Contents:
Sockets
Datagram Sockets
Working with URLs
Web Browsers and Handlers
Writing a Content Handler
Writing a Protocol Handler
The network is the soul of Java. Most of what is new and exciting about Java centers around the potential for new kinds of dynamic, networked applications. This chapter discusses the java.net package, which contains classes for communications and working with networked resources. These classes fall into two categories: the sockets API and classes for working with Uniform Resource Locators (URLs). Figure 9.1 shows all of the classes in java.net.
Java's sockets interface provides access to the standard network protocols used for communications between hosts on the Internet. Sockets are the mechanism underlying all other kinds of portable networked communications. Your processes can use sockets to communicate with a server or peer applications on the Net, but you have to implement your own application-level protocols for handling and interpreting the data. Higher-level functionality, like remote procedure calls and distributed objects, are implemented with sockets.
The Java URL classes provide an API for accessing well-defined networked resources, like documents and applications on servers. The classes use an extensible set of prefabricated protocol and content handlers to perform the necessary communication and data conversion for accessing URL resources. With URLs, an application can fetch a complete file or database record from a server on the network with just a few lines of code. Applications like Web browsers, which deal with networked content, use the URL class to simplify the task of network programming. They also take advantage of the dynamic nature of Java, which allows handlers for new types of URLs to be added on the fly. As new types of servers and new formats for content evolve, additional URL handlers can be supplied to retrieve and interpret the data without modifying the original application.
In this chapter, I'll try to provide some practical and realistic examples of Java network programming using both APIs. Sadly, the current state of affairs is disappointing. The real release of HotJava isn't available, and Netscape Navigator imposes many restrictions on what you can do. In addition, a few standards that we need haven't been defined. Nevertheless, you can use all of Java's networking capabilities to build your own free-standing applications. I'll point out the shortcomings with Netscape Navigator and the standards scene as I go along.
Sockets are a low-level programming interface for networked communications. They send streams of data between applications that may or may not be on the same host. Sockets originated in BSD UNIX and are, in other languages, hairy and complicated things with lots of small parts that can break off and choke little children. The reason for this is that most socket APIs can be used with almost any kind of underlying network protocol. Since the protocols that transport data across the network can have radically different features, the socket interface can be quite complex. (For a discussion of sockets in general, see UNIX Network Programming, by Richard Stevens [Prentice-Hall].)
Java supports a simplified object-oriented interface to sockets that makes network communications considerably easier. If you have done network programming using sockets in C or another structured language, you should be pleasantly surprised at how simple things can be when objects encapsulate the gory details. If this is the first time you've come across sockets, you'll find that talking to another application can be as simple as reading a file or getting user input. Most forms of I/O in Java, including network I/O, use the stream classes described in Chapter 8, Input/Output Facilities. Streams provide a unified I/O interface; reading or writing across the Internet is similar to reading or writing a file on the local system.
Java provides different kinds of sockets to support two distinct classes of underlying protocols. In this first section, we'll look at Java's Socket class, which uses a connection-oriented protocol. A connection-oriented protocol gives you the equivalent of a telephone conversation; after establishing a connection, two applications can send data back and forth; the connection stays in place even when no one is talking. The protocol ensures that no data is lost and that it always arrives in order. In the next section we'll look at the DatagramSocket class, which uses a connectionless protocol. A connectionless protocol is more like the postal service. Applications can send short messages to each other, but no attempt is made to keep the connection open between messages, to keep the messages in order, or even to guarantee that they arrive.
In theory, just about any protocol family can be used underneath the socket layer: Novell's IPX, Apple's AppleTalk, even the old ChaosNet protocols. But this isn't a theoretical world. In practice, there's only one protocol family people care about on the Internet, and only one protocol family Java supports: the Internet protocols, IP. The Socket class speaks TCP, and the DatagramSocket class speaks UDP, both standard Internet protocols. These protocols are available on any system that is connected to the Internet.
When writing network applications, it's common to talk about clients and servers. The distinction is increasingly vague, but the side that initiates the conversation is usually the client. The side that accepts the request to talk is usually the server. In the case where there are two peer applications using sockets to talk, the distinction is less important, but for simplicity we'll use the above definition.
For our purposes, the most important difference between a client and a server is that a client can create a socket to initiate a conversation with a server application at any time, while a server must prepare to listen for incoming conversations in advance. The java.net.Socket class represents a single side of a socket connection on either the client or server. In addition, the server uses the java.net.ServerSocket class to wait for connections from clients. An application acting as a server creates a ServerSocket object and waits, blocked in a call to its accept() method, until a connection arrives. When it does, the accept() method creates a Socket object the server uses to communicate with the client. A server carries on multiple conversations at once; there is only a single ServerSocket, but one active Socket object for each client, as shown in Figure 9.2.
A client needs two pieces of information to locate and connect to another server on the Internet: a hostname (used to find the host's network address) and a port number. The port number is an identifier that differentiates between multiple clients or servers on the same host. A server application listens on a prearranged port while waiting for connections. Clients select the port number assigned to the service they want to access. If you think of the host computers as hotels and the applications as guests, then the ports are like the guests' room numbers. For one guest to call another, he or she must know the other party's hotel name and room number.
A client application opens a connection to a server by constructing a Socket that specifies the hostname and port number of the desired server:
try { Socket sock = new Socket("wupost.wustl.edu", 25); } catch ( UnknownHostException e ) { System.out.println("Can't find host."); } catch ( IOException e ) { System.out.println("Error connecting to host."); }
This code fragment attempts to connect a Socket to port 25 (the SMTP mail service) of the host wupost.wustl.edu. The client handles the possibility that the hostname can't be resolved (UnknownHostException) and that it might not be able to connect to it (IOException).
As an alternative to using a hostname, you can provide a string version of the host's IP address:
Socket sock = new Socket("128.252.120.1", 25); // wupost.wustl.edu
Once a connection is made, input and output streams can be retrieved with the Socket getInputStream() and getOutputStream() methods. The following (rather arbitrary and strange) conversation illustrates sending and receiving some data with the streams. Refer to Chapter 8, Input/Output Facilities for a complete discussion of working with streams.
try { Socket server = new Socket("foo.bar.com", 1234); InputStream in = server.getInputStream(); OutputStream out = server.getOutputStream(); // Write a byte out.write(42); // Say "Hello" (send newline delimited string) PrintStream pout = new PrintStream( out ); pout.println("Hello!"); // Read a byte Byte back = in.read(); // Read a newline delimited string DataInputStream din = new DataInputStream( in ); String response = din.readLine(); server.close(); } catch (IOException e ) { }
In the exchange above, the client first creates a Socket for communicating with the server. The Socket constructor specifies the server's hostname (foo.bar.com) and a prearranged port number (1234). Once the connection is established, the client writes a single byte to the server using the OutputStream's write() method. It then wraps a PrintStream around the OutputStream in order to send text more easily. Next, it performs the complementary operations, reading a byte from the server using InputStream's read() and then creating a DataInputStream from which to get a string of text. Finally, it terminates the connection with the close() method. All these operations have the potential to generate IOExceptions; the catch clause is where our application would deal with these.
After a connection is established, a server application uses the same kind of Socket object for its side of the communications. However, to accept a connection from a client, it must first create a ServerSocket, bound to the correct port. Let's recreate the previous conversation from the server's point of view:
// Meanwhile, on foo.bar.com... try { ServerSocket listener = new ServerSocket( 1234 ); while ( !finished ) { Socket aClient = listener.accept(); // wait for connection InputStream in = aClient.getInputStream(); OutputStream out = aClient.getOutputStream(); // Read a byte Byte importantByte = in.read(); // Read a string DataInputStream din = new DataInputStream( in ); String request = din.readLine(); // Write a byte out.write(43); // Say "Goodbye" PrintStream pout = new PrintStream( out ); pout.println("Goodbye!"); aClient.close(); } listener.close(); } catch (IOException e ) { }
First, our server creates a ServerSocket attached to port 1234. On some systems there are rules about what ports an application can use. Port numbers below 1024 are usually reserved for system processes and standard, well-known services, so we pick a port number outside of this range. The ServerSocket need be created only once. Thereafter we can accept as many connections as arrive.
Next we enter a loop, waiting for the accept() method of the ServerSocket to return an active Socket connection from a client. When a connection has been established, we perform the server side of our dialog, then close the connection and return to the top of the loop to wait for another connection. Finally, when the server application wants to stop listening for connections altogether, it calls the close() method of the ServerSocket.[1]
[1] A somewhat obscure security feature in TCP/IP specifies that if a server socket actively closes a connection while a client is connected, it may not be able to bind (attach itself) to the same port on the server host again for a period of time (the maximum time to live of a packet on the network). It's possible to turn off this feature, and it's likely that your Java implementation will have done so.
As you can see, this server is single-threaded; it handles one connection at a time; it doesn't call accept() to listen for a new connection until it's finished with the current connection. A more realistic server would have a loop that accepts connections concurrently and passes them off to their own threads for processing. (Our tiny HTTP daemon in a later section will do just this.)
The examples above presuppose the client has permission to connect to the server, and that the server is allowed to listen on the specified socket. This is not always the case. Specifically, applets and other applications run under the auspices of a SecurityManager that can impose arbitrary restrictions on what hosts they may or may not talk to, and whether they can listen for connections. The security policy imposed by the current version of Netscape Navigator allows applets to open socket connections only to the host that served them. That is, they can talk back only to the server from which their class files were retrieved. Applets are not allowed to open server sockets themselves.
Now, this doesn't meant an applet can't cooperate with its server to communicate with anyone, anywhere. A server could run a proxy that lets the applet communicate indirectly with anyone it likes. What the current security policy prevents is malicious applets roaming around inside corporate firewalls. It places the burden of security on the originating server, and not the client machine. Restricting access to the originating server limits the usefulness of "trojan" applications that do annoying things from the client side. You won't let your proxy mail bomb people, because you'll be blamed.
Many networked workstations run a time service that dispenses their local clock time on a well-known port. This was a precursor of NTP, the more general Network Time Protocol. In the next example, DateAtHost, we'll make a specialized subclass of java.util.Date that fetches the time from a remote host instead of initializing itself from the local clock. (See Chapter 7, Basic Utility Classes for a complete discussion of the Date class.)
DateAtHost connects to the time service (port 37) and reads four bytes representing the time on the remote host. These four bytes are interpreted as an integer representing the number of seconds since the turn of the century. DateAtHost converts this to Java's variant of the absolute time (milliseconds since January 1, 1970, a date that should be familiar to UNIX users) and then uses the remote host's time to initialize itself:
import java.net.Socket; import java.io.*; public class DateAtHost extends java.util.Date { static int timePort = 37; static final long offset = 2208988800L; // Seconds from century to // Jan 1, 1970 00:00 GMT public DateAtHost( String host ) throws IOException { this( host, timePort ); } public DateAtHost( String host, int port ) throws IOException { Socket sock = new Socket( host, port ); DataInputStream din = new DataInputStream(sock.getInputStream()); int time = din.readInt(); sock.close(); setTime( (((1L << 32) + time) - offset) * 1000 ); } }
That's all there is to it. It's not very long, even with a few frills. We have supplied two possible constructors for DateAtHost. Normally we'll use the first, which simply takes the name of the remote host as an argument. The second, overloaded constructor specifies the hostname and the port number of the remote time service. (If the time service were running on a nonstandard port, we would use the second constructor to specify the alternate port number.) This second constructor does the work of making the connection and setting the time. The first constructor simply invokes the second (using the this() construct) with the default port as an argument. Supplying simplified constructors that invoke their siblings with default arguments is a common and useful technique.
The second constructor opens a socket to the specified port on the remote host. It creates a DataInputStream to wrap the input stream and then reads a 4-byte integer using the readInt() method. It's no coincidence the bytes are in the right order. Java's DataInputStream and DataOutputStream classes work with the bytes of integer types in network byte order (most significant to least significant). The time protocol (and other standard network protocols that deal with binary data) also uses the network byte order, so we don't need to call any conversion routines. (Explicit data conversions would probably be necessary if we were using a nonstandard protocol, especially when talking to a non-Java client or server.) After reading the data, we're finished with the socket, so we close it, terminating the connection to the server. Finally, the constructor initializes the rest of the object by calling Date's setTime() method with the calculated time value.[2]
[2] The conversion first creates a long value, which is the unsigned equivalent of the integer time. It subtracts an offset to make the time relative to the epoch (January 1, 1970) rather than the century, and multiples by 1000 to convert to milliseconds.
The DateAtHost class can work with a time retrieved from a remote host almost as easily as Date is used with the time on the local host. The only additional overhead is that we have to deal with the possible IOException that can be thrown by the DateAtHost constructor:
try { Date d = new DateAtHost( "sura.net" ); System.out.println( "The time over there is: " + d ); int hours = d.getHours(); int minutes = d.getMinutes(); ... } catch ( IOException e ) { }
This example fetches the time at the host sura.net and prints its value. It then looks at some components of the time using the getHours() and getMinutes() methods of the Date class.
Have you ever wanted your very own Web server? Well, you're in luck. In this section, we're going to build TinyHttpd, a minimal but functional HTTP daemon. TinyHttpd listens on a specified port and services simple HTTP "get file" requests. They look something like this:
GET /path/filename [optional stuff]
Your Web browser sends one or more as lines for each document it retrieves. Upon reading the request, the server tries to open the specified file and send its contents. If that document contains references to images or other items to be displayed inline, the browser continues with additional GET requests. For best performance (especially in a time-slicing environment), TinyHttpd services each request in its own thread. Therefore, TinyHttpd can service several requests concurrently.
Over and above the limitations imposed by its simplicity, TinyHttpd suffers from the limitations imposed by the fickleness of filesystem access, as discussed in Chapter 8, Input/Output Facilities. It's important to remember that file pathnames are still architecture dependent--as is the concept of a filesystem to begin with. This example should work, as is, on UNIX and DOS-like systems, but may require some customizations to account for differences on other platforms. It's possible to write more elaborate code that uses the environmental information provided by Java to tailor itself to the local system. (Chapter 8, Input/Output Facilities gives some hints about how to do this).
WARNING:
This example will serve files from your host without protection. Don't try this at work.
Now, without further ado, here's TinyHttpd:
import java.net.*; import java.io.*; import java.util.*; public class TinyHttpd { public static void main( String argv[] ) throws IOException { ServerSocket ss = new ServerSocket(Integer.parseInt(argv[0])); while ( true ) new TinyHttpdConnection( ss.accept() ); } } class TinyHttpdConnection extends Thread { Socket sock; TinyHttpdConnection ( Socket s ) { sock = s; setPriority( NORM_PRIORITY - 1 ); start(); } public void run() { try { OutputStream out = sock.getOutputStream(); String req = new DataInputStream(sock.getInputStream()).readLine(); System.out.println( "Request: "+req ); StringTokenizer st = new StringTokenizer( req ); if ( (st.countTokens() >= 2) && st.nextToken().equals("GET") ) { if ( (req = st.nextToken()).startsWith("/") ) req = req.substring( 1 ); if ( req.endsWith("/") || req.equals("") ) req = req + "index.html"; try { FileInputStream fis = new FileInputStream ( req ); byte [] data = new byte [ fis.available() ]; fis.read( data ); out.write( data ); } catch ( FileNotFoundException e ) new PrintStream( out ).println("404 Not Found"); } else new PrintStream( out ).println( "400 Bad Request" ); sock.close(); } catch ( IOException e ) System.out.println( "I/O error " + e ); } }
Compile TinyHttpd and place it in your class path. Go to a directory with some interesting documents and start the daemon, specifying an unused port number as an argument. For example:
% java TinyHttpd 1234
You should now be able to use your Web browser to retrieve files from your host. You'll have to specify the nonstandard port number in the URL. For example, if your hostname is foo.bar.com, and you started the server as above, you could reference a file as in:
http://foo.bar.com:1234/welcome.html
TinyHttpd looks for files relative to its current directory, so the pathnames you provide should be relative to that location. Retrieved some files? Al'righty then, let's take a closer look.
TinyHttpd is comprised of two classes. The public TinyHttpd class contains the main() method of our standalone application. It begins by creating a ServerSocket, attached to the specified port. It then loops, waiting for client connections and creating instances of the second class, a TinyHttpdConnection thread, to service each request. The while loop waits for the ServerSocket accept() method to return a new Socket for each client connection. The Socket is passed as an argument to construct the TinyHttpdConnection thread that handles it.
TinyHttpdConnection is a subclass of Thread. It lives long enough to process one client connection and then dies. TinyHttpdConnection's constructor does three things. After saving the Socket argument for its caller, it adjusts its own priority and then invokes start() to bring its run() method to life. By lowering its priority to NORM_PRIORITY-1 (just below the default priority), we ensure that the threads servicing established connections won't block TinyHttpd's main thread from accepting new requests. (On a time-slicing system, this is less important.)
The body of TinyHttpdConnection's run() method is where all the magic happens. First, we fetch an OutputStream for talking back to our client. The second line reads the GET request from the InputStream into the variable req. This request is a single newline-terminated String that looks like the GET request we described earlier. Since this is the only time we read from this socket, it's hard to resist the urge to be terse. Alternatively, we could break that statement into three steps: getting the InputStream, creating the DataInputStream wrapper, and reading the line. The three-line version is certainly more readable and should not be noticeably slower.
We then parse the contents of req to extract a filename. The next few lines are a brief exercise in string manipulation. We create a StringTokenizer and make sure there are at least two tokens. Using nextToken(), we take the first token and make sure it's the word GET. (If both conditions aren't met, we have an error.) Then we take the next token (which should be a filename), assign it to req , and check whether it begins with "/". If so, we use substring() to strip the first character, giving us a filename relative to the current directory. If it doesn't begin with "/", the filename is already relative to the current directory. Finally, we check to see if the requested filename looks like a directory name (i.e., ends in slash) or is empty. In these cases, we append the familiar default filename index.html.
Once we have the filename, we try to open the specified file and load its contents into a large byte array. (We did something similar in the ListIt example in Chapter 8, Input/Output Facilities.) If all goes well, we write the data out to client on the OutputStream. If we can't parse the request or the file doesn't exist, we wrap our OutputStream with a PrintStream to make it easier to send a textual message. Then we return an appropriate HTTP error message. Finally, we close the socket and return from run(), removing our Thread.
The biggest problem with TinyHttpd is that there are no restrictions on the files it can access. With a little trickery, the daemon will happily send any file in your filesystem to the client. It would be nice if we could restrict TinyHttpd to files that are in the current directory, or a subdirectory. To make the daemon safer, let's add a security manager. I discussed the general framework for security managers in Chapter 7, Basic Utility Classes. Normally, a security manager is used to prevent Java code downloaded over the Net from doing anything suspicious. However, a security manager will serve nicely to restrict file access in a self-contained application.
Here's the code for the security manager class:
import java.io.*; class TinyHttpdSecurityManager extends SecurityManager { public void checkAccess(Thread g) { }; public void checkListen(int port) { }; public void checkLink(String lib) { }; public void checkPropertyAccess(String key) { }; public void checkAccept(String host, int port) { }; public void checkWrite(FileDescriptor fd) { }; public void checkRead(FileDescriptor fd) { }; public void checkRead( String s ) { if ( new File(s).isAbsolute() || (s.indexOf("..") != -1) ) throw new SecurityException("Access to file : "+s+" denied."); } }
The heart of this security manager is the checkRead() method. It checks two things: it makes sure that the pathname we've been given isn't an absolute path, which could name any file in the filesystem; and it makes sure the pathname doesn't have a double dot (..) in it, which refers to the parent of the current directory. With these two restrictions, we can be sure (at least on a UNIX or DOS-like filesystem) that we have restricted access to only subdirectories of the current directory. If the pathname is absolute or contains "..", checkRead() throws a SecurityException.
The other do-nothing method implementations--e.g., checkAccess()--allow the daemon to do its work without interference from the security manager. If we don't install a security manager, the application runs with no restrictions. However, as soon as we install any security manager, we inherit implementations of many "check" routines. The default implementations won't let you do anything; they just throw a security exception as soon as they are called. We have to open holes so the daemon can do its own work; it still has to accept connections, listen on sockets, create threads, read property lists, etc. Therefore, we override the default checks with routines that allow these things.
Now you're thinking, isn't that overly permissive? Not for this application; after all, TinyHttpd never tries to load foreign classes from the Net. The only code we are executing is our own, and it's assumed we won't do anything dangerous. If we were planning to execute untrusted code, the security manager would have to be more careful about what to permit.
Now that we have a security manager, we must modify TinyHttpd to use it. Two changes are necessary: we must install the security manager and catch the security exceptions it generates. To install the security manager, add the following code at the beginning of TinyHttpd's main() method:
System.setSecurityManager( new TinyHttpdSecurityManager() );
To catch the security exception, add the following catch clause after FileNotFoundException's catch clause:
catch ( SecurityException e ) new PrintStream( out ).println( "403 Forbidden" );
Now the daemon can't access anything that isn't within the current directory or a subdirectory. If it tries to, the security manager throws an exception and prevents access to the file. The daemon then returns a standard HTTP error message to the client.
TinyHttpd still has room for improvement. First, it consumes a lot of memory by allocating a huge array to read the entire contents of the file all at once. A more realistic implementation would use a buffer and send large amounts of data in several passes. TinyHttpd also fails to deal with simple things like directories. It wouldn't be hard to add a few lines of code (again, refer to the ListIt example in Chapter 8, Input/Output Facilities) to read a directory and generate linked HTML listings like most Web servers do.