Dr. Dobb's | APR Networking & the Reactor Pattern

APR Networking & the Reactor Pattern

The Apache Portable Runtime is a C-based API that provides abstractions ranging from memory management to telling time.

October 03, 2006
URL:http://www.drdobbs.com/cpp/apr-networking-the-reactor-pattern/193101548

Ethan is a freelance technology consultant specializing in C++, Java, and Linux. He can be reached at [email protected]

Source code is said to be portable if it compiles and runs in several environments without modification. Applications built on portable code have a potentially wide user base and streamlined development lifecycle, because teams don't have to maintain multiple code bases for disparate platforms.

One path to portability is to limit yourself to the language's base features. This is straightforward in languages such as Java and Perl, which include rich toolsets beyond the pure syntax. C and C++, on the other hand, don't address networking and other concepts required for real-world applications. Native-code portability involves messy preprocessor macros to detect and compensate for OS-specific oddities. Application teams either dedicate a portion of their staff to developing a portability layer in-house, or they acquire third-party solutions such as ACE or RogueWave.

Another contender in the arena of native-code portability is the Apache Portable Runtime (apr.apache.org). This C-based API provides abstractions ranging from memory management to telling time. APR developers focus on the OS-specific oddities so you can focus on your app. APR has been slowly developed over the years as part of the ubiquitous Apache web server, and has also been thoroughly road-tested in several other real-world applications. Better still, it is released under the cost-free, commercial-friendly Apache license.

In this article, I focus on APR's networking and polling routines. To that end, I present a straightforward network service, then revamp it a couple of times. The end result is an APR-based implementation of the Reactor pattern (www.cs.wustl.edu/~schmidt/patterns-ace.html). I built and tested the sample code under Fedora Core 4 using GCC 4.0.1 and APR 1.1.1 and 1.2.1. Because portability is the name of this game, I encourage you to build the sample code under your own operating system—Linux, Mac OS, Windows, or whatever—to confirm that it works as advertised.

(To make the code easier to digest, I've made it tough to maintain—it exposes APR in several places a production app would not; for example, the APRReactor class directly handles apr_pollfd_t values. In a real-world scenario, you'd do well to limit how far a third-party toolkit spreads throughout your app. Hiding such an API behind abstractions and interfaces makes it easier to swap out or remove later on.)

Socket Programming Basics

At a high level, a socket represents one endpoint of a communications channel. Network-based sockets transmit data across the wire, perhaps between two machines. Some operating systems support socket operations via the filesystem using a special file type. Streams-oriented, TCP socket communications involve steady connections between the two socket endpoints. Datagram-oriented, UDP connections pass data in minute packets, the order and delivery of which may be unreliable.

Typically, one socket represents the server side and the other is the client side. (I prefer the term "service" to "server," since the latter also refers to the machine on which the service runs.) The server listens for a client connection, exchanges some data with the client, and the two disconnect.

In code, the notion of a socket differs by the implementation. Java and ACE wrap it in an object. Traditional UNIX sockets are a combination of a file descriptor, represented by a numeric primitive, and a data structure that contains network information, such as IP addresses.

Merely creating a socket variable doesn't do anything. You have to tell it to wait for a connection (server-side) or initiate a connection to something (client-side). To wait for a connection, you first bind the socket to an address and port. Only one entity may listen on a given address:port pair at a time; binding is a way an entity stakes its claim. To describe networking in restaurant terms, for example, binding is the equivalent of renting out a space—nothing's there yet, but soon will be.

A bound socket is simply held so no one else can use it. To be available for connections, you must then listen on the socket. This means opening up your restaurant's doors for diners.

Finally, you accept client connections to perform a data exchange. Accepting clients means your host or hostess is waiting by the restaurant's door to admit and seat clients. For a blocking accept call, your host does nothing else until a client arrives. In a nonblocking accept call, the host periodically checks for clients and tends to other matters.

Time Service: Set up the Listener

Using APR, I can put this restaurant analogy to work. In addition to being portable, APR's networking routines are more convenient to use than the UNIX-style builtins. Both use data structures to represent connections and address info, but APR includes routines to abstract the developer from the raw bit-twiddling.

The sample code is a simple network-based time service. I leave the client side as a reader exercise. For now, you can telnet to the service port to simulate a client. If the service listens on port 8123 of the localhost interface, for example, run:

 
telnet localhost 8123

to connect.

The first version of the code is in the file step1.cc. I've broken the important part of the code into two functions: main() and handleClient(). Granted, this would be unpleasant from a support perspective, but it's easier to describe here.

step1's main() sets up the listener and prepares for client connections; see Listing One. Following some basic APR setup, main() creates a blank socket structure, type apr_socket_t (1) and sets some options. APR_SO_REUSEADDR lets you rebind to an address:port pair that has just been released, instead of having to wait for a timeout. The TRUE value is an APR constant. It doesn't use the Boolean True because Booleans don't (technically) exist in C.

int main( const int argc , const char** argv ){
   const char* listenString = argv[1] ;
   // ... initialize APR 
   // ... setup memory pool "mainMemPool" ...
   apr_socket_t* serverSocket ;
   // create socket
   apr_socket_create(          /* 1 */
      &serverSocket , 
      APR_INET ,
      SOCK_STREAM ,
      APR_PROTO_TCP ,
      mainMemPool
   ) ;
   // set options
   apr_socket_opt_set(
      serverSocket ,
      APR_SO_REUSEADDR ,
      TRUE // "TRUE" --> APR constant
   ) ;
   // create sockaddr_t
   apr_sockaddr_t* sockAddr = NULL ;
   char* listenHost ;
   char* scopeID ; // unused
   apr_port_t listenPort ;
   apr_parse_addr_port(         /* 2 */
      &listenHost ,
      &scopeID ,
      &listenPort ,
      listenString ,
      mainMemPool
   ) ;
   apr_sockaddr_info_get(       /* 3 */
      &sockAddr ,
      listenHost ,
      APR_UNSPEC , // let system decide
      listenPort ,
      0 ,
      mainMemPool
   ) ;
   // bind
   aprResult = apr_socket_bind(     /* 4 */
      serverSocket ,
      sockAddr
   ) ;
   // listen
   aprResult = apr_socket_listen(   /* 5 */
      serverSocket ,
      15
   ) ;
   apr_socket_t* clientSocket = NULL ;
   while( true ){
      std::cout << "accepting ..." << std::endl ;
      aprResult = apr_socket_accept(    /* 6 */
         &clientSocket ,
         serverSocket ,
         mainMemPool
      ) ;
      try{
         handleClient( clientSocket , interpreter ) ;
      }catch( const std::runtime_error& swallowed ){
         // ... handle exception ...
      }
      apr_socket_shutdown( clientSocket , APR_SHUTDOWN_READWRITE ) ;
      apr_socket_close( clientSocket ) ;
      std::cout << "closed" << std::endl ;
      clientSocket = NULL ;
   }
   // cleanup
   aprResult = apr_socket_close( serverSocket ) ;
} // main()

Listing One

Next, main() calls the apr_parse_addr_port() convenience routine (2) to parse the program's command-line argument into a host/address and port pair. The call to apr_sockaddr_info_get() (3) configures the socket object to listen on the specified address and port.

Per its name, apr_socket_bind() attempts to bind to the specified address and port (4). Barring a failure, step1 calls apr_socket_listen() to open the door for client connections (5). Finally, apr_socket_accept() (6) blocks waiting for a client connection. When this function returns, it creates a new socket that's used for the conversation with the client.

Returning to the restaurant analogy, these steps put the "open" sign in the window. Now, you wait for a customer.

Time Service: Handling Client Connections

step1's handleClient() (Listing Two, available electronically; see www.ddj.com/code/) is in charge of the client conversation. It runs in a loop, acting much like a command shell. There are only three valid commands:

now retrieves the current time in CST.
help shows a brief help message.
quit ends the conversation.

handleClient() shows the basics of using APR to read from and write to a socket. It breaks the loop when the client enters quit or explicitly breaks the connection at the socket level. (In telnet, do the latter by typing control-], then quit.)

apr_socket_recv() (1) reads data from the client socket. The size parameter serves a dual purpose: On the way into the function, it states the size of the input buffer; on the way out, it states how many characters were written to the buffer (that is, read from the client). This call blocks if there is no data waiting to be read. You can call apr_socket_timeout_set() to set a socket's read/write maximum wait time, or set the option APR_SO_NONBLOCK to make the socket completely nonblocking.

Note that the sample code cheats somewhat: All commands are a single line of plaintext, and are expected to be smaller than the receive buffer. In a more complex protocol, the service may require several reads from the client, storing the results in a temporary buffer, in order to build a complete request. (Consider, for example, a large HTTP POST operation.)

The function CommandInterpreter::processCommand() (2) returns a numeric constant based on the input string read from the client. This conveniently dovetails with a switch() (3) to determine the action: Send the time, disconnect, send a help message, or send an error message. Each case: label appends some data to a std::stringstream buffer to construct the response message.

After the switch() block, handleClient() sends the buffer's contents back to the client. Similar to receiving data, the call to apr_socket_send() (4) uses the size parameter to declare both the buffer's size (on the way in) and how many bytes were written (on the way out). Note that it's not necessary to pass the function a NULL-terminated string, because you tell it how many characters to read from the buffer.

Polling Concepts

The initial time service is akin to a restaurant with one table and one waiter: It can only service one client at a time. Everyone else waits outside until the current client has finished their business. Chances are, you do not require (nor want) the waiter at the table for the entire meal; instead, he can periodically check in on you and divide the rest of his time among several other tables.

This recurring check for activity is called "polling," and for socket programming it offers a cheap form of multitasking. A program can keep track of client sockets and cycle through those that need attention. Under UNIX-like operating systems, the poll() and select() functions determine which sockets in a given set have waiting data.

The Reactor pattern formalizes polling's event-driven model into a framework. Its unit of currency is the handle, which is something that can receive events. Here, a handle is a socket (or a variable related to a socket), and events indicate that data is waiting; that is, attempts to accept a client or read data will not block.

Each handle has an associated handler that processes events. For example, a Reactor implementation of the time service would have handlers to process the commands sent by the remote clients.

The central Reactor object ties all of this together. Code registers handles with the Reactor, which runs a loop that checks for events on those handles. For each handle that has a waiting event, the Reactor calls its associated handler.

How the Reactor determines whether there are waiting events depends on the implementation. The textbook Reactor description calls this piece a "synchronous event dispatcher." That's a five-dollar word for calls such as select() or poll().

The basic Reactor isn't as responsive as a multithreaded service; on each iteration of its event loop, it processes waiting events in serial fashion. That means a couple of slow connections at the head of the line can delay service to other connections in the set. Nonetheless, it's cheap to set up and will likely suffice for a small and/or low-activity client base. (For a more high-powered solution, consider the Proactor pattern described in POSA2.)

Time Service, Take 2: The Reactor

APR has all the tools necessary to implement a Reactor. The sample program step2 is a Reactor-based refinement of the time service.

The first thing to notice is that step2 wraps APR sockets in objects to simplify calls. APRSocketConnection and APRSocketListener are simple implementations of client and server sockets, respectively.

The handle in the sample APR-based Reactor is the apr_pollfd_t data type (Listing Three, available electronically). Unlike the traditional UNIX handle (a primitive), apr_pollfd_t is a full-blown structure. The union desc is the low-level descriptor—here, the familiar APR socket type—and desc_type lets other code know whether the descriptor is a socket (APR_POLL_SOCKET) or file (APR_POLL_FILE). Client code sets the reqevents member to express interest in certain types of events (APR_POLLIN) and APR polling routines set the rtnevents member to describe the type of event that was received. The client_data member is of type void*, so client code can attach any pointer to the handle structure.

The Reactor can't watch for events on handles it doesn't know. Client code calls APRReactor's registerHandle() member to register handles with it. In turn, registerHandle() calls apr_pollset_add() to add the handle to its watch list, of type apr_pollset_t; see Listing Four (available electronically).

On each iteration, the Reactor's event loop (Listing Five, available electronically) first calls apr_pollset_poll() to get an array of handles that have waiting events (1). This call blocks until there is activity on at least one of the handles. For each handle in the array, the event loop checks whether the returned event type (apr_pollfd_t.rtnevents) is an error (APR_POLLNVAL or APR_POLLERR). Otherwise, if the requested event type (apr_pollfd_t.reqevents) matches the returned event type, the event loop calls the handler to do some work and moves on to the next handle.

In other Reactor implementations, this is where you'd have to use an std::map<> to track handle-handler associations. The sample code assigns the handler to the handle's client_data member, such that there's no need to keep a separate map. That means getting to the handler is as simple as casting apr_pollfd_t.client_data to the proper type (2).

If the handler call throws an exception or returns False, the Reactor adds its handle to a cleanup list. All handlers in this list are unregistered at the end of an event loop iteration (3). That covers housekeeping.

Like any good framework, the Reactor knows little about a handler's details; it only knows to call member functions on the base Handler type. This Handler interface defines two member functions: getHandle() returns the raw handle that is registered with the Reactor. doRead() is called in response to a read event.

An AcceptorHandler wraps a socket listener. Its constructor (Listing Six, also available electronically) creates an apr_pollfd_t and assigns itself as the client_data member. The socket is the wrapped socket (here, hidden behind the APRSocketListener object). Setting the reqevents member to APR_POLLIN tells the APR polling routines that this descriptor is interested in read events.

To an AcceptorHandler, a read event is a new client connection on its socket listener. That is, a call to accept a client connection won't block. The handler creates a new client socket and wraps it in a client handler, DataHandler. (This was what main() did in step1.) AcceptorHandler's doRead() always returns True. Remember, returning False will force the Reactor to unregister this handler, and there's no need to let one bad client connection shut down the listener.

A DataHandler sees an "in" or read event as an opportunity to read data from a client socket and write a response. DataHandler's doRead() holds a conversation with the remote client, just like step1's handleClient() function. doRead() returns False if the client has requested a disconnect using the time service's quit command.

You don't have to take my word for it: Start the app and connect to it from several telnet windows. You can hold multiple, concurrent conversations with the time service.

Conclusion

Subtle differences between operating systems' network stacks can hinder your efforts to writing portable native-code apps. APR can bring this goal closer to reality, and in a fashion that doesn't litter your code with #IFDEF statements. In this article, I introduced APR's OS-neutral networking and polling APIs. By no means is that the entire APR story; this toolkit also includes OS-neutral abstractions for threading, files, and even process handling.