Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Web Development

Using Apache Portable Run-Time


Oct00: Using Apache Portable Run-Time

Ryan is a senior software engineer at Covalent Technologies and a member of the Apache Software Foundation. He can be reached at [email protected].


Most software developers would agree that it is a difficult task to write truly portable C code that performs well. This is a challenge that the Apache Software Foundation (ASF) has wrestled with since the first version of Apache was released in 1995. The Apache web server serves over 60 percent of the Internet and, as of Version 1.3.12, has been ported to most varieties of UNIX, Windows, OS/2, BeOS, and some mainframes. Apache has become almost impossible to maintain, because this level of portability was achieved using #ifdefs. For Apache 2.0, the Apache Group realized that new porting methods would be needed if Apache was going to continue to expand its user base. To this end, the ASF began a new project, the Apache Portable Run-Time (APR), to abstract platform differences. While this article discusses the specifics of APR, with primary focus on Windows and UNIX, many of the details apply to general portability problems.

APR's Goals

The original goal of APR was to provide a small library that Apache could use to handle portability issues. If you wanted to port Apache to a new platform, you would simply implement APR and be done. This goal turned out to be infeasible. There are some things that simply cannot be abstracted. For example, mapping a request to an execution primitive in Apache is essential to providing good performance across all platforms; however, this cannot be abstracted. As a result, the final plans for APR were modified.

The next plan was for APR to provide a common interface to a set of functions across all platforms. This also turned out to be an impossibility. Some platforms do not support particular features that APR would generally implement. What finally worked was to implement a common interface to all APR functions across most platforms. To achieve this, APR provides feature macros. For example, not all platforms provide the ability to map files into memory. On platforms that have this ability, APR defines the macro APR_HAS_ MMAP as True. On all other platforms this macro is defined as False. This approach lets you make decisions based on platform features instead of on which platform the program is being compiled.

Listing One shows code that Apache uses to set up shared memory. Listing One(a) is from Apache 1.3, where Apache is responsible for portability. Listing One(b) is from Apache 2.0, where Apache uses APR to provide portability. Listing One(b) is easier for new developers to read and understand because it hides how the feature is implemented and highlights the feature being used.

Another goal for the developers of APR was to use native functions whenever possible. When porting Apache to Windows, the developers learned that native functions generally work better than POSIX functions. When Apache was originally ported to Windows, POSIX function calls were used throughout the server. This caused instabilities and performance problems in Apache. For this reason, when developers began work on APR, taking advantage of native function calls was a priority.

The final goal for APR is to enforce portability whenever possible. This enforcement is accomplished by using incomplete types for most things in APR. Applications that use APR cannot access the fields of APR types directly. However, the APR developers have provided accessor functions for all of the important fields for each type.

APR provides a set of system-independent functions that make programs inherently portable. Those functions can be categorized based on what features they provide; see Table 1.

Error Codes in APR

Error codes are difficult to implement portably. APR addresses this problem in a unique way: Most APR functions return a status code. The only functions that do not return a status code cannot fail on any platform, such as ap_MD5Init, which fills out an array that APR creates. APR returns a variety of error values, the first of which is APR_SUCCESS. APR_SUCCESS is 0 on every platform that supports APR, letting you compare the status value to 0 and easily determine if there was an error.

Two other types of status returns generated by APR are APR errors and APR statuses. An APR error occurs when an APR function fails because bad data was sent to the function. An APR status occurs when an APR function has nonerror information for you. For example, the function ap_wait(), which checks the status of a child process, cannot return APR_ SUCCESS. The two status values that ap_wait() can return are APR_CHILD_ DONE and APR_CHILD_NOTDONE. The final type of error is a system error. These errors take one of two forms: an errno value or a converted native system error. The conversion is done by adding a defined constant to the native system error, allowing APR to return a distinct error value regardless of what caused the error.

There are a couple of actions that can be taken on APR status values. One is to convert the APR status code into a canonical value. This is done by calling ap_canonical_error(). This function takes an APR error code that represents either an errno value or a system error and returns a common subset of error values. You can then make programmatic decisions based on the canonical error value. Listing Two is an example of how ap_canonical_error() can be used. Another action that can be performed on status values is to generate a human-readable string using the function ap_strerror.

Miscellaneous Functions

There is a small set of functions in APR that does not fit into any other category. Those functions are put into a miscellaneous category for use by APR applications. The first of these functions is ap_ getopt, which is used to parse command-line arguments. Every platform provides a version of getopt, but APR developers found minor differences between these versions that caused problems when writing portable programs.

So-called OtherChild functions are used extensively in Apache. These functions let you create a child process and define a function to be run under certain conditions. Each child process can have a file descriptor associated with it. If that file descriptor becomes unwritable, the function is called with that information. The function is also called if the child dies, the child becomes lost, the main process is told to restart, or the given child process is unregistered.

The miscellaneous category also includes startup and shutdown functions. These functions set up and clean up APR's internal data structures. The first thing every APR program must do is call ap_initialize(). Without this call, the APR application will seg fault or cause a GPF, depending on the system on which it is running. APR developers also recommend that immediately after calling ap_initialize(), the APR application set ap_terminate() to run at exit. This is very important because if APR is configured to use semaphores on a system, without calling ap_terminate() the application will leak semaphores. APR automatically creates two APR locks when ap_initialize() is called. These are used to protect threads while allocating memory out of pools.

Memory Management

Memory management in APR is handled by memory pools. Every APR function has access to a pool, either directly or through an APR data type. This is done even if none of the current APR platforms need to allocate memory for the function because a new platform could be added at any time that does need to allocate memory within the function. For example, when the function ap_strerror was initially written, it did not take a pool as an argument. This worked for UNIX, but Windows and OS/2 needed to allocate memory for the string themselves, so a pool was needed.

Pools save you from having to free memory manually. Each time memory is allocated from a pool, the pool keeps track of it. When the pool is destroyed, all of the memory in that pool is freed automatically. Pools never shrink. They grow each time memory is allocated until they are destroyed.

Each time memory is needed, the pool determines if it already has enough free memory. If not, the pool will allocate more memory for itself using malloc(). Pools never allocate less than 8192 bytes of memory at a time. If less memory is required for the current request, the pool keeps the extra memory for the next time memory is requested. Pools do this so that malloc() is called as seldom as possible, because on most systems, malloc() is an expensive system call. When a pool is cleared, the memory is not freed; it is just made available for the next time memory is requested from that pool.

Pools also let you register cleanups to be called when the pool is either cleared or destroyed. Most APR types store a pool that is associated with the instance of the APR variable. This allows APR to automatically register cleanups whenever an APR type is created and also lets you easily attach data to all APR types. The first feature lets you be lazy with cleaning up after yourself. For example, you don't need to close files because the file will be closed automatically when the pool is cleared or destroyed.

Finally, it is possible to attach data to pools. This data is attached based on a key that the programmer provides. When you want to retrieve that data from that pool, the key is provided to the retrieval function.

Portability Routines

APR developers are aware that not all programs currently use APR. To allow APR programs to interact well with nonAPR programs, routines are included to convert between APR and native types. For example, by calling ap_get_os_file() and passing it an APR file, you can convert from an APR to a native file. This is not yet perfectly implemented: If the pool that the APR file was allocated from is cleared or destroyed while the program is still using the native file, the file is closed and errors occur. However, with some planning, these functions provide a good way to convert between APR and nonAPR code. A function to go the other direction is also provided. By calling ap_put_os_file(), a programmer can take nonportable files and make them portable. Listing Three is an example of using ap_get_os_file() to get a native file from an APR file. This code can only be used in a UNIX-only source file.

File and Network I/O

APR helps the most with file and network I/O. Most cross-platform applications use POSIX functions on all platforms for these jobs. On Windows, however, POSIX functions are slow for both file and network I/O; native functions work much better. Listings Four and Five are examples of opening a file on UNIX and Windows, respectively. It is difficult to combine these two code segments cleanly so that a file can be opened on either platform. Listing Six shows the same function when coded with APR. The function in Listing Six is a compromise between the Windows and UNIX segments. When executed on a Windows machine, this function acts like the code in Listing Five. When run on a UNIX machine it acts like Listing Four. This is powerful because it lets a single C program take advantage of the strengths of whatever platform it is currently running on.

Network I/O is an interesting problem to solve cross platform. UNIX platforms see sockets and files as the same internal type, but Windows does not. This means that files and sockets in APR cannot be treated as a common type. On UNIX, the same functions can be used to read and write to sockets as are used to read/write to files. On Windows, different functions must be used. For this reason, APR has provided ap_read and ap_write to read and write files, and ap_send and ap_recv to read and write to the network. Because of the need to be cross platform, APR does not let Windows programs take advantage of completion ports and asynchronous I/O to the network. APR originally tried to provide these features, but it was not possible in a portable manner. This does not mean that the APR developers have given up; however, this problem has currently been set aside and will be revisited in the future.

Threads and Processes

Different platforms have different priorities with regard to execution primitives. Windows was designed to work best with threaded programs. A serious problem when originally porting Apache 1.3 to Windows was that Windows did not let you fork a new process that started executing at the same location as the original process. The only way to create a new process on Windows is to create a new process that also executes a new program. UNIX was designed to work with processes, and some UNIX implementations do not have native threads. There is no way to abstract these philosophical differences between platforms, and APR does not try to. APR does let you test for these differences and make appropriate decisions by using the APR_HAS_FORK macro.

However, there are other issues that involve execution primitives that can be abstracted; for example, how a program creates a new process. Listings Seven and Eight show how to create a process running a new program on UNIX and Windows, respectively. Listing Nine shows how to create a new process using APR. These listings deal with a very complicated issue: how to communicate with the new process once it has been created. APR takes care of these details for you because they are hard to do portably and you should be writing code, not trying to battle two or three operating systems.

Critical Section Locking

Once multiple threads or processes are introduced into a program, it is possible that they will interfere with each other. For example, in Apache, on some UNIX variations, if two processes are accepting connections on the same socket, when a connection comes in they will both be awakened and then one will go back to sleep. This can be a time sink if there are one or two hundred processes accepting on the same socket and they all get awakened and all but one is put back to sleep immediately. Threads interfere with each other even more readily because they all share the same address space.

One of the problems with locking critical sections is that every platform has its own way of doing this. When dealing with cross-process mutexes, some flavors of UNIX use fcntl() locks, some use flock() locks, some use SVR4 semaphores, and some implement the full pthreads specification and provide cross-process pthreads mutexes. And Windows offers critical sections that are completely different from anything on any kind of UNIX. Add to that the need to lock threads from interfering with each other, and the code can get ugly quickly.

APR solves this problem by providing one type, ap_lock_t, which can lock code one of three ways: cross process (APR_ CROSS_PROCESS), cross thread (APR_INTRAPROCESS), and lock everything (APR_LOCKALL). Cross-process locks are guaranteed to lock multiple processes from reaching the same piece of code but make no guarantees about threads. On some platforms APR_CROSS_ PROCESS locks will lock both threads and processes, but on others they will only lock processes. Cross-process locks should only be used in nonthreaded applications. Cross-thread locks do not affect processes at all, and should be used in threaded applications. APR_LOCKALL locks are guaranteed to block both threads and processes from entering a critical section of code. These locks should be used in any multithread, multiprocess application that must protect sections of code.

Does APR Work?

At this writing, Apache 2.0 has had two alpha releases. Apache 2.0 is portable largely because of APR. However, Apache isn't the only success story for APR. The program ApacheBench is a benchmarking tool for web servers that has always been released with Apache. Until now, this tool only worked on UNIX platforms. Utilizing APR, ApacheBench now works on all platforms that implement APR.

The Future of APR

APR is an Open Source project and APR developers are always looking for more people to help with the development effort. Currently, the development of APR is tied to Apache 2.0, which means that all development discussions related to APR take place on [email protected], the development list for the Apache web server. The current goal is to move APR off the Apache mailing lists after Version 1.0 has been released. The earliest this will happen is the day Apache 2.0 is released.

At this point, the only way to retrieve a copy of APR is to download the entire Apache 2.0 CVS source tree. The code can be found in the apache-2.0/src/lib/apr directory. Instructions for getting the code can be found at http://dev.apache.org/.

DDJ

Listing One

(a)

static void setup_shared_mem(pool *p)
{
#ifdef USE_OS2_SCOREBOARD
   ...
   m=(caddr_t) create_shared_heap("\\SHAREMEM\\SCOREBOARD",
#elif defined(USE_POSIX_SCOREBOARD)
   ...
   fd=shm_open(ap_scoreboard_fname, O_RDWR|O_CREAT|S_IRUSR|S_IWUSR);
   ...
#else
   ...
   fd=ap_popenf(p, 
   ap_scoreboard_fname, O_CREAT |O_BINARY|O_RDWR, 0644);
   ...
#endif
}

(b)
<pre>static void init_scoreboard(pool *p)
{
   if (ap_scoreboard_image == NULL) {
      setup_shared_mem(p);
   }
   ...
}
static void setup_shared_mem(pool *p)
{
   ...
#ifdef APR_HAS_SHMEM
   if (ap_shm_init(&scoreboard_shm, SCOREBOARD_SIZE, fname, p) 
       != APR_SUCCESS) {
      ...
   }
#else
      ...
      /* use a file for shared memory */
      ...
#endif
} 
static void init_scoreboard(pool *p)
{
   if (ap_scoreboard_image == NULL) {
   setup_shared_mem(p);
   }
   ...
}

Back to Article

Listing Two

ap_file_t *fd;
ap_status_t rv;
rv = ap_open_file(&fd, "testfile", APR_WRITE, APR_OS_DEFAULT, pool)
if (rv != APR_SUCCESS) {
   rv = ap_canonical_error(rv);
   switch (rv) {
      case APR_EISDIR:
         /* The file requested is a directory */
      case APR_ACCESS:
         /* The current user doesn't have write access to this file */
      case APR_EMFILE:
         /* The process already has the maximum number of files open */
   }
}

Back to Article

Listing Three

void duplicate_stderr(ap_file_t *error_log)
{
   int errfile;
   ap_get_os_file(&errfile, error_log);
   dup2(errfile, STDERR_FILENO);
}

Back to Article

Listing Four

int open_the_file(char *fname, int permissions, int access)
{
   int fd;
   fd = open(fname, access, permissions);
   if (fd > 0) {
      return fd;
   }
   return errno;
}

Back to Article

Listing Five

void *open_the_file(char *fname, int permissions, int access)
{
   HANDLE fd;
   int *errval;
   fd = CreateFile(fname, access, permissions, NULL, 
                                           createflags, permissions, 0);
   if (fd != INVALID_HANDLE_VALUE)
      return fd;
   }
   *errval = GetLastError();
   return errval;
}

Back to Article

Listing Six

ap_status_t open_the_file(char *fname, int permissions, int access, 
                                    ap_pool_t *cont, ap_file_t *newfile);
{
   ap_status_t rv;
   rv = ap_open(&newfile, fname, access, permissions, cont);
   return rv;
}

Back to Article

Listing Seven

int create_the_process(char &program_name, char *const args[], 
                    char **env, int pipein, int pipeout, int pipeerr) 
{ 
   int pid; 
   if (((*new)->pid = fork()) < 0) { 
      return errno; 
   } 
      /* Child process */ 
      dup2(pipein, STDIN_FILENO); 
      dup2(pipeout, STDOUT_FILENO); 
      dup2(pipeerr, STDERR_FILENO); 
      execve(progname, args, env); 
   } 
   /* We forked properly, but there is no way to know if execve worked. */ 
   ap_close(pipein); 
   ap_close(pipeout); 
   ap_close(pipeerr); 
   return 0; 
} 

Back to Article

Listing Eight

int create_the_process(char *program_name, char *const args[], 
               char **env, HANDLE pipein, HANDLE pipeout, HANDLE pipeerr) 
{ 
   HANDLE pid, hCurrentProcess; 
   STARTUPINFO si; 
   hCurrentProcess = GetCurrentProcess(); 
   si.cb = sizeof(si); 
   si.hStdInput = pipein; 
   si.hStdOutput = pipeout; 
   si.hStdError = pipeerr; 
  if (CreateProcess(NULL, program_name, NULL, NULL, 
                                        TRUE, 0, env, NULL, &si, &pid)) { 
      return 0; 
   } 
   return GetLastError(); 
}

Back to Article

Listing Nine

int create_the_process(char *program_name, char *const args[], 
                           char **env, ap_proc_t *proc, ap_pool_t *pool) 
{ 
   ap_proc_attr *attr = NULL; 
   ap_create_procattr(&attr, pool); 
   /* setup pipes to communicate with the child process. The second 
    * argument details how to setup the pipe for child's stdin, the second is for child's 
    * stdout, and the third is for the child's stderr. There are multiple 
    * options for these arguments: 
    *   APR_FULL_BLOCK: Both child and parent block on reads and writes. 
    *   APR_PARENT_BLOCK: Parent blocks and reads and writes, child does not. 
    *   APR_CHILD_BLOCK: Child blocks on reads and writes, parent does not. 
    *   APR_NO_PIPE: No pipe between child and parent for this input/output 
    */
   ap_set_procattr_io(attr, APR_FULL_BLOCK, APR_CHILD_BLOCK, APR_NO_PIPE); 
   return ap_create_process(&proc, program_name, args, NULL, attr, pool); 
}

Back to Article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.