Dr. Dobb's | Writing Portable Applications with APR

Writing Portable Applications with APR

A truly portable C runtime available right now.

September 01, 2003
URL:http://www.drdobbs.com/writing-portable-applications-with-apr/184401691

Writing Portable Applications with APR

Anybody who has ever written a C or C++ program that must work on many operating systems has faced the same problem: not all C/C++ libraries are portable. Every platform implements POSIX functions slightly differently, and if your program needs to work on platforms other than Unix, such as Windows, the problem is made worse by the fact that those platforms have their own APIs. Although those platforms have a POSIX implementation, the native APIs are always faster and have fewer bugs than the POSIX APIs.

Programmers have developed many solutions to the portability problem through the years. The first solution is to program using strictly POSIX functions because most platforms have a POSIX layer. This solution works, but POSIX causes its own problems. For example, on Unix platforms, writing in append mode ensures that lines are always printed at the end of the file. Windows, however, has no way to do this. So, to emulate append mode in Windows, you must seek to the end of the file and write. Of course, seek and write aren't atomic operations, which would require a lock. But Windows has a solution to this problem if you use native functions. It is possible to open a file for overlapped I/O, which allows you to seek and write in a single atomic operation.

The second option is to pick a core platform and find an emulation library for all other platforms. Many programmers use Unix as their core platform and Cygwin as their emulation layer. Although this solution works, applications that run within this kind of environment are running in an emulation mode, which means they don't behave like native applications to the user, and they are usually slower than native applications.

The third solution is for the programmer to write a portability library that abstracts the differences between platforms. The problem with this solution is that writing portable code isn't easy, and it takes a long time and a lot of testing to get a robust library. Unless your goal is to create a portable runtime, you are spending time duplicating work that has already been done many times.

The final option is to find a portable and robust library that has already been written. The rest of this article discusses the Apache Portable Run-time (APR), an Open Source portability run-time that aims to solve the C/C++ portability problem. This project is covered by the Apache License, which is a BSD-like license.

Introducing APR

Apache 1.3 was ported to many versions of Unix, Windows, OS/2, BeOS, Netware, OS/390, and AS/400. But, Apache 1.3 relied on POSIX for it's portability, which means that although it worked (for the most part), it wasn't always stable on non-Unix platforms. Apache 1.3 also had performance problems on Windows, because the POSIX wrappers have performance problems. Finally, to fix some of these problems, the Apache developers chose to fork some of the Apache 1.3 code using #ifdefs so that the important parts of the code would be written specifically for each platform. This approach caused maintenance problems as the product was updated, a common problem with programs that choose to use POSIX for portability.

The Apache Foundation developed APR as a part of creating the second version of the Apache web server. We wanted to port Apache 2.0 to as many versions of Unix as possible, as well as Windows, OS/2, BeOS, Netware, OS/390, and AS/400, and other platforms. However, we also wanted to solve all of the problems associated with Apache 1.3. To that end, APR uses native function on all platforms and only relies on POSIX when POSIX is the best option. Also, because all of the platform differences are isolated in the APR layer, the Apache code eliminates most of the #ifdefs that have caused such a maintenence problem. At its most basic level, APR is just an abstraction layer between the operating system and the application. When an #ifdef does occur in most code that uses APR, it almost always refers to one of APR's feature macros and doesn't tie the code inside the #ifdef to a particular platform.

APR is a runtime library analagous to the C runtime or the Microsoft runtime. To use APR, you must adapt your code to call APR equivalent functions. For example, if you are trying to open a file, instead of calling fopen or CreateFile, you would call apr_file_open. Of course, introducing APR functions to your code means that you are tying your application to APR, but the payoff is an application that works on more platforms. APR has functions for all of the most common operations. For a small sample of the features available in APR, see Table 1.

Although Apache was created to support the Apache web server, the Apache Foundation makes APR available to all programmers as a general purpose portability tool. To use APR in your programs, you will have to link against the APR library for your system. Currently APR developers are only distributing source code, so you will have to build APR before you can use it in your applications. APR uses a standard autoconf build system on Unix, so to build APR run the commands:

./configure; make; make install

For builds on Windows, APR has a project file that you can use in Visual Studio. Once the library is built, you need to link it into your program. This is done with the -l flag to the linker on Unix. On Windows, you will have to add APR to your project.

The rest of this article examines two standard Unix utilities re-written with APR to demonstrate how APR can solve the portability problem. These programs are not easily available on Windows as native programs. They are available as part of most Unix portability packages, such as Cygwin, but since they are not native programs, the Windows versions do not behave like standard Windows programs. The APR-based examples provide a more native implementation to the utilities. The example programs described in this article are intended for illustration purposes and do not implement all of the options found in their Unix equivalents.

A simple cat program

The first program, cat.c (see Listing 1), is a simplified version of the Unix cat command. The first two lines that I want to draw your attention to are lines 13 and 14. These lines are always found at the beginning of every APR program.The apr_app_initialize function allocates some internal APR variables and ensures that APR is configured properly for the current machine. The second line sets up apr_terminate, which will be called when the program exits. Just as apr_app_initialize allocates some memory, apr_terminate deallocates the same memory. On some platforms, apr_terminate releases semaphores, so if this function isn't called, you may find that other programs fail in unexpected ways.

After these initial setup steps, the program creates its first memory pool. If APR has a major drawback, this is it. APR was designed around a very specific memory model: pools. The idea is that memory is allocated early and reused as often as possible. This design can be a major advantage for programs that do the same operations repeatedly, because the memory usage hits a steady state and the same memory can be used repeatedly. However, pools may not work well for programs that perform many different tasks, such as games, which are constantly changing their memory usage. For a more complete description of memory pools, see the sidebar.

The next two lines open the two file descriptors that I need for this application. The first line is stdout. Most programmers are used to using stdout for this purpose, but stdout doesn't always work if you are in Windows. For example, it is standard practice in a Unix daemon to redirect stderr to a log file for easy debugging. Windows Services, however, do not have stdin, stdout, or stderr. By providing functions to access the equivalent of those file handles, APR can remove a major portability hurdle. The second file descriptor is to the file that we want to read. Unfortunately, file permissions are not well done in APR and are mostly meaningless on non-Unix platforms. This is a hard problem, and hopefully the APR developers will tackle it in a later release. Also notice that I check for success using APR_SUCCESS. This check is standard in APR; almost all APR functions return APR_SUCCESS if the function finished successfully and the exact error code if it did not. Functions that do not do this generally cannot fail and so do not return any value.

Finally, the program loops through the file, reading one line at a time and writing it to stdout. Notice that I did not close either file descriptor. The pool model lets APR applications drop file descriptors when they are no longer needed. APR applications can register cleanups to run when a pool is cleared or destroyed. When APR opens a file, socket, or any other resource, it registers a cleanup to run when the pool is cleared. For files, the cleanup closes the file descriptor. As long as pools are used judiciously, this ensures that resource leaks are rare because resources are cleaned as a part of memory management.

A Version of rm for Windows

rm.c, see Listing 2, is a replacement for rm. Like the cat replacement, this program starts by preparing for APR.The program then opens stdin and stderr. The rm application prompts the user for information while executing, so I need a way to read the response. The really interesting logic happens from line 37 to line 52, however. Getopt is a simple function that is standard in Unix for processing command line arguments. Many Unix programmers find it hard to believe, but there is no equivalent function in the Windows API. That means that all Windows programmers must re-implement command line parsing. APR has solved this problem by providing a robust implementation of getopt for all platforms. In fact, APR uses its own getopt implementation even if the current platform provides an implementation. The problem is that getopt is not exactly standard on Unix either. Some Unix variants allow optional arguments, others do not. Some platforms have getopt implementations that allow long option names, such as --force, others do not. Rather than allow programmers to write non-portable software, APR imposes its getopt version.

To use getopt, you must initialize it first using apr_getopt_init. Then you can loop, calling apr_getopt every time through the loop. As long as apr_getopt returns APR_SUCCESS, you know that the next option on the command line is an acceptable argument. In this case, I have defined f and i to be the only arguments I will accept, and neither takes an argument. After returning from getopt, you can act on the option. In this case, I am keeping track of the number of options so that I can find the list of files to delete later on. Then, depending on whether the user told us to force the delete (f), or prompt interactively (i), I set a boolean. If an unrecognized option is given, I call a simple program that reminds the user of the possible options and exits.

One quick warning about apr_getopt. Like most getopt implementations, it automatically prints an error if an illegal flag is passed to the program. For example, if -G is passed to rm, the following error message appears:

a.out: illegal option -- G
./a.out [-fi] file_name

Notice that I did not put the illegal option error message in the code.

This message can be suppressed by adding the line

opt->errfn = NULL

after the call to apr_getopt_init. This line tells APR to leave it up to the programmer to print all error messages from apr_getopt. Most people will not want to do this, because printing the error message is standard for most getopt implementations.

Now we get to the meat of the program: a simple loop that tries to delete every file in the argument list. This simple example does not allow the user to delete directories. To support directory deletion, you would need to recursively delete every file in the directory, which you could easily add using apr_dir_read, but that step is left as an exercise for the reader. In order to keep people from deleting directories though, I must know when somebody tries to do so. This can be done using apr_stat. If you are used to using the standard stat function, apr_stat will look a little strange. The strangest part is the third argument, which in this case is APR_FINFO_TYPE. The APR developers had two major goals when writing APR: portability and performance. Often those goals are in conflict; apr_stat is a good example of this. The problem is that stat is a very expensive call, and some platforms (Windows most notably) can return some information very easily, while other information requires more time. So, to balance between portability and performance, the third argument was added to apr_stat. This third argument is an OR'ed list of the type of information that you want returned. The contract from apr_stat is that it can return more information than you have asked for, but it can never return less (unless there is an error). Since all I care about is the type of the file, that is all I have asked for.

If the user does ask to delete a directory, the program prints a simple error message and continues to the next file on the command line. However, notice the error message that is printed. This is standard practice for APR applications. The APR_EOL_STR macro is defined by APR to be the correct end-of-line sequence for a given platform, so for Unix it will map to \n, but on Windows it is \r\n.

After checking for directories, the program checks if the user should be prompted before deleting the file and, if so, prompts accordingly. Assuming the user has elected to continue, the program can finally delete the file.

Conclusion

Both of these examples were very simple, and they have barely scratched the surface of what APR is capable of. APR is already being used by some of the most portable applications available today. While APR's original charter was to abstract out the lowest level functions, such as file I/O, network I/O, and shared memory, it has gone much farther than that. APR has two sub-projects, apr-util and apr-iconv, that provide more portable features. apr-util includes string matching routines, encoding routines, and Database access. apr-iconv, on the other hand, only provides one feature; portable character set conversion. For anybody who has ever tried to use iconv on a wide array of platforms, you know that most platforms either do not support iconv, or they have a wide-array of supported features. apr-iconv is an attempt to solve this problem.

You'll find more information on APR, including full source code, at <http://apr.apache.org>. The website includes full documentation for all APR APIs, as well as information on how to contribute to the APR project.

About the Author

Ryan Bloom is a consultant for TechLink Systems, currently working with Covalent Technologies. He was one of the creators of the Apache Portable Run-time project and is the author of "Apache Server 2.0: The Complete Reference". He can be reached at [email protected].

Listing 1: The full cat program

Listing 1: The full cat program

#include "apr_pools.h"
#include "apr_file_io.h"

#define STR_LEN 256

int main(int argc, const char * const argv[])
{
    apr_pool_t *pool;
    apr_file_t *thefile = NULL;
    apr_file_t *out = NULL;
    char str[STR_LEN];

    apr_app_initialize(&argc, &argv, NULL);
    atexit(apr_terminate);

    apr_pool_create(&pool, NULL);
    apr_file_open_stdout(&out, pool);

    if (apr_file_open(&thefile, argv[1], APR_READ | APR_CREATE,
                  APR_UREAD | APR_UWRITE | APR_GREAD, pool) !=
                      APR_SUCCESS) {
        apr_file_printf(out, "Could not open file %s\n", argv[1]);
    }

    while (apr_file_eof(thefile) != APR_EOF) {
        apr_size_t bytes;

        apr_file_gets(str, STR_LEN, thefile);
        bytes = strlen(str);
        apr_file_write(out, str, &bytes);
    }
    apr_pool_destroy(pool);
}

Listing 2: The full rm program

Listing 2: The full rm program

#include "apr_file_io.h"
#include "apr_file_info.h"
#include "apr_getopt.h"
#include "apr_lib.h"
#include "apr_pools.h"

#define STR_LEN 256

void showUsage(const char *argv0, apr_file_t *err) {
    apr_file_printf(err, "%s [-fi] file_name" APR_EOL_STR, argv0);
    exit(1);
}

int main(int argc, const char * const argv[])
{
    apr_pool_t *pool;
    apr_file_t *err = NULL;
    apr_file_t *in = NULL;
    apr_getopt_t *opt;
    int prompt = TRUE;
    char c;
    const char *optarg;
    apr_status_t rv;
    apr_finfo_t finfo;
    int numargs = 1;

    apr_app_initialize(&argc, &argv, NULL);
    atexit(apr_terminate);

    apr_pool_create(&pool, NULL);
    apr_file_open_stdin(&in, pool);
    apr_file_open_stderr(&err, pool);

    if (argc < 2) {
        showUsage(argv[0], err);
    }
    apr_getopt_init(&opt, pool, argc, argv);

    while ((rv = apr_getopt(opt, "fi", &c, &optarg)) == APR_SUCCESS) {
        numargs++;
        switch (c) {
            case 'f':
                prompt = FALSE;
                break;
            case 'i':
                prompt = TRUE;
                break;
        }
    }
    if (APR_STATUS_IS_BADCH(rv)) {
        showUsage(argv[0], err);
    }

    while (numargs < argc) {
        numargs++;
        apr_stat(&finfo, argv[numargs - 1], APR_FINFO_TYPE, pool);
        
        if (finfo.filetype == APR_DIR) {
            apr_file_printf(err, "Cannot delete directory %s"
                    APR_EOL_STR,
                            argv[numargs - 1]);
            continue;
        }
        
        if (prompt == TRUE) {
            char answer[2];
            int numread = 2;
            apr_file_printf(err, "%s: remove file '%s'? ",
                            argv[0], argv[numargs - 1]);
            apr_file_read(in, answer, &numread);
            
            if (apr_tolower(answer[0]) != 'y') {
                continue;
            }
        }
            
        apr_file_remove(argv[numargs - 1], pool);
    }
    apr_pool_destroy(pool);
}

Memory Pools

Memory pools are one of the most misunderstood concepts in APR. Most new developers either use too many or too few pools, because they do not understand exactly what a memory pool represents or why it should be used. At its most basic level a memory pool (as defined by APR) is about scope. Pools are used to define how long memory is available, but they are also used to define when resources are cleaned up.

When a pool is created, it is allocated an 8K block of memory. After that, it is possible to allocate memory out of the pool using apr_palloc. If more memory is requested than is currently available to the pool, another 8K is given to the pool. The memory is never freed for the lifetime of the pool. It continues to grow until the pool is either cleared or destroyed. When the pool is cleared, the memory is marked as available again, and new calls to apr_palloc will reuse the same memory. In this way, memory pools keep applications from calling malloc too often because a steady-state is quickly reached where the maximum amount of required memory is already in a pool and the memory is re-used forever. When the pool is destroyed, the memory is released back to the parent of the current pool.

However, since the memory is never freed, a memory pool could be a recipe for huge memory leaks. Pools remove the danger of memory leaks by making pools hierarchical. When an application is about to perform a short operation that needs memory, a sub-pool is created within the current pool. When the operation is complete, the pool is destroyed, giving the memory back to the current pool for use in either a new sub-pool or in this pool itself. This way, the free function is never called, but you get the same behavior you would get if you had called free. One small trick: it is always possible to create a pool without a parent by passing NULL in as the parent pool. When this pool is destroyed, the memory is actually released using the free function.

Many programmers are afraid to create sub-pools, thinking that creating a sub-pool must be a very expensive operation. In reality, sub-pool creation is very cheap, and pools should be created whenever you have an isolated task to perform.

The other way that pools manage scope is by allowing you to tie resources to pool scope. In APR, this is done through cleanups. The idea is that you can register a cleanup with a pool, so that when the pool is cleared or destroyed, the cleanup is run. If you are doing something like reading a file and you have a pool that is specifically used for reading the file, you know for a fact that you won't need the file to remain open after the pool is destroyed. So, you can register a cleanup with the pool to close the file and then ignore the actual act of closing the file. If used properly, cleanups can make program termination much easier to implement. For example, if your program creates a file to store the process ID on start-up, you will want that file to be deleted when the process dies. To accomplish this, create a pool whose scope is the lifetime of the program. Then, register a cleanup to delete the file when the pool dies. As part of terminating the process, destroy the pool, and the file will be destroyed. This looks like it is just trading one type of cleanup for another (deleting the file instead of destroying the pool). The difference is that you can register any number of cleanups with one pool, so by destroying the pool, you can delete the file and also perform other tasks, such as unallocating any semaphores you have opened.

Table 1: Examples of APR Functions

Table 1: Examples of APR Functions