Web Development

Web Extensions and Applications Using FastCGI

By Scott Dybiec and Philip Rousselle, June 01, 1997

By implementing a software layer that receives control from a web server, OpenMarket's FastCGI protocol addresses many of the shortcomings of both conventional CGI and server APIs. Our authors discuss the FastCGI protocol, describe the OpenMarket C programming library for FastCGI, and develop example applications using the OpenMarket library.

Dr. Dobb's Journal June 1997: Web Extensions and Applications Using FastCGI

Scott is a software-solutions architect with IBM's client/server division in Columbus, Ohio. He can be reached at sdybiec@ vnet.ibm.com. Philip develops network-management software at Tivoli Systems in Austin, Texas. He can be reached at [email protected].

Since the inception of the World Wide Web, the Common Gateway Interface (CGI) has been the basis for most applications that provide functionality beyond HTML's hypertext capabilities. Using CGI to dynamically generate HTML has made it possible to implement an enormous variety of web-based client/server applications. However, CGI has several limitations:

Creating and destroying a process for each CGI request can degrade performance.
Complex programming is needed to implement shopping cart applications, which require user state information to be saved between CGI requests.
Internal web-server data is not accessible through CGI, which impedes its use in implementing security protocols, fine-grain usage tracking, or fault detection.
There is no standard way for CGI programs to communicate with one another.

Many web servers are equipped with an additional set of APIs -- Netscape's NSAPI or Apache's ASAPI, for instance -- that overcome some of CGI's limitations. However, while these APIs provide similar functionality, they are proprietary and incompatible. Because these APIs tend to be complex and idiosyncratic, the effort required to master them is considerable. Yet the time spent understanding a particular set of APIs and developing applications to exploit them cannot be applied universally. Developers using these APIs must be extremely careful because errors can compromise the security, integrity, or reliability of the server.

Open Market's FastCGI protocol addresses many of the shortcomings of both conventional CGI and server APIs. Its approach is to implement a software layer that receives control from a web server's API and invokes scripts or programs that conform to the FastCGI specification. In this article, we'll discuss the FastCGI protocol, describe the OpenMarket C programming library for FastCGI, and develop example applications using the OpenMarket library. The FastCGI developer's kit, as well as FastCGI enabling modules for the public-domain Apache and NCSA web servers are available at http://www.openmarket.com/fastcgi/.

CGI Architecture

Figure 1 shows the API architecture of a typical web server. The shaded portions indicate server components that are developed and installed by the user. CGI interfaces provide a friendly and portable programming model, but with poor performance and too little robustness to easily implement web security, accounting, and other kinds of applications. Server API applications are fast and powerful, but they're difficult to develop and port.

CGI programs are most commonly invoked when users click on the SUBMIT button of an HTML form. The browser constructs an HTTP request that encodes the input supplied by the user, and submits that request to the server. The web server looks at the URL and determines whether it should retrieve a file or execute a CGI program. If it needs to execute a CGI program, it creates a process to execute the program (on UNIX systems, this involves a fork() call). The new process is provided with a set of environment variables that describe the HTTP request and that contain information related to the state of the web server. The form data is passed to the process either by placing the data in the QUERY_STRING environment variable or by piping the data to the program's standard input.

The web server also redirects the CGI program's standard output and standard error. When a CGI program completes, the server parses the program's standard output and returns the parsed data to the browser as the response to the original HTTP request. The CGI program's stderr (usually diagnostic or logging output) is usually included in the server's log file.

As you can see, the CGI protocol requires the server to perform considerable processing each time a CGI program is requested: A new process must be created, pipes to accommodate the CGI program's input and output must be created and opened, and the executable file associated with the program must be loaded and run.

FastCGI Architecture

The FastCGI protocol eliminates CGI's overhead (see Figure 2). At run time, the server starts FastCGI applications and establishes sockets to communicate with them. Applications perform any needed initialization and listen to the socket for requests from the server. Each message exchanged across the socket is prefaced by a fixed-length header that specifies the message's protocol version, message type, request ID, and data length. Supported message types include standard input, standard output, standard error, environment variables, and an end-of-request indicator. Standard input messages flow only from the server to the application. Standard output and standard error messages go in the opposite direction. Either party can send environment variables or signal the end of processing for a request. The request ID field of a message enables multithreaded FastCGI applications to service multiple HTTP requests concurrently.

Many of FastCGI's particulars are identical to CGI. When the server receives a CGI request, the server translates the HTTP request data into FastCGI protocol messages, then passes them to the FastCGI application through a socket. HTML response data is returned to the web server over the same socket, translated into an HTTP response, and forwarded to the browser. The overhead of process and pipe creation associated with CGI processing is eliminated.

FastCGI Applications

Since FastCGI is an open specification, you can implement FastCGI applications without depending on software from Open Market. However, Open Market does provide a set of libraries for C, Perl, Tcl, and Java to aid in the development of FastCGI applications. We use Open Market's C library to implement our sample applications.

From a programmer's perspective, one major difference between CGI and FastCGI is that FastCGI processes are persistent. Processes are not created and destroyed for each request; instead, FastCGI programs process a request, then wait to receive the next request.

Before translating standard I/O references into the necessary socket operations, the FastCGI library performs some optimization to improve the efficiency of communications between the web server and application. The FastCGI C library includes an FCGI_Accept() function that receives environment variable settings from the server through the socket and modifies the application's environment variables to conform with the CGI specification. Upon receiving requests, FCGI_Accept() returns a positive integer.

Example 1(a) is a simple C-based CGI application that dynamically generates a web page displaying the time of day. Example 1(b) shows its FastCGI counterpart. All the CGI logic is placed, unmodified, within a loop controlled by the FCGI_Accept() function. The program is linked with the Open Market library to create the FastCGI executable.

A special directive is added to the configuration file of the FastCGI-enabled server to start the process at the same time the server starts. The server directive in Example 1(c) can be added to the httpd.conf file of a FastCGI-enabled Apache server to initialize the time-of-day page generator. It causes the server to start the application and open a socket to that application. Finally, configuration file directives are needed to specify which URLs refer to FastCGI executables. Example 1(d) shows the access.conf directives that serve this purpose for the Apache server. They indicate that any requests that resolve to files in the /server-root/fcgi-executables directory and have a .fcgi suffix are to be handled by the corresponding FastCGI server process.

By simply adding FCGI_Accept() loops (as in Example 1), existing C-based CGI applications can be used as the basis for high-performance web-server extensions. Porting larger applications may not be as trivial. Larger CGI programs, developed to terminate after each request, sometimes have memory leaks that would not be harmful in a conventional CGI environment. Migration may also be complicated by the need to recompile all code that uses the stdio library routines with the FastCGI library. Applications that use utilities that cannot be recompiled (for instance, if source code is not available) may not be migratable.

Another potential performance boost resulting from the persistence of the FastCGI process is that data can be stored in memory rather than on disk. Example 2 shows how to enhance the time-of-day page generator of Example 1(b) to display and maintain a reference counter. The performance impact of this enhancement to the FastCGI program is negligible. However, adding similar functionality to the CGI code in Example 1(a) would require two additional references to a disk file each time the program is executed: one to read the count and another to store the updated value.

Using FastCGI to Customize a Web Server

The techniques described thus far illustrate how FastCGI can be used to implement applications that are equivalent to CGI applications but provide better performance. However, the intent of FastCGI is not only to alleviate the performance limitations of CGI, but also to provide a means for developing portable web-server customizations.

To this end, FastCGI applications can be specified to fill different roles. For instance, a FastCGI program that fields a user's request and produces an HTML page in response (as in Examples 1(b) and 2) is termed a "Responder." In addition to Responders, the FastCGI specification also includes the "Filter" and "Authorizer" roles. The server uses Filter applications to post-process the data requested by a user. The server calls Authorizers immediately after receiving a request to validate a user's security credentials.

We have modified Open Market's Apache FastCGI module to enable the Authorizer functionality (available from http://www.humanfactor.com/downloads/). Example 3 shows the Apache directives used to activate a FastCGI Authorizer. Example 3(a) contains the httpd.conf directives, and Example 3(b) the access.conf directives. As in Example 1(c), the AppClass directive causes the indicated FastCGI executable to be started when the web server starts. The -initial-env clause is used to pass environment variables to the FastCGI process. The AddFastCgiAuth directive, Example 3(a), indicates that the process is an Authorizer rather than a Responder. The Authorizer in this example will base its authorization decision on whether a user ID and password supplied by a browser are valid to gain access to an FTP server. The FTP-HOST environment variable identifies which FTP server to consult.

The access.conf directives in Example 3(b) tell the web server that access to files in the directory /server-root/secret-docs are protected by the HTTP basic-authentication protocol. The protocol requires a user ID and password to gain access to documents in the directory. The AuthFastCgi directive, which is the only place Example 3(b) differs from standard Apache syntax, tells the web server not to consult its own user and group files. Instead, the FastCGI process specified by the AuthFastCgi directive operand performs the authorization.

Authorizer (available electronically; see "Availability," page 3) is the C program referenced in Example 3. The Authorizer application begins by initializing a hash table to cache valid user IDs and passwords. The remainder of its logic is contained within a FCGI_Accept() loop. Each time it fields a request, the SessionStatus() routine is called. If SessionStatus finds a hash table entry matching the supplied user ID and password, and the hash table entry has not been inactive or resident in the table longer than predefined limits for these periods, it allows the user to access the requested data. When access cannot be granted based on information in the hash table, the TryFtpLogon() routine is called, passing the user ID and password to the local FTP server. If the user ID and password are valid for access to the FTP server, the user is granted access to the web-server data and an entry for the user is added to the hash table (in NewSession()).

Once the Authorizer has processed the user ID and password, SendAuthStatus() is called to communicate the results to the web server. If the authorization is successful, two environment variables are sent back to the server: Variable-AUTH_TYPE and Variable-REMOTE_PASSWORD. The latter is used to reset the password for the user so no other CGI or FastCGI scripts can see this data. Each time a specified number of requests have been processed, ReapExpiredSessions() is called to remove old or inactive entries from the hash table. The time limits and frequency of pruning can be controlled by using the -initial-env option in the web server configuration file to set the INACTIVITY_TIMEOUT, SESSION_LIFETIME, and REAPER_FREQ environment variables.

The Authorizer application illustrates the utility and simplicity of FastCGI for web-server customization. In this case, an FTP server, which may be accessible to multiple web servers, is used to provide centralized password validation. Authorization results are cached efficiently in memory because of process persistence. The caching reduces the likelihood of submitting excessive authorization requests to the FTP server.

Normally, an application like this is implemented using server APIs. The FastCGI code in the Authorizer application is more portable than similar API-based implementations, and has other advantages. For instance, because it runs in its own process, a failure will only cause documents in the protected directory to be inaccessible and will not affect other web-server operations.

While FastCGI processes can be specified to serve as Responders, Authorizers, or Filters, it is also possible for a single FastCGI application to perform more than one of these roles. An enhanced version of the Authorizer program (also available electronically), produces an HTML report detailing the settings of its parameters and the state of its internal database (see Figure 3). The main() routine of the enhanced program interrogates the FCGI_ROLE environment variable, which is set by FCGI_Accept(), to determine if it has been called to authorize an access request or produce an HTML report. In the latter case, respond() is called to build the portion of the table that shows the parameter settings. respond() calls DisplaySessions to walk through the hash table and build one HTML table row for each active hash-table entry. This administrative interface can be used to monitor the usage of the Authorizer and to diagnose performance problems.

Improving FastCGI Performance

Our FastCGI examples were all single-process, single-threaded servers. If requests come faster than they are processed, they are queued. This creates the possibility of a bottleneck. The easiest way to minimize this possibility is to direct the web server to spawn multiple instances of critical FastCGI processes. This is done by adding a -processes <n> option to the AppClass in the configuration-file directive that starts the FastCGI server process. Of course, additional programming may be required to achieve efficient performance of replicated processes. For example, if multiple instances of the FTP-based Authorizer are instantiated, it would be desirable to maintain the hash table in shared memory (where it would be accessible to all instances).

Further performance improvements can be achieved by using web-server environment variables for communicat-ing between FastCGI processes. In the Authorizer program, SendAuthStatus() demonstrates how a FastCGI program could modify these variables. Using this technique, FastCGI Authorizer or Responder processes could set variables to be interpreted by other FastCGI processes that are invoked later for a given request.

DDJ

1 2 3 4 5 6 7 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Web Development