Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Web Development

Working With CGI


Dr. Dobb's Sourcebook September/October 1997: Web Database Developer

William is the executive director of Global Internet Solutions and author of several books, including Web Publishing Unleashed, FrontPage 97 Unleashed, Peter Norton's Guide to Java Programming, and Netscape ONE Developer's Guide. William can be contacted at [email protected].


At times, web technologies seem to be spinning out of control, away from traditional development cycles that advance at a comfortably moderate pace. Just about the time you develop a working solution for your web-to-database needs, the paradigm shifts, and you are left with a solution that may not run properly with the latest and greatest browser or server software. Instead of trying to keep up with the insane pace of change, it may be time to return to the basics and forget about jumping on the technological bullet train.

In the dark and early days of the Web, developers created their own web-to-database solutions by tapping into the power of CGI, a standard that defines how external programs are used on web servers and how they interact with browsers and other applications. You can use a CGI script to act as the middleman between databases, web servers, and browsers. The wonderful thing about CGI is that your scripts can be written in just about any programming or scripting language, including C/C++, Perl, and UNIX shell. Thus, you can use structures you know to create a working web-to-database solution.

As a traditional software developer making the move to web development, you probably want to know what makes CGI tick and how you can use it. In this column, I will take a look at how CGI works.

Taking a Look at CGI

CGI scripts are used extensively on the Web. Anytime someone accesses a page at your web site or submits a form, you can use a script to handle the necessary transactions. If you want to serve unique pages based on browser type, you can use a CGI script to generate the necessary markup. If you want to use a form to make requests and updates to your database, you can use a CGI script to process the input, pass the request on to the database server, and format the results.

Regardless of how a CGI script is used at your web site, all data sent to the script via the server is either encapsulated in environment variables or sent directly in the standard input stream. After the script processes the input, the results can be formatted to make database queries and updates. In turn, your database server handles the requests and returns the results to the script, which formats the data so it can be sent to the user's browser.

The output from a CGI script begins with an HTTP header containing a directive to the web server. A blank line separates the header from the actual data you are passing back to the browser. After the server interprets the output and sets environment variables, it passes the results to the client.

Environment Variables and the Input Stream

Input to a CGI script is usually in the form of environment variables that are associated with the browser requesting information, the server processing the request, and the data passed in the request. Environment variables are case sensitive and are set automatically (whenever user input is passed to a server). Although some environment variables are browser and server specific, there is a core set of environment variables that you will see again and again. Table 1 describes these common environment variables.

Not all input fits neatly into an environment variable. When a user submits actual data to be processed by a CGI script, the data is received as a URL-encoded string via either an environment variable or the standard input stream. Data submitted using the GET method is sent via the environment variable QUERY_STRING. Data submitted using the POST method is sent via the standard input stream. Web servers track the submission method using the REQUEST_METHOD environment variable.

To get a snapshot of the environment variables sent to a script, you need to know the method used to submit the data, the applicable data fields, and the URL to the CGI script. Let's say you create a web page that allows your marketing department to perform a database lookup. The main attraction on the page is a form that has four input fields: CUSTOMER_NAME, CUSTOMER_ADDRESS, CUSTOMER_PHONE, and CUSTOMER_ACCOUNT.

When the form is submitted, a script called lookup.pl processes the data and passes it off to the database server. Following this, you can paint a fairly complete picture of a search with these values:

Fred Meyer
Jacksonville, Florida

When the GET method is used, the server sets the environment variables in Listing One, then passes the input to the lookup.pl script. When the POST method is used, the server sets the environment variables in Listing Two, then passes the input to the lookup.pl script. On the standard input stream, the script reads in the submitted data:

NAME=Fred+Meyer&ADDRESS=Jacksonville,+Florida.

By examining these examples, you can see that data submitted using the GET method is handled differently than data submitted using the POST method. When you use the GET method, any data submitted is stored in the QUERY_STRING environment variable. Because the QUERY_STRING variable is subject to the environment variable length restrictions of the server, data may get truncated at 256 characters or less. Generally speaking, data truncation is a bad thing -- which is why most web developers prefer to use the POST method and the standard input stream.

With the standard input stream, all the server has to do is tell the script the number of bytes to read. The server does this using the CONTENT_LENGTH variable. In turn, the script opens the standard input stream and reads the specified amount of data bytes.

Output and the Output Stream

After processing the input, your script can call other applications, like your database server. The key is to format the data in a way that allows your server to use it. For example, if your are using the Sybase SQL Server, you can put together a series of statements to execute in the database, then invoke ISQL with the name of the file containing your statements.

The results from the database are then returned to the script. The script, in turn, processes the results and passes the data to the web server. The server then returns the output to the client. Generally, this output is in the form of an HTTP response that includes a header followed by a blank line and a body.

CGI headers are strictly formatted sections that contain directives to the server. The three valid server directives are: Content-type, Location, and Status. (Usually, when the Location and Status directives are used, no additional data is sent to the browser.)

A single header can contain one or all of the server directives. Your CGI script is responsible for outputting these directives to the server. While the header is followed by a blank line that separates the header from the body, the output does not have to contain a body.

The Content-Type field in a CGI header identifies the MIME type of the data you are sending back to the client. Usually, the data output from a script is a fully formatted document, such as a web page. You could specify this in the header using: Content-Type: text/html.

The output of your script doesn't have to be a document created within the script. You can reference any document on the Web using the Location field. The Location field references a file by its URL. Servers process Location references either directly or indirectly, depending on the whereabouts of the file. If the server can find the file locally, it passes the file to the client. Otherwise, the server redirects the URL to the client and the client has to retrieve the file. You would specify a location using: Location: http://www. tvpress.com/.

The Status field passes a status line to the server for forwarding to the client. Status codes are expressed as a three-digit code followed by a string that generally explains what has occurred. The first digit of a status code shows the general status:

1XX Continue/Protocol Change
2XX Success
3XX Redirection
4XX Client error
5XX Server error

While many status codes are used by servers, the status codes you pass to a client via your CGI script are usually client error codes.

Creating the output from a CGI script is easier than it may seem. All you have to do is format the output into a header and body using your favorite programming language. If you want the script to output a simple web page, Listing Three shows how you could do it using Perl. The server processing the output sets environment variables, creates an HTTP header, then sends the data on to the client. Listing Four shows how the output sent to a client might look coming from the NCSA web server.

Conclusion

Getting back to the basics may be just what you need to stay on track with your web development efforts -- instead of relying on the latest technological achievements, you may be better off turning to your good old-fashioned know-how to get the job done. In upcoming columns, I plan to take a closer look at how you can use basic tools like CGI and HTML to design web-to-database solutions. Some of the things I will examine include friendly interface design with HTML, hands-on processing with Perl, and scripting your Sybase SQL server.

DDJ

Listing One

PATH=/bin:/usr/bin:/usr/etc
SERVER_SOFTWARE = NCSA/3.0
SERVER_NAME = www.tvpress.com
GATEWAY_INTERFACE = CGI/1.1
SERVER_PROTOCOL = HTTP/1.1
SERVER_PORT= 80
REQUEST_METHOD = GET
HTTP_ACCEPT = text/plain, text/html, application/rtf, application/postscript,
audio/basic, audio/x-aiff, image/gif, image/jpeg, video/mpeg
PATH_INFO =
PATH_TRANSLATED =
SCRIPT_NAME = /cgi-bin/lookup.pl
QUERY_STRING = NAME=Fred+Meyer&ADDRESS=Jacksonville,+Florida
REMOTE_HOST =
REMOTE_ADDR =


REMOTE_USER =
AUTH_TYPE =
CONTENT_TYPE =
CONTENT_LENGTH =

Back to Article

Listing Two

PATH=/bin:/usr/bin:/usr/etc
SERVER_SOFTWARE = NCSA/3.0
SERVER_NAME = www.tvpress.com
GATEWAY_INTERFACE = CGI/1.1
SERVER_PROTOCOL = HTTP/1.1
SERVER_PORT= 80
REQUEST_METHOD = POST
HTTP_ACCEPT = text/plain, text/html, application/rtf, application/postscript,
audio/basic, audio/x-aiff, image/gif, image/jpeg, image/tiff, video/mpeg
PATH_INFO =
PATH_TRANSLATED =
SCRIPT_NAME = /cgi-bin/lookup.pl
QUERY_STRING =
REMOTE_HOST =
REMOTE_ADDR =
REMOTE_USER =
AUTH_TYPE =
CONTENT_TYPE = application/x-www-form-urlencoded
CONTENT_LENGTH = 46

Back to Article

Listing Three

#!/usr/bin/perl

#Create header with blank line
print "Content-Type: text/html\n\n";

#Add body in HTML format
print <<"MAIN";
<HTML>
<HEAD>
<TITLE>Database Lookup Results</TITLE>
</HEAD>
<BODY BGCOLOR=#FFFFFF>

<H1>Search Parameters</H1>
<P>You searched for: $cust</P>

<H1>Search Results Empty</H1>
</BODY>
MAIN

Back to Article

Listing Four

HTTP/1.0 302 Found
MIME-Version: 1.0
Server: NCSA/3.0
Date: Tuesday, 30-June-97 00:00:00 PST
Content-Type: text/html
Content-Length: 162

<HTML>
<HEAD>
<TITLE>Database Lookup Results</TITLE>

</HEAD>
<BODY BGCOLOR=#FFFFFF>

<H1>Search Parameters</H1>
<P>You searched for: $cust</P>

<H1>Search Results Empty</H1>
</BODY>

Back to Article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.