Writing CGI Scripts in REXX
Decoding input, sending output back to the client, reporting diagnostics and errors, and more!
CGI scripts can be written in many programming languages. REXX, a procedural language that allows programs to be written in a clear and structured way, is one such language. REXX has powerful arithmetic and character-
manipulation capabilities and very comprehensive debugging facilities. Since it is often executed by an interpreter, it permits rapid program development.
In this article, I'll discuss writing your own Web executable scripts in REXX using the Common Gateway Interface (CGI), assuming you are familiar with HTML and forms, have some programming experience, and have access to a REXX interpreter or documentation. In particular, I'll focus on how to use REXX to get and decode input, send output back to the client, and report diagnostics and errors. I'll also introduce a library of REXX CGI functions-freely available at http://www.slac.stanford.edu/slac/www/tool/cgi-rexx/cgi-lib.rxx-that may be used to simplify writing scripts.
A REXX Backgrounder
The original definition of the REXX language was published in The REXX Language: A Practical Approach to Programming, by Mike Cowlishaw (Prentice Hall, 1985). The language was designed from the outset for easy readability, using a minimum of boilerplate, required punctuation, special escape characters, notations in general, and reserved words (keywords, for example, are only reserved in context). It allows you to spread a clause over several lines or to include multiple clauses on a single line (separated by semicolons). Yet it elegantly does not require a semicolon at the end of clauses already terminated by a line end. Since it has only one data type-the character string-no declarations are needed. There are no inherent limits on the size of strings, and it allows the creation of simple programs with a minimum of overhead. As a result, it is easy to use and remember by both computing professionals and "casual" users.
REXX provides rich control constructs, including IF/THEN/ELSE, DO/END, WHILE, UNTIL, FOR, SELECT/WHEN/OTHERWISE/END, as well as ITERATE and LEAVE for modifying loop execution, and SIGNAL to provide abnormal transfer of control. It also provides associative variables, dynamic variable scoping, powerful string parsing and data extraction, as well as character- and word-manipulating facilities, a powerful built-in library of functions, and support for internal and external procedures. REXX can access information from the host's environment and issue commands to the host environment or programs written in other languages. REXX is an ANSI standard, and programs are highly portable across a wide variety of platforms, from mainframes to PCs and Macs, and operating systems from MVS/VM through VMS and multiple flavors of UNIX to OS/2, PC/DOS and MacOS. Both commercial and public-domain implementations of REXX are available.
To learn more about REXX, check out:
Most REXX implementations carefully follow the language as defined by Cowlishaw in The REXX Language. Scripts are therefore usually highly portable between REXX implementations. The REXX examples I provide here are written for and tested in uni-REXX as defined in the uni-REXX Reference Manual by the Workstation Group. uni-REXX is a UNIX implementation of the REXX language. In addition, any operating-system- dependent examples are provided for the UNIX environment. Most of the examples are common across multiple operating systems (for example, the finger command). However, you will need to change operating-system-dependent items such as filenames.
To write a REXX script sensitive to the operating environment of the host, use the REXX command PARSE SOURCE Architecture to obtain the name of the operating system in the variable Architecture (for example, "UNIX" or "CMS").
One area in which implementations of REXX currently differ is accessing system environment variables. In uni-REXX, the setting of an environment variable is returned by the GETENV('string') function, and PUTENV('string='value) is used to set environment variables (where string is the name of the environment variable). The examples in this article use both GETENV and PUTENV.
Other implementations of REXX, such as OS/2's, often use the REXX VALUE(name[,newvalue][,selector]) function (the brackets indicate optional arguments). This can return the variable's value by name. The selector names an implementation-defined external collection of variables. If newvalue is supplied, then the named variable is assigned this new value.
Thus you can discover the value of the environment variable QUERY_STRING in uni-REXX using Input=GETENV('QUERY_STRING') and in OS/2 REXX by using Input=VALUE('QUERY_STRING',,'OS2ENVIRONMENT').
To perform the same feat with other versions of REXX, consult documentation for your REXX implementation. Usually you simply need the literal string to be used for the selector to access the environment variables.
Since REXX is case insensitive (apart from literals), I identify REXX keywords by making them all uppercase (for example, the name of a built-in function like VERIFY).
Reading and Processing the Input to the Script
The input may be passed to the script in several ways. The most common ways are via the QUERY_STRING and PATH_INFO environment variables.
The QUERY_STRING environment variable contains anything that follows the first question mark in the URL. This information could be also be added by an HTML form (with the GET action). This string will usually be an information query. For example, what users want to search for in a database, or perhaps encoded results of your feedback form.
The QUERY_STRING input can be accessed in REXX via String=GETENV('QUERY_STRING'). This string is encoded in the standard URL format that changes spaces to plus signs (+), and encodes special characters with %XX hexadecimal encoding. You need to decode the string in order to use it. The REXX built-in TRANSLATE function provides a simple way to convert the plus signs to spaces: Input=TRANSLATE(Input,' ','+').
The DeWeb REXX PROCEDURE (from the library of REXX CGI functions) provides an example of how to decode the special hexadecimal characters; see Listing One . If your server is not decoding results from a form, your script will also get the QUERY_STRING on the command line. It is thus available via the REXX PARSE ARG command. For example, for a URL http://www.my.box/cgibin/foobar?hello+world, if you use the REXX command PARSE ARG Arg1 Arg2, then Arg1 will contain "hello" and Arg2 will contain "world" (the plus sign is replaced with a space).
The PATH_INFO environment variable contains the "extra" information after the path of your CGI script in the URL. This information is not encoded by the server in any way. For example, say you have a CGI script accessible to your server with the name foo. When users access foo, they may want to tell it that they want to use the English language directory. Thus they will access your script in an HTML document, as in http://www.my
.box/cgi-bin/foo/lang=eng";. When the server executes foo, it will give you PATH_INFO containing /lang=eng, and your script can decode this and act accordingly. The PATH_INFO can be accessed in REXX via the PATH_INFO environment variable, as in Path=GETENV
PATH_INFO and QUERY_STRING may be combined. For example, http://www.my
map?40,45 will cause the server to run the script called htimage. It will pass remaining path information "/usr/www/img/map" to htimage in the PATH_INFO environment variable, and pass "40,45" in the QUERY_
STRING environment variable.
If your form has METHOD="POST" in its FORM tag, then your CGI script will receive the encoded Form input in standard input (stdin in UNIX). The server will not send you an EOF on the end of the data; instead use the environment variable CONTENT_LENGTH to determine how much data to read from stdin. The ReadPost REXX PROCEDURE (from the library of REXX CGI functions) is an example of how to read the form's POST Standard Input; see Listing Two. Listing Three is an example of how to combine reading the various types of input into your script. The REXX PROCEDUREs ReadForm, together with MethGet and MethPost (all available in the library of REXX CGI functions) in Listing Four can be used to simplify the task of reading input from a form.
Decoding Forms Input
When you write a form, each of your input items has a NAME tag. When users place data in these items in the form, that information is encoded into the form data. The value given by users to each input item is called "VALUE."
Form data is a stream of NAME=VALUE pairs separated by the ampersand (&) character. Each NAME=VALUE pair is URL encoded; that is, spaces are changed into plus signs and some characters are encoded into hexadecimal. To decode the form data, you must first parse the form-data block into separate NAME=VALUE pairs, tossing out the ampersands. Then, you must parse each NAME=VALUE pair into the separate NAME and VALUE. Use the first equal sign you encounter to split the data. If there is more than one, then something is wrong with the data. Again, toss out the equal signs. Finally, undo the URL encoding of each NAME and VALUE. Listing Five is an example of decoding the form input.
When using the NAME and VALUE information in the script, you need to be aware that:
Nothing dictates the order in which the NAME and VALUE are concatenated into the form-data block.
- Not every NAME and VALUE defined in the form is necessarily sent by the client; for example, if nothing is selected.
- More than one VALUE may be sent for a given NAME; for example, if a scrolling list allows the selection of several options.
Sending the Document Back to the Client
CGI programs can return a myriad of document types. To tell the server what kind of document you are sending back, CGI requires a short ASCII header that indicates the MIME type of the following document. Common MIME types relevant to Web development are:
- A "text" Content-Type to represent textual information. The two most likely subtypes are text/plain (text with no special formatting requirements) and text/html (text with embedded HTML commands).
- An "application" Content-Type to transmit application data or binary data. Application/postscript is an example, as data is in PostScript, and should be fed to a PostScript interpreter.
The first line of your output should read Content-type: type/subtype, where type/sub-type is the MIME type and subtype for your output. Next, you have to send a blank line. In REXX this may be accomplished as SAY 'Content-type: text/html'; SAY. After these two lines are output, any standard output or stdout (that is, a REXX SAY command) will be included in the document sent to the client. The REXX PROCEDURE PrintHeader (from the library of REXX CGI functions) provides some help in creating such lines; see Listing Six.
Following the header lines, you will probably want to output an HTML title and header for the page, and at the end of the page you will want the matching lines. This can be simplified by using the REXX PROCEDUREs HTMLTop and HTMLBot (from the library of REXX CGI functions); see Listing Seven.
With the boilerplate out of the way, you can print the variables you have read in from a form, using the REXX PROCEDURE PrintVariables (from the library of REXX CGI functions); see Listing Eight.
Diagnostics and Reporting Errors
Since stdout is included in the document sent to the browser, diagnostics output with the REXX SAY command will appear in the document. This output must be consistent with the Content-type: type/subtype mentioned earlier. Listing Nine illustrates diagnostic reporting.
If errors are encountered (for example, no input provided, invalid characters found, too many arguments specified, invalid syntax in the REXX script), the script should provide detailed error information. It may be very useful to provide information on the settings of various Web environment variables.
The CGIerror and MyURL REXX PROCEDUREs (from the library of REXX CGI functions) assist in error reporting; see Listing Ten. Listing Eleven is an example of a REXX code using CGIerror to provide an error message together with a listing of the current environment variables. The code produces the output in Figure 1. Listing Twelve is an example of a REXX code using CGIdie (which is identical to CGIerror except that it exits instead of returning); the output is shown in Figure 2.
Two Simple REXX CGI Scripts
To get your Web server to execute a CGI script you must:
1. Write the script. Listing Thirteen, for instance, is the source of a CGI script to execute a UNIX finger type function; Figure 3 is the output of this script. Listing Fourteen, on the other hand, is the source of a minimal HTTP Form and Script. Figure 4 is the output from the first part of this script, while Figure 5 is the output from the second part (after entering "testing" into the form created by part 1).
2. Move the script to a valid area as defined by the server software and make the script executable by your Web server. The procedures to accomplish this step vary from site to site. Contact your Webmaster to help you with this. The Webmaster will want to ensure that the security aspects of your script have been addressed before adding your script to the rules file.
Putting it all Together
First, you must write your REXX CGI script. If it is to run on a UNIX server, you will need to determine where the REXX interpreter is located. Typically this is somewhere like <tt>/
usr/local/bin/rxx</tt>. This information must be placed on the first line of your script immediately following the characters "#!". You can view the CGI REXX script in Listing Thirteen to see how this first line appears. This script enables the server to execute a UNIX finger command for a userid provided following the question mark in the URL. Typical output from this script is seen in Figure 3.
Another simple example illustrating the use of a form is seen in Listing Fourteen. This script is self referencing, the same script being used to display the form as is called to process the user input from the form. Thus the form is invoked from a URL of the form: http://www.slac.stanford.edu/cgi-wrap/minimal and appears as shown in Figure 4.
If one enters "testing" into the form's Data field, then this script produces the output shown in Figure 5.
Next, you will need to test the script before exposing it to the server and the world. You can use the UNIX setenv command to set the environment variables that will be read by your script. Then you can call the script and pipe the output to a file. Finally, you can use your WWW browser to view the local file created by the pipe.
Before installing your script so the server can execute it, you need to ensure that it is working properly, can handle all kinds of stupid or malicious input, and won't stall, get into a loop, provide access to private or sensitive data, or expose the security of the server. To learn more about the security issues involved with running CGI scripts, check out http://www.slac.stanford.edu/slac/www/resource/how-to-use/cgi-rexx/cgi-security.html When you are ready to install the script, then you will need to contact your local Webmaster to move the script to a valid area and to make it executable by the server.
For more detailed information on writing CGI scripts in REXX see http://www.slac.stanford.edu/slac/www/resource/how-to-use/cgi-rexx/.
For information in hard-copy form, see :
HTML & CGI Unleashed, by John December and Mark Ginsburg (Sams.net, 1995).
Other sources of more general interest include:
- Online Resources at http://www.mispress.com/introcgi/ref/ for more online details on standards and protocols in use with WWW.
- The newsgroup comp.infosystems.www.authoring.cgi, which covers discussion of the development CGI scripts as they relate to Web-page authoring.
- Marc Hedlund's CGI FAQ.
Les, who is assistant director of the Stanford Linear Accelerator Center Computing Services group, was a contributor to HTML and CGI Unleashed (Sams/Macmillan, 1995).