Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Database

Integrating Web Pages with Databases


Sep00: Integrating Web Pages with Databases

Preprocessing HTML

David teaches computer science at the Rochester Institute of Technology. He can be reached at [email protected].


Ecommerce is more than about offering pretty web pages -- it's mostly about using and interacting with databases. A classic example of a database used on a web site is, of course, the online product catalog, where product descriptions, prices, and availability are retrieved from a database and offered to customers. If and when a customer completes a purchase, the credit-card number, name, address, and shopping list are recorded. If all goes well, purchased items appear on the customer's doorstep within a few days.

In one recent e-commerce project, I worked with a company wanting to build its own online pharmacy. The client wanted to sell over-the-counter products as well as prescription medicines. Clearly, the application demanded a database. Consequently, I chose Microsoft Access for the database mainly because it is fast, allowing almost instantaneous lookups from a line of more than 18,000 products. To integrate the database with the web site, I then examined three alternatives. In this article, I'll share with you the decisions made, along with the custom system I eventually implemented.

Typical Approaches

Initially, the alternatives I considered included Allaire's Cold Fusion, Microsoft's Internet Database Connector (IDC), and Microsoft's Active Server Pages (ASP).

Cold Fusion (http://www.coldfusion .com/) is a server that processes extensions to HTML. When a web server encounters a request for a particular file type, the web server first passes the file to the Cold Fusion server to preprocess the extended HTML. These extensions consist of extra tags that act as an embedded language. The technique is akin to embedding SQL inside a C program. A preprocessor translates the embedded SQL into C before sending it to the compiler. Cold Fusion scans for code written in the embedded language CFML (short for "Cold Fusion Markup Language"), executes that code, and returns only HTML to the client. The language is fairly complex, letting you query databases and return the result as HTML to the client's browser. Because the language is complex, you can use it for many things using CFML. Example 1 shows how you can generally query a database using CFML.

You embed the SQL between the <CFQUERY> and </CFQUERY> tags. Catalog is the name of a CFML variable that will hold the results of the query. Other CFML constructs manipulate this variable to produce HTML. Here, the CFOUTPUT construct generates a list using the Catalog variable. Ultimately, the Cold Fusion server strips all the CFML and only HTML reaches the browser.

Microsoft's Internet Database Connector (IDC) (http://www.microsoft.com/) is an ISAPI DLL. ISAPI (short for "Internet Server Application Programmer Interface") lets you build extensions to Microsoft's Internet Information Server (IIS). These extensions are DLLs that IIS can load and use as needed. IDC is a DLL that IIS uses to read from and write to databases.

The DLL is named httpodbc.dll and, as the name implies, it connects web pages with ODBC data sources. The IDC relies on two files -- one with the extension .idc, and the other with .htx. The .idc file contains information about the data source and the SQL for the query. The .htx file contains HTML plus extensions (hence the "x" in .htx) that refer to the data collected from the query.

For example, prodlist.idc contains Example 2(a). The corresponding .htx file, prodlist.htx, contains Example 2(b). When a request for an .idc file reaches IIS, the server executes the query in the .idc file and processes the .htx file. The result of this processing is standard HTML that the server sends back to the browser.

The final option I looked at was ASP, Microsoft's preferred approach to marrying HTML with ODBC data sources. ASP replaces the .idc and .htx files with a single .asp file. The .asp file combines both HTML and a Visual Basic-like language. Using ASP syntax, you can cause the server to load and run ActiveX components. These components, in turn, can perform a variety of tasks including the retrieval of data from ODBC data sources. Again, the result returned to the browser is pure HTML.

Each of these approaches required that I learn a proprietary programming language and purchase software that can translate that language. To avoid these additional costs, I decided to write my own preprocessor, which I called "Ceres." I particularly wanted to avoid learning a new language and decided to write a preprocessor that executes SQL and inserts the result as a JavaScript object. Once I had the JavaScript object in hand, I could write my web pages as I normally would without the need for another language. The complete source code and related files for the Ceres preprocessor are available electronically; see "Resource Center," page 5.

Choosing a Custom Approach

The Internet Service Provider (ISP) selected to host the web site used NT servers. However, I didn't want to design the system to run only on NT servers -- I wanted to keep open the possibility of hosting the site locally on a Linux server running Apache or possibly some other platform. With the introduction of Mandrake and Red Hat secure servers, this possibility is quickly becoming reality.

Unfortunately, the Microsoft options are proprietary and only work, as far as I know, on Microsoft's operating systems. Cold Fusion is available on more platforms but no one wanted to spend the money on it. Another downside is that all of these approaches required learning a proprietary programming language. I preferred to stick to standards such as JavaScript that would work across many platforms.

Inserting SQL

Rather than invent a new tag that could potentially confuse an unsuspecting browser, I decided to insert SQL into a specially formatted HTML comment. Example 3(a) is its syntax. The <!-- is followed by a space and an exclamation point that introduces the InsertResultTable command. The <datasource> and <sql> are required while the <tablename>, <userid>, and <password> are optional. If <tablename> is not supplied, the preprocessor uses the default name, ResultTable.

Example 3(b) better explains how to insert the SQL. When the preprocessor encounters this comment, it locates the command, datasource, and SQL statement. After connecting to the datasource (ProductCatalog), the preprocessor sends the SQL to the data source and waits for the data to return. When the data does return, the preprocessor replaces the original comment with a JavaScript object that stores the returned data.

The JavaScript

Example 4 is typical of the JavaScript code generated by the preprocessor. The Catalog object contains three arrays, each of which has the name of a column from the result table of the SQL. In this example, three products were returned and all the data was stored as strings in the arrays. Because JavaScript allows easy conversion between data types, storing the data entirely as strings presents no problem.

The preprocessor replaces embedded SQL with JavaScript wherever it appears in the HTML file. Perhaps the best place to embed the SQL is inside the <HEAD> </HEAD> tag of the document; see Example 5. This way the embedded SQL is in one place and not scattered throughout the HTML file.

Invoking the Preprocessor

The Ceres preprocessor is written in C++ and invoked using CGI. However, a preprocessor could be written as an Apache module or as an ISAPI DLL.

You invoke Ceres using either GET or POST. In either case, the name of the HTML file to process is passed to Ceres. Example 6(a) is a typical URL. Using this mechanism, you can pass other information to Ceres. For example, consider the URL in Example 6(b). With a simple change to the embedded SQL, I can easily introduce a where clause; see Example 6(c). Ceres replaces @Name with the value of the Name variable passed in as part of the CGI protocol.

As a final twist to invoking Ceres, I can pass parameters to control the number of records returned from the query, as in Example 6(d). Ceres sets up an ODBC cursor and uses it to fetch the appropriate data.

Conclusion

Ceres is not a full-fledged programming language. Instead, it is an example of a string search. You can write some surprisingly useful code by searching for a string and then doing something when you find it. Ceres searches for the string "<!-- !". When it finds the string, Ceres assumes that the next set of characters represents a command. If Ceres recognizes the command, it continues searching for additional strings that serve as parameters to the command. After it collects all the parameters, Ceres executes the command.

Because Ceres uses CGI, it sends all of its output to cout. Unless Ceres encounters the "<!-- !" string, it copies the HTML file directly to cout. Only when it encounters a command does Ceres do anything other than copy characters. When Ceres executes a command, it collects the results of the command and sends them to cout as well.

Because most, if not all, browsers support JavaScript, there was really no need to invent a new language. JavaScript is a wonderful object-oriented language. With it, I can manipulate the data, display it in a variety of ways, and interface directly with other standard languages such as Java.

Lastly, Ceres is extremely portable. Using CGI, it will run on virtually any web server. With additional (but not complicated) programming, Ceres can run as an ISAPI DLL or Apache module. While not as sexy as an ISAPI DLL or an Apache module, CGI remains a powerful and flexible technology for moving information between browsers and web servers.

DDJ


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.