Dr. Dobb's | Tiny Perl Server Pages and MySQL

Tiny Perl Server Pages and MySQL

Our authors use MySQL and Perl Server Pages to explore web-site personalization techniques.

June 01, 2002
URL:http://www.drdobbs.com/database/tiny-perl-server-pages-and-mysql/184405072

Jun02: Using Tiny Perl Server Pages and mySQL

Andy is a solution architect and Hung-Wai an MCSE. They can be reached at [email protected] and [email protected], respectively.

There are three basic reasons why organizations invest in web-based information technology: to increase revenue, reduce cost, and improve customer relationships. But when visiting web sites, customers increasingly want more than just static HTML documents — they now expect personalized content and interaction with the site. In this article, we'll explore personalization techniques that involve technologies ranging from cookies and URL rewriting to tracking user sessions and state management. The tools we use are mySQL and Tiny Perl Server Pages (TPSP) custom tag library facility (for more information on TPSP, see "A Tiny Perl Server Pages Engine," by Andy Yuen, DDJ, August 2001). In the process, we'll present ConsoleApp, a tool that facilitates building web applications using the Model-View-Controller paradigm. While the code presented here is written in TPSP, the techniques can easily be applied to other programming and scripting languages.

TPSP and Custom Tags

TPSP unites client-side HTML, server-side scripting, and component-based development to produce dynamic pages. TPSP has many JSP-like features including custom tag support. The latter lets you develop custom tag modules to encapsulate complex server-side behaviors and business rules into simple XML-like elements that content developers can use. The only requirements for running TPSP are that your web server supports CGI and that you have Perl 5.005 or later installed on your system. In other words, TPSP is platform- and web-server independent.

Like JSP, TPSP lets you create custom tags. But, unlike JSP, you are not required to provide an XML-based Tag Library Descriptor (TLD) with each tag library you develop. Listing One is a TPSP page that demonstrates the use of nested tags. It uses the custom tags <test:loop>, <test:if>, <test:condition>, <test:then>, and <test:else> to generate 10 random numbers and displays either "head" or "tail" in an unnumbered list, depending on whether the random number is greater than 0.5.

Cookies and Hidden Fields

Unlike JSP, previous versions of TPSP did not provide a built-in session object for session tracking. Since HTTP is a stateless protocol, this is a limitation if you want to implement personalization because personalization relies on the availability of a session-tracking mechanism.

A session is a set of interactions between the client's web browser and a web server. A session starts when users first invoke a site's URL and ends when they terminate their browser, or when the web server closes the session after a certain period of inactivity (timeout). Since HTTP is a stateless protocol, it requires an external mechanism to save the state information between client/web server interactions. The most common mechanisms used to maintain state are cookies and hidden fields.

Cookies are name-value pairs similar to CGI query strings. Cookies are sent back and forth between the browser and web server in the HTTP header and maintained by the browser. They can be created to have an indefinite lifespan or expire when the browser terminates. Cookies are easy to use. To create a cookie, you can use the TPSP built-in $response object: $mycookie = $request->cookie(-name=> $COOKIENAME, -value=>$sid, -expires=>'+30m');, where $COOKIENAME is defined elsewhere and contains the name of the cookie and $sid contains the session identifier (SID).

A cookie is then sent to the browser in the HTTP header: Out->print($request-> header(-cookie=>$mycookie));. To read a cookie, use: $sid = $request->cookie ($COOKIENAME);.

One limitation of cookie is its size constraint. The cookie header can store a maximum of 4 KB of text. Unpredictable behavior may result if your cookie is assigned a value exceeding 4 KB. This makes it impractical to store a huge amount of information in cookies. Another drawback is that they cannot be passed among different domains. Also, there is a limitation to the number of cookies you can create for a particular domain. However, the most severe drawback is that users can disable cookies from the browser (usually for privacy reasons). Hence, a site must have a fallback mechanism to store state information.

Hidden fields are one of the many HTML INPUT types. They are set by servers and do not correspond to any displayable user-interface element. To use hidden fields to store state information, use: <INPUT TYPE="hidden" NAME="STATE" VALUE="state info">.

The drawback to hidden fields is that stored information has to be sent across the network multiple times during a session. For example, if you save state information in the first page accessed by users, you have to send the information back in one or more hidden fields. If users add some more information in a subsequent page, you have to add that information in the next page you send, and so forth. This can quickly become messy. If your site relies on numerous existing legacy pages, it may be difficult to change all the legacy code to accommodate the use of hidden fields and preserve state information.

Security is another problem for both cookies and hidden fields. Hence, storing sensitive information directly in cookies and hidden fields is not recommended.

URL Rewriting

Instead of saving user information in cookies or hidden fields directly, the alternative is to only store an SID in them and save the state information on the server side. In case cookies are disabled on the client's browser, you can use URL rewriting to achieve a similar result. This involves the inclusion of the SID in the HTTP query string of the URL:

HTTP://domainName/cgi-bin/something.pl?SID=xxxxxxxxx

Once you use URL rewriting, you have to make sure that all URLs referencing your TPSP pages include this SID in the query string, otherwise the session will be lost. Listing Two uses cookie in the first attempt to save the SID and when it fails, reverts to URL rewriting. The simple approach works as follows:

Read the SID stored in the named cookie.
If the cookie is present, record that you are using the cookie and go to Step 5.
If SID is present in the query string, record that you are using URL rewriting and go to Step 5.
If you get here, either cookies have been disabled or this is the beginning of a new session. Generate a new and unique SID, redirect to the same URL but include the SID in the query string, and a cookie with the SID as its value in the HTTP header. The redirection will bring you back to Step 1.
The SID is now retrieved from the client. Check if the session is still valid before proceeding to service the request. If the session has expired, go back to Step 4.

State Management Strategies

Once you have a mechanism to identify a session by its SID, you must determine how to save the state information on the server. If you are using ASP and JSP, you can use the built-in session object to store the information in memory. PHP has a similar mechanism called "session variables." Some of these technologies also implement the URL rewriting mechanism.

There are several ways to save the session data. Hewlett-Packard's Bluestone J2EE application server offers at least three options for state management:

State Server-External process: Communication between application and State Server is via a TCP connection. This is hidden in a simple State Server API.
In-Process State Server-Internal process. This is different from the first approach in that the State Server runs in the same Java Virtual Machine (JVM) of the application server instead of in its own JVM.
Persistent State. Persistence of state to a JDBC data source.

By far, the most common approach is to make the data persistent in a relational database. For example, IBM's J2EE WebSphere application server uses a shared HTTPSession (a Java Servlet class) implementation. Each interaction with the shared HTTPSession is a transaction with a relational database. The SID is used to identify a row in the database. Serialized Java objects are saved in the database as binary large objects (BLOBs).

TPSP also uses the relational-database approach based on the mySQL relational database.

The session table is simple, having only four columns:

sid, to store the session identifier.
state, to store serialized Perl objects.
mtime, to store the timestamp when the row is modified.
ctime, to store the timestamp when the row is created.

A page object, session object, and several TPSP custom tags have been developed to facilitate state management in TPSP.

A Framework Based On the MVC Design Pattern

Model-View-Controller (MVC) is an abstraction that helps you divide functionality among objects to minimize the degree of coupling among them. The "model" deals with the business rules and data, "view" with presentation aspects of the application, and "controller" accepts and interprets user requests and controls the model and possibly multiple views to fulfill these requests.

Several approaches are possible in using MVC. One example is Struts, an open-source initiative from the Jakarta Project sponsored by the Apache Software Foundation. It uses J2EE servlets and JSP technologies. (For more information on Struts, see http://jakarta.apache.org/struts/). In TPSP's approach, each web application is made up of one controlling TPSP and one or more model and view TPSPs. The controlling TPSP receives all HTTP requests from clients and directs those requests to the appropriate model and view TPSPs. The model TPSPs are responsible for invoking the business objects (in our case Perl modules). In a nontrivial application, the model TPSP extracts parameters from the HTTP requests query string and translates them to a form suitable for use in invoking business objects. These objects should be designed to maximize reusability and hence should not normally be made to handle HTTP requests directly. Model TPSPs do not produce any direct output. After processing, they send the request back to the controlling TPSP so that it can direct the proper view TPSPs to create the content and send it to the client. The model stores the data in the TPSP page object for view TPSPs to pick up.

The controlling TPSP authenticates users for each request that it receives. That is, it makes sure that users have already logged in to the application before certain pages can be accessed.

Each application has an entry in the appl_global_mapping and multiple entries in the appl_page_mapping tables. These table entries map client HTTP requests to the appropriate model and view TPSPs. Listing Three is the schema for these tables.

The fields (columns) in the appl_global_mapping table are:

appname, the web application name.
path, the absolute directory path on the server where all the TPSPs are located.
login_page, the login page to display if authentication is required for a page and the user has not already logged in.
error_page, the error page to display when problems occur. The error page retrieves the error message from the page object for display.
home_page, the page to display when cmd (see next section) is not specified.
mdate, the date on which this entry was modified.
comment, the web application description.

The fields in the appl_page_mapping table are:

appname, the web application name. appname and cmd together form the primary key for this table.
cmd, the controlling TPSP uses this field and appname to locate the model and view TPSPs for handling an HTTP request.
model, the name of the TPSP that invokes the business objects for the command specified in cmd. A model TPSP may invoke other model TPSPs.
success_view, the name of the TPSP for display on successful execution of the model TPSP. The return code from the model TPSP is retrieved from the page object.
failure_view, the name of the TPSP for display on failure in execution of the model TPSP. The return code from the model TPSP is retrieved from the page object.
authen_level, specifies if authentication is needed before the request specified in cmd is carried out. It has a value ranging from 0 to 5.
comment, a description on this page.

Listing Four is the controlling TPSP. Its complex logic has been hidden by TPSP custom tags. The custom tags use the page and session objects introduced in earlier sections. This controlling TPSP can be reused by other web applications with only minor changes: the cookieName attribute in the PECS:UseSession and the name attribute in the PECS:Application tag. The rest is handled by the configuration information in the database. (PECS is short for "TPSP E-Commerce System.")

The page object is not persisted between user and server interaction. Its purpose is to provide a simple in-memory area for TPSPs to exchange information, for example, between model and view TPSPs. The only method it provides (other than the constructor) is attribute. To set an attribute, use: $page->attribute('errMsg', "error message");. To retrieve an attribute, use: $msg = $page->attribute('errMsg');.

The session object is derived from the page object. Other than the constructor and the attribute method inherited from the page object, it provides the following methods:

getSid, retrieves the unique session identifier.
include, includes a TPSP page.
encodeURL, appends the SID in the specified URL if URL rewriting is in use.
_runScript, is a private method that translates and executes a TPSP page.

The TPSP custom tags used include:

<PECS:UseSession timeout="N" cookie Name="appname" dsn="dsn" user="user" password="password"> </PECS:UseSession>, which creates a session object, saves it in the $request object, and handles the persistence of session data to the mySQL relational database. Listing One is extracted from this custom tag. It uses the Data::Dumper Perl module for object serialization and deserialization. The serialized session object is saved in the state column in the session table: timeout specifies the number of minutes of inactivity before the session times out (terminates). cookieName specifies the name to be used for the cookie. It should be unique for each web application. Although you specify the cookie name here, the custom tag may use URL rewriting for session tracking if cookies have been disabled on the user's browser. dsn, user, and password are the data source name, user name, and password, respectively. These are parameters for accessing the mySQL relational database using Perl DBI. TPSP does not support connection pooling. However, it keeps only one database connection for processing a page. This custom tag saves the database parameters and the database connection handle in the page object so the TPSPs that make up the page can access it. The database parameters are saved in the page attributes: db_dsn, db_user, db_password, and db_handle, respectively.
<PECS:Application name="appname" /> specifies the web application name so that the proper configuration and processing information for the application can be retrieved from the database for the controlling TPSP. It uses the appl_global_mapping and appl_page_mapping table entries to determine how to process requests and error conditions. This custom tag must be nested inside the PECS:UseSession tag.

The TPSP template (tpsp.tpl is available electronically together with the complete TPSP package and example applications; see "Resource Center," page 5) has been modified to support these new features. However, the new template is compatible with TPSP pages built using previous versions of TPSP.

To support the session and page object, the built-in $request and $response objects are now instantiated using class Request, a subclass of CGI, instead of using CGI directly as in previous versions of TPSP. The Request class provides one new method, attribute, which functions in the same way as the page object's attribute method. Its main use is for custom tags to retrieve the session and page objects, and other information items that are necessary to perform a task.

The ConsoleApp Example Application

To illustrate the concepts discussed here, we now present a web-based management console for the creation and administration of web applications using the MVC architecture. You can actually use it to change and enhance its own behavior or create new web applications. The application is bootstrapped using a SQL script to create a database and insert data into the appl_global_mapping, appl_page_mapping, user, and security tables. You have to change the absolute path for the application to point to the directory where you are putting all the TPSPs. The web server must be configured to allow executing CGI scripts in this directory. Write access is also needed because TPSPs are translated dynamically at least once. Figure 1 is a typical ConsoleApp screen. This application only helps you to define which TPSPs are needed to handle what requests and where they are located on the server. You are responsible to develop these TPSPs yourself.

There are a few things to be aware of when examining the source code:

The controller TPSP has been manually translated using tpspt.pl. This is done to avoid using different URL syntax to invoke the TPSP translator CGItpspt.pl on different web servers (see my previous article on TPSP). The controller has the same name as appname; for example, ConsoleApp.pl. All model and view TPSPs called by the controller will be translated automatically.
The SID is a 32-bit number generated using MD5 based on the current time, a random number, and the process number.
All URLs in TPSPs are encoded using the session object's encodeURL method, which appends the SID to the URL query string if URL rewriting is used by the session. Please note that static HTML pages cannot be used with URL rewriting.
Some model TPSPs such as savePage.tpsp, delPage.tpsp, and so on, invoke other model TPSPs at the end of their processing using the session object's include method.
The forwardToCMD.tpsp takes the URL of the page to forward from the page object's cmd attribute (placed there by a model TPSP). ForwardToCMD uses the refresh HTTP header (CONTENT attribute) of the <META> tag to do redirection. For example, <META HTTP-EQUIV="refresh" CONTENT="1; URL=URL">.
Access to each page is controlled by the page's authen_level and the user's sec_level. User security can be assigned by site. It is controlled by two tables: user and security.
If users have not logged in or their access level is insufficient to access a page, a login page prompts users to log in.
All business logic is encapsulated in the ConsoleApp::DataAccess.pm Perl module specifically developed for the ConsoleApp application. The model TPSPs translate parameters from the HTTP query string to a form suitable for calling the business object. For more complex applications, there can be more than one business object.
The model TPSP passes information to be displayed by a view TPSP using the page object. The information is usually returned in one or more sets of description and result lists (descList and resultList). The descList is a simple array that returns a description of each data item in resultList, which is returned by the DBI fetchall_arrayref([]) method call. In short, it returns a reference to an array containing a reference to an array for each row of data fetched.
Each view TPSP uses the TPSP's directive <%@ include file="url" %> to include header and footer files (header.tpsp and footer.tpsp) to give a consistent look and feel to the application.

Conclusion

In a future article, we'll delve into personalization, explain what it is, how it works, and enhance the web-based ConsoleApp to support the building and administering of web applications that provide rule-based personalized product recommendation. Updates, if any, will be posted on the TPSP home site at http://www.playsport.com/psp_home/.

DDJ

Listing One

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<TITLE>Nested Custom Tag Demo</TITLE>
</HEAD>
<
BODY>
<H3>PSP Nested Custom Tag Demo</H3><P>
<UL>
<
test:loop repts=10>
<LI>
<test:if>
  <test:condition><%= (rand > .5) %></test:condition>
  <test:then>Head</test:then>
  <test:else>Tail</test:else>
</test:if>
</test:loop>
<
/UL>
<
/BODY>
</HTML>

Back to Article

Listing Two

 ...
# extract SID from cookie
my $sid = $request->cookie($self->{'cookiename'});
if ($sid) {
    # cookie present: record the fact that we are using cookie
    $url_flag = 2;
}
else {
    # cookie not present
    if (($sid = $request->param("SID"))) {
        # sid found in query string: record that we are using URL rewriting
        $url_flag = 1;
    }
    else {
        # no sid found, redirect url using both cookie and url rewriting
        $self->_sendSid();
        exit(0);
    }
}
 ...
# check if session has timed out
 ...

Back to Article

Listing Three

DROP DATABASE IF EXISTS pecs;
CREATE DATABASE pecs;
USE pecs;
DROP TABLE IF EXISTS appl_global_mapping;
CREATE TABLE  appl_global_mapping(
  appname   varchar(32) NOT NULL PRIMARY KEY,
  path     varchar(128) NOT NULL,
  login_page    varchar(32) NOT NULL,
  error_page    varchar(32) NOT NULL,
  home_page varchar(32) NOT NULL,
  mdate     date NOT NULL,
  comment   text
);
DROP TABLE IF EXISTS appl_page_mapping;
CREATE TABLE  appl_page_mapping(
  appname   varchar(32) NOT NULL,
  cmd       varchar(32) NOT NULL,
  model     varchar(32) NOT NULL,
  success_view  varchar(32) NOT NULL,
  failure_view  varchar(32) NOT NULL,
  authen_level  int unsigned NOT NULL,
  comment   text,
  PRIMARY KEY (appname, cmd)
);
DROP TABLE IF EXISTS user;
CREATE TABLE user  (
  username  varchar(16) NOT NULL PRIMARY KEY,
  passwd    varchar(16) NOT NULL,
  email     varchar(64) NOT NULL
);
DROP TABLE IF EXISTS security;
CREATE TABLE security  (
  username  varchar(16) NOT NULL,
  appname   varchar(32) NOT NULL,
  sec_level int unsigned NOT NULL,
  PRIMARY KEY (username, appname)
  );
DROP TABLE IF EXISTS session;
CREATE TABLE session  (
  sid       varchar(36) NOT NULL PRIMARY KEY,
  mtime     timestamp NOT NULL,
  ctime     timestamp NOT NULL,
  state     text
);
GRANT select, insert, update, delete
ON pecs.*
TO pecs@localhost identified by 'xfiles';

Back to Article

Listing Four

<%!
use PECS::Request;
use PECS::Session;
use PECS::Page;

my $page = new PECS::Page;
$request->attribute('page', $page);
%>
<PECS:UseSession timeout="15" cookieName="ConsoleApp" 
         dsn="DBI:mysql:pecs" user="pecs" password="xfiles">
    <PECS:Application name="ConsoleApp" /> 
</PECS:UseSession>

Back to Article

Jun02: Using Tiny Perl Server Pages and mySQL

TPSP and Custom Tags

Cookies and Hidden Fields

URL Rewriting

State Management Strategies

A Framework Based On the MVC Design Pattern

The ConsoleApp Example Application

Conclusion

Listing One

Listing Two

Listing Three

Listing Four

Figure 1: Typical ConsoleApp screen.