Template Processing Classes for Python

Brad shows how you can embed Python objects in HTML pages using boilerplate template processing classes. Then Python creator Guido van Rossum adds a note on what's new in the just-released Python 1.5.

February 01, 1998
URL:http://www.drdobbs.com/web-development/template-processing-classes-for-python/184410485

Dr. Dobb's Journal February 1998: Template Processing Classes for Python

Embedding Python objects in HTML pages

Sidebar: What's New in Python 1.5?
Sidebar: About Python

Recently, I was prototyping a CGI infrastructure in Python to investigate possible implementation strategies for a production system. I had achieved a multithreaded, fast CGI service with persistent objects, but the HTML text was embedded inside the Python code, represented as print statements. This was cumbersome and failed to adhere to the object paradigm present elsewhere in the Python code (see Listing One.) Also, when editing, I had to contend with two syntaxes at the same time -- Python and HTML -- and I was constantly introducing errors in the latter with nearly every change. Furthermore, I was unable to easily use Emacs' wonderful HTML editing mode, with its complement of template HTML constructs and syntax coloring -- reason enough to look for an alternative.

At the most primitive level, I needed to insert run-time values into specific, tagged locations in a block of text. However, since I was working with rows of data from an object database, I wanted conditional and iterative control over regions of text; the former would include a block of text only if a condition was True, while the latter would map a set of values onto a text block, substituting different values for the tagged locations with each iteration. I had tried the HTML classes available for Python, but they focused on run-time HTML generation. I wanted, instead, to create a template of an HTML document, and at run time, substitute the placeholders in the template with run-time values.

The solution I developed is called "BoilerPlate." Listing Two is a sample BoilerPlate template that contains examples of the three properties just described. The %()... constructs represent placeholders for run-time values. Conditional inclusion is represented by #if# and #else# tags, and the #for# tag starts a region of text iterated over with a run-time set of values. Compared to Listing One, the HTML structure is predominant, and the BoilerPlate # tags clearly stand out. Unfortunately, the placeholder tag %()... does not. This is Python's string formatting descriptor, similar to that found in the C Standard Library printf family. I kept the Python syntax over a custom tag to keep the BoilerPlate implementation simple. As you will see, its power more than makes up for its orphaned appearance.

The complete BoilerPlate source is available electronically (see "Resource Center," page 3). This includes two source files and a patch file for Python 1.4. BoilerPlate.py contains the classes described in this article. The file Sink.py contains the definition of the Sink class, a utility class used by BoilerPlate to collect formatted output. It is a bit faster than standard Python string concatenation using the "+" operator.

Although I predominantly use BoilerPlate for HTML documents, it has no ties to the HTML language. It can be used to process files of any content -- including Python code and, with some care, binary files.

BoilerPlate Processing Model

Processing within the BoilerPlate classes occurs in three stages. These stages work with three conceptual data types which roughly correspond to specific BoilerPlate classes; Table 1 presents the class hierarchy. The data types in the processing model are:

template, text with embedded BoilerPlate tags. Represented in the code as native Python strings.
text block, a range of text taken from a template. Represented by the RawText and Block classes.
format dictionary, a mapping of BoilerPlate tag names to values. Represented by the Formatter class.

These data types are used in the following processing stages:

fragmentation, which takes a template and breaks it up into one or more text blocks based on the BoilerPlate tags encountered.
value acquisition, which generates a mapping or dictionary of BoilerPlate tag names and their run-time value.
formatting, which creates a Python string by applying the current format dictionary to the text blocks of the template document.

The fragmentation stage normally occurs only once for each template object. The BoilerPlate classes save the resulting text blocks for later processing in the formatting stage. The other phases can occur at any time, to place new values in the active format dictionary or to format output. The BoilerPlate base class also remembers the result of the last formatting stage. However, any changes to the the active format dictionary will clear this cached value so that BoilerPlate instances always emit up-to-date information.

Fragmentation Stage

The BoilerPlate base class Block performs the bulk of the processing of template objects during the fragmentation stage. It looks for special tags that denote regions of text in the template. All tags begin and end with the "#" character. I chose this character because it is Python's comment character, and would least likely be found in the expressions allowed in BoilerPlate constructs. A BoilerPlate text block begins with a tag that follows the format #<kind><data>#, where <kind> represents the block type, and <data> contains whatever information is required by the tag to process the block. A block ends with a corresponding tag formatted as #end<kind> (an optional space between the end and <kind> value is allowed). The <kind> values of the start and end tags must match.

Also recognized, though not actually a block, is the sequence #char# which will leave a single "#" in the formatted text. You would use this in the unlikely event that there is a conflict between a BoilerPlate tag and unformatted template text. Table 2 presents a list of legal BoilerPlate tags.

BoilerPlate supports nested text blocks. The template in Listing Two has a for block nested inside an else block. Apart from memory constraints, there is no limit to how deep you can nest.

Normally, all text between matching tags is owned by the Block instance indicated by the start tag. This includes any spaces and line-terminating characters that may appear after the block's start tag or immediately before the block's end tag. For some applications, such behavior may result in excessive whitespace when a BoilerPlate instance emits its output. To overcome this, the BoilerPlate Block class hierarchy supports a parsing mode called lineMode, which trims the first and last characters of the text owned by a Block instance if the characters are either a space or a line terminator. Entering lineMode is accomplished by passing a nonzero value for the lineMode argument in the __init__ method of any Boilerplate Block class.

I usually set lineMode to True when using a template with HTML tables since the HTML table tag <TD> begins accepting data right after the closing ">" character. With lineMode on, I can place BoilerPlate tags on their own lines within <TD> elements without introducing spurious whitespace characters. Example 1 shows the difference between normal and lineMode processing. Notice how the lines of output terminate at different locations.

Conditional Text

BoilerPlate IfBlock instances represent if blocks. Their behavior should be familiar, since they act like the traditional conditional constructs found in programming languages. An IfBlock instance expects a valid Python expression as its <data> component. During the formatting stage, the instance evaluates the expression using Python's built-in eval function. If the result is True, then the corresponding text block is formatted and output. Example 2 shows the results of if processing.

After an initial if block but before its matching end tag, a template may contain any number of elif blocks. If the if condition does not result in a True value, the IfBlock instance visits each elif block in succession until one returns True and emits its output. Finally, an if block can close with an else block that will output its formatted text block if all preceding conditions return False. The classes ElifBlock and ElseBlock represent these additional constructs.

Iterating Over Text

A for block represents an iteration over a block of text. Its behavior is defined in the ForBlock class. The syntax of the <data> part of the for tag is <name> in <value>, where <name> is a legal Python identifier and <value> is a valid Python expression. This is the same as Python's native for syntax. Before formatting its text blocks, the ForBlock instance evaluates <value> to obtain the set of run-time values to iterate over (again, using Python's eval function). The result of this evaluation must be a sequence (list, tuple, string) or a dictionary; anything else is an error (see Example 3).

During the formatting stage, the ForBlock instance makes available to the template text certain iterator values through the identifier <name>. For instance, if <name> is foo, you can access the current iterator value with the qualification foo.value. Table 3 lists the iterator attributes available from an iteration. The presence of a sequence or dictionary is dependent on the Python type being iterated on.

The iterator attributes of a for block are available through the <name> identifier even after the proper end of the block. As a result, you can show summary values at the end of formatted text, or use an iterator attribute in future if or for constructs. However, if a future for block has the same value for <name>, it will overwrite whatever values were there before. This also applies to any values stored in the formatting dictionary: Any slot in the dictionary that has a key that matches a for block's <name> value will be overwritten with the iteration attributes.

BoilerPlate Comments

BoilerPlate supports comments through the CommentBlock class. Comments begin and end with the sequences #!# and #end!#, respectively. All text contained in a comment block is ignored during the formatting stage, and is never output.

Value Acquisition

The BoilerPlate base class lets you supply values for the format dictionary in three methods: during instance creation in the __init__ method, in the Remember method, and in the output generation method Value. You can call the Remember method as many times as you want to build up the values held in the formatting dictionary. However, key collisions are not detected; only the last value corresponding to a particular key is remembered in the formatting dictionary.

All three methods accept format dictionary values in two ways: You can simply pass in a dictionary, or you can list name/value assignments within the method call, using Python's keyword argument feature. In brief, keyword arguments look just like Python assign statements, but appear within function calls. For instance, if a method is defined as

def foo( a, b, c, d )

then the call

foo( d=4, c=3, b=2, a=1 )

will invoke foo with the assignments listed in the call. The assignments do not have to follow the order in which the argument names appear in the function's definition. Furthermore, if the last variable in the definition begins with "**" (def bar( a, b, **c ), for instance), then upon entry to the function, the variable will contain a dictionary of all unassigned keyword arguments. Using this call sequence, positional arguments a and b would again receive the values 1 and 2, respectively; however, c will contain a dictionary with the keys c and d, and values 3 and 4.

Formatting Stage

The BoilerPlate classes use the Python string class format operator (%) to substitute placeholder tags in a template with run-time values. This entire operation takes place in the Cook method of the RawText class (Listing Three). The syntax of the format operator is <string> % <data>, where <string> is a Python string instance with embedded format descriptors that start with a "%" character (similar to those of the C printf). For each descriptor, the format operator takes a value from <data>, formats it according to the descriptor flags, and replaces the descriptor with the resulting value.

Although the <data> part of the expression is usually a Python tuple or list sequence, there is a variant called named value formatting that requires a dictionary on the right side of the % operator. Inside the <string> value, each format descriptor has a key value. The syntax for this extension is %(<key>).... When the format operator encounters the "(" character in a format descriptor, it grabs <key>, attempts to fetch a value from the <data> dictionary that corresponds to <key>, and formats the value per the rest of the descriptor.

There are three interesting facts about the % operator that are not obvious:

Text between the left and right parentheses of the % descriptor can contain any character, including spaces and additional embedded parenthesis pairs. (Support for embedded parentheses is in Python Version 1.5. There is a patch available for Version 1.4.)
The key is not evaluated in any way by the Python interpreter before it is used to access a value in the dictionary.
The right side of the % operator can be an instance of a class that implements a __getitem__ method, and not just a native Python dictionary.

As a result of these conditions, a Python script can gain control during the processing of each format descriptor when it uses named value formatting. This is how Formatter class instances resolve attribute references and function names.

Formatter Functions

The stock Formatter class in BoilerPlate contains simple methods that you can invoke within a % descriptor to change a value before Python applies it to a formatting descriptor. In Example 4, for instance, the format descriptor

%( HtmlEncode( '<' + Lower(       Roman( foo ) ) + '>' ) )s

invokes three Formatter methods: the first, Roman, converts the contents of foo (a number) into its Roman numeral equivalent. The result is next given to Lower, which converts all uppercase characters in the string to lowercase. That result is then used in a Python string concatenation operation, which is the argument to the last Formatter method, HtmlEncode. It replaces characters in the set (<, &, >, ") with their corresponding HTML encoding.

Finally, the format operator converts the encoded result into a string because of the "s" format flag at the end of the descriptor (an NOP in this case). Table 4 lists the formatting functions in the Formatter class.

An unusual format function is the Null method. Like its siblings, it takes as its first argument the value to work on. Its second argument specifies a value to return if the first is a Python False value -- a member of the set (None, 0, 0.0, ", (), {}). Because Formatter uses the eval function, you can also use Python's logical operators to achieve the same effect. For instance,

    %( foo or 'N/A' )s

and

    %( Null( foo, 'N/A' ) )s

will always produce the same formatted output for all values of foo.

You can easily add your own functions either by subclassing the Formatter class or by installing them in the active format dictionary via the Remember method.

BoilerPlate and Python eval

BoilerPlate uses Python's eval built-in function to obtain conditional, iterative, and placeholder values. CGI developers might be concerned about this if they plan to use HTML form data in BoilerPlate expression tags. First, a standard rule to follow in any CGI application is to never blindly accept values received from a form. Otherwise, malicious users might be able to cause problems in your program with the data they enter. I have tried various Python expressions (range( 0, 99999999 ), for instance, and 1 / 0) in a simple CGI application without incident: Python gracefully raises an appropriate exception (MemoryError and ZeroDivisionError, respectively) and continues on. That is not to say that Python will always properly handle all errors or even on all platforms; however, I think the amount of mischief that can be caused is minimal since only expressions are evaluated, and not Python statements. Again, always be wary of data obtained from an external source.

If this is unsatisfactory, you can implement your own expression resolution mechanism for BoilerPlate instances to use. The Formatter class is the only one that uses Python's eval built-in function; it does so in its Resolve method. Simply create a Formatter subclass with your own custom Resolve method.

Conclusion

BoilerPlate has proven to be an extremely useful library for CGI programming. My CGI applications are clearer and less cluttered, and editing an application's HTML components is no longer a chore. Perhaps most important, the incidence of HTML coding errors has dramatically decreased.

DDJ

Listing One

print "<HTML><HEAD><TITLE>%s</TITLE></HEAD><BODY>" % title# Trap when there is no data to show.
if len( data ) == 0:
    print "<B>No data available</B><P>"
else:
    # Print table heading, then each row
    print "<TABLE><TR><TH>Index</TH><TH>Value</TH></TR>"
    for index in range( 0, len( data ) ):
        print "<TR><TD>%d</TD>" % index
        print "<TD>%s</TD></TR>" % data[ index ]
    print "</TABLE>"
print "</BODY></HTML>"

Back to Article

Listing Two

<HTML><!-- Example of BoilerPlate HTML -->
<HEAD>
<TITLE>%(title)s</TITLE>
</HEAD>
<BODY>
<!--
  -- Trap when there is no data
  -->
#if len( data ) == 0#
    <B>No data available</B><P>
#else#
    <!--
      -- Print table heading, then each row
      -->
    <TABLE>
    <TR><TH>Index</TH><TH>Value</TH></TR>
    #for each in data#
        <TR><TD>%(each.index)d</TD><TD>%(each.value)s</TD></TR>
    #end for#
    <TR><TH>Total:</TH><TD>%(each.sum)</TD></TR>
#end if#
</BODY>
</HTML>

Back to Article

Listing Three

# Cook -- apply the given dictionary to a range of text we own.def Cook( self, sink, dict ):
    sink.Append( self.text % dict )

Back to Article

Dr. Dobb's Journal February 1998: Template Processing Classes for Python

Template Processing Classes for Python

By Brad Howes

Dr. Dobb's Journal February 1998

>>> from BoilerPlate import String
>>> z = '''a
#if b#
b
#else#
c
#endif#'''
>>> print String( z, lineMode = 0, b = 1 )
a
b
>>> print String( z, lineMode = 1, b = 1 )
ab
>>> print String( z, lineMode = 1, b = 0 )
ac

Example 1: lineMode processing.

Dr. Dobb's Journal February 1998: Template Processing Classes for Python

Template Processing Classes for Python

By Brad Howes

Dr. Dobb's Journal February 1998

>>> from BoilerPlate import String
>>> z = 'A#if a == 1#B#elif a == 2#C#else#D#endif#E'
>>> print String( z, a = 1 )
ABE
>>> print String( z, a = 2 )
ACE
>>> print String( z, a = 99 )
ADE

Example 2: if processing.

Dr. Dobb's Journal February 1998: Template Processing Classes for Python

Template Processing Classes for Python

By Brad Howes

Dr. Dobb's Journal February 1998

>>> from BoilerPlate import String
>>> z = '''#for x in a#%(x.index)d %(x.value)d
#end for#Total: %(x.sum)d'''
>>> print String( z, a = range( 0, 100, 10 ) )
0 0
1 10
2 20
3 30
4 40
5 50
6 60
7 70
8 80
9 90
Total: 450

Example 3: for processing.

Dr. Dobb's Journal February 1998: Template Processing Classes for Python

Template Processing Classes for Python

By Brad Howes

Dr. Dobb's Journal February 1998

>>> from BoilerPlate import String
>>> z = "Page %( HtmlEncode( '<' + Lower( Roman( foo ) ) + '>' ) )s"
>>> print String( z, foo = 14 )
Page <xiv>
>>>

Example 4: The format descriptor invokes three Formatter methods.

Dr. Dobb's Journal February 1998: What's New in Python 1.5?

Dr. Dobb's Journal February 1998

What's New in Python 1.5?

Dr. Dobb's Journal February 1998

By Guido van Rossum

Guido, Python's creator, works at the Corporation for National Research Initiatives in Reston, Virginia. He can be contacted at [email protected].

Python 1.5 has some powerful improvements over previous versions of the language. I'll briefly describe some of the major modifications here. For more information, see the Python web site at http://www.python.org/.

Packages. Perhaps the most important change is the addition of packages. A Python "package" is a named collection of modules, grouped together in a directory. A similar feature was available in earlier releases through the ni module (named after the Knights Who Say "new import"), but was found to be too important to be optional. Starting with 1.5, it is a standard feature, reimplemented in C, although it is not exactly compatible with ni.

A package directory must contain a file __init__.py -- this prevents subdirectories that happen to be on the path or in the current directory from accidentally preempting modules with the same name. (The __init__.py file was optional with ni.) When the package is first imported, the __init__.py file is loaded in the package namespace. (This is the other main incompatibility.)

For example, the package named "test" (in the Python 1.5 library) contains the expanded regression test suite. The driver for the regression test is the submodule regrtest, and the tests are run by invoking the function main() in this submodule. There are several ways to invoke it:

import test.regrtest
test.regrtest.main()

If you don't want to use fully qualified names for imported functions and modules, you can write:

from test import regrtest
regrtest.main()

or even:

from test.regrtest import main
main()

Assertions. There's now an assert statement to ease the coding of input requirements and algorithm invariants. For example,

assert x >= 0

will raise an AssertionError exception when x is negative. The argument can be any Boolean expression. An optional second argument can give a specific error message; for example:

assert L <= x <= R,\"x out of range"

Once a program is debugged, the assert statements can be disabled without editing the source code by invoking the Python interpreter with the -O command-line flag. This also removes code like this:

if __debug__: statements

This form can be used for coding more complicated requirements, such as a loop asserting that all items in a list have the same type.

Perl-style regular expressions. A new module, re, provides a new interface to regular expressions. The regular expression syntax supported by this module is identical to that of Perl 5.0 to the extent that this is feasible, with Python-specific extensions to support named subgroups. The interface has been redesigned to allow sharing of compiled regular expressions between multiple threads. A new form of string literals, dubbed "raw strings" and written as r"...", has been introduced, in which backslash interpretation by the Python parser is turned off. Example 5, for instance, searches for identifiers and integers in its argument string.

import re, sys
text = sys.argv[1]
prog = re.compile(
		 r"\b([a-z_]\w*|\d+)\b",
		 re.IGNORECASE)
hit = prog.search(text)
while hit:
	print hit.span(1),
	print hit.group(1)
	hit = prog.search(text, 
		  hit.end(0))

Example 5: Using Python 1.5 regular expressions.

Standard exception classes. All standard exceptions are now classes. There's a (shallow) hierarchy of exceptions, with Exception at the root of all exception classes, and its subclass StandardError as the base class of all standard exception classes. Since this is a potential compatibility problem (some code that expects exception objects to have string objects will inevitably break), it can be turned off by invoking the Python interpreter with the -X command-line flag. To minimize the incompatibilities, str() of a class object returns the full class name (prefixed with the module name) and list/tuple assignment now accepts any sequence with the proper length on the right side.

Performance. The 1.5 implementation has been benchmarked as being up to twice as fast as Python 1.4. The standard Python benchmark, pystone, is now included in the test package (import test.pystone; test.pystone.main()).

The biggest speed increase is obtained in the dictionary lookup code. It is aided by a better, more uniformly randomizing hash function for string objects, and automatic "string interning" for all identifiers used in a program (this turns string comparisons into more efficient pointer comparisons). Some new dictionary methods make faster code possible if you don't mind changing your program: d.clear(), d.copy(), d.update(), d.get().

Other speed increases include some inlining of common operations and improved flow control in the main loop of the virtual machine.

I/O speed has also been improved. On some platforms (notably Windows) the speed of file.read() (for large files) has improved dramatically by checking the file size and allocating a buffer of that size, instead of extending the buffer a few KB at a time.

Miscellaneous. The default module search path is chosen much more intelligently, so that a binary distribution for UNIX no longer requires a fixed installation directory. There are also provisions for site additions to the path without recompilation.

If you are embedding Python in an application of your own, you will appreciate the vastly simplified linking process -- everything is now in a single library. There's also much improved support for nonPython threads, multiple interpreters, and explicit finalization and reinitialization of the interpreter.

For those of us who like to read the source, the code now uses a uniform naming scheme (the "Great Renaming") wherein all names have a "Py" prefix. For example, the function known as getListitem() is now called PyList.GetItem().

DDJ

Dr. Dobb's Journal February 1998: About Python

Dr. Dobb's Journal February 1998

About Python

By David Arnold, Andy Bond and Martin Chilvers

The authors are researchers at the CRC for Distributed Systems Technology, University of Queensland, Australia. They can be contacted at [email protected].

Python is a portable, interpreted, object-oriented programming language influenced by a variety of other languages, most notably ABC, C, Modula-3, and Icon. With an elegant syntax and powerful, high-level data types, it is easy to learn and is ideal for CGI scripts, system administration, and many other extension and integration tasks. More importantly, its support for rapid prototyping and object-oriented programming makes it a valuable tool for serious software engineering and product development.

The small, but not oversimplified, core language provides the usual basic data types and flow-control statements, along with higher-level types such as strings, lists, tuples, and associative arrays. Object-oriented programming is supported by a class mechanism following the multiple-inheritance model. Exception handling is provided via the try/catch paradigm.

The real power of Python, however, lies in its extensibility. The language can be extended by writing modules in either Python itself, or compiled languages such as C and C++. These modules can define variables, functions, new data types and their methods, or simply provide a link to existing code libraries. It is also possible to embed the Python interpreter in another application for use as an extension language. The standard Python library includes modules for a wide range of tasks, from debuggers and profilers to Internet services and graphical user interfaces. If you need it, it is probably already there.

Python runs under Windows 3.x, 95, and NT, most flavors of UNIX, Macintosh, and OS/2. It is freely copyable and can be used without fee in commercial products. More information (and source code) can be obtained at http://www.python.org/.

DDJ

Dr. Dobb's Journal February 1998: Template Processing Classes for Python

Template Processing Classes for Python

By Brad Howes

Dr. Dobb's Journal February 1998

Table 1: BoilerPlate class hierarchy.

Dr. Dobb's Journal February 1998: Template Processing Classes for Python

Template Processing Classes for Python

By Brad Howes

Dr. Dobb's Journal February 1998

Table 2: BoilerPlate tags.

Dr. Dobb's Journal February 1998: Template Processing Classes for Python

Template Processing Classes for Python

By Brad Howes

Dr. Dobb's Journal February 1998

Embedding Python objects in HTML pages

BoilerPlate Processing Model

Fragmentation Stage

Conditional Text

Iterating Over Text

BoilerPlate Comments

Value Acquisition

Formatting Stage

Formatter Functions

BoilerPlate and Python eval

Conclusion

Listing One

Listing Two

Listing Three

Template Processing Classes for Python

By Brad Howes

Example 1: lineMode processing.

Template Processing Classes for Python

By Brad Howes

Example 2: if processing.

Template Processing Classes for Python

By Brad Howes

Example 3: for processing.

Template Processing Classes for Python

By Brad Howes

Example 4: The format descriptor invokes three Formatter methods.

What's New in Python 1.5?

By Guido van Rossum

Example 5: Using Python 1.5 regular expressions.

About Python

By David Arnold, Andy Bond and Martin Chilvers

Template Processing Classes for Python

By Brad Howes

Table 1: BoilerPlate class hierarchy.

Template Processing Classes for Python

By Brad Howes

Table 2: BoilerPlate tags.

Template Processing Classes for Python

By Brad Howes

Table 3: Iteration attributes: (a) common attributes; (b) sequence attributes; (c) dictionary attributes.