Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Web Development

Perl & Rapid Database Prototyping


Dec02: Perl & Rapid Database Prototyping

Tim is a freelance software developer and consultant. He can be reached at [email protected].


Even though the bulk of my production development is in C, Java, and assembly language, I often turn to Perl to quickly prototype new ideas. That's especially true with database applications. Perl's DBI module for database access provides convenient access to the most popular relational databases, and its strong text-processing abilities make it easy to synthesize complex queries.

Over the last several years, I've refined a simple framework that handles most routine database access automatically and significantly accelerates the development of simple database applications. This framework takes advantage of Perl's advanced object-oriented features to pack a lot of power into a small package.

Accessors

Accessor functions are the easiest way to abstract database access. By convention, accessors in Perl are subroutines that can either set or return a value, depending on how they are called. Listing 1 illustrates a simple accessor function. The accessor always returns the old value; if a new value is specified, the function also sets the value.

The Class-Per-Table Pattern

Accessors are a natural way to manipulate database data. A simple object-oriented design pattern uses a single class for each table in your database. An object of such a class represents a single row, and you use accessor methods to read or change individual fields. Such a system could be used as in Listing 2. The details of the data storage are completely hidden; this design makes it easy to change the details of the low-level storage without any impact on the higher level application code.

Exploiting AUTOLOAD

The class-per-table pattern can involve a lot of highly repetitive code. For each column in the database, you need a method that can read/write that column. Often, programmers build simple code generators to create reams of code for such classes.

Perl's AUTOLOAD capability lets you eliminate almost all of the explicit accessor methods. If there is no "phone" method defined, then an expression such as $e->phone invokes the AUTOLOAD method instead. The special $AUTOLOAD variable contains the full name (including package name) of the method that was requested. Using this, the AUTOLOAD method can load the requested method from a disk or database or simply emulate the call directly.

My database access framework uses AUTOLOAD to provide a default accessor method for any column of a database. I simply use the name of the column as the name of the accessor. For example, if there is no explicit implementation of the phone method, $e->phone('555-5555') invokes AUTOLOAD to store the specified value into the phone column. This gives you convenient access to any database column without having to write any code.

Packaging It Up

Since real-world projects have lots of tables, I've created a single DBTable class that contains the AUTOLOAD method just described and a number of other convenient methods. The per-table classes simply inherit most of their functionality from DBTable. You only need to write one or more constructors that let you access rows in your tables.

To illustrate, consider the schema in Listing 3. Listing 4 is the complete Perl code needed to use this schema. In particular, note that I've defined no accessors; the AUTOLOAD method in DBTable will handle them automatically.

Defining New Tables

The first three lines of Listing 4 are just standard boilerplate for any Perl class. The only interesting part is the byName constructor. The _fetchOrCreate method accepts a hash that identifies this row and returns a hash with the values of one or more columns. If the row doesn't exist, it is created.

The DBTable methods rely on the TABLE and KEY entries in the hash. The TABLE entry specifies the name of the database table. The KEY entry is a hash providing a primary key that identifies the particular row. In Listing 3, the primary key is the value of the employee_id field. The first argument to the _fetchOrCreate method specifies a condition for identifying or creating the necessary row; the second argument is the column whose value is to be returned.

If you now look carefully at Listing 2, you can see how the entire system works. The $e variable in that example contains everything needed to identify a particular row in a certain table. The call $e->phone('555-5555') invokes the AUTOLOAD method in the DBTable class, which can use the table name (from the TABLE key), the primary key (from the KEY key), and the column name (from the method name) to access the database.

Inside DBTable

The full implementation of DBTable is shown in Listings 8 and 9, available in the Source Code Appendix, pages 24-26. There are only four places where SQL statements are actually constructed: _fetch reads one or more columns from a row, _create creates a new row, _set sets the values of columns, and _columns obtains the names of the columns in this table. The string manipulations that convert hashes into SQL statements are admittedly a bit convoluted, but they are hidden inside just a couple of small functions. Also available electronically is an example program that uses this code. You can use this as a starting point for your own database applications.

Advanced Techniques

Sometimes you will want accessors that return objects rather than raw database fields. Listing 5 is a manager method that returns an object. With this addition, you can refer to $e->manager_id to obtain the numerical ID for the manager, or $e->manager to obtain an object. Such methods are especially useful when you have complex schemas with many interrelated tables. In like fashion, you can define methods that simulate nonexistent database fields (by transparently accessing joined tables, for instance).

The byID constructor in Listing 5 could have simply used $self->{'KEY'} = {'employee_id'=>$id} to set the KEY without any database access. Although that would be faster, it doesn't provide any guarantees that the requested row actually exists.

Finally, remember that $e->dbHandle returns the underlying DBI database handle to your application so that you can use SQL statements directly if necessary.

Performance

This framework makes it easy to prototype database applications. You simply define a package for each table in your schema, and those packages need only contain one or two simple constructors each. You can then begin to build your application, accessing database columns through accessor functions.

Of course, such applications are going to be slow. Every time you read or write a field, at least one database access is going to occur. As your application evolves, you can improve performance by overriding specific accessors. For example, if you access the primary key fields frequently, you can override them using code like that in Listing 6.

Listing 7 takes this idea a step further by modifying the constructor to store the entire row in memory, and overrides the _fetch and _set methods to read and update the in-memory versions, dramatically reducing database traffic. Such optimizations are not always appropriate, however, and are generally best delayed until you understand your application better.

Future Directions

Currently, DBTable stores the DBI database handle in a global variable, which limits you to a single database handle at a time. Applications that must access multiple databases will need to find a way to address this.

Right now, the _columns method is rather wasteful. It generates a full database query once for every object. It would be more efficient to do so only once for every table. Worse, it stores a list of valid columns in every object rather than once for each table, which could become a significant memory problem.

Conclusion

This framework has helped me to very rapidly build new database applications. In some cases, performance is not a serious issue, and such code can even be used in production. However, even when performance is an issue, this design is still a good place to start. You should generally be able to build functioning applications using this interface and then upgrade the underlying data access to improve performance without having to extensively modify your application.

TPJ

Listing 1

# Simple accessor
sub foo {
   my $a=$FOO;
   ($FOO)=@_ if @_;
   return $a;
}
# Using accessors
foo(4);  # foo = 4
print foo(5); # Print 4, set foo=5
print foo(); # Print 5

Back to Article

Listing 2

# Access a row
$e = Employee->byName('Doe', 'John');
# Get phone number
print $e->phone();
# Set phone number
$e->phone('555-5555');

Back to Article

Listing 3

CREATE TABLE employee(
   employee_id  INT PRIMARY KEY,
   first        CHAR(80),
   last         CHAR(80),
   phone        CHAR(80),
   manager_id   INT
);

Back to Article

Listing 4

package Employee
use vars qw(@ISA);@
ISA = qw(DBTable);
sub byName {
  my($class,$last,$first) = @_;
  my $self = bless {'TABLE'=>'employee'},$class;
  # _fetchOrCreate returns a hash
  # { 'employee_id' => <number> }
  $self->{'KEY'} = $self->_fetchOrCreate(
       {'last' => $last,'first' => $first}, 'employee_id' );
  return $self;
}

Back to Article

Listing 5

# An accessor returning an object
sub manager {
    my($self) = @_;
    return Employee->byId($self->manager_id);
}
# Another constructor
sub byID {
    my($class,$id) = @_;
    my $self = bless {},$class;
    $self->{'TABLE'} = 'employee';
    $self->{'KEY'} = $self->_fetchOrCreate
                     ({'employee_id' => $id}, 'employee_id' );
    return $self;
}

Back to Article

Listing 6

# An optimized employee_id accessor
sub employee_id {
  my $self = shift;
  $self->{'KEY'}->{'employee_id'};
}

Back to Article

Listing 7

# An optimized Employee class
package Employee
use vars qw(@ISA);@
ISA = qw(DBTable);
sub byName {
    my($class,$last,$first) = @_;
    my $self = bless {},$class;
    my $cols = $self->_columns;
    $self->{'TABLE'} = 'employee';
    # Read and cache all of the columns
    $self->{'FIELDS'} = $self->_fetchOrCreate
                  ({'last' => lc $last, 'first' => lc $first}, keys %$cols);
    $self->{'KEY'} =
        { 'employee_id' => $self->{'FIELDS'}->{'employee_id'} };
    return $self;
}
# Override _fetch to just pull the value from memory
sub _fetch {
    my ($self, $key, $col) = @_;
    return { $col => $self->{'FIELDS'}->{$col} };
}
# Override _set to modify the in-memory value
sub _set {
    my($self,$key,$set) = @_;
    $self->SUPER::_set($key,$set); # Set fields in DB
    foreach $k (keys %$set) {
       $self->{'FIELDS'}->{$k} = $set->{$k};
       # Remember to update the value in 'KEY', if it's there
       if(exists $self->{'KEY'}->{$k}) {
          $self->{'KEY'}->{$k} = $set->{$k};
       }
    }
}


TPJ

Back to Article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.