Channels ▼
RSS

C/C++

The Persistent Template Library

Source Code Accompanies This Article. Download It Now.


Dr. Dobb's Journal March 1998: C Programming

Al is a DDJ contributing editor. He can be contacted at astevens@ddj.com.


In two recent columns, I introduced my new friend, Marine Corps Major Buzz Lightyear. I met Buzz when I attended Jim McCarthy's TeamworX BootCamp in May, 1997. Jim, Buzz, and I staged a reunion at Software Development '97 East in Washington, D.C. I gave Buzz his nickname at BootCamp because his attitude, enthusiasm, and occasional frustrations reminded me of the character in Toy Story. Until now, I have not revealed Buzz's true identity in this column because of an agreement with the BootCamp attendees that I would not use anyone's real name. Buzz released me from that commitment, but I continued to observe the original promise; other members of the team might read this column and think I was breaking my word.

I can now bring Buzz out of the closet (well, not in the traditional sense -- I'm just going to tell you who he is). Actually, he did it himself with the help of none other than Bill Gates. If you attended Bill's Fall Comdex keynote address or saw reports of it, you know he was joined on-stage by Marine Corps Major Jim Cummiskey and sports-legend Kareem Abdul-Jabbar. Major Jim Cummiskey, USMC, now famous far beyond the reach of this humble column, and Buzz Lightyear are one and the same. During his Comdex presentation, he managed to slip in the BootCamp buzzword, "resonate," a reference that I'm sure resonated with all the team members who happened to see it.

A Persistent STL

Three years ago this month, I interviewed Alexander Stepanov, the creator of the Standard Template Library, which the ANSI/ISO committee had accepted as part of Standard C++. We discussed STL and persistence, a property which, in the object-oriented lexicon, permits an object to outlive its creator (the program, not the programmer) and to move from the space in which it was created to some other space -- permanent storage, the memory space of other programs, or both. Persistence is usually implemented with disk files that store object memory, which permit programs to reinstantiate previously created objects, a concept that views traditional database management from an object-oriented perspective.

In our interview, I observed that STL does not implement a persistent object container model, and I asked Alexander to comment on STL and persistence. He responded:

This point was noticed by many people. STL does not implement persistence for a good reason. STL is as large as was conceivable at that time. I don't think that any larger set of components would have passed through the standards committee. But persistence is something that several people thought about. During the design of STL and especially during the design of the allocator component, Bjarne [Stroustrup] observed that allocators, which encapsulate memory models, could be used to encapsulate a persistent memory model. The insight was Bjarne's, and it is an important and interesting insight.

This statement caught my attention. I have long been interested in using programming languages to implement database-management software. In 1987, I published C Database Development, a book with code that implements a relational database by using the features of the C language and its preprocessor. The first edition used K&R C. The second edition, published in 1991, used Standard C. I called this software "CData: The Cheap Database Management System." In 1992, I published C++ Database Development, a book that presented an approach I called "Parody: The Persistent, Almost Relational Object Database Manager." Parody, and its successor, Parody 2 (a 1994 second edition that used newer C++ language constructs), employed inheritance to implement a persistent object mechanism. The technique works, but is somewhat cumbersome and limited because it inherits the classes that produce persistent objects from a persistent base class. Many programmers still use CData and Parody for their small database and object storage applications.

Following my conversation with Stepanov, I watched the committee's progress with the hope of expanding my persistent object ideas to use the STL container programming model -- templates and generic programming -- rather than inheritance. Persistent containers rather than persistent objects, it seemed to me, would better support the persistence needs of most programmers. I delayed starting work until the committee's work was near completion and until at least one major PC compiler supported enough of Standard C++ and the STL to enable me to develop, implement, and publish my ideas. That time is now. The committee has approved Standard C++, and Microsoft Visual C++ 5.0 implements most of Standard C++ and STL with only a few language enhancements remaining to be implemented -- complete enough for me to get started.

I looked at two other compilers. The gnu-win32 port of GNU C++ uses the original HP release of STL without default template arguments in the template class declarations. Borland C++ 5.2 uses a version of Rogue Wave's STL that also does not employ default template arguments. Neither of these implementations are close enough to the Standard C++ specification to work with what I have in mind. Microsoft licensed P.J. Plauger's STL implementation, which is close enough to the Standard to test my ideas. There may be others on the PC platform. Eventually, all compilers will comply with the Standard, and the code published with this column should work with those other compilers.

There is not, as far as I can find from a search of the Web, a public-domain object database system that uses STL as its basis. This project tries to fill that void.

iostream Iterators

Standard C++ addresses persistence in the STL by providing iostream iterators named istream_iterator and ostream_iterator. These iterators support forward sequential access only. By using them, you can write objects to and read objects from disk-based containers. Listings One and Two demonstrate this process. The first program, tstios01.cpp, constructs a memory-based vector container of pseudorandom numbers and uses an ostream_iterator<long> iterator as an argument to the STL copy algorithm to copy the memory-based container to a disk-based container. The second program, tstios02.cpp, constructs an empty memory-based vector container with enough space to hold the numbers in the disk file. Then it uses an istream_iterator<long> iterator as an argument in the copy algorithm to copy the disk-based container into the memory-based container.

File-based iterators are elegant indeed. Just the fact that they can be implemented and used so transparently highlights the genius (Stepanov) that underpins the generic programming model of STL. I see three drawbacks to using this approach for persistent containers, however. First, it takes a lot of code in the application to manage the disk container, code that, due to its complexity, should be encapsulated once and reused. Second, the technique supports only those operations that can be performed with forward iterators. Third, the disk containers record the objects as text strings by using the overloaded ostream::operator<< and istream::operator>> insertion and extraction operators. Each object is separated by a text string specified as an argument to the ostream_iterator constructor, and you must make sure that the separator is something that properly separates input strings as recognized by the istream class. Objects not only use more disk space than they need to, they are variable in length depending on their data values. Unless you specify the setw manipulator or width member function for the disk file stream output, an integer with the value 12 takes two character positions (plus the separator string), and an integer with the value 22334 takes five character positions. Because of this architecture, you cannot use random access or bidirectional iterators with these disk-based containers.

Persistence and STL Allocators

My first thoughts about a persistent STL were inspired by Alexander's statement that allocators might be used to encapsulate a persistent memory model. STL containers are memory based. The simplest application of persistence would save a memory-based container on disk to be retrieved later by an instantiation of that same container as identified by its file-name. The newly instantiated container would also be memory based. It could be any STL container built from a disk file when the container is constructed, and saved to the disk file when the container is destroyed. From that concept, I pictured the code in Listing Three.

The PAllocator template class in Listing Three would encapsulate everything it needs to load and save a "templatized" container. The specialized IntAllocator class would exist only to provide a filename for the particular container, in this case a vector of int objects. In the example, the filename would be "IntAllocator." You would not have to use the allocator class's typeid name, however. Any string value would do. When the program instantiates the persistent container, the allocator allocates memory and loads the memory with objects from the disk file. When the program destroys the container, the allocator saves the objects from memory back to the disk file to record any changes the program has made to the memory-based container.

That was the idea. I really wanted to be able to implement an interface like the one in Listing Three, because it is central to the idea that you could instantiate a standard container and have it silently exhibit persistent behavior. Based on what Alexander said in that interview, I tried diligently to use allocators (and iterators) as the key mechanism to implement a persistent STL.

Unfortunately, none of what I tried worked. It wasn't Stepanov's fault. In that same response, he observed that:

In October 1994...there was strong interest [among the Object Database Management Group] to make the containers within their emerging interface to conform to STL. They were not looking at the allocators as such.

I should have paid closer attention.

Here's why it didn't work. STL containers typically instantiate an allocator object in the container's constructor, although exactly how they manage that is an unspecified implementation detail. Each container has three or four overloaded constructors. (Container adapter constructors specify container arguments. This discussion is about the constructors of the container classes not those of the adapter classes.) The first constructor initializes an empty container. The second constructor constructs a container of specified size and fills it with objects of a common value. (The map, set, multimap, and multiset containers do not have this constructor.) The third constructor copies objects from a range specified by two iterators, presumably that point to another container. The fourth is a copy constructor. There are no communications between the constructors and their allocators about initializing the containers' allocated memory with object values. Consequently, allocators seem not to be the path to persistence after all, at least not yet.

I thought I might make the persistent allocator concept work by partially specializing each container to overload its constructors to communicate with the allocator. Until PC compilers implement partial specialization, however, I cannot experiment with this approach.

PTL

Following all that failed experimentation, I caved in and decided to use inheritance -- not to add persistence to object classes as Parody does, but to add persistence to containers. Persistent containers, being derived from standard containers, have to have their own names.

I hereby introduce PTL, the Persistent Template Library, Version 1. The header file ptl.h (available electronically; see "Resource Center," page 3) implements the template classes that constitute PTL. There is one class per STL container and some base classes. The naming convention for the public interface prefixes the STL container name with PTL. Consequently, a persistent vector is the template class PTLvector, and so on. The top base class in the template hierarchy is the PTLcntr class, which is derived from whatever STL container class is involved. Intermediate classes encapsulate operations common to more than one of the containers. For example, PTLvector, PTLdeque, and PTLlist are derived from PTLseq.

Let's start with an example. Tstptl01.cpp (available electronically) declares a PTLvector<int> container. Constructors for PTL containers specify their filenames. The library adds the file extension .ptl to the filename. The declaration instantiates a memory-based vector container that contains whatever was stored in the vector the last time one was instantiated with that particular filename. The first time you run the program, the file does not exist, and the declaration instantiates an empty vector. The program uses the push_back function to add integers to the vector. When the PTLvector object is destroyed, the memory-based vector is written to the disk-based container.

Tstptl02.cpp (also available electronically) instantiates the same PTLvector<int> container by using the same filename as the constructor argument. This declaration results in a vector container that is initialized with the values from the disk file. The program declares an iterator of type PTLvector<int>::iterator. This iterator is not specific to disk-based containers. It is the standard vector<int>::iterator object that STL declares. The PTL notation works because PTLvector<int> is indirectly derived from vector<int>. Accessing the container after it is instantiated uses STL conventions because the container is a standard memory-based STL container. If the program changes the contents of the container (it does not in this example), the contents are saved to the disk file when the container is destroyed.

There are other example programs to demonstrate all the PTL containers. I'll discuss two of them later in this column. They all are available electronically.

To Save or Not to Save

Currently, the base PTLcntr class assumes that it should always save the contents of the container on disk when the container object is destroyed. I considered overloading all the STL container functions that can change the container -- operator[], insert(), push(), and so on -- and saving only if one of them was called at run time. I might do that yet, but for now, PTL saves the container automatically unless you call the suppress_save function.

PTLStore

Ptlstore.h and ptlstore.cpp implement the PTLStore class to manage disk input and output of all persistent containers. Ptlstore.h overloads the << and >> operators with template functions that read and write objects of types that have flat memory data representations. These overrides use the sizeof operator and the object address to store the object's memory image on the disk. This approach works with all C++ intrinsic types and some user-defined types, which I'll address next. Ptlstore.cpp includes specialized insertion and extraction operator functions for string objects. These functions store the length of the string data and the string characters.

Preparing a Class to be Persistent

The examples discussed so far use containers of intrinsic types. PTL works with user-defined types if the types are STL-compatible and if they conform to certain conventions. First, if the class has no virtual functions (no vptr), no pointers, no references, and no embedded objects that have any of these things, the class is already PTL-compliant because its data representation is a flat memory image of the data values -- a C struct without pointers. Otherwise, if the class refers to other memory or has a vptr, the class has to participate in its own persistence by first overloading the insertion operator to save sufficient data to reconstruct the object and then overloading the extraction operator to do the reconstruction. The overloaded string insertion and extraction operator functions in ptlstore.cpp are examples of this procedure.

Date.h, the header file for a Date class, has a virtual destructor, so Date objects cannot be automatically saved and restored by the PTLStore class. Date.cpp includes overloaded insertion and extraction operators for both iostreams and PTLStore. You can see how similar they are. But the PTLStore functions use PTLStore's template functions to store intrinsic values, which work with memory images rather than character strings. Note that even though the Date class's data representation is a long integer to represent the number of days since 1/1/1, their persistent representation uses day, month, and year integer values. This approach permits someone to modify the Date class for a different data representation without having to make data conversions on all the PTL disk files that store dates.

Brothers.h is a header file that defines a predicate function to be used by a map container to compare string objects with a case-insensitive compare. Brothers.h also provides a typedef to give the shorthand alias, Brothers, to the PTLmap<std::string, Date, ltstr> class just to keep the code as brief as possible. Tstptl13.cpp and tstptl14.cpp use the Brothers alias to instantiate PTLmap containers keyed on string values and returning Date values.

Cross-Container Stores

The disk files for the containers share a common format. The file begins with an integer value that specifies how many objects are in the container followed by the object values in the sequence in which they are to be inserted into the memory-based container when it is reinstantiated. This common format means that you can construct a persistent vector container and use its filename to instantiate a list container, for example. There are two exceptions: First, maps and multimaps contain two objects of possibly different types for each container entry, so those files are not interchangeable with the other containers. Second, multiset and multimap containers are not compatible with set and map, unless you are willing to forsake the duplicate entries when you instantiate the nonmultivalue memory-based containers from the multivalue disk-based containers.

PersistentTemplateLibrary Namespace

PTL is enclosed in the PersistentTemplateLibrary namespace. Each program that uses PTL starts with a namespace alias statement to shorten the notation to ptl. Why didn't I simply use "ptl" as the namespace? Well, there is a better chance that no one else has used the longer name. Programmers use namespace aliases for brevity and can choose one that is not in the current scope. Library builders have no way of knowing what other libraries a programmer is likely to use. I asked two prominent committee members if they knew of any coordinated effort to build an international namespace registry. Apparently, no such effort is underway. P.J. Plauger likes "...Sun's trick with Java libraries: if you own the domain plauger.com, you have all rights to the package names COM.plauger.*," a natural construction for an Internet-based language. Bjarne Stroustrup says the reigning mechanism is "...if you call your namespace AT&T or Microsoft, you are likely to get into trouble," which puts management of namespace registration into the hands of the lawyers. Registry by saber rattling.

In any event, for now I'm using PersistentTemplateLibrary. If that name collides with yours, tell me, not your lawyer. I'll keep changing it until I come up with something that no one else would dream of using.

What's After PTL?

PTL has a limitation -- the real containers are memory based. The disk-based containers exist only to support persistence. You would not use PTL as a substitute for a large object database. My goal is to implement disk-based sequential and associative containers that conform to the STL interface and that offer some of the functionality of a relational database. I'll be looking at some of the low-level Parody code to see what I can reuse. To support bidirectional and random-access iterators, such containers must provide a way to retrieve an object by its position in the container, which means that the container storage must use fixed-length object images or that the container provides an efficient way to locate an object's disk image from some position-related index.

Memory-based objects are always fixed length as reported by the sizeof operator. Their members, however, can point to dynamically allocated memory of different sizes for different objects of the same class. The standard string class is an example. The sizeof operator will always return the same value irrespective of the length of the string, which is stored on the heap. An efficient object store would use a fixed-length memory image for two reasons -- first, so that retrievals based on position could simply seek to the position as a function of the image size; second, so that the program could rewrite an image over the disk position it previously held and delete objects by adding their fixed-length disk record space to a free space list.

One approach I am considering is the development of a persistent heap to contain the variable-length members of a fixed-length object. The persistent heap must be able to associate fixed object identifiers with allocated blocks so that the heap can be reorganized, concatenating associated storage and eliminating fragmented free space to optimize the heap's performance. Something similar to the MS-DOS FAT system without the fat.

Not all memory-based container data structures would be well-served by disk-based containers. Inserting and deleting objects in a large disk-based vector container would be an inefficient operation, for example, because of the shifting of objects. This project has to consider these factors and arrive at solutions that make sense, not merely strive for disk-based object stores that mimic the behavior of STL containers.

DDJ

Listing One

// ------ tstios01.cpp#include <fstream>
#include <iostream>
#include <string>
#include <vector>
#include <iterator>
#include <cstdlib>


</p>
const char* fname = "tstiostr.txt";
int main()
{
    std::ofstream ofile(fname);
    std::ostream_iterator<long> iter(ofile, " ");
    // ------ vector of pseudorandom numbers
    std::vector<long> rno(100);
    for (int i = 0; i < 100; i++)
        rno[i] = rand();
    // ------- write the data
    std::copy(rno.begin(), rno.end(), iter);
    ofile.close();
    return 0;
}

Back to Article

Listing Two

// ------ tstios01.cpp#include <fstream>
#include <iostream>
#include <string>
#include <vector>
#include <iterator>


</p>
const char* fname = "tstiostr.txt";
main()
{
    std::vector<long> rno(100);
    std::ifstream infile(fname);
    // ----- read vector of pseudorandom numbers
    std::istream_iterator<long> in(infile);
    std::istream_iterator<long> end;
    std::copy(in, end, rno.begin());
    // ------- display the input numbers
    for (int i = 0; i < 100; i++)
        std::cout << rno[i] << ' ';
    return 0;
}

Back to Article

Listing Three

#include <vector>#include <typeinfo>
// --- base persistent allocator class
template <class T>
class PAllocator : public std::allocator<T> {
protected:
    PAllocator(const char* filename);
    // ...
};
// --- specialized persistent allocator class
class IntAllocator : public PAllocator<int>  {
public:
    IntAllocator() : PAllocator<int>(typeid(*this).name())
        {  }
};
int main()
{
    std::vector<int, IntAllocator> pints;
    // ...
    return 0;
}

Back to Article


Copyright © 1998, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video