C/C++

Persistent Objects in C++

By Al Stevens, December 01, 1992

Al discusses persistence, then presents a method for adding persistent objects to C++ programs by deriving applications classes from a persistent base class.

DEC92: PERSISTENT OBJECTS IN C++

This article contains the following executables: PARODY.ZIP

Al is a contributing editor to DDJ and can be contacted through the DDJ offices or on CompuServe at 71101,1262.

This article describes a method for adding persistent objects to C++ programs by deriving applications classes from a persistent base class. Persistence is not part of the C++ language, so every application must deal with it in one way or another. The techniques described here represent a persistent-object database manager I've implemented as a class library. Because of space constraints, that library is only available electronically; see "Availability," page 5.

To understand persistent objects we must first agree on what an object is. In C++, an object is a declared instance of a data type. The object can be an instance of one of the C++ primitive data types, such as int or long, or it can be an instance of an abstract data type, defined as a C++ class. When you declare an object, it comes into scope, bare except for the memory it occupies and the constructor effects of any data-member initializers you provide. Its contents might change significantly while it is in scope because your program might do things to change it. When the object goes out of scope, its memory is returned to the C++ free store, and the object ceases to exist. The object is not persistent because the changes are not in evidence the next time your program, or any other program, declares the same object.

A persistent object would retain its content from instance to instance. You could declare a persistent object, change it, and let it go out of scope. The next time you declared another instance of the same object, the object would reflect the changes you made. This concept implies that the object system uses an object database to store and retrieve objects and that such a database knows how to retrieve and save specific objects when they are constructed and destroyed.

Just as not every variable in the traditional program is in the database, not every object in an object-oriented program will be persistent. You would not want every class to include the persistent attribute. If every object were persistent, the object database would grow unnecessarily large as it gathered copies of dates, strings, complex numbers, and so on, that programs declared. Instead, the persistent attribute should apply only to those objects whose values the application needs to retain.

Nonpersistent C++

The C++ language does not support intrinsically persistent objects. In fact, it does not support the notion that an object has its own identity--that a subsequent declaration of a previously declared and destroyed object is, in fact, the same object at all. It is unlikely that C++ will ever have persistent objects as a part of the language. Type identification is, however, being considered by the ANSI X3J16 committee. Although type identification does not necessarily distinguish individual objects of a class, it can distinguish classes from one another, which is necessary to an object-database manager that must store and retrieve objects of different classes.

The Persistent-object Database

A number of commercial products already implement persistent C++ objects in one form or another. I've seen several available for MS-DOS computers and heard of others on other platforms. The objective of the approach described in this article is not to cover all the terrain with a package that competes with such commercial products, but to present a solution that supports most of a programmer's requirements without an overwhelming and complex interface.

You will observe that I do not identify the technique proposed in this article as an object-oriented database. No consensus exists on the definition of this term. I suspect that the formal definition will accrue by default to the format of whatever turns out to be the most successfully marketed object-database management system. Until then, it is prudent to avoid using the expression unless you are marketing such a system and hope to define the technology by your success. I will call this technique simply a persistent-object database.

Objectives of the Persistent-object Database

The persistent-object database must consist of a simple and intuitive class-library interface. There should be a minimum number of class member functions in the interface, and the interface should use the features of C++ in an intuitive manner. The objectives of the persistent object interface are as follows:

Associative access. Objects must be identified by key data member values. When the persistent-object database saves an object, it maintains an index that associates key values with their object records in the database. When a subsequent instance of an object of the same class specifies the same key value, the persistent-object database will retrieve the associated object. The database will also maintain indexes of other data members so that the program can retrieve objects on the basis of data values other than the primary key.

Associative access is preferred over navigational access, which assigns object identifications--identifications usually related to the object's address in the database. The application must remember these addresses to retrieve the objects later. Objects related to other objects of other classes would similarly use addresses to point to their relatives. Some existing systems work this way. Navigational access is one of the characteristics that the relational database model strives to eliminate because of the inherent instability in its use.

Maintenance of objects. The persistent-object database will provide methods for the application to change and delete objects that exist in the database.

Object integrity. The persistent-object database will indicate when a declared object does not already exist in the database. It will also refuse to add an object to the database if another object exists with the same key value.

Class relationships. The persistent-object database will maintain relationships between classes on the basis of key values, and it will maintain the integrity of those relationships. If a class includes another class's primary key as a secondary key, the classes are related. The persistent-object database will not delete an object if other objects that are related to it remain, and it will not attempt to relate an object to a nonexistent object.

The DBMS Connection

The aforementioned objectives sound like the same ones that have been around for years and are supported by traditional database management systems (DBMSs). We're breaking no new ground here, it seems. The requirements to store, maintain, and retrieve entities of data have not been overturned simply because we have newer ways to express programs. The object-oriented paradigm does not invalidate years of experience with database-management technology.

To understand how an object database can leverage that experience and still exploit the expressiveness of object-oriented design, we should examine the problems of implementing persistent objects and some of the solutions.

Persistent C++

Inasmuch as C++ does not support intrinsic persistent objects, what features of the language might contribute to the solution? First, we have stated that we need to specify which classes are persistent. The C++ inheritance mechanism takes care of that. If you derive a class from a persistent base class, the derived class is persistent. Second, the persistent object needs to retrieve a copy of itself when a program declares it. The C++ constructor mechanism is the best place to handle that. Finally, the persistent object needs to write itself back to the database when the object goes out of scope. The C++ destructor mechanism will do that. So, although C++ does not directly support persistent objects from within the language itself, it does provide the primitive language constructs with which we can add the feature.

Persistent C

Object persistence in C was a relatively simple matter. You defined a structure, put some data in it, and wrote it to a disk file. A generic file input/output function would read and write the structure by using the sizeof operator. This method is self-adjusting when you modify the program to change the contents of the structure. If you needed more features, you used a DBMS, but the solution still centered around the basic flat record structure, and so the C solution looks like Example 1.

Example 1: A C persistent object.

  struct Employee       {
       /* ... */
  }  Employee empl;
  fread (&empl, sizeof (struct Employee), 1, dp);

The C solution is not perfect, however. If the structure has a pointer in it, the generic file input/output function becomes less generic. It has to know what the pointer points to, the length of that object, and, if the pointer points to the first member of an array, the size and number of the members. But, except for these restrictions, persistence in C is a straightforward procedure.

Persistent Wrinkles in C++

The C++ solution introduces a new set of problems, primarily because of the many new kinds of things that you can put into a class. Suppose you wanted to design a Persistent base class that would automatically manage all aspects of persistence for any derived class. Now consider the derived class in Example 2.

Example 2: A persistent-object class.

  class Employee : public Persistent {
       EmployeeNo emplno;
       String *name;
       Department& dept;
       int promotion_ctr;
       Date *promotions;
  public:
       virtual int
  SalaryReviewPeriod();
  };

The Employee class in Example 2 has several data members:

An instance of another class, known to the application.
A pointer to an object of a class taken from a class library.
A reference to an object of one of the application's classes.
A count of the number of members in an array.
A pointer to the first member of an array.

To complicate matters, the class has a virtual function, which means that a vtbl (virtual function table) pointer is somewhere among the other data members. Furthermore, you do not know if the embedded objects have vtbls.

How would you design a Persistent base class that would know how to find all the pertinent data members and write them to disk? How would it know how to construct those data members properly to read them back in? How would it know the size and location of the data members? How could it possibly understand the dependent relationship between the promotion_ctr member and the promotions member? How would it know how to get around the one or more vtbl pointers in the class?

The answer is, of course, that you couldn't design such a Persistent base class.

Some Solutions

How can you solve these problems? There are a number of ways, the first and most frequently used of which is the least desirable.

Custom file input/output methods. You can forget the idea of a Persistent base class and write custom file storage and retrieval methods for every class in your design. This approach betrays everything we have learned about database management in the last thirty years. The apparent contrary nature of the C++ class definition is the result of its power. It is a strength of C++ that you can design a class that includes all things difficult to pin down in a database manager. But that doesn't mean we should abandon the effort--only that it is going to take some thought and work.

Extend the C++ language. You can propose to the X3J16 committee that they add the persistent attribute to the language and that they figure out how to make it work. You can. I won't.

Write a preprocessor. This is one way to extend C++ without getting involved with X3J16. You can write a preprocessor program that translates your extended C++ language into C++ code. The preprocessor would need to search all the header files to ferret out the formats of the embedded classes. This approach will not resolve problems such as the relationship between the counter and the array pointer mentioned above, but it is a step closer. One commercial product, POET from BKS Software (Cambridge, Massachusetts), uses a preprocessor to translate class-definition extensions into C++.

Limit the scope of a persistent class. You can get around all the aforementioned problems by laying down some rules. You can specify that to derive from the Persistent class, an object may not use any of the C++ features that cause those problems. If your application can get by without the features, then such an approach might work. Using sizeof in the base class would not work, though. The sizeof operator is not polymorphic. Nonetheless, if it suits you to limit a persistent-object class to that which is easy to implement, you might as well use one of the relational DBMSs already available. Read on.

Use a relational DBMS. You can build an object-database manager by putting a C++ wrapper around an existing relational (or other) DBMS. That is a viable, and sometimes appropriate approach. At least one commercial object-database manager does just that. The CodeBase product from Sequiter Software (Edmonton, Alberta) is a C function library that uses the dBase database formats. Their C++ product consists of C++ wrapper classes around their C function library.

When should you use a relational database and when should you use an object database? The answer lies in an analysis of the mutually exclusive strengths and weaknesses of each and the requirements of your application. The relational data model has some advantages that the object data model does not support, namely:

The relational schema is stored in the database catalog.
General-purpose query programs can use the catalog.
The SELECT, PROJECT, and JOIN operators can build new database views at run time.
The database is cross-compatible with other applications that use the same DBMS.

Conversely, the object data model supports data representations that the rules for the relational data model absolutely forbid:

Variable-length data members to support such applications as imagery, multimedia, geographic data, and weather.

Abstract data types. The typical relational DBMS supports a small set of primitive data types.

Arrays.

Encapsulation of data formats with the methods that define behavior.

Polymorphism.

Clearly, the strength of the object data model is that it supports an object-oriented design. A designer must decide which way to go by weighing the benefits of both approaches. If you decide that you need those relational features not available in the object model, then read no further. You will find, however, that the object model can emulate much of the behavior of the relational model.

The derived class cooperates in its own persistence. This is the approach presented in this article. Everything that the base class cannot know about the data is known to the derived class. If the derived class provides a few required functions that the base class calls, and if the derived class calls specified base-class functions at specified times, the problems associated with building a Persistent base class evaporate. The only problem that remains is to design an interface to the persistent-object database that is simple enough not to clutter up the user's program.

The Persistent-object Database: Cooperation Among Classes

To cooperate with the Persistent base class, a derived persistent-object class will provide some of the methods. To be sure that it does, the Persistent class names them as pure virtual functions. You cannot declare a persistent object unless those functions exist in the derived-class definition. The Persistent class will call the functions when it needs them.

One such function in the derived class provides the class-type identifier. Some day this function will be unnecessary. Until then, we'll have to provide it.

The derived class's constructor calls LoadObject in the base class after all the construction is done. That tells the base class to use the key data member value to position the database at the object's record. Then the LoadObject function calls the derived class's Read function, which must be provided. A derived class will know which data members need to be read. The derived class does not do the physical reading and writing; the Persistent base class does that. The derived class calls the base class's ReadObject function, passing the address of each data member and its size.

The derived class's destructor calls SaveObject in the base class before destruction of the object begins. The base class positions the database at the object and calls the derived class's Write function, which writes the data members back to the database through the WriteObject function in the base class.

The Persistent-object Class Interface

The Persistent class provides an interface to the derived class's user, the program that declares the persistent object. The ObjectExists function tells if the specified object was found in the database. If not, the AddObject function tells the base class to add the object to the database when the destructor calls the SaveObject function. The DeleteObject function tells the base class that the user wants to delete the object from the database. The ChangeObject function tells the base class that the user changed the object in some way and that it should rewrite the object when the constructor calls the SaveObject function.

The application can navigate the object database by using the FirstObject, LastObject, NextObject, and PreviousObject methods. Calling these functions retrieves the relative object based on the key sequence of the key that the program used to construct the object.

Class Definition = Schema

Traditional DBMS languages include a data-definition language (DDL) that defines the format of records in the database files and the relationships of files to one another. The DDL is said to define the database schema. The persistent-object database's DDL is C++ itself. You design a database by designing classes that define the file formats and their key data members. Example 3 is an illustration of such a design.

Example 3: A database schema.

  // -------- key department number
  class DeptNo : public Key      {
       int deptno;
       //  ...
  };
  // -------- department class
  class Department : public Persistent     {
       DeptNo deptno;          // primary key
       String *name;
       //  ...
  };
  // -------- key employee number
  class EmployeeNo : public Key {
       int emplno;
       // ...
  };
  // -------- Employee class
  class Employee : public Persistent {
       EmployeeNo emplno;       // primary key
       DeptNo deptno;           // secondary key
       String *name;
       Date date_hired;
       Currency salary;
  public:
       // ...
  };

Two of the classes in Example 3 are derived from the Key class. This is how you specify a key data member. Observe that the Employee class has two key members, the EmployeeNo object and the DeptNo object. The first derived Key object is the class's primary key. All subsequent Key objects are secondary keys. There can be only one object of a particular class in the object database with a given primary key value because it's the primary key value that identifies the object. Multiple objects of the same class can share secondary key values, however. The Department class in Example 4 has a DeptNo member as its primary key, so there may be only one Department object with a department number of 123, for example. The Employee class has a DeptNo member as a secondary key. Several employees can be assigned to department number 123. This relationship is an implied one, based on the presence of those Key data members. Because the design implies the relationship, the persistent-object database maintains it. You will not be allowed to write an Employee object with a nonnull DeptNo key value unless there is a corresponding Department object with the same key value. You will not be allowed to delete a Department object if any Employee objects include the matching DeptNo key value.

The Persistent Class

Example 4 is a listing of a simplified Persistent class; Example 5 is a simplified Key class. The classes in Example 3 derive from these two classes. The actual implementation is more complex than these examples, which are pared down to facilitate this discussion and illustrate the concept.

Example 4: The Persistent class.

  class Persistent     {
       // --- object state flags
       Bool changed, deleted, newobject, exists;
       // --- allows key constructors to associate with object
       static Persistent *thispers;
       // --- key indexes
       LinkedListHead Keys;
       // --- interface for derived class
       Bool LoadObject ();
       Bool SaveObject ();
       // --- provided by class
       virtual int ClassID () = 0;
       virtual void Write () = 0;
       virtual void Read () = 0;
       void ReadObject (void *buffer, int length);
       void WriteObject (void *buffer, int length);
  public:
  Persistent();                      // constructor

  virtual ~Persistent();                  // destructor
  // --- interface for user of derived class
  Bool AddObject();
  Bool DeleteObject();
  Bool ChangeObject();
  Bool ObjectExists() { return exists; }
  void AddKey (Key *key);
  void FirstObject();
  void LastObject();
  void NextObject();
       void PreviousObject();
  };

Example 5: The Key class.

  class Key : public LinkedList {
       int classid;
       int keyno;
  public:
       Key ();
       virtual ~Key ();
       virtual Bool operator> (Key& key) = 0;
       virtual Bool operator==(Key& key) = 0;
       virtual void Write (fstream& bfile) = 0;
       virtual void Read (fstream& bfile) = 0;
  };

The order in which constructors and destructors run is important to the way the persistent-object database works. When a program declares an object of type Employee, the constructors execute in this order:

Persistent::Persistent
emplno.Key::Key
emplno.EmployeeNo::EmployeeNo
deptno.Key::Key
deptno.DeptNo::DeptNo
Employee::Employee

This sequence supports the persistent-object database. The Persistent constructor stores the object's address (this) in the thispers pointer, which is a static global variable that the constructors of the objects of the Key class use to associate themselves with the Persistent object. They call the AddKey function through that pointer, which adds the key object to a linked list within the Persistent object. The constructor for the Employee object completes its construction, which presumably includes putting the initialized employee number into the EmployeeNo key member. Then the constructor calls the Persistent class's LoadObject function to find the object in the database.

When the object goes out of scope, the destructors execute in the reverse order. The Employee destructor, which executes first, calls the Persistent class's SaveObject function before it does any destruction. This action either saves the object if it is to be changed or added or deletes it if indicated.

The Key Class

Primary and secondary key classes derived from the Key class include functions to support the index mechanism. Besides the index-key values themselves, the derived classes supply overloaded relational operators so the index process can compare Key objects to one another and Read and Write functions to read and write the object's data values in the index file. Different keys may have different lengths, but the length of any particular key for a Persistent class must be fixed.

Physical Database Format

The Persistent class manages the persistent-object database. Besides the functional objectives of the database earlier, the persistent-object database must support:

Variable-length objects, even within the same class.
Fixed-length (per class) key indexes.
Automatic garbage collection when the program deletes an object or changes its size.
Two physical files per database: object file and indexes.

The object and index files use a common file mechanism that stores data in a linked list of fixed-length nodes. An object uses as many nodes as it needs to hold all its data. An index fills a node with keys. When the application deletes an object or when changes to an index release an index node, the node or nodes are added to a linked list of deleted nodes. The next process that needs a node takes it from this list.

The persistent-object database uses the B-tree algorithm to implement the object indexes. Each entry in the B-tree includes the class identification, the relative key number within the class, the key value, and the node address where the object is stored.

Copies of Objects

The Persistent class in Example 4, has no copy constructor or overloaded assignment operator. Neither should the classes you derive from it. Making copies of a persistent object has its perils. The system would not know which copy to save. Therefore, it must assure that only one copy of any particular persistent object is in memory at a time. A C++ program copies objects in several ways, two of which are the copy constructor and assignment. In a third way, the program can simply declare another instance of the same object. Rather than allow these processes to make copies of an object, the Persistent class must take measures to prevent it.

Although Example 4, does not show it, the Persistent class uses the handle/copy idiom to implement reference counting. The using program instantiates the object by declaring an instance of the handle class. The handle class contains a pointer to a copy class, which is the body of the object and which contains all the members. Only one instance of the copy class exists for any given object, regardless of the number of copies. The copy class also contains a count of the current number of copies. When the first copy of the class is constructed, the constructor uses the new operator to create a new copy object. Each new copy object gets the address of the copy class in its pointer, and the counter in the copy class gets incremented. When a handle class's destructor runs, it decrements the counter. When the counter is 0, the handle class deletes the copy object. The handle class provides the copy constructor and overloaded assignment operator. It also monitors for the presence of other instances of the object. Thanks to Jim Coplien for documenting this and many other good C++ techniques in his book, Advanced C++ Programming Styles and Idioms (Addison-Wesley, 1992). As mentioned earlier, a persistent-object database manager I've implemented is available electronically. For a complete discussion of the class library, refer to my book, C++ Database Development (MIS Press, 1992).



_PERSISTENT OBJECTS IN C++_
by Al Stevens


Example 1: A C persistent object.

struct Employee     {
     /* ... */
} Employee empl;
fread(&empl, sizeof(struct Employee), 1, dp);



Example 2: A persistent object class.


class Employee : public Persistent {
     EmployeeNo emplno;
     String *name;
     Department& dept;
     int promotion_ctr;
     Date *promotions;
public:
     virtual int SalaryReviewPeriod();
};



Example 3: A database schema

              // -------- key department number
              class DeptNo : public Key     {
                   int deptno;
                   // ...
              };
              // -------- department class
              class Department : public Persistent    {
                   DeptNo deptno;           // primary key
                   String *name;
                   // ...
              };
              // -------- key employee number
              class EmployeeNo : public Key {
                   int emplno;
                   // ...
              };
              // -------- Employee class
              class Employee : public Persistent {
                   EmployeeNo emplno;       // primary key
                   DeptNo deptno;           // secondary key
                   String *name;
                   Date date_hired;
                   Currency salary;
              public:
                   // ...
              };



Example 4: The Persistent class


              class Persistent    {
                   // --- object state flags
                   Bool changed, deleted, newobject, exists;
                   // --- allows key constructors to associate with          object
                   static Persistent *thispers;
                   // --- key indexes
                   LinkedListHead Keys;
                   // --- interface for derived class
                   Bool LoadObject();
                   Bool SaveObject();
                   // --- provided by class
                   virtual int ClassID() = 0;
                   virtual void Write() = 0;
                   virtual void Read() = 0;
                   void ReadObject(void *buffer, int length);
                   void WriteObject(void *buffer, int length);
              public:
                   Persistent();                     // constructor

                   virtual ~Persistent();                 // destructor

                   // --- interface for user of derived class
                   Bool AddObject();
                   Bool DeleteObject();
                   Bool ChangeObject();
                   Bool ObjectExists() { return exists; }
                   void AddKey(Key *key);
                   void FirstObject();
                   void LastObject();
                   void NextObject();
                   void PreviousObject();
              };



Example 5: The Key class

              class Key : public LinkedList {
                   int classid;
                   int keyno;
              public:
                   Key();
                   virtual ~Key();
                   virtual Bool operator>(Key& key) = 0;
                   virtual Bool operator==(Key& key) = 0;
                   virtual void Write(fstream& bfile) = 0;
                   virtual void Read(fstream& bfile) = 0;
              };

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

C/C++

Persistent Objects in C++

Nonpersistent C++

The Persistent-object Database

Objectives of the Persistent-object Database

The DBMS Connection

Persistent C++

Persistent C

Example 1: A C persistent object.

Persistent Wrinkles in C++

Example 2: A persistent-object class.

Some Solutions

The Persistent-object Database: Cooperation Among Classes

The Persistent-object Class Interface

Class Definition = Schema

Example 3: A database schema.

The Persistent Class

Example 4: The Persistent class.

Example 5: The Key class.

The Key Class

Physical Database Format

Copies of Objects

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

C/C++

Persistent Objects in C++

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content