Object-Oriented Programming and Database Design

MAR88: OBJECT-ORIENTED PROGRAMMING AND DATABASE DESIGN

Dr. Jacob Stein has been a senior software engineer at Servio Logic Corp. of Beaverton, OR since 1984. He is also an adjunct assistant professor at the Oregon Graduate Center.

Is structured programming still relevant? In the early 1970's, E.W. Dijkstra introduced us to structured programming and Nicklaus Wirth introduced us to stepwise refinement. These programming methodologies were intended to foster a systematic and scientific approach to software development. That they were an improvement over previous methodologies, or the lack thereof, is not in doubt. What is in doubt is whether they are still germane to many applications being developed today. Structured programming appears to fall apart when applications exceed 100,000 or so lines of code. Programmer productivity falls to meet expectations and software maintenance becomes a nightmare. What we are left with is programmer fear: fear of side effects when fixing even minor bugs; fear of improving what already works, even if it does so poorly.

It may be that the complexity of the application itself is responsible for this fallure. On the other hand, our understanding of the application domain may be the limiting factor. Although the underlying models for banking and other conventional applications have been understood for millenia, the underlying models for new application areas such as computer-aided software engineering (CASE) and computer-integrated manufacturing (CIM) are as much a product of the process of application development as they are inputs to the process. Structured programming assumes that the application developer stands a reasonable chance of getting it right the first time, and that once the application has been successfully coded, the application domain will not change. For many applications being addressed today, neither of these is likely.

Object-oriented programming provides a paradigm suitable for these new application areas. Instead of looking at programs as algorithms and data structures, it looks at applications as the behavior of objects and the communication between objects. This communication takes place in the form of messages to which objects respond with suitable behavior. In addition, object-oriented programming addresses the vital need to validate abstractions before committing many man-years to a detalled implementation and the fact that application domains change with time. Diedrich and Milton refer to the style of development encouraged by object-oriented languages as "fearless programming."<fn1>

By organizing objects into classes, which encapsulate object behavior in a manner similar to abstract data types, and organizing classes into an inheritance hierarchy, object-oriented programming may finally provide the long sought alter sharing and reusability of code. The code that gives a data structure its meaning, its semantics, is no longer spread amongst disparate application programs. Perhaps one of the most frustrating aspects of trying to reuse code in a new application is that the old code almost, but not quite, fits the needs of the new application. In an object-oriented language, a new class can be created from an existing class by specifying only the differences between the existing and new class; all else is inherited by the new class from the existing class. PPI's belief in the encapsulation provided by object-oriented systems and lead them to trademark the term Software-IC.

Object-oriented languages tend to be memory-based and are geared toward supporting a single user or application. Objects either do not persist between program executions or are saved in a primitive manner that does not lend itself to sharing (for example, copyihg a program's image to disk). It would be difficult for a design layout tool and a design rule checker to share data, let alone run concurrently. Object-oriented data management systems rectity this situation by providing data management facilities. The trick here is to provide these facilities in a manner that provides flexibffity and merges well with the object-oriented paradigm.

AR Object-Oriented Model

This section outlines the OPAL programming language and GemStone data model developed by Seivio Logic Corp. Most of the constructs described here are present, in one form or another, in all object-oriented languages. The concepts of objects, messages, methods, classes, and class hierarchy are essential to all object-oriented models. You should beware of systems that claim to be object-oriented yet do not contain similar concepts.

The three principal concepts of the model and language are object, message, and class. They correspond roughly to record, procedure call, and record type in conventional systems. Table 1, page 22, gives some other correspondences.) An object is a chunk of private memory with a public interface. Objects communicate with other objects by passing messages, which are requests for the receiving object to change its state and/or return a result. The set of messages an object responds to is called its protocol (its "public interface"). An object may be inspected or changed only through its protocol. The means by which an object responds to a message is a method. A method is an OPAL procedure that is invoked when an object receives a particular message. So that each object need not carry around its own methods, objects with the same internal structure and methods are grouped into a class and are called instances of the class. The methods and structure are m a single object describing the class-the class-defining object (CDO)-and all instances of the class contain a reference to the class-defining object. Unlike some other object models, an object is an instance of exactly one class.

Table 1: Correspondences between concepts in GemStone and conventional systems


Gemstone                 Conventional

object                   record instance,
                         set instance

instance variable        field, attribute
instance variable        field type,
contraint                domain

message                  procedure call

method                   procedure body

class-defining           record type,
object                   relation scheme

class hierarchy          database scheme

class instance           record instance
                         tuple

collection class         set, relation

Every object has a unique identity. In OPAL, an object is identified by its object-oriented pointer (OOP). An object's identity is invariant: it remains the same regardless of changes in the object's state. OPAL does not support the become: message, which interchanges the identity of two objects. Smalltaik uses this message for screen manipulation and growing objects. GemStone's designers considered become: an inappropriate solution to the problems it addresses. The functionality it is used for is implemented in other ways.

Objects

Internally, most objects are divided into fields, called instance variables. Each instance variable can hold a value, which is usually a reference to another object. Figure 1, page 22, shows an Employee object with four instance variables: empName, ssNo, address, and salary. The empName instance variable holds an object that is an instance of the PersonName class. Not all objects are internally divided into instance variables. Certain basic types such as SmallInteger and Character are not further decomposed. These basic types are seif-identifying as they have a direct representation.

Figure 1: An Employee object and its four instance variables

         EMPLOYEE                     
                    
EmpName                PersonName         
              first    String               
                         Ray       
              last     String              
                         Ross       

ssNo                   Small Integer       
                         11122333        
                         
Address                StreetAddress         
              stNumber SmallInteger         
                         1055                 
              street   String              
                         Alameda       
              city     String               
                         Gresham       

Salary                 SmallInteger         
                         45558

The instance variables in the Employee object are called named instance variables. An object can have indexed instance variables, which can be viewed as instance variables with numbers instead of names. Indexed instance variables are used mainly for implementing ordered collections, such as arrays.

Messages

The basic form of all message expressions is <receiver> <message>. The <receiver> part is an identifier or expression denoting an object that receives and interprets the message. The <message> part gives the selector of the message and possibly arguments to the message. Every message returns a result to the sender, which is usually another object. There are three kinds of messages: unary, binary, and keyword.

Unary messages have no arguments and have selectors that are a single identifier. Assume that emp is a variable holding an Employee object. If there is a unary message firstName to retrieve the first name of the employee, then emp firstName returns a string that is the first name of emp.

Binary message expressions have a receiver, one argument, and a message selector that is one or two nonalphanumeric characters. To multiply 8 by 3, for example, you send to 8 the message "Multiply yourself by 3": 8 * 3. Here, * is the binary selector for the multiplication message. Comparisons are also handled with binary messages. In the following expression:

(emp1 salary) <= (emp2 salary)

the receiver and argument of the <= binary message are both results of unary message expressions.

Keyword messages have one or more arguments and have multipart selectors composed of alphanumeric characters and colons. To set the third component of an array held in anArray to 'Ross', you use:

anArray at: 3 put: 'Ross'

The same effect is accomplished in other languages by:

anArray[3] := 'Ross'

The message selector here has two parts--at: and put:-- because it takes two arguments: the array index and the object to be stored at that index. You refer to a message is referred to by the concatenation of its parts: at:put.

Methods

To construct a method you need to know what objects are visible within its scope. All the named instance variables of the receiver are available via their names. Thus, in a method for class Employee, as defined in Figure 1, the instance variables empName, ssNo, address, and salary are accessible.

A method may also have temporary variables, which are declared between vertical bars at the beginning of the method: .temp1 temp2.. Each user has one or more dictionaries of global variables, and those global variables can appear in methods. You need two other operators before you can write methods: the assignment operator (:=) and the operator which returns the value of the expression as the result of a method.

For a unary message wholeName in class PersonName that returns the first and last name concatenated with a space between, you can use the following method:

wholeName

  |temp|

  temp := first.

  temp := temp + '' + last.

  ^ temp

The first statement, temp := first, assigns the value of the instance variable first in the receiver (a PersonName object) to the temporary variable temp. The statement:

temp := temp + '' + last

concatenates the value of temp, a blank, and the contents of the instance variable last of the receiver (+ is the binary message for concatenation of strings). The result of the concatenations is assigned to temp. Finally, the statement temp returns the value of temp as the result of the message wholeName. (This particular message could be implemented with the single message expression ^ first + ' ' + last with no need for a temporary variable.)

A method can also change the state of the receiver, by assigning new values to instance variables. Suppose objects of class Department have a budget instance variable. The following method increases a department's budget by a certain percentage, up to a limit:

increaseBudgetBy: aPercentage upTo: aLimit

     budget := budget + (budget * (aPercentage / 100)).

     (budget > aLimit) ifTrue: [budget := aLimit].

     ^budget

The increaseBudgetBy:upTo: message takes two arguments, represented by variables aPercentage and aLimit. This method changes the budget instance variable of the Department object that receives the increaseBudgetBy:upTo: message. In the second line of the method, you see a keyword message ifTrue: that functions as a conditional control construct. The receiver of this message is a Boolean value. The argument of the message is a block. A block is a sequence of one or more OPAL statements within brackets and is a first-class object in GemStone. The effect of a message aBoolean ifTrue: aBlock is to perform the code in aBlock if aBoolean is true. The third line of the method returns the new value of budget.

OPAL includes messages for iterative control structures, using blocks with arguments. Block arguments are declared at the beginning of a block. For example:

     [:n.(2 * n) - 1]

is a block with one argument, n. For this block to be evaluated, it needs a value for the argument. The value: message provides the argument and causes execution. A block returns the value of its last expression when executed. Thus:

     [:n.(2 * n) - 1] value:7

returns 13. In general, given a value of N, this block returns the Nth odd number.

All methods you have seen so far have been defined in terms of other messages. The definitions of methods are not completely circular. At the bottom of everything are primitive methods. When the OPAL interpreter encounters a message that has a primitive method, it executes a piece of machine code rather than an OPAL method. Primitive methods exist for arithmetic, comparisons, object creation and copying, array selection, string manipulation, and set functions. The set of primitive methods in GemStone cannot be augmented by a GemStone programmer, although such an extension could be provided.

Classes

Every class is represented by a class-defining object that describes the structure and behavior of instances of the class as well as the position of the class in the class hierarchy. Any object will return the CDO for its class in response to the message class. Suppose the variable assoc holds an object of class Association. (Association is a key-value pair used in building dictionaries.) The assignment:

ac := assoc class

causes ac to be assigned the CDO for class Association. CDOs respond to messages just as all other objects do. For example, CDOs respond to the name message. The result of the expression assoc class name is #Association. (Symbols are indicated with a # as a prefix.)

The instance variable names and methods for instances of a class are stored in the CDO for the class. CDOs also store the methods that instances execute in response to various messages. When an object receives a message, it consults the CDO to find out how to execute that message.

OPAL provides a class hierarchy to exploit similarities in structure and behavior of entities. (The built-in hierarchy is shown in Figure 2, page 24.) A subclass inherits structure and behavior from its superclass. Structure is inherited in that all named instance variables in the superclass are also present in any subclass. Suppose you wanted objects that represented people's names with titles. You could create a subclass TitledName of PersonName, and instances of TitledName would automatically have instance variables first and last. You could add an instance variable, title, to TitledName to hold the title. (See Table 2, page 24.)

Figure 2: Kernel classes in the superclass hierarchy

Object
     Association
          SymbolAssociation
     Behavior
          Class
          MetaClass
     Boolean
     Collection
          SequenceableCollection
               Array

                    InvariantArray
                    Repository
               String
                    InvariantString
                         Symbol
          Bag
               Set
                    Dictionary
                         SymbolDictionary
                              LanguageDictionary
                    SymbolSet
                    UserProfileSet
     CompiledMethod
     Magnitude
          Character
          DateTime
          Number
               Float
               Fraction
               Integer
                    LargeNegativeInteger
                    LargePositiveInteger
                    SmallInteger
     MethodContext
          Block
               SelectionBlock
     Segment
     Stream
          PositionableStream
               ReadStream
               WriteStream
     System
     UndefinedObject
     UserProfile

[TABLE 2]

      Classes                    Instances
      PersonName                 Ray Ross
       |                          |
      TitledName                 Dr. Ray Ross
       |                          |
      Titled Name with Letters   Dr. Ray Ross, OBE

Class                    InstanceVariables     Messages       
PersonName               first                 first          
                         last                  first:          
                                               last           
                                               last:           
                                               fullName       

TitledName               (above) +             (above)+       
                         title                 titledName     

 TitledNameWithLetters   (above) +             (above) +      
                         letters               titledName     
                                               (new Method)

A subclass inherits methods from its superclass. Thus, if fullName is a message for PersonName, instances of TitledName respond to that message by using the method from PersonName. The lookup process to determine which method corresponds to a message starts in an object's class. If the message is not defined in that class, the search proceeds in the class's superciass, the superclass of that class, and so forth. Thus, the implementation of fullName can be overridden in TitledName if desired.

A subclass can implement messages of its own. For TitledName you might want a message that returns a string contalning the full name with title:

     titledName      ^title + '' + self fullName

A new feature in this example is sea which is a special variable whose value is always the receiver of the message.

A subclass can also reimplement an inherited message. Suppose you define a class TitledName WithLetters as a subclass of TitledName. This subclass adds an instance variable letters that holds the letters after a person's name, as in 'Dr. Ray Ross, OBE'. You can reimplement the titledName message in TitledName WithLetters to include the letters alter the name.

In OPAL, unlike some object-oriented languages, instance variables may be constrained to a kind of a class. An object is a kind of class foo if its class is foo or a subclass of foo. If instance variable personName in class Employee is constrained to PersonName, then in instances of Employee, personName could be an instance of PersonName, TitledName, or TitledName WithLetters. Constraints may also be placed on the elements of collection classes such as Bag and Set.

The Read-Write Boundary

In the same manner as conventional programming methodologies are not appropriate for today's applications, conventional data management systems are not appropriate for managing their data. Conventional data management systems are best at two things. One, they are good at batch processing large amounts of homogeneous data such as monthly credit-card billings. Two, they are good for high-transaction-rate applications such as ATM networks. The processing of data in applications such as CASE, cartography, and CIM fits neither of these two categories. The data in a cartographic database is not homogeneous, and transactions in a CASE system can take hours, or days, not just seconds.

Perhaps the best argument for integrating a data management system with an object-oriented language can be made by looking at the impedance mismatch across the interface between conventional application and database languages. Conventional application and database languages are incompatible both computationally and structurally.

Data manipulation languages use a read-write model of interaction with application code. Database calls are embedded in the application language and allow for fetching, inserting, deleting, and modifying records or tuples. Although query languages have matured, allowing for the definition of complex views through sophisticated queries, they are by no means complete, and views are difficult to update.

In any event, the communication between application and database is declarative, not procedural. The application tells the database what it wants, not how to get it. This works well when what the application wants can be specified declaritively and the database can figure out how to get it in an efficient manner. All too often, though, an application needs to intersperse several database calls amongst application code to perform what should be a single logical operation.

A similar problem arises with regard to maintaining the consistency of a database. Many database systems do not allow constraints any more sophisticated than requiring key values to be unique. The application program ends up with the responsibility of maintaining all other constraints on the data.

Consider an office management system in which several applications reserve meeting rooms. In such a system, each application would first check for the availability of the room. This might involve fetching records to check that the individual reserving the room is authorii:ed to do so and that the room has not already been reserved. The application would then post a record to indicate the reservation and perhaps post other records to appointment calendars. A change to the structure of the database, or the policy for reserving rooms, might require locating and modifying every application that modifies the database. In an object-oriented data management system, a message could be defined that performed all necessary checks and updates to the database. In the event that the policy for reserving a room changes, the method for reserving a room could be easily located and modified.

You might argue that in a conventional system the code for reserving a room could be factored into a single procedure--thus, achieving the same effect. In a conventional system, however, none of the applications is forced to use the procedure when making a reservation. The encapsulation provided by object-oriented data management allows the designer of a class to control access to data in instances of the class. By organizing the methods for a class along with the structural description of its instances, the impact of changes in structure are localized. This control and organization becomes more important as the size and complexity of applications grow: placing the data in one file and the code that determines its meaning into several other files no longer makes sense.

Two other negative aspects of the read-write model should be pointed out. One, a procedure may need to make numerous calls to the database. If a procedure needs to fetch 1,000 records, it may need to make 1,000 calls. In an object-oriented data management system, one message can take the place of many database operations. Two, conventional systems may be doing a lot of unnecessary copying. If a database is queried for employees who have been with a company for at least five years, the tuples corresponding to those employees are copied into the result. In an object-oriented system the result would contain only the identities of those employees.

The differences between data types supported by the application language and the data management language cause structural impedance across the read-write boundary. Although programming languages directly support arrays, most relational systems must encode them by adding a field to represent offsets. This encoding requires a translation when arrays are retrieved from, or stored to, the database. Union types and nested structures are difficult to represent using relations. Application developers are typically faced with two choices. One, they can normalize the representation so that each relation stores only one of the unioned types and only flat structures are stored. Two, they can store uninterpreted bit strings if they fit within the size limitations imposed by the relational system.

With the first choice, the application becomes dependent on the decomposition in order for the translation to work properly when storing and fetching. Furthermore, in the relational model, the properties of an entity must be sufficient to distinguish it from all other entities. For an employee tuple to reference a department tuple, there must be some fields in department tuples that uniquely and immutably identity departments. Using department names to identity department tuples is fine until a department's name changes. Maintaining the validity of foreign keys, such as a department name stored in an employees tuple, is known as referential integrity. Making up unique department numbers might solve the problem; however, this adds an attribute that may not be present in the world being modeled and is a burden on application developers. With the second choice, the database isn't providing any more functionality than a good file manangement system.

Suffice it to say that conventional programming languages manipulate data types by representing their structure in one place and distributing their semantics throughout the application code, whereas data managment systems, because of their limited repertoire, can only manage data structures. An object-oriented data management system combines object-oriented programming with an integrated data model to free programmers from the tedious and error-prone coding needed to accommodate a separate data management system that understands neither the execution model of the language nor the structures used by the language in representing its data types.

An Object-Oriented Data Management System

Arguing in favor of an object-oriented language that provides a common abstraction for data definition, data manipulation, and general computation is one thing; actually developing one is quite another. What follows is a discussion of object-oriented data management issues and how they are addressed by GemStone.

One of the guiding principles in GemStone's development was uniformity; all objects, be they temporaries or shared and disk-resident, should be accessed in a uniform manner. This principle helped address the issue of how objects persist. Does an application explicitly state that an object is to be persistent and must be explicitly deleted, or do objects exist as long as they are reachable from a special root collection of objects? Some folks argue that persistence through reachability is not practical for a disk-based system: garbage collection of disk-based objects incurs too much overhead.

Let's take a look at the implications of explicit deletion. In the first place, who decides when an object can safely be deleted? In a shared environment, in which many applications are using the data, this is not an easy question to answer. When an object is deleted, what is to be done with all the references to that object? The problem is extremely similar to that of referential integrity in relational databases.

Assume that it would be satisfactory to replace these references with references to a special object Nil. Either all the references are located and replaced in one operation, which corresponds to garbage collection, or all references are screened for deleted objects, which corresponds to incremental garbage collection. If references are screened, are all the instance variables of an object screened when the object is accessed, or are only those instance variables to which messages are sent screened? The former involves greater overhead, whereas the latter allows the propagation of references to a deleted object: an object can be assigned to an instance variable without sending messages.

Given that explicit deletion does not incur less computational overhead, is difficult to control in a shared environment, and is less transparent to applications, persistence through reachability was chosen for GemStone. Instances of class UserProfile are used to represent properties of each user, including a list of dictionaries to be used in resolving symbols when compiling OPAL code for that user. Any object that is reachable from a dictionary in any user's UserProfile is persistent. Dictionaries are also used to manage name spaces and sharing. When an identifier is encountered and it does not correspond to a temporary, instance, or class variable, the dictionaries are searched to find an object corresponding to that identifier. A user may have any number of dictionaries to accommodate various degrees of sharing.

Data Integrity

Three kinds of failure can compromise the integrity of a database: an application may fai to complete because of a run-time error; the processor managing secondary storage may fail; and the media used to store objects may fail. Protection against media failure is achieved by replicating objects on disk.

Failures of the first two kinds require the careful posting of an application's changes to the database. By ensuring that either all, or none, of an applications's changes are posted, inconsistent states of the database are avoided. This all-or-nothing form of posting changes is known as atomicity. Applications generally use commit and abort commands to define the changes that are to be atomically posted. When an application commits, an attempt is made to post all changes made by the application since the last commit or abort (you pretend that the first thing an application does is an abort). If all changes are successfully posted, then the commit succeeds; else the commit fails and the application is so informed. When an application aborts, all the changes made since the last abort or successful commit are thrown away.

The activity that occurs between an abort or commit and the succeeding abort or commit is known as a transaction. When a failure of the first two kinds occurs, it is treated as though the transaction had aborted. When the processor falls, a lot of fancy footwork is required to guarantee that all the needed information is safely on disk, but it works.

The two basic choices for implementing atomicity are logging and shadowing. With logging changes are posted direcfly to the database and logged. When a transaction aborts, the log is used to restore the database to its previous state. A successful commit allows the log to be thrown away and a new log started. With shadowing, changes to an object are posted against a copy (a shadow) of the object. When a transaction aborts, the shadow copies of objects are thrown away. The database is left alone, as it has not been modified. For a transaction to successfully commit, all modified objects must be carefully replaced with their shadow copy.

Concurreney Control

Any system that allows concurrent access to shared data must handle conflicting changes made by applications running concurrently. Either conflicting changes are prevented (pessimistic concurrency control), or conflicting changes are discarded (optimistic concurrency control). Pessimistic concurrency control requires that a transaction state its intention prior to accessing an object by acquiring either a read-lock or a write-lock. While a transaction holds a lock on an object, other transactions are prohibited from acquiring a lock that would allow conflict. Optimistic concurrency control allows a transaction to proceed as though it were running alone. At transaction commit, the changes made by the transaction are checked for conflict with other transactions. If a conflict is detected, the changes made by the transaction must be undone.

As you may have noticed, concurrency control and data integrity are strongly related. Pessimistic concurrency control is generally bundled with logging, optimistic concurrency control with shadowing. Conventional data management systems tend to use locking and logging. The advantage of optimism and shadowing is that transactions do not manage locks. The down side is that ar'bitrary amounts of work can be lost when conflict is detected at commit. The situation is reversed for locking and logging. One further disadvantage of locking is deadlock: two transactions are unable to complete until each gets a hold of a lock the other is holding.

For the initial implementation, optimism and shadowing were chosen. This choice was guided by a desire for uniformity. As GemStone supports general programming and data manipulation in a sing!e language, the objects in a GemStone database span the continuum between objects that correspond to values of variables in conventional programming languages and those that correspond to large design objects as may be found in VLSI databases. Conflict is unlikely on objects that correspond to program variables. For one thing, they are unlikely to be shared or persistent. Imposing pessimistic concurrency control on objects at this end of the continuum is an unnecessary burden.

Optimism and shadowing allow GemStone to provide each transaction with a private workspace, within which the activities of other transactions cannot affect the transaction's progress. At commit, privacy is abandoned, changes in the private workspace are made available to other transactions, and changes committed by other transactions become visible. Servio Logic Corp. realized, however, that in the case of large design objects, the amount of work that must be undone when conflict is detected may be unacceptable. They therefore began investigating supporting pessimism control in addition to optimism.<fn2> The goal was to make pessimism optional and, to as large a degree as possible, not of concern to applications that chose not to use it. Support for pessimism will be included in a forthcoming release.

Concurrency control is one of the areas in which object-oriented data management offers great promise. Recent research efforts in programming languages have explored the notion of behavior-based concurrency control. As a simple example, consider a savings account to which deposits are credited and withdrawals are debited. There is no intrinsic reason why two transactions that are both crediting a given account need conflict. Neither transaction is concerned with the final balance alter the credit. All they care about is that the proper amount is credited to the account; the order in which the amounts are credited is not of concern. Basically, each transaction adds the amount deposited to a list of credits to the account. The credits are processed in the order in which they were added to the list. That the credits were added to the list in the opposite order to that in which the two transactions committed has no effect upon the desired behavior. The read-write barrier of conventional systems prevents this form of concurrency as the database has no knowledge of the semantics of operations beyond read and write.

Large Objects and Large-Object Space

Although relational systems can support a large number of tuples, they generally do not allow tuples to be larger than a page. Object-oriented systems must support large objects as well as a large numbers of objects. If large objects are not supported, application developers will have to encode large objects into objects no larger than those supported by the system. A GemStone system supports 2³¹ objects (2³² counting instances of SmallInteger), and an object can contain 2³¹ instance variables.

When an object is, or grows, larger than a page, it is broken into pieces and is no longer stored contiguously. Large objects can be accessed and updated without bringing the entire objects into memory. Large objects can also grow and shrink without copying the entire object.

The basic data formats provided by an object-oriented system must support reasonably direct and efficient implementations of user-defined classes. Although relational systems support both records (tuples) and sets (relations), arrays must be encoded. A data management system must support all three.

The underlying basic storage formats must in turn efficiently support the basic data formats. GemStone supports five basic storage formats--self-identifying (for example, Smalllnteger, Character, byte (for example, String), named, indexed, and nonsequenceable collections. The byte format is used for classes whose instances may be considered unstructured. Structure is imposed by the methods that operate on the objects. The indexed format supports access to components of an object by integers, as in instances of Array. The byte and indexed formats support, without copying changes in the size of an object. The named format supports access by instance variable names. Classes whose instances have both named and indexed instance variables are supported by a hybrid format. The nonsequenceable collection (NSC( format supports collection classes such as Bag and Set. The members of such collections are not identified by name or index; instead, collections can be have members added, removed, or enumerated, and efficienlly support union, intersection, difference and tests for membership.

Two other areas that need to be addressed by data management systems are clustering and associative access. Clustering is the placement of objects that tend to be accessed together near each other on the disk. The objects are placed on as few, preferably contiguous, pages as possible. By so placing the objects on disk, fewer disk accesses are required to bring these objects into memory.

Consider the Employee object of Figure 1. If the clusterDepthFirst message were sent to the object, its layout on disk would be as in Figure 3, page 30. Since the ssNo, stNumber and Salary instance variables are selfidentifying, they are directly represented in their containing objects and need not be clustered. Now consider a collection of Employee objects. If enumeration of the elements of the collection occurs frequently, the elements can be clustered together. If these enumerations tend not to access the addresses of employees, it may be desirable to omit addresses from the clustering. The layout on disk of the employees would be as in Figure 4, page 30. By omitting addresses from the clustering, employees and their frequently accessed instance variables can be enumerated with even fewer disk accesses. OPAL's clustering protocol is flexible enough to allow such clustering.

Figure 3: Clustering of an Employee object

Employee
  Person Name   String   String   Street  Address   String   String

Figure 4: Clustering of Employee objects omitting their addresses

Employee  Person Name  String  String  Employee  Person Name  String  String

An associative access is a search of a collection based upon the internal state of the collection's elements--for example, a collection of employees can be associatively accessed for employees who live on a certain street. Even with clustering, searching large collections by a sequential scan may yield unacceptable performance. Associative accesses should take time that is no more than logarithmic on the size of the collection. Hashing would allow the associative access of an employee with a particular social security number in nearly constant time, regardless of the size of the collection being searched. B-tree indexes support associative access in time that is logarithmic on the size of the collection and efficiently support range queries, such as locating those employees whose salary is within a given range.

There are many issues with regard to associative access in object-oriented systems.<fn3> For example, should indexes index all instances of a class or only instances of explicit collections? How is the use of indexes to be indicated? Should indexes be based upon the structure or protocol of objects? If by structure, should indexes be on identity or value? GemStone supports B-tree indexes into explicit collections using the structure of objects. Both identity- and value-based indexing is supported. A limited calculus sublanguage is provided to make use of indexes. The language was constructed so that associative queries may be viewed as procedural OPAL code.

State of the Technology

Object-oriented languages and data managment are emerging technologies. The first commercially available data management systems have only recently arrived on the scene. Deciding when and if to jump on the bandwagon is difficult. Strong interest in object-oriented systems, as indicated by the Conference on Object-Oriented Systems, Languages and Applications (OOPSLA) becoming ACM's third largest conference in just two years, is not sufficient. Other promising technologies of the past have failed to yield their expected benefits.

If, however, you are encountering the frustration with structured programming I discussed in the introduction, you might just browse through the proceedings of the OOPSLA conferences<fn4><fn5> to see what advances are being made and the kinds of applications being developed using object-oriented systems. You may well be surprised at the siguificant applications being developed with object-oriented technology. GemStone, for example, is being used by an agency of the U.S. government to develop a new, automated coastal chart production and maintenance system. In this system, GemStone manages both the static feature information and the procedural knowledge to render that information into readable and reliable charts. At another U.S. government facility, dedicated to research in CIM, GemStone is being used to manage information flow between the many, often incompatible, systems involved in computer-integrated manufacturing.

You may find that object-oriented technology is more mature than you thought. You might start asking hard questions about performance and how to compare object-oriented systems. Many argue that part of the price to be paid for the benefits of object-oriented systems is an apparent increase in the consumption of machine resources. It may well be the case, however, that many applications can be developed using today's technology and still yield adequate performance. After all, an application that can be developed quickly, that is easily maintained, and whose performance is adequate is often preferable to missed production deadlines, buggy code, and blazing performance.

In the same manner as relational systems have markedly improved their performance over the last decade, so will object-oriented systems. Hardware will continue to get faster and cheaper. If performance isn't quite adequate on the hardware you are running today, it will be tomorrow.

Convinced? Want one? Which one? That may be a hard decision. For one thing, there is no single object model. In some models the collection of all instances of a class is meaningful; in others it isn't. Some models support explicit deletion; others don't. There are object-oriented extensions to C. Will these C extensions perform better than systems whose heritage is Smalltalk? If so, there may be a price, such as the slower development and more difficult maintenance implied by a compile-and-link methodology. Perhaps what you really need is the EXODUS extensible database system being developed at the University of Wisconsin or the POSTGRES system being developed by Stonebraker and crew at Berkeley.

What about benchmarks? A good set of benchmarks would make comparison shopping easier. There are problems too numerous to mention here. It's difficult enough developing benchmarks for relational systems or any given programming language. It's even harder for systems that integrate different language extensions with different object models.

See you at OOPSLA.

This article was condensed by the author from Development and Implementation of an Object-Oriented DBMS by David Maier and Jacob Stein.

Notes

1. J. Diedrich and J. Milton, "Experimental Prototyping in Smalltalk," IEEE Software, vol. 4, no. 3 (May 1987).

2. D. Maier and J. Stein, "Indexing in an Object-Oriented DBMS," Proc. International Workshop on Object-Oriented Database Systems (Asilomar, Calif.: IEEE Computer Society Press, September 1986).

3. D. J. Penney, J. Stein, and D. Maier, "Is the Disk Half Full or Half Empty?: Combining Optimistic and Pessimistic Concurrency Mechanisms in a Shared, Persistent Object Base," Proc. Workshop on Persistent Object Stores (Appin, Scotland, August 1987).

4. Proc. of ACM Object-Oriented Programming Systems, Languages and Applications Conference (OOPSLA-86) (Portland, Oreg., October 1986.) Also published as ACM SiGPLAN Notices vol. 21, no. 11 (November 1986).

5. Proc. of ACM Object-Oriented Programming Systems, languages and Applications Conference (OOPSLA-87) (Orlando, Fla., October 1987). Also published as ACM SIGPLAN Notices, vol. 22, no. 12 (December 1987).

Object-Oriented Programming and Database Design

AR Object-Oriented Model

Table 1: Correspondences between concepts in GemStone and conventional systems

Objects

Figure 1: An Employee object and its four instance variables

Messages

Methods

Classes

Figure 2: Kernel classes in the superclass hierarchy

[TABLE 2]

The Read-Write Boundary

An Object-Oriented Data Management System

Data Integrity

Concurreney Control

Large Objects and Large-Object Space

Figure 3: Clustering of an Employee object

Figure 4: Clustering of Employee objects omitting their addresses

State of the Technology

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Object-Oriented Programming and Database Design

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content