Extending the Standard Template Library with Association Classes

By Eli Tilevich, September 01, 2001

These classes extend the STL container concepts to manage complex relationships among objects.

September 2001/Extending the Standard Template Library with Association Classes

Extending the Standard Template Library with Association Classes

Eli Tilevich

These classes extend the STL container concepts to manage complex relationships among objects.

Introduction

The STL (Standard Template Library) provides a large set of standard data structures called containers and many standard algorithms to operate on them. Using this library allows programmers to concentrate on the higher-level design without worrying about many implementation details when it comes to data structures and algorithms. Despite the abundance of different containers in the STL, representing associations between two application classes still can be quite challenging. Such an association is actually a relationship that exists between different kinds of objects. You may, for example, wish to create a data structure that represents all the employees who belong to a particular department, or that represents all the interconnections between nodes in a network. In this article I explain why such associations are difficult to represent using just the STL, and I describe a set of classes that enable them to be represented naturally. Of particular importance, these classes effectively decouple the objects that participate in such relationships.

The Problem of Representing Associations

Representing associations with STL containers is difficult because none of the containers, with the exception of std::map and std::multimap, are Pair Associative [1]. A Pair Associative container stores elements that consist of pairs of sub-elements; the other STL containers store elements of just one particular type (or of base-class pointers to polymorphic types).

Storing entities in containers that are not Pair Associative just groups them together, it does not represent any associations between two different classes of objects. For example, an STL vector of pointers to Employee objects provides little information about how objects of the Employee class relate to other kinds of objects in the program. In contrast, a Pair Associative container, such as a std::map<Employee *, Department *>, accomplishes two goals: it stores Employee and Department pointers, and shows that each Employee has an associated department.

The map and multimap classes, which are the only Pair Associative containers in the STL, quite often are not sufficient to represent all possible associations between application classes. There are three types of possible associations between objects of two classes: one-to-one, one-to-many, and many-to-many. One-to-one means that an object of the first type relates to one and only one object of the second type. For instance, for each Employee object there is one and only one unique ID object. One-to-many means that for each object of the first type there can be many objects of the second type but for each object of the second type there is one and only one object of the first type. For example, there can be many Employees in one Department but each Employee belongs to one and only one Department. And finally, many-to-many means that for each object of the first type there can be many objects of the second type and vice versa. For example, an Employee can be enrolled in many Courses and there can be many Employees enrolled in a particular Course.

An Unsatisfactory Solution

When it comes to representing such associations with the STL, there are several choices. Consider a common textbook problem in which there are a set of employees and a set of departments. I will show how this problem can be represented in a C++ program.

One approach is to use a non-Pair-Associative container inside an application class. Consider the following class declaration:

class Employee;
class Department{
   set <Employee *> _employees.
   typedef set 
    <Employee *>::iterator empl_iterator;

public:
   ...
   void addEmployee(const Employee* e);
   bool deleteEmployee
    (const Employee* e);
   empl_iterator beginEmployee();
   empl_iterator endEmployee();
};

This declaration implies a possible one-to-many association between classes Department and Employee. A close examination reveals several shortcomings of this approach. Suppose you need to access the set of Employee pointers (_employees) from outside the Department class. If this data member is made private, more methods will be required to access it. Besides, it is not clear what access methods should be provided. Providing a method that returns a reference to the container defeats the data encapsulation principle — it might as well be declared public. Instead, the class designer would probably want to provide methods for adding and deleting items from the container and searching and traversing the container itself. It becomes rather problematic to enforce a uniform coding convention in providing such methods.

Understanding this code is problematic as well. If this declaration is the only piece of information that exists, there is no way to tell whether it represents just a one-to-many association. For example, if some of the employees belong to multiple departments, we are dealing with a many-to-many association. In many cases, proper maintenance of this code is impossible without understanding what type of association may exist. The only way to find out what type of association may exist is to carefully study not just the implementation of the Employee and Department classes but also all their interactions with other classes. Finally, since all the data relationships are hidden inside each individual class declaration, writing code to iterate through the application’s data becomes possible only if we have previously studied each application class’s declaration.

Using Association Classes

A different approach is to use special-purpose association classes. Application classes no longer have to keep track of which associations they participate in. All associations are externalized and put in one place:

namespace DataOrg{

//each department has a unique ID
OneOne<Department*, int> Dept2ID;
typedef OneOne<Department*, 
  int>::iterator DeptIter;

//There are can be many employees in one
//department but each employee belongs 
//to only one department
OneMany<Department*, 
  Employee*> Employees;
typedef OneMany<Department*, 
  Employee*>::iterator EmplIterator;

//Each employee can participate in 
//several projects and each project can 
//have many employees
ManyMany<Employee*, Project*> Projects;
typedef ManyMany<Employee*, 
  Project*>::iterator1 EmplByProjIter;
typedef ManyMany<Employee*, 
  Project*>::iterator2 ProjByEmplIter;

}; //end DataOrg namespace

Partial definitions of the template classes OneOne, OneMany, and ManyMany are shown in Listing 1. This listing also shows the implementation of a few of the methods in the ManyMany class, including insert, begin, end, and operator++ of one of the nested iterator classes. These template classes are like STL containers in a couple of ways: they can store more than one element, and they can grow or shrink dynamically. Thus, the OneOne container can store any number of one-to-one associations. But the association classes are not just ordinary containers; for one thing, they provide more kinds of iterators. For example, given the many-to-many Projects container as defined above, it is possible to obtain an iterator that iterates over all the Employees involved in a particular Project; but it is also possible to iterate over all the Projects being worked on by a particular Employee.

Using these classes, suddenly all the data associations in the program become very clear, which makes the program easier to manage and understand. For example, if the requirements should change and employees can no longer share projects, the programmer can just change the type of the Projects association from ManyMany to OneMany. The data integrity requirement (making sure that there are no two employees sharing the same project) will be enforced by the OneMany class itself and there is no need to provide any additional code for that purpose. Since all class associations are now externalized, it is easy to add a new association to the program.

Consider how you would add logic to organize Projects by Departments using both the old and new approaches. Without the association classes, you would have to add a vector of Projects to the Department class. In addition to this, you would also have to add several methods to access projects, add them to the Department, and delete them. Using the new approach, you need only add another association:

OneMany<Department*, Project*> ProjsByDept;

The OneMany class already has all the methods for adding, deleting, and iterating projects for each department inserted into it. An interesting question arises: where should objects of such association classes be created? We could either make them to be members of the main application class or declare them global, depending on other design factors.

Design and Implementation Considerations

Later in the article I provide a few examples that show how standard general-purpose association classes can be useful. Regardless of how they are implemented, there are several design considerations that should be taken into account. One decision is whether to make those classes intrusive or not. By intrusive I mean that the pointers that are used to maintain the associations are stored inside application classes. There are several existing class libraries that take this approach. The DataObject Library from CodeFarms Inc. uses a code generator to put such pointers inside application classes. The Pattern Template Library from the same company uses template-based inheritance to do the same thing. The latter class library also provides a many-to-many data structure based on the intrusive approach.

The obvious advantage of using the intrusive approach is performance. For example, if each many in a OneMany association stores a pointer to its one, then the one associated with a particular many can be retrieved in constant time. Where no such back-pointer exists, finding the one associated with a particular many typically requires a search operation. The only drawback of the intrusive approach is that if the data organization of the program changes, the programmer must either rerun a code generator, or change the inheritance structure of the application classes.

Since the STL has become a de-facto industry standard (not to mention a part of the C++ Standard), I designed my relationship classes to extend the STL, and I wanted to make them as STL compatible as possible. My goal was that anyone familiar with the STL should be able to start using the three relationship classes right away. Since all the STL containers are non-intrusive, I have chosen a non-intrusive approach as well. I use the STL map and multimap classes to implement the internal structure of the relationship classes. map and multimap utilize a balanced search tree (specifically, a “red-black” tree), so all insertion, removal, and retrieval operations have logarithmic complexity. This is of course less efficient than the constant access complexity of intrusive data structures; however, for reasonably-sized data the difference is not significant.

The three association classes, OneOne, OneMany, and ManyMany, were designed to be extremely flexible to fit the general programming purpose. Each class provides a standard interface to insert and remove the data, as well as STL-style iterators to traverse through the stored data. The ManyMany class provides two types of iterators for each application class participating in the association.

I had to take particular care to make the ManyMany association flexible enough to be useful in different programming scenarios. The question of how to deal with many-to-many associations has been tackled by many OO designers. Rumbaugh et. al suggest that

the best approach is usually to implement the association as a distinct class, in which each instance represents one link and its attributes

This quotation introduces the notion of link attributes. A link attribute is a piece of data that applies only to the association between two entities, not to either entity in itself. For instance, in the case of a many-to-many relationship between Employee and Project, possible link attributes would be the hours put in, the start and end date, and some other information that applies only to a particular employee working on a particular project. A class to hold such data might be called EmplProj and the class would serve as a link attribute.

A problem arises because sometimes a link attribute is not needed. This implies the necessity of having two different internal data organizations for the ManyMany class. That’s where the advanced C++ feature called partial template specialization comes into play. The ManyMany class can be specialized in two different ways. If the link attribute is not used, the class should be specialized with the two application classes participating in the association, and possibly with the corresponding less-than predicates for each application class. (If the predicates are not specified, the default STL less predicate will be used.) If the link attribute is needed, then the specialization requires five parameters: the first two are the application classes followed by their corresponding less-than predicates, with the link attribute class at the end. In the ManyMany class implementation, the fifth template parameter is defaulted to type void. This generates the correct internal representation and the partial template specialization does the magic.

The ManyMany class provides two overloaded insert methods. The first two parameters corresponding to the elements stored in the associations are identical in both methods. The third parameter in the second insert method is the link attribute. Choosing an incorrect insert method that is not supported by the chosen specialization will result in a compiler error. The same holds true for the getAttribute method that returns a link attribute given two entities participating in the relationship. If the ManyMany class was not instantiated with a link attribute, the compiler is not be able to generate code for the getAttribute method and signals an error.

Unfortunately, not all C++ compilers support the partial template specialization feature. I have tested the sample code using egcs-1.1.2 running on both Solaris and Linux.

Examples

The following examples show a couple of simple uses for association classes. To get an idea how useful they are in practice, see the sidebar, “Field Experience.”

To use these association classes, you need only include the file Relations.h, partially shown in Listing 1. (The full source code is available on the CUJ website at <www.cuj.com/code>.)

Listing 2 (MotherChildren.C) demonstrates a typical use of the one-to-many association. This example if often referred to in database literature as an “obvious” case of such use: while a mother can have one or more children, each child has only one mother. The listing shows the definition of two functions besides main: insertChild and listChildren. The first one shows how the OneMany class enforces the integrity of the association and the second demonstrates a use of the class’s iterators.

This example demonstrate that the iterators provided for the association classes behave a little differently than STL iterators. Consider the following statement from the listChildren function defined in Listing 2:

OneMany<string, string>::const_iterator it =
   MotherChildren.begin(Mother),
   end = MotherChildren.end(Mother);

Unlike the begin and end functions in STL containers, these begin and end functions take a parameter. Here, begin returns an iterator to all children who have the mother represented by the string Mother. The end function returns an iterator that points one-past-the-end of this range of children.

Listing 3 (StudentClasses.C) shows a textbook example of the many-to-many association. Indeed a student can take many classes and there can be many students in one class. This listing shows how the ManyMany classes can be instantiated with and without the link attribute template parameter. In this example, the link attribute of the association is expressed via the means of the Record struct. This struct contains information that applies to a unique student-class association: the number of absences and the student’s grade for a given class. The listing also shows a typical use of the class’s iterators.

Conclusion

Using the special purpose association classes makes the resulting code easier to manage and understand. These classes can easily be ported to Java, which starting from JDK 1.2, supports the Collections framework with many useful Pair Associative data structures such as Map and HashMap. In addition, I would like to propose that the C++ standards committee consider adding relationship classes to the STL. I would not insist either on this particular implementation or even on the method interface that I have provided. However, I would like to make a point that such classes are generic enough and difficult enough to implement to be considered a useful addition to the STL.

Note

[1] The term Pair Associative comes from Matt Austern’s book, Generic Programming and the STL (Addison-Wesley, 1999).

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Extending the Standard Template Library with Association Classes