Associative Chains in C++

By Phillip Bruce, February 01, 2006

Phillip gets around some of the limitations of the pointer-to-member with a model of class associations that he calls "Associative Chains."

February, 2006: Associative Chains in C++

Phillip Bruce is a self-employed programmer focusing on artificial intelligence, multimedia, and database systems. He can be contacted at [email protected].

In C++, the pointer-to-member lets you obtain a certain member of any object of a given data type. However, it doesn't immediately allow the expression of other useful class associations (for example, the member of an aggregate) or relationships of higher order (member of a member of a member). In this article, I consider an extension of the pointer-to-member that lets you dynamically compose and dereference heterogeneous associations from a source class to a target class. To demonstrate the effectiveness of this idea, I show its use in a common UI abstraction model. (The complete source code for an associative chain and scheduler is available at http://www.cuj.com/.)

The Pointer-to-Member And Its Limitations

The well-known C++ pointer-to-member [1] describes a class association from a source (that is, the class itself) to a target (the class member). Furthermore, it allows traversal from objects of the source type to the corresponding target objects by way of a dereferencing operation. For example, in a resource-scheduling system, I might define the reservation of a resource like Listing 1(a). t then denotes the containment of resource_ type in Reservation, and for any object of type Reservation (r in this case), I can dereference its resource type using the syntax shown.

While the pointer-to-member is useful, it is also limited. For starters, there are many interesting relationships besides class membership. For example, the Reservation class likely exists in a relational model, where user_id specifies users by their id attribute. The pointer-to-member obviously cannot describe this relationship.

Second, the pointer-to-member is not immediately extendable to relations of higher order. Suppose I were to resolve the user directly within the reservation (inviting a data integrity mishap, by the way) as in Listing 1(b). Now, even though the pointer-to-member is able to dereference the user, it is unable to further dereference a member of the user (for example, dereference the name of the reserver given their reservation). This limitation is not something inherent in the class model—the user's name is indeed accessible from Reservation_Hack—but rather the inability to easily compose two pointer-to-member variables.

I conclude, then, that the pointer-to-member lacks the generality and extension necessary to be a truly useful programming tool, even as it gives tantalizing intimations at the concepts of associations and links in UML [2]. What I should really like is an extension of the pointer-to-member that permits more complex class associations.

Associative Chains

Weighing the strengths and weaknesses of the pointer-to-member, I tried to develop a better model of class associations with three goals in mind:

The expression of most any association from a source to a target class.
The ability to compose associations of compatible types.
Well-defined dereferencing operation (that produces the corresponding target object for any object of the source type).

From these criteria, I developed what I call "associative chains." "Associative" comes from UML nomenclature, where an association models relations between classes. "Chain," of course, indicates that I want to compose the associations. If you recall the problem of directly accessing the user's name given his reservation, you should have no difficulty in interpreting the code Listing 2(a).

In this example, observe that Chain is a templatized class whose parameters indicate the source and target types of the association. Appropriately enough, in this example the source has type Reservation and the target std::string. You can also see that the chain rc comprises three simpler associations: The first goes from the reservation to the user's ID, the second from the user's ID to the user (presumably by some database lookup), and the final one from the user to the user's name. Dereferencing is done by applying the traverse() method to the source object.

While the associative chains evidence the type for the source and target, they leave the type and number of intermediate classes unspecified. As such, the chain rc could resolve the user by a database access or class-member lookup; it could contain three links or 30. These details are occulted by the type.

Design and Implementation

To implement associative chains, I used the class hierarchy in Figure 1. The abstract class Link in Listing 3(a) represents an association from one class to another, with the exact types shrouded by void *. It enables the Chain to manipulate Typed_Link objects without regard to the latter's type parameters. As such, it's basically just another manifestation of the familiar pimpl idiom [3]. However, its typeless interface makes it most unsuitable for general use. Instead, you use the templatized Typed_Link, which clearly manifests the source and target classes.

The v_traverse() and traverse() methods of these classes are the respective untyped and typed dereferencing operations. The get_size() method indicates the number of indivisible links spanned by an association, which obviously can only be greater than one in a chain proper. The clone() method lets the chain copy links without knowing their type.

To create a new association (for example, one that takes an integer with a record from a table in a database), you simply subclass the Typed_Link class with the appropriate template parameters and implement the traverse() and clone() methods.

The task of catenating links is done by the appropriately named Chain class template; see Listing 3(b). It is a list of privately managed link objects—which themselves may be chains!—with some special constructors that simplify the initialization syntax for the programmer and inductively guarantee the type correctness of the chain. The implementation of the get_size() method is simple enough—it computes the total size recursively. The chain class also implements the traverse() method by passing the source object through each link in its list. Finally, I have implemented an addition operator for Typed_Link objects that collaborates with the chain constructors to allow the handy construction syntax previously presented.

Some Simple Links

To gainfully employ the chain, I must of course implement some typed links. For starters, here's the familiar pointer-to-member; see Listing 4(a). This is straightforward, and yet in conjunction with the chain, it already allows me the higher order associations sought at the onset.

Now what if the member is publicly accessible only through a get() method? Well, you could try rewriting Member_Link to accomodate member functions and then prepend it to some evaluator that returns the appropriate value. However, it's much cleaner if you simply condense the two operations into a single Get_Link template; see Listing 4(b). (Here, the template parameter Get lets me adjust the type of the get() method.)

More complicated than class members are aggregate members. For example, to resolve a C array, you could write something like Listing 4(c). Array_Link's traverse() method performs some rudimentary bounds checking and throws an exception in the case of failure. Unfortunately, many C-style programs use pointers and zero-length arrays with an accompanying size variable. To safely check the bounds for such arrays, I need to introduce an auxiliary class that contains the first element and size of the array. (I leave the details for the prodigious reader.)

Clanks and Tangential Reference

Looking back at my goals for the associative chains, you will find that I wanted a well-defined dereferencing operation for all objects of the source type. Unfortunately, many interesting associations simply aren't defined for every object of a given type. As seen in the previous section, accessing the sixth element of a variable-length array is only sensible for such arrays having six or more elements. On the other hand, it is not really convenient to specify array-of-size-six-or-more as the source type. As such, links throw a Clank (Listing 5) whenever dereferencing fails.

To identify the source of failure, the Clank contains a handle for the source object at the link that failed. Now, the handle is not especially interesting. It does what most any handle does: Present a void pointer that may only be compared with others. Nevertheless, this handle also uses RTTI, which might cause some puzzlement. A simple pointer, though, may cause serious misinterpretations without the accompanying type information. To see why, consider the first member of an object. It probably has the same address as its containing object, but what does that address refer to? Because I am interested in this type of containment association, I cannot allow this sort of confusion.

The Clank further contains a distance attribute, which indicates the number of objects successfully dereferenced prior to failure. This concept of distance turns out to be important in measuring an interesting property that I call "tangential reference." Basically, even though a given object might not be the target of a chain for a given source, I might still want to know if the object gets referenced during chain traversal. Moreover, I might want to know at what distance from the source this tangential reference occurs. This distance may be found using the chain's get_distance() method, which basically traverses the chain starting from a given source until it finds the specified handle. It then reports the number of simple links traversed, returning -1 if the handle was never dereferenced. Measuring the distance of tangential reference is a powerful way for determining which object in a given collection is closest to the target.

Practical Application

With the associative chains explained, I return to the resource-scheduling system and to see how they might be useful. In this application, the user requests a batch of reservations for a particular event (for example, a projector, a laptop, and a meeting room for a product demonstration). Programmatically, I might model this operation as in Listing 6(a), defining the function schedule() to transform a request into a list of reservations. In many cases, schedule() may fail in its attempt to make the reservations because:

A requisite specifies an invalid time range.
The user doesn't have permission to reserve.
The matching algorithm couldn't satisfy all of the requisites (though maybe it came close).
The user is simultaneously scheduled for another event.

Ideally, the user interface should do something useful or informative in the case of failure (for example, highlight the requisite that was not scheduled), but given the complexity of failure, the user interface can't really do much with the lone integer that schedule() returns. On the other hand, schedule() doesn't seem a likely location for UI code. Using associative chains, though, I can devise a general solution to this kind of problem that maximizes the effectiveness of the UI with minimal impact to the functional core.

For starters, I add an Error class in Listing 6(b) that contains a description of the problem and an associative chain that directs me to the object deemed the protagonist of the failure. In the functional core (that is, schedule() and its helpers), I replace the integral return values with Error exceptions, always using some parameter of the function as the source of the Error. In cases where a function has several parameters of the same type, the Error's source attribute indicates which one is the culprit. In this environment, error handling at the UI level looks like Listing 6(c). So with this error handler in place, I can easily do things such as highlight the specific UI artifact responsible error of type Error<Request, User>.

What happens, though, if the scheduler throws an exception for a target type the UI programmer did not anticipate? For example, the cause of the error could be a low-level object not recognized in the UI layers. Ideally, the schedule() function tries to present the errors at a level that the caller understands, but there is no way for schedule() to know who its caller might be. For such cases, I added Handle_Link for absconding a chain's target type behind a handle; see Listing 6(d). I also added the Error_Base class with a single template parameter for the chain's source type. Now, all Error objects are also Error_Base objects and have a method to obtain their respective chains with an appended Handle_Link. So it is now possible to write "catch-all" handlers for each parameter of the scheduler function.

Using tangential reference, I can greatly improve even this. In the exception handlers for errors of type Error_Base, I don't know much about the target. However, I can measure the distance of the tangential reference among the members of the request object to see which member is the closest to the point of error. What's more, I can compare objects associated with the request in hopes of an even closer match. Now, taking the object with the maximal distance—and hence, the minimal distance to the target—I can activate the UI artifact appropriate for that object, and thus guide the user in the resolution of the problem; see Listing 6(e). In fact, with some care, I can automate this sort of classification and response without really having to understand the underlying cause of the error.

Improving Performance

While the associative chain may be useful, the implementation I have presented also has some glaring inefficiencies:

~2n virtual function lookups to traverse a chain of n links.
O(n^2) heap allocations for piecewise construction of chain.

While the virtual function lookups are pretty much unavoidable, there are several ways to get heap allocations down to an acceptable O(n):

A special typelist-style class [4] that is used as a parameter to the chain constructor.
Incorporating chain subsumption, whereby catenation operands let the newly created object take their links.

Of course, if heap allocation bothers you, consider replacing the link with a locally allocated, discriminated union [5] of typed links. (In fact, the main reason I didn't do this in the first place was to avoid a digression about discriminated unions.)

Conclusion

Having presented the associative chain, perhaps I can best conclude by contrasting them with some other familiar programming paradigms, in particular decoupling patterns such as facades and mediators [6]. Unlike many of these patterns, which are statically modeled in classes, associative chains facilitate dynamically built relations not easily enumerated at a single point in the program. This explains why the association chain worked well in the example: The variety of errors was such that it was not really practical to explicitly enumerate them; moreover, the objects that caused the errors at low level might be temporary objects that could not be returned. Thus, I have a handy decoupling pattern for complex interactions between two classes. (And as for the pointer-to-member that initiated this inquiry, I recommend the Member_Link in its place.)

References

[1] Stroustroup, B. C++ Programming Language, Third Edition, Addison-Wesley, 1997.
[2] Object Management Group, "OMG Unified Modeling Language Specification," Version 1.5, March 2003.
[3] Sutter, H. Exceptional C++, Addison-Wesley, 2000.
[4] Alexandrescu, A. "Typelists and Applications." C/C++ Users Journal, February 2002.
[5] Alexandrescu, A. "Discriminated Unions." C/C++ Users Journal, April 2002.
[6] Gamma, E. et al. Design Patterns: Elements of Reusable Object-Oriented Software, Addison-Wesley, 1995.

1 2 3 4 5 6 7 8 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Associative Chains in C++