Dr. Dobb's | Reusable Associations

Reusable Associations

Our authors take the Model-Driven Development (MDD) concept of reusable associations and implement them for C++, C#, and Java.

October 10, 2007
URL:http://www.drdobbs.com/architecture-and-design/reusable-associations/202401093

Martin is a Design Manager at Nortel. Jiri is a cofounder of Cadence Design Systems and president of Code Farms. You can contact them at [email protected]

The best way to explain this problem is with a real-world example. We chose the computer-aided design of silicon chips (VLSI circuits) because it involves complex data organization with only a few classes. A fully functional computer can be on a single chip, and the data to store and traverse are huge—millions of objects. Figure 1 explains the engineering concepts of building these chips.

If you use standard class libraries, the initial code may look like Listing One, where ChipLib is a library storing both complete chips and partially designed blocks. ChipLib also keeps frequently needed basic geometries such as contacts. Associations among classes in Listing One are represented by collections, pointers, or by a combination of both—one relation may possibly require members in several classes. For example, the relation between Block and Terminal requires the two members shown in red.

[Click image to view at full size]

Figure 1: A silicon chip is designed from blocks, which are hierarchically composed from smaller blocks and so on. This is a conceptual diagram, not a real circuit. Pins are points to which the outside signal nets may be attached. A set of pins connected together inside a block is a logical entity called "terminal" (shown as a dashed line). A terminal often has only one pin. The nets are formed by wires (rectangles assigned to one of the layers; here blue or red) and by contacts (geometries that connect several layers). The two parts of net1 are connected through one of the BLK blocks.

class ChipLib {     
  Collection<Master> masters;
  Master *chip;    
  Collection<Geometry> geometries;
};
class Master {    
  String name;
  int xWidth,yWidth; 
  Collection<Block> blocks; 
  Collection<Net> nets;    
};
class Block {
  String name;
  int x,y;             
  int orientation;     
  Master *master;      
  Collection<Terminal> termsByBlock; 
};
class Net {
  String name;
  Collection<Connector> connectors;   
  Collection<Terminal> termsByNet;  
  Collection<Pin> pins; 
  Master *master;           
};
class Terminal {
  Block *block;   
  Net *net;       
  Net *masterNet; 
};
 ... and so on

Listing One

The problem with this code is that the associations are buried inside the class definitions; and when reading the code, the purpose of these members is obscured. Worse, in spite of the formal similarity to how associations are implemented, the green members in classes Master and Block do not belong to the same relation. Member Block::master points to the master of the block, not to the master in which the block is used. This style of implementing associations, in some situations, prohibits derivation of the UML class diagram from the code automatically—additional information such as variable names, written documentation, or comments would have to be included.

Today, you typically start with a UML class diagram describing classes and the associations among them (Figure 2). UML class diagrams are popular because they provide a network representation of a problem, which is difficult to understand without a picture, and because associations used in the UML diagram are more powerful than collections in existing class libraries. UML works with a number of different views, each using a different style of diagram. When we say "UML diagram" in this article, we refer to the most popular of these—the class diagram.

[Click image to view at full size]

Figure 2: UML class diagram of the data supporting the hierarchical VLSI chip design. Compare with Listing One where each association is implemented in a different color with the exception of green (blocks and master here).

If we could expand existing libraries such as STL or Java Collections to include associations, it would improve the existing software design methodology in several ways:

It would force us to think in more general terms—in associations instead of collections and individual references/pointers.
MDD code generators would be simpler because there would be a one-to-one match between the associations in the diagram and those in the code.
For the same reason, UML diagram generators would be easy to code and always safe to use.

Collections deal with only two classes where one of them controls the other, while associations involve two or more cooperating classes that may know (and access) each other. Adding a collection requires an addition to the controlling class only but adding an association may require additions to several participating classes (Listing Two). The Java implementation would be identical, except for references instead of pointers.

// LIBRARY OF ASSOCIATIONS:
 template<class Parent,class Child> class ParentAggregate {
     Collection<Child> children;
 };
  template<class Parent,class Child> class ChildAggregate {
     Parent *parent;
 };
 template<class Source,class Link,class Target> class SourceXtoX {
     Collection<Link> links;
 };
 template< class Source,class Link,class Target > class LinkXtoX {
     Source *source;
     Target *target;
 };
 template<class Source,class Link,class Target> class TargetXtoX {
     Collection<Link> links;
 };
 // APPLICATION CODE
 class Master {
     ParentAggregate<Master,Net> nets;
     ...
 };
 class Net {
     ChildAggregate<Master,Net> nets;
     TargetXtoX<Block,Terminal,Net> blockNets;
     ...
 };
 class Block {
     SourceXtoX<Block,Terminal,Net> blockNets;
     ...
 };
 class Terminal {
     LinkXtoX<Block,Terminal,Net> blockNets;
     ...
 };

Listing Two

We find it logical and convenient to use the association names, such as nets or blockNets, for the inserted member. When an application class participates in several associations (such as Net in Listing Two), several members are inserted, one for each association.

Once popular, pointer-based data structures have been neglected and are completely missing from existing class libraries. Like associations, intrusive data structures (IDS) require coordinated insertion of members into participating classes. For example, collection masters in Listing One may be implemented as an intrusive linked list:


class ChipLib {
    Master *masters;
    ...
};
class Master {
    Master *next;
    ...
};

Besides being more efficient than array-based collections (faster to traverse, smaller footprint), when such lists are implemented with rings instead of NULL ending lists, they provide effective runtime protection of data integrity. As demonstrated by the IN_CODE library (www.codefarms.com/products.htm), one-to-one, one-to-many, and many-to-many associations can be implemented in the intrusive style. On the other hand, all the IDS the authors know implement one association or another. Perhaps there are some IDS that are not associations, but for a library of generic associations, we should use a mechanism that would support any IDS as well.

Structural design patterns are data structures that combine associations with inheritance. In Figure 2, classes Connector, Geometry, Wire, and Contact form a pattern Composite that allows complex hierarchical designs from several different types.

This pattern is different from the commonly used association called "composition aggregate," which is simply a specific implementation of OneToMany. An example of a composition aggregate is the relation between Master and Block. Each block belongs to only one master and ceases to exist if its master is destroyed.

Popularity Cycle of Graphical Tools

We have observed an interesting cycle. Programmers write programs that grow bigger and more complex until their own authors cannot debug or modify them safely. Graphical tools become popular to manage the complexity, but then a new programming language or new programming paradigm is invented, programmers return to the more compact textual programming, and a new cycle begins.

In "The Inevitable Cycle: Graphical Tools and Programming Paradigms" (IEEE Computer, August 2007; www.codefarms.com/OOPSLA07/workshop/cycles.doc), we describe three historical cycles:

Flow charts were used to manage spaghetti logic in the code but were eliminated by structured programming.
Diagrams of table indices helped to manage Fortran programs but were eliminated by the introduction of C structures and pointers.
Pointer diagrams representing C data structures were eliminated by the introduction of class libraries.

Software complexity is a serious problem and the prevalent use of graphical tools including UML fits this observation. If this popularity cycle really exists, it implies that the arrival of a new programming technique is imminent and a new paradigm will eliminate or reduce the use of UML class diagrams. The new paradigm would have to include the major improvements that UML gave us:

Instead of programming with collections, UML uses more general associations.
Associations and classes are both treated as first-class entities.

There is more to UML than just the UML class diagrams though, and the new paradigm would also have to cover those other areas to provide the same utility as all of UML.

A Matter of Control

Most associations need an iterator class, and these iterators can be implemented in the same style as collection iterators. However, when implementing associations, reusable or not, the question is where to keep the methods, which operate on the association.

The commonly used practice today is to add individual methods to classes where the implementation of the method is easiest or seems natural:

void SourceXtoX::    add(Terminal *t,Net *n);
void LinkXtoX::remove();

// USING THE METHODS
Block *b; Terminal *t; Net *n;
b->blockNets.add(t,n);
t->blockNets.remove();

This may look elegant, but it adds to the problem of having parts of the associations scattered throughout the classes and no guidance on where to find them.

The second possibility is to select one class, preferably the "main" or "most important" class of the association, and keep all the methods in it. In the case of ManyToMany, there isn't any clear difference between the Source and Target, which mirror each other. Let's choose the Source:


void SourceXtoX::    add(Terminal *t,Net *n);
void SourceXtoX::    remove(Target *t);
// USING THE METHODS
Block *b; Terminal *t; Net *n; 
b->blockNets.add(t,n);
b->blockNets.remove(t);

This interface is similar to the interface we use for collections today.

Alternatively, for each association, we introduce a new class just to keep the association methods (Listing Three, available online; see "Resource Center," page 5). This association control class (ACC) is a completely different concept from the UML association class (UAC) commonly used today. ACC provides the interface for the association, while UAC represents the data or state associated with the association.

Note the paradigm shift. The user interface commonly used today:

    
     b->termByBlock.add(t,n);

reads "for Block b and the collection termByBlock on it, add a link that connects it (meaning b) through t to n," with emphasis on b.

The new centralized control has a different order of parameters:

          
     blockNets::add(b,t,n);

reads "for association blockNets, add a link from b through t to n," with emphasis on the association.

Data Hiding

If we use centralized control, only the control class needs access to the inserted data. In C++, this can be easily arranged by adding friend statements:


class Net {
  friend class nets;
  friend class blockNets;
  ChildAggregate     <Master,Net> nets;
  TargetXtoX     <Block,Terminal,Net> blockNets;
  ...
};

In Java, the inserted data must be public. If support for associations becomes a part of these languages, the compilers should take care of this problem.

Other Ways of Inserting Data

Until now, we have assumed that the additional data we have to insert to implant the association is inserted as members. We'll mention only briefly other approaches that we explored, which proved less practical for inserting this data.

Required data can be inserted through inheritance, but when a class participates in several associations we get multiple inheritance, and this approach cannot be used in Java or C#. For example:


class Net : ChildAggregate<Master,Net>,
  TargetXtoX<Block,Terminal,Net>,
      Parent1toX<Net,Pin>,
        Parent1toX<Net,Connector> {
    ...
};

The Pattern Template Library (PTL; www.codefarms.com/products.htm) was built on this principle. The Data Object Library (DOL; www.codefarms.com/products.htm) provided automatically persistent associations based on ACC and inserted members. It was implemented with a code generator and C macros. Over 18 years, this library was successfully used on many large and complex projects.

Aspect Programming transparently inserts sections of code using a precompiler. However, AOP needs instruction about where and what should be inserted, and generating this instruction is more complicated than directly generating the members. We reached this conclusion after a discussion with Olaf Spinczyk, the author of AspectC++ (see www.codefarms.com/ aspects.doc).

Expanding Existing Languages

Something somewhere must tell the compiler that, for example, class Aggregate represents the association while classes ParentAggregate and ChildAggregate are a part of its implementation. Also, the compiler must know that when invoking Aggregate<Net,Pin>, an instance of AggregateParent must be inserted into Net and an instance of ParentChild into Pin. The best way to provide this information would be to add two new keywords; in Listing Four (available online), we used keywords Association and Participants.

When implementing the association, we also need to parameterize member names, which is something that existing templates/generics do not support. There are three ways to get around this last hurdle:

Style 1: Templates use a special parameter Name to parameterize member names, as in Listing Four.

Style 2: Implementation uses a new keyword, id, and the compiler replaces it by the association ID, for example:

template<class Parent,class Child> 
               class Aggregate {
  Participants(ParentAggregate,
                ChildAggregate);
  void add(Parent *p, Child *c){
     p->id.children.add(c);
     c->id.parent=p;
  }

Style 3: No additional keywords. The compiler is just smarter. In templates that implement the associations, it detects references to participating classes and transparently inserts ID:


template<class Parent,class Child> 
               class Aggregate {
  Participants(ParentAggregate,
                ChildAggregate);
  void add(Parent *p, Child *c){
      p->children.add(c); 
      c->parent=p;    
      // which is interpreted as
      // p->ID.children.add(c); 
      // c->ID.parent=p;    

  }

The implementation should let the debugger step through all the code when debugging association-based applications as well as when debugging associations themselves.

Interim Implementation

Until the time languages and compilers support reusable associations, we implemented a library of associations, which uses a code generator (precompiler). The implementation and the application interface are the same in both C++ and Java. (For free source-including documentation, see www.codefarms.com/incode.htm.) The precompiler is coded with the library and uses itself to recompile.

Because the code generator must substitute member names in the association files, we decided not to use templates and do all the substitutions with the precompiler. The result is a simple scheme that makes coding of new associations easy and readable (Listing Five; available online). This approach is a cross between style 1 and style 2. The library inserts each group of members as one instance of an automatically generated class (ZZ_Net, ZZ_Pin,...).

When interpreting the implementation of:


Association Aggregate<Net,Pin> pins;

the code generator performs the following substitutions:



$$  is replaced by   pins
$0                  ZZds._pins
$1                  Net
$2                  Pin

Instead of using keyword Participants, the library registry file (Listing Five) describes the roles of participating classes and the UML representation of individual associations. For example, B1-* means "bidirectional one-to-many."

Conclusion

We have no doubt that eventually, libraries of associations will supersede existing class libraries. The new libraries will provide a uniform treatment for existing containers, intrusive data structures, associations, structural design patterns, and other data organizations missing in the standard libraries today. Associations will become first-class entities—equally visible and important as classes. Experience with several libraries based on this idea—one of them used commercially for over 18 years—shows that this approach is viable and significantly increases readability, reusability, and supportability of the resulting software.