Keith is a software engineering consultant, specializing in software auditing and automatic software inspection systems. He can be reached at firstname.lastname@example.org.
A software system has both a logical and physical design. The logical design is expressed in terms of logical entities (such as class, data member, and member function) and logical relations (such as inherits, uses, and contains). The physical design shows how the logical entities (such as classes and member functions) are assigned to files.
The speed of compilation, modifiability, testability, and maintainability of a software system all depend crucially on its physical design. For example, cyclic dependencies among components make for poorer testability and unnecessary include directives make for slower compiling. Both Large Scale C++ Software Design, by J. Lakos (Addison-Wesley, 1996) and Designing Object-Oriented Applications, by R.C. Martin (Prentice-Hall, 1996) define several principles of good physical design.
In Large Scale C++ Software Design, for instance, Lakos shows how the physical design of a system does much to determine its speed of compilation and linking, modifiability, testability, and maintainability. He lays down several principles of good physical design to ensure that compilation and linking are not too slow, and that the software is easily modified, tested, and maintained.
Likewise, in Designing Object-Oriented Applications, Martin shows how the unnecessary use of the include directive can make a system unduly slow to compile and unduly hard to modify. He lays down several rules that define when you must use the include directive and when you can get away with just the forward declaration.
In this article, I'll describe how breaches of these principles can be detected automatically, provided that you can extract a set of key relationships among the entities of the system (such as classes and member functions). Examples of such relations include:
- A inherits from B.
- A::f uses a variable in class B.
- A::f invokes a member function in class B.
To extract and manipulate these relationships, I built PDCHECK, an automatic code inspection tool that flags the places where the design contravenes the rules. PDCHECK extracts and examines relationships from C++ systems, and reports breaches of the Lakos and Martin guidelines. (For more information about PDCHECK availability, contact me at email@example.com.)
PDCHECK starts by extracting key relations between the parts of a C++ system using Code Check, a rule-based expert system from Abraxas Software (http://www.abxsoft.com/). PDCHECK displays the relations both as matrices and as directed acyclic graphs as appropriate. PDCHECK then issues three types of reports:
- The Physical Organization Report assesses the assignment of the logical entities (classes and member functions) to the physical entities (.cxx files and .h files), and points out where this assignment fails to follow the design rules laid down by Lakos.
- The Include Report assesses each include directive according to the criteria established by Martin. If the include is necessary, PDCHECK explains why; if it is not necessary, PDCHECK explains whether to replace it with a forward declaration or to omit it entirely.
- The Metrics Report measures the six numerical properties defined by S.F. Chidamber and C.F. Kemerer in "A Metrics Suite for Object-Oriented Programs" (IEEE Transactions on Software Engineering, June 1994).
PDCHECK is currently being extended to measure a wider range of metrics, including those defined by J. Bansiya and C. Davis in "Automated Metrics and Object-Oriented Development" (DDJ, December 1997).
In this discussion, I'll use capital letters for classes and small letters for members. Table 1 lists the relations you need in discussing the Lakos and Martin principles, whereas Table 2 defines the relation A depends on B. All these relations are logical in that they make no mention of the files in which the entities are stored. You also need relations involving physical entities, such as:
- File a.cxx implements A::f.
- File a.h defines class A.
- File a.cxx includes file b.h.
Lakos points out that the appropriate unit of physical organization of object-oriented software is not the class but the component, where the component is defined by amalgamating classes that lie on cycles. More formally, Lakos lays down that class A and class B must reside in the same file if and only if there is a cycle in the class-dependency graph that contains A and B.
For example, consider a system such as Figure 1 with six classes. Now suppose that the designer has assigned these classes to files, as in Figure 2. On the left side of the diagram, the cycle (A,B,C) implies that these three should be in the same file yet they are in separate files. On the right side of the diagram, the three classes D, E, and F have no cycle and, therefore, should be in separate files yet they are in the same file. The designer has violated the Lakos guidelines twice in opposite senses. The three files in the left are too small and should be merged into one; the one file on the right is too big and should be split into three separate files.
You need to know where the design breaks the Lakos principles. PDCHECK tells you this in the Physical Organization Report; see Table 3.
You can see that this report tells you not only what is wrong (left column) but what to do to correct it (right column). Even better, the corrections to the physical design can be made automatically. If you submit the Physical Organization Report to PDCHECK's companion system, PDMODIFY, it will create a new version of your system with these corrections made. The new system will therefore obey the Lakos principles.
Include or Forward Declare
You use the include directive to make the current file aware of the interface of a class; sometimes all the current file needs to know is that a class exists. Table 4 shows this distinction. The first block of Table 4 invokes a member function in class B, so the include is required. The second block of Table 4, on the other hand, merely passes onwards a pointer to an object in class B, so a forward declaration is sufficient.
Martin sums this up as:
Following the multiple-inclusion protection should be the #include statements that describe the interface dependencies of this module. An interface dependency is a dependency upon the interface of the included class. Typically such a dependency only occurs for base classes, classes that are contained by value, or classes whose member functions are called within inline functions.
For instance, consider the file a.h in Example 1. I have deliberately used an include directive for every class mentioned, although according to Martin's criteria, some are unnecessary.
You need to know which of your includes are really necessary and which can be replaced by a forward declaration. PDCHECK tells you this in the Include Directives Report; see Table 5.
You can see that this report assesses each include directive. If the include is required, the comment in the middle column tells you why; if the include is not required, the action in the right hand column shows you what forward declaration to use. Even better, each redundant include can be automatically downgraded to the appropriate forward declaration. If you submit the Include Directives Report to PDCHECK's companion system, PDMODIFY, it will create a new version of each offending file with these corrections made. The new version of the system will therefore obey the so-called "Martin parsimonious inclusion principle."
Chidamber and Kemerer proposed the suite of metrics in Table 6 for object-oriented systems. The values of these measurements (see Figure 3) depend on relations among the logical entities of the system. Figure 3 also identifies which measurements require which relations. Thus, the measurement WMC requires the relation "class A defines function f" and the relation "A::f has cyclomatic complexity c."
Chidamber and Kemerer's article deals only briefly with the relation A depends on B. A full treatment is provided in several books on object-oriented design, notably Lakos and Martin's, which shows that to compute the relation A depends on B, you need the following 13 relations:
- A inherits from B.
- A defines an object in class B.
- A defines a pointer to an object in class B.
- A::f returns an object in class B.
- A::f returns a pointer to an object in class B.
- A::f receives an object in class B.
- A::f receives a pointer to an object in class B.
- A::f defines an object in class B.
- A::f defines a pointer to an object in class B.
- A::f invokes a member function on an object in class B.
- A::f invokes a member function on a pointer to an object in class B.
- A::f uses an object in class B.
- A::f uses a pointer to an object in class B.
Of these 13, only one (inherits) is required to measure the other five values. Thus, CBO is very demanding (see Table 6 and Figure 3). Nonetheless, the coupling between classes is an important property and worth measuring accurately.
If, as programmers, we agree to adopt the principles laid down by Lakos and Martin, the next step is to ask if software can be automatically inspected to detect breaches of these principles. The answer is yes. That's what PDCHECK does. The tool carries out these steps:
1. Extracts relationships.
2. Reports breaches of the Lakos principles.
3. Reports breaches of the Martin parsimonious inclusion principle.
4. Shows how to correct breaches of the Martin parsimonious inclusion principle.
5. Reports values of Chidamber and Kemerer metric suite.
Steps 2-5 are reasonably straightforward programming tasks and caused me no difficulty. Step 1, however, demands that the tool must parse the code in order to extract the relationships. This is a more specialized task. That's why I used Abraxas Software's Code Check, which parses the code and stops at every token to execute a set of rules supplied by the user. I created a special set of rules to extract exactly the relationships I needed. I have been writing rules for Code Check for many years in my consulting business and have found that they save both me and my clients time and money.
If you want to write this sort of automatic code inspection tool for yourself, it may be possible to obtain the same effect with such tools as YACC or Bison.
If you merely want to use such a tool without the bother of writing it, it may be possible to find other suitable tools by searching the Web.
Lakos and Martin define several principles of good physical design. Breaches of these principles can be detected automatically, provided that you can extract and manipulate a set of key relationships among the entities (such as classes and member functions) of the system.