Two C++ Gotchas

Get this before they get you.


February 01, 2003
URL:http://www.drdobbs.com/two-c-gotchas/184401613

Two C++ Gotchas

Adapted from S. Dewhurst, C++ Gotchas: Avoiding Common Problems in Coding and Design (#29 & #97). © 2003 Pearson Education, Inc. Reproduced by permission of Pearson Education, Inc. All rights reserved.

Steve Dewhurst's book C++ Gotchas: Avoiding Common Problems in Coding and Design [1] begins thus:

Following are two prize Gotchas...

Gotcha #29: Converting through void *

Even C programmers know that a void * is second cousin to a cast and should be avoided to the extent possible. As with a cast, converting a typed pointer to void * removes all useful type information. Typically, the original type of the pointer must be "remembered" and restored when the void * is used. If the type is resupplied correctly, everything will work fine (except, of course, that having to remember types for later casting implies that the design needs work).

void *vp = new int(12);
// . . .
// will work
int *ip = static_cast<int *>(vp); 
Unfortunately, even this simple use of void * can open the door to portability problems. Remember that static_cast is the cast operator we use (when we must cast) for relatively safe and portable conversions. For example, one might use a static_cast to cast from a base class pointer to a publicly derived class pointer. For unsafe, platform-dependent conversions, we're forced to use reinterpret_cast. For example, one might use a reinterpret_cast to cast from an integer to a pointer or between pointers to unrelated types:


// error!
char *cp = static_cast<char *>(ip); 
// works.
char *cp = reinterpret_cast<char *>(ip);
The use of reinterpret_cast is a clear indication to you and to the readers and maintainers of your code that you're not only casting, but that you're casting in a potentially nonportable way. Use of a void * intermediary allows that important warning to be circumvented:


// put int addr into a char *!
char *cp = static_cast<char *>(vp);
It gets worse. Consider a user interface that allows the address of a "Widget" to be stored and later retrieved:


typedef void *Widget;
void setWidget( Widget );
Widget getWidget();
Users of this interface recognize that they have to remember the type of Widget they set, so they can restore its type information when it's retrieved:

// In some header file . . .
class Button {
   // . . .
};
class MyButton : public Button {
   // . . .
};
// elsewhere . . .
MyButton *mb = new MyButton;
setWidget( mb );

// somewhere else entirely . . .
// might work!
Button *b = static_cast <Button *>
   (getWidget());
This code will usually work, even though we lose some type information when we extract the Widget. The stored Widget refers to a MyButton but is extracted as a Button. The reason this code will often work has to do with the likely way that the storage for a class object is laid out in memory.

Typically, a derived class object contains the storage for its base class sub-object starting at offset 0, as if its base class part were the first data member of the derived class, and simply appends any additional derived class data below that, as in Figure 1. Therefore, the address of a derived class object is generally the same as that of its base class. (Note, however, that the Standard guarantees correct results only if the address in the void * is converted to exactly the same type used to set the void *. See Gotcha #70 for one way this code could fail even under single inheritance.)

However, this code is fragile, in that a remote change during maintenance may introduce a bug. In particular, a straightforward and proper use of multiple inheritance may break the code:

// in some header file . . .
class Subject {
   // . . .
};
class ObservedButton :
   public Subject, public Button {
   // . . .
};
// elsewhere . . .
ObservedButton *ob = new ObservedButton;
setWidget( ob );
// . . .
Button *badButton =
   static_cast<Button *>(getWidget()); // disaster!
The problem is with the layout of the derived class object under multiple inheritance. An ObservedButton has two base class parts, and only one of them can have the same address as the complete object. Typically, storage for the first base class (in this case, Subject) is placed at offset 0 in the derived class, followed by the storage for subsequent base classes (in this case, Button), followed by any additional derived class data members, as in Figure 2. Under multiple inheritance, a single object commonly has multiple valid addresses.

Ordinarily this is not a problem, since the compiler is aware of the various offsets and can perform the correct adjustments at compile time:

Button *bp = new ObservedButton;
ObservedButton *obp =
   static_cast<ObservedButton *>(bp);
In the code above, bp correctly points to the Button part of the ObservedButton object, not to the start of the object. When we cast from a Button pointer to an ObservedButton pointer, the compiler is able to adjust the address so that it points to the start of the ObservedButton object. It's not hard, since the compiler knows the offset of each base class part within a derived class, as long as it knows the type of the base and derived classes.

And that's our problem. When we use setWidget, we throw away all useful type information. When we cast the result of getWidget to Button, the compiler can't perform the adjustment to the address. As a result, the Button pointer is actually referring to a Subject!

Void pointers do have their uses, as do casts, but they should be used sparingly. It's never a good idea to use a void * as part of an interface that requires one use of the interface to resupply type information lost through another use.

Gotcha #97: Cosmic Hierarchies

More than a decade ago, the C++ community decided that the use of "cosmic" hierarchies (architectures in which every object type is derived from a root class, usually called Object) was not an effective design approach in C++. There were a number of reasons for rejecting this approach, both on the design level and on the implementation level.

From a design standpoint, cosmic hierarchies often give rise to generic containers of "objects." The content of these containers are often unpredictable and lead to unexpected run-time behavior. Bjarne Stroustrup's classic counterexample considered the possibility of putting a battleship in a pencil cup -- something a cosmic hierarchy would allow but that would probably surprise a user of the pencil cup.

A pervasive and dangerous assumption among inexperienced designers is that an architecture should be as flexible as possible. Error. Rather, an architecture should be as close to the problem domain as possible while retaining sufficient flexibility to permit reasonable future extension. When "software entropy" sets in and new requirements are difficult to add within the existing structure, the code should be refactored into a new design. Attempts to create maximally flexible architectures a priori are similar to attempts to create maximally efficient code without profiling; there will be no useful architecture, and there will be a loss of efficiency. (See also Gotcha #72.)

This misapprehension of the goal of an architecture, coupled with an unwillingness to do the hard work of abstracting a complex problem domain, often results in the reintroduction of a particularly noxious form of cosmic hierarchy:


class Object {
 public:
   Object( void *, const type_info & );
   virtual ~Object();
   const type_info &type();
   void *object();
   // . . .
};
Here, the designer has abdicated all responsibility for understanding and properly abstracting the problem domain and has instead created a wrapper that can be used to effectively "cosmicize" otherwise unrelated types. An object of any type can be wrapped in an Object, and we can create containers of Objects into which we can put anything at all (and frequently do).

The designer may also provide the means to perform a type-safe conversion of an Object wrapper to the object it wraps:


template <class T>
T *dynamicCast( Object *o ) {
   if( o && o->type() == typeid(T) )
      return reinterpret_cast<T *>
         (o->object());
   return 0;
}
At first glance, this approach may seem acceptable (if somewhat ungainly), but consider the problem of extracting and using the content of a container that can contain anything at all:


void process( list<Object *> &cup ) {
   typedef list<Object *>::iterator I;
   for( I i(cup.begin()); i != cup.end(); 
       ++i ) {
       if( Pencil *p =
           dynamicCast<Pencil>(*i) )
           p->write();
       else if( Battleship *b =
           dynamicCast<Battleship>(*i) )
           b->anchorsAweigh();
       else
           throw InTheTowel();
   }
}
Any user of the cosmic hierarchy will be forced to engage in a silly and childish "guessing game," the object of which is to uncover type information that shouldn't have been lost in the first place. In other words, that a pencil cup can't contain a battleship doesn't indicate a design flaw in the pencil cup. The flaw may be found in the section of code that thinks it's reasonable to perform such an insertion. It's unlikely that the ability to put a battleship in a pencil cup corresponds to anything in the application domain, and this is not the type of coding we should encourage or submit to. A local requirement for a cosmic hierarchy generally indicates a design flaw elsewhere.

Since our design abstractions of pencil cups and battleships are simplified models of the real world (whatever "real" means in the context), it's worth considering the analogous real-world situation. Imagine that, as the designer of a (physical) pencil cup, you received a complaint from one of your users that his ship didn't fit in the cup. Would you offer to fix the pencil cup, or would you offer some other type of assistance?

The repercussions of this abdication of design responsibility are extensive and serious. Any use of a container of Objects is a potential source of an unbounded number of type-related errors. Any change to the set of object types that may be wrapped as Objects will require maintenance to an arbitrary amount of code, and that code may not be available for modification. Finally, because no effective architecture has been provided, every user of the container is faced with the problem of how to extract information about the anonymous objects.

Each of these acts of design will result in different and incompatible ways of detecting and reporting errors. For example, one user of the container may feel just a bit silly asking questions like "Are you a pencil? No? A battleship? No? ..." and opt for a capability-query approach. The results are not much better (see Gotcha #99).

Often, the presence of an inappropriate cosmic hierarchy is not as obvious as it is in the case we just discussed. Consider a hierarchy of assets, as in Figure 3.

It's not immediately clear whether the Asset hierarchy is overly general or not, especially in this high-level picture of the design. Often the suitability of a design choice is not clear until much lower-level design or coding has taken place. If the general nature of the hierarchy leads to certain disreputable coding practices (see Gotchas #98 and 99), it's probably a cosmic hierarchy and should be refactored out of existence. Otherwise, it may simply be an acceptably general hierarchy.

Sometimes, refactoring our perceptions can improve a hierarchy, even without source code changes. Many of the problems associated with cosmic hierarchies have to do with employing an overly general base class. If we reconceptualize the base class as an interface class and communicate this reconceptualization to the users of the hierarchy, as in Figure 4, we can avoid many of the damaging coding practices mentioned earlier.

Our design no longer expresses a cosmic hierarchy but three separate hierarchies that leverage independent subsystems through their corresponding interfaces. This is a conceptual change only, but an important one. Now employees, vehicles, and contracts may be manipulated as assets by an asset subsystem, but the subsystem, because it's ignorant of classes derived from Asset, won't attempt to uncover more precise information about the Asset objects it manipulates. The same reasoning applies to the other interface classes, and the possibility of a run-time type-related error is small.

Notes

[1] Steve Dewhurst. C++ Gotchas: Avoiding Common Problems in Coding and Design (Addison-Wesley, 2002).

[2] Ibid., p. xi.

About the Author

Stephen C. Dewhurst (<www.semantics.org>) is the president of Semantics Consulting, Inc., located among the cranberry bogs of southeastern Massachusetts. He specializes in C++ consulting, and training in advanced C++ programming, STL, and design patterns. Steve is also one of the featured instructors of The C++ Seminar (<www.gotw.ca/cpp_seminar>).

Figure 1: Likely layout of a derived clas under single inheritance

Figure 1: Likely layout of a derived clas under single inheritance

Figure 2

Figure 2: Likely layout of an object under multiple inheritance. An ObservedButton object conains sub-objects for both its Subject and Button base classes. Loss of type information cause badButton to refer to a non-Button address.

Figure 3

Figure 3: An iffy hierarchy -- it's not clear whether the use of Asset is overly general or not

Figure 4

Figure 4: An effective reconceptualization -- the is-a relationship is appropriately weakened if we consider Asset to be a prototol rather than a base class

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.