Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Two C++ Gotchas


Two C++ Gotchas

Adapted from S. Dewhurst, C++ Gotchas: Avoiding Common Problems in Coding and Design (#29 & #97). © 2003 Pearson Education, Inc. Reproduced by permission of Pearson Education, Inc. All rights reserved.

Steve Dewhurst's book C++ Gotchas: Avoiding Common Problems in Coding and Design [1] begins thus:

    This book is the result of nearly two decades of minor frustrations, serious bugs, late nights, and weekends spent involuntarily at the keyboard. This collection consists of 99 of some of the more common, severe, or interesting C++ gotchas, most of which I have (I'm sorry to say) experienced personally. The term "gotcha" has a rather cloudy history, and a variety of definitions. For the purposes of this book, we'll define C++ gotchas as common and preventable problems in C++ programming and design. The gotchas described here run the gamut from minor syntactic annoyances to basic design flaws to full-blown sociopathic behavior." [2]
Following are two prize Gotchas...

Gotcha #29: Converting through void *

Even C programmers know that a void * is second cousin to a cast and should be avoided to the extent possible. As with a cast, converting a typed pointer to void * removes all useful type information. Typically, the original type of the pointer must be "remembered" and restored when the void * is used. If the type is resupplied correctly, everything will work fine (except, of course, that having to remember types for later casting implies that the design needs work).

void *vp = new int(12);
// . . .
// will work
int *ip = static_cast<int *>(vp); 
Unfortunately, even this simple use of void * can open the door to portability problems. Remember that static_cast is the cast operator we use (when we must cast) for relatively safe and portable conversions. For example, one might use a static_cast to cast from a base class pointer to a publicly derived class pointer. For unsafe, platform-dependent conversions, we're forced to use reinterpret_cast. For example, one might use a reinterpret_cast to cast from an integer to a pointer or between pointers to unrelated types:

<font color="#FF0000">
// error!
char *cp = static_cast<char *>(ip); </font>
// works.
char *cp = reinterpret_cast<char *>(ip);
The use of reinterpret_cast is a clear indication to you and to the readers and maintainers of your code that you're not only casting, but that you're casting in a potentially nonportable way. Use of a void * intermediary allows that important warning to be circumvented:

<font color="#FF0000">
// put int addr into a char *!
char *cp = static_cast<char *>(vp);</font>
It gets worse. Consider a user interface that allows the address of a "Widget" to be stored and later retrieved:

<font color="#FF0000">
typedef void *Widget;
void setWidget( Widget );
Widget getWidget();</font>
Users of this interface recognize that they have to remember the type of Widget they set, so they can restore its type information when it's retrieved:

// In some header file . . .
class Button {
   // . . .
};
class MyButton : public Button {
   // . . .
};
// elsewhere . . .
MyButton *mb = new MyButton;
<font color="#FF0000">setWidget( mb );</font>

// somewhere else entirely . . .
// might work!
<font color="#FF0000">Button *b = static_cast <Button *>
   (getWidget());</font>
This code will usually work, even though we lose some type information when we extract the Widget. The stored Widget refers to a MyButton but is extracted as a Button. The reason this code will often work has to do with the likely way that the storage for a class object is laid out in memory.

Typically, a derived class object contains the storage for its base class sub-object starting at offset 0, as if its base class part were the first data member of the derived class, and simply appends any additional derived class data below that, as in Figure 1. Therefore, the address of a derived class object is generally the same as that of its base class. (Note, however, that the Standard guarantees correct results only if the address in the void * is converted to exactly the same type used to set the void *. See Gotcha #70 for one way this code could fail even under single inheritance.)

However, this code is fragile, in that a remote change during maintenance may introduce a bug. In particular, a straightforward and proper use of multiple inheritance may break the code:

// in some header file . . .
class Subject {
   // . . .
};
class ObservedButton :
   public Subject, public Button {
   // . . .
};
// elsewhere . . .
ObservedButton *ob = new ObservedButton;
<font color="#FF0000">setWidget( ob );</font>
// . . .
<font color="#FF0000">Button *badButton =
   static_cast<Button *>(getWidget()); // disaster!</font>
The problem is with the layout of the derived class object under multiple inheritance. An ObservedButton has two base class parts, and only one of them can have the same address as the complete object. Typically, storage for the first base class (in this case, Subject) is placed at offset 0 in the derived class, followed by the storage for subsequent base classes (in this case, Button), followed by any additional derived class data members, as in Figure 2. Under multiple inheritance, a single object commonly has multiple valid addresses.

Ordinarily this is not a problem, since the compiler is aware of the various offsets and can perform the correct adjustments at compile time:

Button *bp = new ObservedButton;
ObservedButton *obp =
   static_cast<ObservedButton *>(bp);
In the code above, bp correctly points to the Button part of the ObservedButton object, not to the start of the object. When we cast from a Button pointer to an ObservedButton pointer, the compiler is able to adjust the address so that it points to the start of the ObservedButton object. It's not hard, since the compiler knows the offset of each base class part within a derived class, as long as it knows the type of the base and derived classes.

And that's our problem. When we use setWidget, we throw away all useful type information. When we cast the result of getWidget to Button, the compiler can't perform the adjustment to the address. As a result, the Button pointer is actually referring to a Subject!

Void pointers do have their uses, as do casts, but they should be used sparingly. It's never a good idea to use a void * as part of an interface that requires one use of the interface to resupply type information lost through another use.

Gotcha #97: Cosmic Hierarchies

More than a decade ago, the C++ community decided that the use of "cosmic" hierarchies (architectures in which every object type is derived from a root class, usually called Object) was not an effective design approach in C++. There were a number of reasons for rejecting this approach, both on the design level and on the implementation level.

From a design standpoint, cosmic hierarchies often give rise to generic containers of "objects." The content of these containers are often unpredictable and lead to unexpected run-time behavior. Bjarne Stroustrup's classic counterexample considered the possibility of putting a battleship in a pencil cup -- something a cosmic hierarchy would allow but that would probably surprise a user of the pencil cup.

A pervasive and dangerous assumption among inexperienced designers is that an architecture should be as flexible as possible. Error. Rather, an architecture should be as close to the problem domain as possible while retaining sufficient flexibility to permit reasonable future extension. When "software entropy" sets in and new requirements are difficult to add within the existing structure, the code should be refactored into a new design. Attempts to create maximally flexible architectures a priori are similar to attempts to create maximally efficient code without profiling; there will be no useful architecture, and there will be a loss of efficiency. (See also Gotcha #72.)

This misapprehension of the goal of an architecture, coupled with an unwillingness to do the hard work of abstracting a complex problem domain, often results in the reintroduction of a particularly noxious form of cosmic hierarchy:

<font color="#FF0000">
class Object {
 public:
   Object( void *, const type_info & );
   virtual ~Object();
   const type_info &type();
   void *object();
   // . . .
};</font>
Here, the designer has abdicated all responsibility for understanding and properly abstracting the problem domain and has instead created a wrapper that can be used to effectively "cosmicize" otherwise unrelated types. An object of any type can be wrapped in an Object, and we can create containers of Objects into which we can put anything at all (and frequently do).

The designer may also provide the means to perform a type-safe conversion of an Object wrapper to the object it wraps:

<font color="#FF0000">
template <class T>
T *dynamicCast( Object *o ) {
   if( o && o->type() == typeid(T) )
      return reinterpret_cast<T *>
         (o->object());
   return 0;
}</font>
At first glance, this approach may seem acceptable (if somewhat ungainly), but consider the problem of extracting and using the content of a container that can contain anything at all:

<font color="#FF0000">
void process( list<Object *> &cup ) {
   typedef list<Object *>::iterator I;
   for( I i(cup.begin()); i != cup.end(); 
       ++i ) {
       if( Pencil *p =
           dynamicCast<Pencil>(*i) )
           p->write();
       else if( Battleship *b =
           dynamicCast<Battleship>(*i) )
           b->anchorsAweigh();
       else
           throw InTheTowel();
   }
}</font>
Any user of the cosmic hierarchy will be forced to engage in a silly and childish "guessing game," the object of which is to uncover type information that shouldn't have been lost in the first place. In other words, that a pencil cup can't contain a battleship doesn't indicate a design flaw in the pencil cup. The flaw may be found in the section of code that thinks it's reasonable to perform such an insertion. It's unlikely that the ability to put a battleship in a pencil cup corresponds to anything in the application domain, and this is not the type of coding we should encourage or submit to. A local requirement for a cosmic hierarchy generally indicates a design flaw elsewhere.

Since our design abstractions of pencil cups and battleships are simplified models of the real world (whatever "real" means in the context), it's worth considering the analogous real-world situation. Imagine that, as the designer of a (physical) pencil cup, you received a complaint from one of your users that his ship didn't fit in the cup. Would you offer to fix the pencil cup, or would you offer some other type of assistance?

The repercussions of this abdication of design responsibility are extensive and serious. Any use of a container of Objects is a potential source of an unbounded number of type-related errors. Any change to the set of object types that may be wrapped as Objects will require maintenance to an arbitrary amount of code, and that code may not be available for modification. Finally, because no effective architecture has been provided, every user of the container is faced with the problem of how to extract information about the anonymous objects.

Each of these acts of design will result in different and incompatible ways of detecting and reporting errors. For example, one user of the container may feel just a bit silly asking questions like "Are you a pencil? No? A battleship? No? ..." and opt for a capability-query approach. The results are not much better (see Gotcha #99).

Often, the presence of an inappropriate cosmic hierarchy is not as obvious as it is in the case we just discussed. Consider a hierarchy of assets, as in Figure 3.

It's not immediately clear whether the Asset hierarchy is overly general or not, especially in this high-level picture of the design. Often the suitability of a design choice is not clear until much lower-level design or coding has taken place. If the general nature of the hierarchy leads to certain disreputable coding practices (see Gotchas #98 and 99), it's probably a cosmic hierarchy and should be refactored out of existence. Otherwise, it may simply be an acceptably general hierarchy.

Sometimes, refactoring our perceptions can improve a hierarchy, even without source code changes. Many of the problems associated with cosmic hierarchies have to do with employing an overly general base class. If we reconceptualize the base class as an interface class and communicate this reconceptualization to the users of the hierarchy, as in Figure 4, we can avoid many of the damaging coding practices mentioned earlier.

Our design no longer expresses a cosmic hierarchy but three separate hierarchies that leverage independent subsystems through their corresponding interfaces. This is a conceptual change only, but an important one. Now employees, vehicles, and contracts may be manipulated as assets by an asset subsystem, but the subsystem, because it's ignorant of classes derived from Asset, won't attempt to uncover more precise information about the Asset objects it manipulates. The same reasoning applies to the other interface classes, and the possibility of a run-time type-related error is small.

Notes

[1] Steve Dewhurst. C++ Gotchas: Avoiding Common Problems in Coding and Design (Addison-Wesley, 2002).

[2] Ibid., p. xi.

About the Author

Stephen C. Dewhurst (<www.semantics.org>) is the president of Semantics Consulting, Inc., located among the cranberry bogs of southeastern Massachusetts. He specializes in C++ consulting, and training in advanced C++ programming, STL, and design patterns. Steve is also one of the featured instructors of The C++ Seminar (<www.gotw.ca/cpp_seminar>).


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.