Channels ▼
RSS

C++ Theory and Practice


August 1997/C++ Theory and Practice

Dan has second thoughts about an early C++ design decision.


Copyright © 1997 by Dan Saks

In C++, classes and structs are essentially the same construct. They obey nearly identical rules. Rather than repeatedly refer to "class or struct," the draft C++ Standard neatly folds structs into classes with this single statement:

A structure is a class defined with the class-key struct; its members and base classes are public by default.

(The identical statement appears in the ARM[1] .) Aside from coding examples, the draft hardly mentions structs. It speaks only of classes with the understanding that classes include structs.

Thus, a struct in C++ can have members (including constructors and a destructor), access specifiers, and base classes. The only real difference between a class declared as a class and a class declared as a struct is in the default access to their bases and members. For example, in

class B : D
    {
    T m;
    };

D is a private base of B and m is a private member of B. If you simply change the keyword class to struct, as in

struct B : D
    {
    T m;
    };

then D is a public base and m is a public member. If you always specify the access on the base class and the first member, it doesn't matter whether you use the keyword class or struct.

Bjarne Stroustrup[2 ] explained his reasons for defining structs as classes as follows:

Maybe we could have lived with two sets of rules, but a single concept provides a smoother integration of features and simpler implementations. I was convinced that if struct came to mean "C and compatibility" to users and class to mean "C++ and advanced features," the community would fall into two distinct camps that would soon stop communicating... Only a single concept would support my ideas of a smooth and gradual transition from "traditional C-style programming," through data abstraction, to object-oriented programming. Only a single concept would support the notion of "you only pay for what you use" ideal.

He later added that:

I think the idea of keeping struct and class the same concept saved us from classes supporting an expensive, diverse, and rather different set of features that we have now. In other words, the "a struct is a class" notion is what has stopped C++ from drifting into becoming a much higher-level language with a disconnected low-level subset."

Like many beginning C++ programmers, I was surprised when I learned that a struct is a class. But it appealed to me right away. I have always liked designs that unify seemingly disparate concepts within a common set of rules — provided the common set of rules are indeed simpler than the separate of sets of rules would have been.

Therein lies my current disillusionment. If the Grand Unification (of classes and structs) had indeed been a good idea, it would have led to a simpler and safer language. I don't believe it did. I now believe it led to a language that is more complicated and perilous than it had to be.

One of the direct consequences of the Grand Unification is the language's penchant for generating copy constructors and copy assignment operators. These generated functions are a well-known source of bugs.

Generated Copy Operations

The copy constructor for a class T is a constructor that can be called with a single argument of type T. The typical declaration for a copy constructor has the form

T(T const &);

(My habit of writing T const & rather than const T & is purely a matter of style.)

A copy assignment operator for class T is an operator= that can be called with an argument of type T. The typical declaration for a copy assignment operator has the form

T &operator=(T const &);

Although the term copy assignment sounds redundant, it isn't. A class can have numerous assignment operators. The copy assignment is the only one that compilers can generate. The term copy assignment distinguishes this assignment operator from all other assignment operators.

A generated copy constructor uses memberwise construction. For example, if class T has members m and n, then the generated copy constructor behaves as if it were defined as:

T::T(T const &t) :   m(t.m), n(t.n)
    {
    }

For any member that has a scalar type, such as int or char *, then the compiler turns its member initializer into an assignment.

Similarly, a generated copy assignment uses memberwise assignment. For class T, copy assignment behaves as if it were defined as:

T &T::operator=(T const &t)
    {
    m = t.m;
    n = t.n;
    return *this;
    }

Copy constructors and copy assignment operators are profoundly important in C++. Initialization and assignment are fundamental operations available to all types in C, and these functions are the means by which C++ extends them to class types. C++ uses copy constructors not only for declarations, but also for passing function arguments and returning function results.

It often appears that C++ is doing you a favor by generating the copy operations for you. After all, most classes need them and, at least for many simple classes, the compiler-generated functions are just right. This allows rank beginners to write code that successfully passes class objects by value even before they know enough about const and references to write a decent copy constructor.

For example, you might have a class representing rational numbers (exact fractions) defined as:

class rational
    {
    ...
private:
    long num, denom;
    };

For this class, the compiler-generated copy constructor is exactly as it should be. In this case, the compiler has indeed done you a favor.

On the other hand, many interesting classes contain pointers to dynamically-allocated memory. For example, a typical vector class contains a pointer to a dynamically-allocated array that holds the vector elements:

template <class T>
class vector
    {
    ...
private:
    T *v;
    size_t n;
    };

For these vector types, the generated copy operations produce objects that share dynamically-allocated arrays without realizing that they are doing so. This leads to real confusion about who owns what. Programs with such objects usually wind up leaking memory or scrambling the free store.

A Question of Compatibility

If a compiler might generate erroneous copy operations, why does it generate them at all? As Stroustrup[2] explained:

I personally consider it unfortunate that copy operations are defined by default... However, C++ inherited its default assignment and copy constructors from C, and they are frequently used.

(By "default assignment," he meant what the draft now calls "copy assignment.")

In C, you can assign an object of a struct type to another object of the same struct type. You can also pass and return objects of struct type by value. Struct copy operations (assignment and initialization) assign each member of the source object to the corresponding member of the destination object. That is, they perform what C++ calls memberwise assignment.

When you compile C code as C++, C++ treats the structs as classes. If that code performs a struct assignment, the C++ compiler will look for a copy assignment that will carry out the assignment. Since the code is C, the compiler will not find what it's looking for. If the compiler is to treat structs as classes, and yet succeed at compiling the code, it must generate a copy assignment that will behave just like a C struct assignment. By the same token, the compiler must generate a copy constructor as needed for struct initialization.

But, did C++ really inherit the copy assignment and copy constructor from C, as Stroustrup suggested? I don't think so. C++ inherited struct assignment, that's all. C has copy assignments and copy constructors only if you first posit that "a struct is a class." By abandoning the dream of the Grand Unification, we could have had a language without a propensity for generating inappropriate copy operations.

The Unification That Wasn't

The real irony of the Grand Unification is that it was an illusion. The ARM preserved the illusion by failing to address various incompatibilities with C. C makes various promises about the layout of members within a struct. One such promise is that you can convert a pointer to a struct into a pointer to the first member of that struct. Since a C++ class object can have hidden data such as a vptr (a pointer to a virtual function table), C++ cannot extend those promises to all class types.

In removing the incompatibility, the draft C++ Standard added a definition for a restricted category of classes called POD classes. POD stands for "plain ol' data," and (surprise!) it's pretty much a struct or union as defined in C.

PODs are defined in terms of aggregates. An aggregate is an array or a class with no user-declared constructors, no private or protected non-static data members, no base classes, and no virtual functions. In essence, an aggregate is a class type that can be brace-initialized. For example,

struct date
    {
    int mm, dd, yy;
    };

is an aggregate. You can declare a date d_day and initialize it with a sequence of expressions enclosed in braces, as in

date d_day = { 6, 6, 1944 };

On the other hand, if a date has private members and a constructor, as in:

class date
    {
public:
    date(int m, int d, int y);
private:
    int mm, dd, yy;
    };

then it is not an aggregate, and you cannot brace initialize it. Rather, you must initialize a date by calling the constructor:

date d(6, 6, 1944);

An aggregate may sound like it's just a C struct, but it's more than that. An aggregate can have members with non-aggregate types (such as classes with constructors). It can also have members that are references or pointers to members.

The draft defines a POD as an aggregate that has no non-static data members of type pointer to member or non-POD (or array of such types) or reference, and has no user-defined copy assignment operator and no user-defined destructor.

How is a POD different from a C struct? A POD can have static data members and non-virtual member functions (other than the ones specifically prohibited). I believe that's the only difference.

In addition to eliminating the incompatibility with C, the notion of POD types is useful in describing what kinds of class objects can reside in read-only memory (ROM). Although the draft doesn't say so explicitly, I believe it implies that a const-qualified class object can reside in ROM only if it has a POD type.

The current draft Standard makes distinctions between POD and non-POD classes in a dozen or so places. It makes distinctions between aggregate and non-aggregate classes in a few more. If the concepts of class and struct were truly unified, I don't think we'd see so many such distinctions.

What to Do

As I mentioned earlier, I have always liked designs that unify seemingly disparate concepts within a common set of rules — provided the common set of rules are indeed simpler than the separate of sets of rules would have been. I don't like the Grand Unification of classes and structs; the rules aren't any simpler, and they have some very bad effects.

Well, it's interesting to critique the language and think about what we might have done if we had the power to do it over. But we're not about to change C++. Nonetheless, I think the insights are helpful. They lead us investigate programming techniques we can use to avoid the traps built into the language.

Next month I will look at some other problems that arise from the Grand Unification, and look at what we can do to cure those problems.

References

[1] Margaret Ellis and Bjarne Stroustrup. The Annotated C++ Reference Manual (Addison-Wesley, 1990).

[2] Bjarne Stroustrup. The Design and Evolution of C++ (Addison-Wesley, 1994).

Dan Saks is the president of Saks & Associates, which offers training and consulting in C++ and C. He is active in C++ standards, having served nearly seven years as secretary of the ANSI and ISO C++ standards committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield, OH 45504-4906 USA, by phone at +1-937-324-3601, or electronically at dsaks@wittenberg.edu.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video