Stepping Up To C++

September 1995/Stepping Up To C++

Other Assorted Changes, Part 3

Dan Saks is the president of Saks & Associates, which offers consulting and training in C++ and C. He is secretary of the ANSI and ISO C++ committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield OH, 4450-4906, by phone at (513)324-3601, or electronically at dsaks@wittenberg. edu.

This is the third and final installment in a series on assorted changes to the C++ language wrought by years of ongoing standardization. These changes do not include new features, but rather features defined in the Annotated C++ Reference Manual (ARM) [1] that now have different behaviors under the draft C++ standard. In some cases, the changes simply disallow constructs that the ARM allowed.

Thus far, I've explained the following new rules:

The left-hand side of a member access expression is always evaluated, even if the right-hand side designates a static member or enumeration constant.
Enumerations are not integral, so that built-in arithmetic operators, such as ++ and --, no longer apply to enumerations; however, enumerations can be promoted to int, unsigned int, long, or unsigned long.
In most cases, temporary objects are destroyed as the last step in evaluating the full-expression that (lexically) contains the point where they were created. This is true even if that evaluation ends in throwing an exception.
A local class may declare a global function as a friend, but it may not define it.
A program may no longer convert a pointer-to-function to or from a pointer-to-object-type.
A program may not use an operator function name, such as operator+, as the name of an ordinary variable.

See "Stepping Up to C++: Other Assorted Changes," Parts 1 and 2, CUJ, July and August, 1995.

In addition to these changes, I described the changes specific to the scope rules in "Stepping Up to C++: Changes in Scope Rules," CUJ, June, 1995.

CV-Qualifiers in Parameter Types

C++ allows function name overloading. That is, a program can declare two or more functions with the same name in the same scope, as long as each function has a distinct signature. A function's signature is the sequence of types in its parameter list. For example,

int put(int c);
int put(int c, FILE *stream);
int put(const char *s);
int put(const char *s, FILE *stream);

is a family of overloaded functions. The last one has signature (const char *, FILE *).

A program can also overload non-static member functions as const and non-const, as well as volatile and non-volatile. For example,

class X
    {
public:
    int f(int);
    int f(int) const;
    int f(int) volatile;
    int f(int) const volatile;
    // ...
    };

declares four distinct functions named X::f. Thus, the signature of a non-static member function must also include any cv-qualifiers (const and/or volatile qualifiers) applied to the function itself so the compiler can tell it apart from the others. The last of the X::f functions above has signature (int) const volatile.

In some cases, two functions with the same name may have different signatures, but still cannot overload each other. In other words, they cannot appear in the same set of overloaded functions. The ARM states:

Since for any type T, a T and a T& accept the same set of initializer values, function declarations with parameter types differing only in this respect may not have the same name. For example,
int f(int i);
int f(int& r)  // error: function types
               // not sufficiently different

The ARM also states that:

Similarly, since for any type T, a T, a const T, and a volatile T accept the same set of initializer values, functions with parameter types differing only in this respect may not have the same name.

This means that you can't declare

void f(T t);
void f(const T t);

as overloaded functions because the compiler can't tell the fs apart when you try to call one. Every actual argument you can pass to f(T), you can also pass to f(const T), and vice versa.

At first blush, the ARM's rule also appears to preclude declaring overloaded functions such as

void g(T *p);
void g(const T *p);

void h(T &r);
void h(const T &r);

It doesn't. The ARM goes on to say:

It is, however, possible to distinguish between const T&, volatile T&, and plain T& so function declarations that differ only in this respect can be overloaded. Similarly, it is possible to distinguish between const T*, volatile T*, and plain T* so function declarations that differ only in this respect can be overloaded.

Essentially, the difference between the set

void g(T *p);
void g(T const *p);

which you can overload, and the set

void f(T t)
void f(const T t);

which you can't, is whether the const that distinguishes the functions applies to the parameter's top type. That is, the ability to overload hinges on whether the const applies to the formal parameter itself (the top type), or to the actual argument accessed through the parameter.

In the case of

void g(T *p);
void g(T const *p);

the const in the second function applies, not to parameter p, but to the object addressed by p. Thus, the two functions may have clearly different outward behaviors. The first function is entitled to modify the object addressed by p; the second is not. Therefore, the first function can accept only addresses of non-const T objects, while the second can accept addresses of either const or non-const T objects,

In the case of

void f(T t)
void f(const T t);

the const in the second function applies to formal parameter t, not to the actual argument. In other words, the functions only differ in that the first function can alter its parameter t, but the second cannot. But in both cases, the parameter is passed by value, so the function only has access to a copy of the actual argument, not the argument itself. So even if the first function alters its parameter, it has no effect on the actual argument. And there's no difference in the type of actual arguments either function will accept. Thus, there' s little to distinguish the outward behavior of these functions.

In case the horse isn't completely dead, I'll beat it a little more. Note that you cannot overload

void f(X *t)
void f(X *const t);

because this is just a special case of the rule that you can't overload

void f(T t)
void f(T const t);

where T is X *. The order of the type-specifier T and the cv-qualifier const doesn't matter, so

void f(T const t);

and

void f(const T t);

are the same function. In either case, the const applies to parameter t itself, not to the actual argument.

You might argue that, given

void f(T t)
void f(const T t);

a call f(t) should call f(const T) when t is const and call f(T) when t is non-const. But first, consider the situation where T is a non-class type, say int:

void f(int n);
void f(const int n);

Which function should f(42) call? If you think it should call f(const int) because 42 is a constant, think again. There are subtle reasons why it's not that simple.

Syntactically, 42 is surely a constant, but during semantic analysis, the compiler regards it as an rvalue of type int. Since 42 is not an lvalue (an object), it cannot be const-qualified (or volatile-qualified). That is, although 42 is a constant-expression, it is not const-qualified. A subtle point indeed.

Thus, a simple rule that matches const with const and non-const with non-const implies that f(42) selects f(int) over f(const int). If this were the rule, I suspect many programmers would find the behavior surprising. So it's just as well that you can't overload f(T) and f(const T). C++ already has more than its share of surprises.

Unfortunately, the rule that prohibits overloading f(T) and f(const T) has a problem — it's incompatible with C. Although you can't overload functions in C, you can declare a function more than once in a given scope. Moreover, the declarations need not be exactly the same; they just have to be close enough. More precisely, they must be compatible.

For example, the declarations

void f(int);
void f(const int i);

are valid in the same scope of a standard C program, and they refer to the same function. In C, two function declarations are compatible if they have the same name and return type, and their parameter types are the same, ignoring any top-level cv-qualifiers on the parameter types.

The C++ standards committees, WG21 and X3J16, agreed to eliminate this incompatibility. They took the approach that the presence of cv-qualifiers in the top type of a formal parameter is merely a detail of implementing the function; it's irrelevant to a function's interface. Therefore, C++ should ignore such cv-qualifiers in determining a function's signature for declaration matching and overload resolution.

The draft now says:

Parameter declarations that differ only in the presence or absence of const and/or volatile are equivalent. That is, the const and volatile type-specifiers for each parameter type are ignored when determining which function is being declared, defined, or called. For example,
typedef const int cInt;

int f(int);         // redeclares f(int);
int f(const int);   // defines f(int)
int f(int) { ... }  // error: redefines f(int)
int f(clnt) { ... }
Only the const and volatile specifiers at the outermost level of the parameter type specification are ignored in this fashion; const and volatile specifiers buried within a parameter type specification are significant and can be used to distinguish overloaded function declarations.

Thus, according to the current draft, a function such as

void f(const char *p, const int i);

has signature (const char *, int). Someday, C++ compilers will actually behave this way.

A C++ compiler is supposed to ignore the cv-qualifiers outside the function, but not inside. For example,

void f(int);
...
void f(const int i)
    {
    return ++i;   // error
    }

is an error because it tries to modify parameter i in a context where i is const-qualified. Interestingly,

void f(const int);
...
void f(int i)
   {
   return ++i;
   }

is valid C. To my dismay, I'm afraid it will also be valid C++.

CV-Qualifiers on Return Types

As part of the investigation that lead to the previous change, the committees considered whether to also ignore cv-qualifiers at the top of return types. In other words, the question was whether a function declared as

const T f();

really returns a const T for just a T.

Why does this matter? Well, if T is a non-class type, it doesn't matter much. The result of a function call expression such as f() is an rvalue unless the function returns a reference, in which case the result is an lvalue. As in C, an rvalue of a non-class type is not an object, so you can't modify it. Thus cv-qualifiers are meaningless when applied to rvalues.

However, if T is a class type, then the const qualifter in the return type of f might influence overload resolution in member function calls. For example, suppose T is

class T
    {
public:
    void g();
    void g() const;
    // ...
    };

and f is declared as above (with return type const T). Does f().g() call the const member function g, or the non-const one? If C++ were to ignore cv-qualifiers at the top of return types, then calling f would return a non-const T, and so f().g() would apply the non-const member T::g to the result of f. If C++ were to heed the cv-qualifiers in return types, then the result of f() would be const-qualified, and so f().g() would call the const member T::g.

The committees found that most existing implementations did not ignore cv-qualifiers in return types. Since the ARM did not suggest they should be ignored, the committees simply decided to affirm what they perceived as existing practice. The C++ draft now contains an explicit footnote stating that

As indicated by the syntax, cv-qualifiers are a significant component in function return types.

It also states elsewhere that

Class rvalues can have qualified types; non-class rvalues always have unqualified types.

Thus, given

T f();
const T fc();

where T is as defined above, then f().g() calls the non-const member T::g, and fc().g() calls the const member T::g. However, given

int h();
const int hc();

calling either h() or hc() returns an rvalue of type int.

Arrays with Unknown Bound

As in C, a C++ program can declare objects that are arrays with unknown bounds, such as

extern int a[];

The type composition rules allow arrays with unknown bounds in other contexts, such as

typedef int UNKA[];

and even

void f(UNKA *p);

The committees decided that functions such as this, that accept a pointer or reference to an array with unknown bound, complicate declaration matching and overload resolution rules in C++. The committees agreed that, since such functions have little utility and are fairly uncommon, it would be simplest to just ban them. Hence, the C++ draft now states:

If the type of a parameter includes a type of the form pointer to array of unknown bound of T or reference to array of unknown bound of T, the program is ill-formed.

Implicit int

Like C, C++ as in the ARM employs the "implicit int" rule — you can omit the type specifier from certain declarations, in which case the type defaults to int. For example,

const N = 10;

defines N as a constant with type int, and

extern f();

declares a function f that returns an int.

Surprisingly, C++ as described in the ARM allows implicit int in places that even ISO C does not. For example, the rules in the ARM permit

f(int);

as the declaration for a function accepting an int and returning an int. This is not valid C.

Implicit int complicates parsing, which in turn, complicates error detection and recovery. Since the dawn of man, WG21 and X3J16 members had been making noises about banning implicit int from C++. But it took the committees many meetings to work up the collective courage to do anything about it.

Late in 1993, the committees started chipping away at implicit int. They agreed to do the following:

ban implicit int wherever ISO C does.
ban implicit int in a typedef. For example,
```
typedef T;
```
ceased to be valid C++, even though it still is valid C.
deprecate all other uses of implicit int. (A deprecated feature is one that's on the chopping block. It may be removed from a future standard.)

The C++ standards committees remained reluctant to ban implicit int altogether as long as it remained alive in C. Late last year, Thomas Plum, the liason between the C and C++ committees, presented the issue to the C committee, WG14. He reported back to WG21 and X3J16 that there is considerable support in WG14 to either ban or at least deprecate implicit int. He added that if WG21 and X3J16 elected to ban implicit int, WG14 would not complain.

Once relieved of their inhibitions, the C++ committees voted to ban implicit int altogether. The draft C++ standard now states:

Only in function-definitions and in function declarations for constructors, destructors, and type conversions can the decl-specifier-seq be omitted.

The decl-specifier-seq is the sequence of one or more storage class specifiers, cv-qualifiers, or type specifiers at the beginning of a declaration. For example, in

extern const unsigned int *p;

the decl-specifier-seq is extern const unsigned int.

But that rule is not sufficient to complete the ban. The draft also states:

At least one type-specifier is required in a typedef declaration. At least one type-specifier is required in a function declaration unless it declares a constructor, destructor or type conversion operator.

One possibly unpopular consequence of the ban is that

main()
{
    ...
}

is no longer valid C++. I, for one, am delighted.

Initializers of the Form T()

In C++, you can construct temporary objects on-the-fly using function-style casts of the form

simple-type-specifier ( expression-list-opt )

For example, in

complex z;
...
z = 2 * complex(3, 1) + 1;

complex(3, 1) is a function-style cast that yields an object of type complex initialized by a constructor that accepts 3 and 1 as actual arguments.

The expression list in a function-style is optional. According to the ARM:

If the type is a class with a suitably declared constructor that constructor will be called; otherwise the result is an undefined value of the specified type.

For example, if class complex has a default constructor, then

z = complex();

initializes a temporary complex object using its default constructor, and copies that object to z (by assignment). On the other hand,

int i;
...
i = int();

assigns an unspecified value to i.

Bjarne Stroustrup, the inventor of C++, suggested changing the behavior of function-style casts so that T() has a specified value for every type T. In particular, he suggested that T() should yield the value that a static T object has by default (typically zero for scalars). For example, int() yields zero because that's the default initial value for i defined as

static int i;

Stroustrup requested this change because he wanted to be able to write template classes that construct all their members with sane values. For example, given

template <class T>
class C
    {
public:
    C();
    //...
private:
    T m;
    int i;
    };

the constructor

C::C() : i(0), m(T()) { }

uses T() to specify a reasonable initial value for member m using any type T, be it a class or scalar. The committees adopted the change.

Earlier this year, the committees refined the rules for T() and extended them to cover T() occurring in a new expression, such as

T *p = new T();

and to cover a ctor-initializer of the form T(), as in

X::X() : T() { }

The draft now describes initialization for static objects differently than the ARM. The intent is to clarify the behavior described in the ARM, not to change it. The draft says:

The storage for objects with static storage duration is zero-initialized before any other initialization takes place.

It later explains that:

To zero-initialize storage for an object of type T means:

if T is a scalar or pointer-to-member type, the storage is set to the value of 0 (zero) converted to T;

if T is a non-union class type, the storage for each nonstatic data member and each base-class subobject is zero-initialized;

if T is a union type, the storage for its first nonstatic data member is zero-initialized;

if T is an array type, the storage for each element is zero-initialized;

if T is a reference type, no initialization is performed.

This just spells out in greater detail what it means to initialize static storage to zero.

The draft then describes the meaning of the function-style cast T() in terms of default initialization:

The expression T(), where T is a simple-type-specifier, creates an rvalue of the specified type, whose value is determined by default-initialization.

where default initialization is defined in terms of zero initialization:

To default-initialize an object of type T means:

if T is a non-POD class type, the default constructor for T is called (and the initialization is ill-formed if T has no accessible default constructor);

if T is an array type, each element is default-initialized;

otherwise, the storage for the object is zero-initialized.

(A POD is a class with no user-declared constructors, no private or protected members, no base classes, no virtual functions, references and no pointers to members. I described POD types in "Stepping Up to C++: Even More Minor Enhancements," CUJ, May 1995.)

Essentially, the above rules specify in detail the behavior for T() that Stroustrup requested. In short, T() either calls a constructor or zero-initializes the temporary as if it were static.

The draft now also makes a subtle distinction between

T *p = new T; // 1

and

T *p = new T(); // 2

If T is a non-POD class type, these two forms for a new expression have the same effect. This is not a change from the ARM. But, if T is a POD type or a scalar, then // 1 above leaves the allocated object uninitialized, while // 2 applies default-initialization to the object. This is a change.

The draft makes a similar distinction between omitting a ctor-initializer and writing an explicit initializer of the form T(). For example, given

class X
    {
public:
    X();
private:
    T m;
    };

under the current rules of C++, if T is a non-POD class type, then it doesn't matter whether you define the constructor as

X::X() { }       // 3

or as

X::X() : m() { } // 4

In either case, X() applies T's default constructor to m. This is not a change. However, if T is a POD type or a scalar, then // 3 above leaves m uninitialized, but // 4 applies default initialization to m. This is a change.

References

[1] Margaret A. Ellis and Bjarne Stroustrup, The Annotated C++ Reference Manual (Addison-Wesley, 1990).

Stepping Up To C++

Other Assorted Changes, Part 3

CV-Qualifiers in Parameter Types

CV-Qualifiers on Return Types

Arrays with Unknown Bound

Implicit int

Initializers of the Form T()

References

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Stepping Up To C++

References

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content