Dr. Dobb's | Standard C | November 01, 1993

Standard C

November 01, 1993
URL:http://www.drdobbs.com/standard-c/184402794

November 1993/Standard C

C++ Library Ground Rules

P.J. Plauger is senior editor of The C Users Journal. He is convenor of the ISO C standards committee, WG14, and active on the C++ committee, WG21. His latest books are The Standard C Library, published by Prentice-Hall, ANSI and ISO Standard C (with Jim Brodie), published by Microsoft Press, and Programming on Purpose (three volumes), published by Prentice-Hall. You can reach him at [email protected].

Introduction

Last month, I discussed some of the history and current politics behind development of the library portion of the standard for programming language C++. (See "Standard C: Developing the Standard C++ Library," CUJ, October 1993.) The joint ANSI/ISO committee X3J16/WG21 is developing that standard. Nominally, the first draft for public review will be out within a year (though more than a few harbor doubts about that schedule).

I continue this month by describing the evolving library standard in more detail. Numerous words have been written to date about the C++ language proper. Yet remarkably little has been said, by comparison, about the library that accompanies a typical C++ translator. I hope to help rectify that imbalance in this and subsequent installments of this column.

I begin by repeating the overall structure of the C++ library draft standard:

(0) introduction, the ground rules for implementing and using the Standard C++ library

(1) the Standard C library, as amended to meet the special requirements of a C++ environment

(2) language support, those functions called implicitly by expressions or statements you write in a C++ program

(3) iostreams, the extensive collection of classes and functions that provide strongly typed I/O

(4) support classes, classes like string and (perhaps) complex that pop up in some form in every library shipped with a C++ compiler

Component (3) iostreams requires by far the most description, followed by component (4) support classes. Component (1), The Standard C library, may actually involve more code than these two combined, but it is also more familiar to many and more widely described. (See, for example references [1] and [2].) For now, I'll just plod ahead and take the components in order. The larger ones will doubtless be the subjects of repeat visits. But the sermon for this month will be confined to (0) introduction.

As we learned with the C Standard, it takes a lot of words to lay down all the ground rules for using a library. It doesn't help matters that words like these have never been generated before for the C++ library. We have to be all the more careful to say things clearly and to get widespread agreement among members of the joint committee.

I encouraged this particular process at the Munich meeting last July by circulating a draft that included about nine pages of such "front matter" for the C++ library. It is a distillation of rules inherited from the C Standard, suitably amended, along with various discussions that have occurred in the Library Working Group (LWG) of the joint committee. An edited form of that draft went out with the post-meeting mailing, for consideration by everybody active in the C++ standardization effort.

Let me emphasize again that these words have not been approved as a formal part of the draft C++ standard. Some have been discussed within the LWG, and on its e-mail reflector, with general agreement. But all are obviously subject to amendment, or even rejection, by the joint committee.

Still, they're currently the only game in town. If you want some hint about how to use various bits of the Standard C++ library, you have to start somewhere.

Library Headers

Take library headers, for example. These are the critters you include, by naming them in #include directives, to get definitions and declarations of library entities into a translation unit. The C Standard spells out several clear rules:

Every macro or type peculiar to the library is defined in one (or more) headers, and every object or function is declared in a library header.
Library definitions and declarations live only in the headers; they appear only after you include the relevant header (but any external names are nevertheless reserved even if you don't include the header).
You can include headers more than once with no harm, or in any order, since the headers do not include each other or depend on each other.

You may be surprised to learn that none of these rules (except, perhaps, part of the second) was widespread before the C Standard was developed. Lots of library functions had no header to declare them — the C committee had to invent homes for them. Headers included each other all the time. Some required that others be included first. The situation was, in short, a non-standard mess. Sorting it out dramatically eased writing portable code.

Much the same sort of mayhem exists today in C++. Sorting it out, however, requires a slightly different set of rules. C++ library classes, for example, refer to each other extensively. So the newer C++-specific headers must include each other in all sorts of combinations. There's no point in even trying to "fix" that behavior. Better to institutionalize it.

So the current (yet to be approved) rules for C++ headers read roughly as follows:

The headers inherited from C, which all have names that end in .h, follow the usual C rules, outlined above.
The newer C++ headers, which have names with no suffix, include no C headers but can include other C++ headers in arbitrary combinations.

As an example, <iostream> is the standard name for the header that declares the streams cin and cout (among other things). The name is a deliberate departure from the commonly used <iostream.h>. It doubtless includes the new header <exception>, so that it can define its own nested exception class ios::failure. But it does not include <stdio.h>, however convenient that might prove to the implementors. (And however common that might be in current practice.)

I emphasize again that these rules are preliminary. Even if widely accepted as sensible, they may have to change because of another significant factor. At the Munich meeting last July, the joint committee voted to accept yet another major addition to C++. The language now has facilities for wrapping a chunk of code inside a namespace, from which you can selectively export names. How these new facilities will affect the library is unknown, at this writing, except in one important regard. One of the strongest arguments for adding namespaces to C++ at this date was to better structure the standard C++ library (and other, vendor-supplied libraries). We may not know yet how to use namespaces in the C++ library, but whether we use them doesn't seem to be an option.

Reserved Identifiers

And that brings us naturally to the topic of reserved identifiers. The C Standard pioneered the business of partitioning namespaces between implementors and users, and for good and proper reason. What had been a small problem with languages like FORTRAN and Pascal became a major problem with C. Suddenly, you could write nontrivial programs and move them among compilers said by half a dozen different vendors. The dialects of C were similar enough, the tricks for writing portable C were easy enough to learn. In fact, the biggest problems in porting code had moved from the language proper to the library.

Yes, everybody provided printf, sqrt, and functions of that ilk. The problem came with the functions that these stalwarts called behind the scenes. Each implementor felt free to use a different set of names for lower-level library support functions, such as read, seek, and domain_error. You learned what names to avoid on system A, then got surprised anew when moving to system B.

So the C Standard reserved whole sets of names for the implementors. Typically, these begin with an underscore, but I won't repeat here all the niggling details. Suffice it to say that a program that makes use of names from this reserved space deserves any problems it encounters. And a program that avoids such reserved names deserves to be free of name collisions.

C++, as usual, faces the same issues as C, but with a twist or two. Essentially the same set of names is reserved to the implementor as in C. And now for the twists:

The Standard C library can be implemented as "alien" C code, decorated with lots of extern "C" qualifiers. Equally, it can be implemented as C++ code. Thus, all the global names from that library are reserved in the space of extern "C" names, but the function names are still not guaranteed to be of that flavor. That effectively rules out declaring any of these functions inline, as permitted by the C Standard. Include the appropriate header, or perish.
C++ cares about function signatures, not just names. A program can overload sqrt, for example, even if it includes <math.h>. Thus, only the global function signatures explicitly defined by the Standard C++ library are reserved. That effectively rules out the use of any masking macros in the Standard C headers, despite what the C Standard normally promises.
C++ classes define lots of names. Even those nested deep inside classes can come to grief if a user defines a macro of the same name. And, of course, you never know how many classes you drag in by including any Standard C++ header. Hence, you'll soon be presented with a list of hundreds of names that you must not use for your own macros in a C++ program.

If you're both a macro freak and a C++ programmer (a heady combination), that last rule may weigh heavily on you. I suggest you cultivate a peculiar naming style for macros, if you haven't already done so. It should avoid leading underscores and involve at least one capital letter.

Blanket Restrictions

The C Standard lists a few blanket restrictions that also apply to C++:

The description of a library function doesn't always say what happens for a funny argument value, such as a null pointer or a pointer into Never Never Land. In such cases, the behavior of the library function is simply undefined.
Functions that expect pointers into arrays have every right to expect suitable pointers. (But remember that any object of type T can be treated as an array of char, of size sizeof (T). And a pointer to any object can be treated as a pointer to an array of such objects, of size 1.)

Implementors are also subject to blanket constraints. They must make all macros that yield integer constants suitable for use in #if expressions, unless explicitly permitted to do less.

Alternate Definitions for Functions

The C++ Standard has a problem not shared by that for C. All functions in the Standard C library are provided by the implementor. A user can replace a library function only with the indulgence of a particular implementation. The operation is not portable. But C++ provides any number of legitimate ways to replace a library function with a user-supplied version:

by providing a definition in the program for certain functions, such as ::operator new(size_t)
by registering a handler function, as with a call to set_new_handler
by overriding a virtual function in a class derived from a library class, such as streambuf::overflow(int)
by overloading the "placement operator new" (::operator new(size_t, void *)) with a signature that can be called in its place, such as with ::operator new(size_t, myclass)

In all such cases, the C++ Standard has a double job to perform. It must describe what you can expect the library-supplied version to do for you. It must also describe what you are obliged to do with any version you supply. These two specifications are highly similar, of course. But they can also be different. (Otherwise, what benefit do you get from writing your own?)

The C++ Standard handles this problem by making a generic distinction. Required behavior is what any version of the replaceable library function must provide. Default behavior is the particular flavor of required behavior you can count on from the version supplied by the library.

Objects Within Classes

Here's another problem in describing things. One of the widely touted benefits of classes is encapsulation. A class has its published interface and its internal implementation. Stored data, for example, might be encoded for efficient storage, or cached, or shared among objects. Any internal implementation that meets the needs of the published interface should be legitimate.

That's all well and good, but how do you talk about a class whose stored data can take so many forms. Descriptions can vary from horribly detailed to hopelessly abstract. Neither of these extremes meets the needs of a language standard.

The draft library standard addresses the problem by applying the now infamous as if rule from the C Standard. It describes each class in terms of a simple, even naive, version of its stored data. That makes many member functions much easier to describe. The library front matter then provides a general disclaimer. It reassures the reader that any alternative implementation is equally permissible, provided it appears from the outside as if it possesses the simple internal data structure.

Functions Within Classes

Member functions present a related problem. The external interface to most library classes is generally on the "thick" side. That's another way of saying that the library tends to spell out lots of different operand types for each operation that involves a class. With a thick interface, the chances are better that typical expressions you write will involve fewer conversions to intermediate forms. Hence, the code will create fewer temporaries and may be smaller and faster.

The library class string, for example, contains a number of operations with similar form:

string& append(const string& str, size_t n = NPOS);
string& assign(const string& str, size_t n = NPOS);
string& insert(size_t pos, const string& str, size_t n = NPOS);
string& remove(size_t pos, size_t n = NPOS);
string& replace(size_t pos, size_t n1, const string& str, size_t n2 = NPOS);

In every case, the last argument can specify a length shorter than the actual length of the string argument. The constant NPOS is a huge value, which you get by default when you want to use the entire string. This definition lets you perform each of the string operations each of two different ways:

   s1.insert(3, s2);
   s1.insert(3, s2, 5);

The alternative is to provide a "thin" interface. Given the constructor:

string(const string& str, size_t n = NPOS);

then each of the member functions above could be declared without the length option, as in:

string& append(const string& str);
string& assign(const string& str);
string& insert(size_t pos, const string& str);
string& remove(size_t pos);
string& replace(size_t pos, size_t n1, const string& str);

The expressions above would then be written:

   s1.insert(3, s2);
   s1.insert(3, string(s2, 5));

You get the same functionality, but with a bit more notation in the source code, and probably more wheel spinning at run time.

On the other hand, sometimes a default argument is not as cheap as it appears. An implementation might want to provide the shorter versions I showed immediately above, perhaps because they're smaller and faster and used far more often than the full-blown versions. (In that case, the original flavors would be written without a default value for the last argument, so the translator can determine which version to call.)

The draft library standard permits this sort of flexibility in implementation. It even allows an implementation to add default arguments not specified in the standard. What this latitude costs the user is a bit of uncertainty. Taking the address of a library member function is now potentially ambiguous. (If you must do so, write a wrapper function that calls the one you want, then take the address of the wrapper.)

This latitude is limited, however. An implementation must not overload member functions to avoid other kinds of conversions. Consider, for example, the effect of adding:

string& insert(size_t pos, const char *s);

One constructor for string converts a const char * to a string, on the assumption that it points at a null-terminated string. It would seem that adding this member function would make for a potentially better implementation. Expressions of the form:

   s1.insert(3, " + ");

would no longer generate a temporary string object along the way.

True enough. But where you cause trouble is in user-defined classes that try to be conscientious. Let's say that your string class mystring offers conversions to both string and const char *. The first conversion lets you mix your flavor of strings with those from the library. The second flavor helps you with the occasional call to the functions in <string.h>. All is well until you move to a helpful implementation with this added member function. Suddenly, you start getting ambiguities reported by the translator. Not so good.

So the latitude granted implementors is specifically limited to playing games with default arguments. That still helps the standard offer a thicker interface without specifying so many different functions. And it lets implementors decide how best to offer the required services.

By the way, the same latitude does not extend to global functions in the library. There are relatively few of these, and many of them are serious candidates for having their addresses taken on a regular basis.

Conclusion

The current front matter has a few other odds and ends of lesser interest. Mostly, these deal with definitions of terms and notational conventions used throughout the library draft. A few concern more esoteric issues of how library classes can be derived from other classes, or how an implementation can represent certain "utility" types (such as ios::fmtflags). What I have focused on here are those issues that most directly affect how you use the Standard C++ library (once it exists, at least).

I find it significant that it takes nine typeset pages of text to spell out such matters. And that it has taken the joint committee four years to get to where it can start worrying about such matters. That, more than anything else, should tell you how desperately the world needs a clear standard for the C++ library.

Bibliography

[1] ISO/IEC 9899:1990, International Standard for Programming Language C.

[2] P.J. Plauger, The Standard C Library, Prentice-Hall, 1992.