P.J. Plauger is senior editor of The C Users Journal. He is convenor of the ISO C standards committee, WG14, and active on the C++ committee, WG21. His latest books are The Standard C Library, published by Prentice-Hall, ANSI and ISO Standard C (with Jim Brodie), published by Microsoft Press, and Programming on Purpose (three volumes), published by Prentice-Hall. You can reach him at [email protected]
I continue this month by describing the evolving library standard in more detail. Numerous words have been written to date about the C++ language proper. Yet remarkably little has been said, by comparison, about the library that accompanies a typical C++ translator. I hope to help rectify that imbalance in this and subsequent installments of this column.
I begin by repeating the overall structure of the C++ library draft standard:
(0) introduction, the ground rules for implementing and using the Standard C++ library
(1) the Standard C library, as amended to meet the special requirements of a C++ environment
(2) language support, those functions called implicitly by expressions or statements you write in a C++ program
(3) iostreams, the extensive collection of classes and functions that provide strongly typed I/O
(4) support classes, classes like string and (perhaps) complex that pop up in some form in every library shipped with a C++ compiler
Component (3) iostreams requires by far the most description, followed by component (4) support classes. Component (1), The Standard C library, may actually involve more code than these two combined, but it is also more familiar to many and more widely described. (See, for example references  and .) For now, I'll just plod ahead and take the components in order. The larger ones will doubtless be the subjects of repeat visits. But the sermon for this month will be confined to (0) introduction.
As we learned with the C Standard, it takes a lot of words to lay down all the ground rules for using a library. It doesn't help matters that words like these have never been generated before for the C++ library. We have to be all the more careful to say things clearly and to get widespread agreement among members of the joint committee.
I encouraged this particular process at the Munich meeting last July by circulating a draft that included about nine pages of such "front matter" for the C++ library. It is a distillation of rules inherited from the C Standard, suitably amended, along with various discussions that have occurred in the Library Working Group (LWG) of the joint committee. An edited form of that draft went out with the post-meeting mailing, for consideration by everybody active in the C++ standardization effort.
Let me emphasize again that these words have not been approved as a formal part of the draft C++ standard. Some have been discussed within the LWG, and on its e-mail reflector, with general agreement. But all are obviously subject to amendment, or even rejection, by the joint committee.
Still, they're currently the only game in town. If you want some hint about how to use various bits of the Standard C++ library, you have to start somewhere.
- Every macro or type peculiar to the library is defined in one (or more) headers, and every object or function is declared in a library header.
- Library definitions and declarations live only in the headers; they appear only after you include the relevant header (but any external names are nevertheless reserved even if you don't include the header).
- You can include headers more than once with no harm, or in any order, since the headers do not include each other or depend on each other.
Much the same sort of mayhem exists today in C++. Sorting it out, however, requires a slightly different set of rules. C++ library classes, for example, refer to each other extensively. So the newer C++-specific headers must include each other in all sorts of combinations. There's no point in even trying to "fix" that behavior. Better to institutionalize it.
So the current (yet to be approved) rules for C++ headers read roughly as follows:
- The headers inherited from C, which all have names that end in .h, follow the usual C rules, outlined above.
- The newer C++ headers, which have names with no suffix, include no C headers but can include other C++ headers in arbitrary combinations.
I emphasize again that these rules are preliminary. Even if widely accepted as sensible, they may have to change because of another significant factor. At the Munich meeting last July, the joint committee voted to accept yet another major addition to C++. The language now has facilities for wrapping a chunk of code inside a namespace, from which you can selectively export names. How these new facilities will affect the library is unknown, at this writing, except in one important regard. One of the strongest arguments for adding namespaces to C++ at this date was to better structure the standard C++ library (and other, vendor-supplied libraries). We may not know yet how to use namespaces in the C++ library, but whether we use them doesn't seem to be an option.
Yes, everybody provided printf, sqrt, and functions of that ilk. The problem came with the functions that these stalwarts called behind the scenes. Each implementor felt free to use a different set of names for lower-level library support functions, such as read, seek, and domain_error. You learned what names to avoid on system A, then got surprised anew when moving to system B.
So the C Standard reserved whole sets of names for the implementors. Typically, these begin with an underscore, but I won't repeat here all the niggling details. Suffice it to say that a program that makes use of names from this reserved space deserves any problems it encounters. And a program that avoids such reserved names deserves to be free of name collisions.
C++, as usual, faces the same issues as C, but with a twist or two. Essentially the same set of names is reserved to the implementor as in C. And now for the twists:
- The Standard C library can be implemented as "alien" C code, decorated with lots of extern "C" qualifiers. Equally, it can be implemented as C++ code. Thus, all the global names from that library are reserved in the space of extern "C" names, but the function names are still not guaranteed to be of that flavor. That effectively rules out declaring any of these functions inline, as permitted by the C Standard. Include the appropriate header, or perish.
- C++ cares about function signatures, not just names. A program can overload sqrt, for example, even if it includes <math.h>. Thus, only the global function signatures explicitly defined by the Standard C++ library are reserved. That effectively rules out the use of any masking macros in the Standard C headers, despite what the C Standard normally promises.
- C++ classes define lots of names. Even those nested deep inside classes can come to grief if a user defines a macro of the same name. And, of course, you never know how many classes you drag in by including any Standard C++ header. Hence, you'll soon be presented with a list of hundreds of names that you must not use for your own macros in a C++ program.
- The description of a library function doesn't always say what happens for a funny argument value, such as a null pointer or a pointer into Never Never Land. In such cases, the behavior of the library function is simply undefined.
- Functions that expect pointers into arrays have every right to expect suitable pointers. (But remember that any object of type T can be treated as an array of char, of size sizeof (T). And a pointer to any object can be treated as a pointer to an array of such objects, of size 1.)
- by providing a definition in the program for certain functions, such as ::operator new(size_t)
- by registering a handler function, as with a call to set_new_handler
- by overriding a virtual function in a class derived from a library class, such as streambuf::overflow(int)
- by overloading the "placement operator new" (::operator new(size_t, void *)) with a signature that can be called in its place, such as with ::operator new(size_t, myclass)
The C++ Standard handles this problem by making a generic distinction. Required behavior is what any version of the replaceable library function must provide. Default behavior is the particular flavor of required behavior you can count on from the version supplied by the library.
That's all well and good, but how do you talk about a class whose stored data can take so many forms. Descriptions can vary from horribly detailed to hopelessly abstract. Neither of these extremes meets the needs of a language standard.
The draft library standard addresses the problem by applying the now infamous as if rule from the C Standard. It describes each class in terms of a simple, even naive, version of its stored data. That makes many member functions much easier to describe. The library front matter then provides a general disclaimer. It reassures the reader that any alternative implementation is equally permissible, provided it appears from the outside as if it possesses the simple internal data structure.
The library class string, for example, contains a number of operations with similar form:
string& append(const string& str, size_t n = NPOS); string& assign(const string& str, size_t n = NPOS); string& insert(size_t pos, const string& str, size_t n = NPOS); string& remove(size_t pos, size_t n = NPOS); string& replace(size_t pos, size_t n1, const string& str, size_t n2 = NPOS);In every case, the last argument can specify a length shorter than the actual length of the string argument. The constant NPOS is a huge value, which you get by default when you want to use the entire string. This definition lets you perform each of the string operations each of two different ways:
s1.insert(3, s2); s1.insert(3, s2, 5);The alternative is to provide a "thin" interface. Given the constructor:
string(const string& str, size_t n = NPOS);then each of the member functions above could be declared without the length option, as in:
string& append(const string& str); string& assign(const string& str); string& insert(size_t pos, const string& str); string& remove(size_t pos); string& replace(size_t pos, size_t n1, const string& str);The expressions above would then be written:
s1.insert(3, s2); s1.insert(3, string(s2, 5));You get the same functionality, but with a bit more notation in the source code, and probably more wheel spinning at run time.
On the other hand, sometimes a default argument is not as cheap as it appears. An implementation might want to provide the shorter versions I showed immediately above, perhaps because they're smaller and faster and used far more often than the full-blown versions. (In that case, the original flavors would be written without a default value for the last argument, so the translator can determine which version to call.)
The draft library standard permits this sort of flexibility in implementation. It even allows an implementation to add default arguments not specified in the standard. What this latitude costs the user is a bit of uncertainty. Taking the address of a library member function is now potentially ambiguous. (If you must do so, write a wrapper function that calls the one you want, then take the address of the wrapper.)
This latitude is limited, however. An implementation must not overload member functions to avoid other kinds of conversions. Consider, for example, the effect of adding:
string& insert(size_t pos, const char *s);One constructor for string converts a const char * to a string, on the assumption that it points at a null-terminated string. It would seem that adding this member function would make for a potentially better implementation. Expressions of the form:
s1.insert(3, " + ");would no longer generate a temporary string object along the way.
True enough. But where you cause trouble is in user-defined classes that try to be conscientious. Let's say that your string class mystring offers conversions to both string and const char *. The first conversion lets you mix your flavor of strings with those from the library. The second flavor helps you with the occasional call to the functions in <string.h>. All is well until you move to a helpful implementation with this added member function. Suddenly, you start getting ambiguities reported by the translator. Not so good.
So the latitude granted implementors is specifically limited to playing games with default arguments. That still helps the standard offer a thicker interface without specifying so many different functions. And it lets implementors decide how best to offer the required services.
By the way, the same latitude does not extend to global functions in the library. There are relatively few of these, and many of them are serious candidates for having their addresses taken on a regular basis.
I find it significant that it takes nine typeset pages of text to spell out such matters. And that it has taken the joint committee four years to get to where it can start worrying about such matters. That, more than anything else, should tell you how desperately the world needs a clear standard for the C++ library.
 ISO/IEC 9899:1990, International Standard for Programming Language C.
 P.J. Plauger, The Standard C Library, Prentice-Hall, 1992.