The New C++ Not-So-Standard Library

By Pete Becker, June 01, 2005

In the first installment of this new column, Pete Becker examines the ISO standardization process and how it applies to the C++ Standard and TR1, as well as the templates and classes that make up TR1.

June, 2005: The New C++ Not-So-Standard Library

Pete Becker is a software developer employed by Dinkumware Ltd., where he works on standard library implementation and documentation for C, C++, and Java. He is project editor for the C++ Standard, and currently writing a book about TR1. He can be contacted at [email protected].

TR1, the Technical Report on C++ Library Extensions, provides a grab bag of templates, classes, and functions that supplement those currently in the C++ Standard Library. Some are more powerful or more flexible versions of existing templates, some are analogs of existing templates with various new properties, some bring the library more in line with the 1999 changes to the Standard C Library, and some provide wholly new features. The TR1 Library is not, however, part of the C++ Standard, so compilers that conform to the Standard are not required to provide it.

TR1 extends the Standard C++ Library with new facilities in these areas:

Utilities. A reference-counted smart pointer; a class template, tuple, that generalizes the std::pair class template to handle various numbers of arguments.
Function Objects. More powerful and more flexible templates to provide wrappers around functions, member functions, and other function objects, allowing them to be used more easily as arguments to algorithms.
Type Traits. A set of templates that extract properties of types or modify types at compile time, simplifying template metaprogramming.
Numerics. A rich set of random number generators; advanced mathematical special functions.
Containers. Sets and maps implemented with hash tables, a fixed-size array.
Regular Expressions. Classes and functions to describe and search for patterns in text.
C Compatibility. Numeric facilities, types, functions, and macros such as those added to the C language in 1995 and 1999.

This column will examine the ISO standardization process and how it applies to the C++ Standard and TR1. We'll then look at the templates and classes that make up TR1. Over the next 15 months or so, those templates and classes will be discussed in more detail.

The ISO Standardization Process

ISO technical standards are written by committees of volunteers who meet regularly, typically for several years. For C++, this committee is ISO Working Group 21, which is part of Subcommittee 22 of ISO's Joint Technical Committee 1. In "standardese" that's JTC1/SC22/WG21, or WG21 for short. In addition, there's the American National Standards Institute's J11, which has become less important over the years, as the focus within the standards committee shifted from creating an American standard that would then become an international standard to directly creating an international standard [1].

Working Group decisions are made by consensus. That means that there must be general agreement on a technical decision; the vote of a bare majority is not sufficient. In WG21, most of the work of developing a consensus is done in subgroups. Part of the responsibility of the subgroup chairs is to make sure that there is broad agreement on matters that will be brought before the committee for a vote. Most of the time, if something is controversial, it's not ready for a decision. Further discussion has usually led to deeper technical insights, and, in turn, to resolving the controversy.

Work on the first C++ Standard began in 1991, and it was approved by ISO in 1998 [2]. Like most technical specifications, it had some errors—some merely typographical, some where the text wasn't clear enough, and some where the text was simply wrong. Even when the text was clear and correct, there were places where we later realized that a simple change could make the language or the library more powerful, easier to use, or both. ISO has formal procedures for addressing both of these forms of problems. Errors in standards are addressed through Defect Reports (DRs); enhancements are typically addressed through a Technical Corrigendum (TC) or a revised standard.

As the name suggests, when someone tells the standards committee that there's a defect in the standard, the result is a Defect Report. ISO has stringent rules for tracking and responding to DRs; every DR must get a response, even if it's only the minimal NAD (that's our abbreviation for "Not A Defect"). Even then, we usually try to explain why we think it's not a problem. There are documents available on the WG21 web site that give the status of all the DRs that have been submitted to the standards committee [3].

In the Library Working Group, we adopted a rather narrow definition of a defect; if the words in the standard said what we meant them to say, and what they said was technically feasible, we wouldn't change the standard. That's not to say that such a change didn't have merit, just that it wasn't something we were willing to do in the form of a DR. We felt that such substantive changes should be made in a subsequent major revision of the standard.

In 2003, the standards committee issued a revised standard that included all of the changes made in response to DRs up to that time [4]. That's a bit unusual; the more common approach is to issue a TC that lists the changes made to the existing standard. For the C++ Standard, this list would have been quite long, and we decided it would be better to integrate the changes and issue a revised Standard.

Also in 2003, we started working on the next revision of the C++ Standard. That's a long-term project, with no set completion date and, as yet, no established list of new features. That's all okay, though, because ISO rules require at least five years between revisions of a standard, so there's plenty of time remaining before we're allowed to change the 2003 version of the Standard.

Five years may seem like a long time, especially if you've bought into the hype about working in Internet time. But we have to live in the real world, and for programming languages, that means we need to allow time for compilers to catch up with new language requirements, for libraries to be written to match the new specifications, and for users, implementors, and educators to understand the new standard. Only then can we evaluate its strengths and weaknesses, and focus properly on what needs to be changed and what needs to be added [5]. Making changes too soon runs the risk that the changes won't be right, necessitating further changes.

This delay doesn't mean the language has to stagnate, though. The standards committee can write Technical Reports that, among other things, indicate possible future directions for the standard. However, a TR does not change the standard. If your compiler comes with an implementation of TR1, use it. But be aware that if you change compilers, your code may no longer work; you can complain to your new compiler vendor that they don't support TR1, but you cannot, on that basis, complain that they do not conform to the C++ Standard.

Utilities

The class template shared_ptr<Ty> provides a nonintrusive reference-counted smart pointer to a resource of type Ty. When the last shared_ptr object that owns a given resource is destroyed, the resource will be destroyed. In some situations, this can simplify resource management. For example, programmers often want to create containers that hold pointers to resources. The containers in the Standard Library weren't designed to do that. If you create a vector<int*>, you can populate it with int objects allocated off the heap. However, when the container is destroyed, the memory for those objects won't be freed, and if you haven't provided some way of deleting those objects, you've got a memory leak. With a vector<shared_ptr<int> >, however, if you don't have any other shared_ptr objects that own the resources, when the container is destroyed the resources will also be destroyed. For example, Listing 1 produces this output:

  creating one object
   created object at: 00320A38
  calling use_vector
  creating two objects
   created object at: 00320AF8
   created object at: 00320AE8
   destroyed object at: 00320AF8
   destroyed object at: 00320AE8
  returned from use_vector
   destroyed object at: 00320A38

The instrumented objects created in the function use_vector are destroyed when the vector object is destroyed. The object created in main is not destroyed until main returns, even though a copy of its pointer was put into the vector.

The class template tuple is a generalization of the template std::pair. It lets you create a type with an arbitrary [6] number of contained objects of various types. As with std::pair, there is also a function template, make_tuple, that creates a tuple object that holds copies of the arguments passed to the function. As you can see in Listing 2, the syntax for getting at the members of a tuple object is rather different from the syntax for getting at the members of a pair object. With a pair object, you use the member names first and second; this doesn't extend very well to arbitrary numbers of members, so with a tuple object, you pass the tuple object as an argument to the function template get<n>, where n designates the element that you are interested in. If it's done right, the compiler can generate the code for get<n> inline, and the resulting code is the same as the code that would be generated to access a member of a struct by name.

The class template reference_wrapper, as its name suggests, provides a wrapper type that acts pretty much like a reference to another object. The big difference between a reference_wrapper object and a reference is that reference_wrapper objects can be copied and assigned to. As with tuple and pair, there are functions to create reference_wrapper objects that refer to their argument. The function call ref(x) returns an object of type reference_wrapper<Ty>, where Ty is the type of the function argument x. The object acts like a reference to x. The function call cref(x) is similar, but it returns an object of type reference_wrapper<const Ty>. In Listing 3, note that assigning to an ordinary reference changes the value of the object to which the reference points. Assigning to a reference_wrapper changes the reference_wrapper object to refer to the new object. If you need an actual reference to the object referred to by a referece_wrapper, there is a conversion operator, or you can use the member function get.

Function Objects

TR1 provides four new function objects. I've already looked at one of them; reference_wrapper can be called with an appropriate argument list when it holds a reference to a callable object; see Listing 4.

The class template function provides a wrapper type with a function call operator specified by its template argument. Thus, function<float(float, float)> is a type with a function call operator that can be called with two arguments of type float and returns a float. In itself, that's not a big deal; the real strength of the function template is that it can hold any callable object that can be called with arguments of the specified types and whose return type can be converted to the specified return type; see Listing 5.

TR1 doesn't specify the details of the types of the other two new function objects. Instead, it specifies template functions that return objects of these new types. You can't name the type in your code, so you use these as function objects that you pass to template functions.

The function bind, like std::bind1st and std::bind2nd, creates a function object that can take a different number of arguments in a different order from the callable object that it wraps. When you call bind, you call it with an argument list consisting of values and placeholders [7]. The function returns an object with a function call operator; when you call the function call operator, the values in the original call are passed to the wrapped object, and the placeholders are replaced by the values passed in the call to the function call operator; see Listing 6.

The function mem_fn provides a generalized wrapper for pointers to member functions. The first argument to the resulting function object is the object that the member function should be applied to. It can be the name of an object, a reference to an object, a pointer to an object, or a smart pointer to an object. The remaining arguments are the named arguments to the member function; see Listing 7. The other three kinds of function objects can also hold pointers to member functions and treat their argument lists the same way as mem_fn.

Type Traits

Sometimes in template programming you need to look at the properties of a type that was passed as a type argument. TR1 provides a set of "type predicates" that tell your code whether their type argument is an integral type, a floating-point type, and so on, or whether their argument has, for example, a const or a volatile qualifier. It also provides binary type predicates that tell your code whether two types are the same, whether one type is convertible to another, and whether one type is a base class of another type. You can use "type transformations" to create new types that are related to a given type by adding or removing const or volatile qualifiers, getting the type that a pointer points to, and so on. While many of the things that can be done with type traits can also be done with function overloading, type traits give you compile-time results that can often be used to simplify coding in ways that overloading can't. The trick in Listing 8 is that type predicates are derived from true_type when the predicate is true, or from false_type when the predicate is false. Thus, the function copy calls the version of do_copy that uses memcpy when it is called with pointers to a type that has a trivial assignment operator; otherwise, it calls the version that copies each element separately.

Numerics

TR1 adds 23 mathematical special functions, each with float, double, and long double overloads, for computing irregular modified cylindrical Bessel functions, confluent hypergeometric functions, elliptical integrals of the second kind, and so on. If you don't know what these names mean, you don't need them. If you write code for scientific or engineering applications, you've probably used some or all of them.

If you write code that uses random number generators, TR1 will interest you. It has four class templates that define uniform random number generators. The template linear_congruential implements the most common form of generator, which uses the recurrence relation x(i)=(A*x(i-1)+C) mod M. The template mersenne_twister implements a more complex generator that usually provides random numbers with better distribution characteristics. The templates subtract_with_carry and subtract_with_carry_01 use a simple subtraction algorithm to generate fairly good random numbers; the first generates integral values and the second generates floating-point values.

All of these generators produce values that are uniformly distributed; that is, the probability of getting a value in any given subrange of all the generator's possible values is the same as that of getting a value in any other subrange of the same size. Sometimes you need a different distribution. For example, if you toss a coin 30 times, you'll get roughly 15 heads and 15 tails. The actual number you get if you repeat those 30 tosses many times fits a binomial distribution. If you're writing code to simulate sets of 30 coin tosses, you can do each toss separately, or you can save time by generating random numbers in a binomial distribution. TR1 provides nine templates that implement various distributions; you can combine any of the four generators (or a generator of your own) with any of the nine distributions (or a distribution of your own); see Listing 9.

Containers

TR1 adds four templates, named unordered_set, unordered_map, unordered_multiset, and unordered_multiset, that meet most of the requirements for associative containers. These are designed to use a hash table internally, providing constant-time access to elements in most cases. It also adds a template named array that is a fixed-size array and satisfies most of the requirements for a sequence container. Like a built-in array object, an array<Ty, N> object can be initialized with an aggregate initializer. Listing 9 uses an array<int, 31> to count the number of trials with a given number of heads.

Regular Expressions

Nearly a quarter of the text in TR1 is devoted to regular expressions. A "regular expression" is a sequence of characters that can match one or more target sequences of characters according to a regular expression grammar. In its simplest form, you use the class regex to store a regular expression, and pass the regex object to one of the three functions regex_match, regex_search, and regex_replace. These functions all take a regex object and a target character sequence. The function regex_match checks whether the regular expression stored in the regex object matches the entire target sequence; regex_search finds the leftmost, longest match in the target character sequence; and regex_replace replaces every matching subsequence with replacement text given by a format string. You can also create two kinds of iterators that let you step through the subsequences of the target sequence that match the regular expression; see Listing 10.

C Compatibility

The C++ Standard was based on the 1990 C Standard [8]. The C Standard was amended in 1995 [9] and revised in 1999 [10]. TR1 adopts the changes that these revisions made to the library, modified somewhat to take advantage of C++ language features. It provides low-level control of floating-point operations, including things such as rounding mode and trap handling [11]. It also adds several functions that provide inverse trigonemetric functions for complex arguments and support for fixed-width integral types, including format specifiers for printf and scanf.

TR1 also matches the changes to the C Standard by adding function overloads that follow the Fortran argument promotion rules for most of the math functions. Currently, a call to the function atan2 with one argument of type float and one argument of type double is ambiguous: All three of its variants, atan2(float, float), atan2(double, double), and atan2(long double, long double), are equally good matches. Under the new rules, atan2(double, double) is the best match. Generally speaking, integral arguments are promoted to double, then all arguments are promoted to the largest floating-point type in the resulting list of argument types. This does what most people would expect without the need for casts to unify the argument types.

Coming Up

Next time, I'll look at the TR1 header <tuple> and the changes that TR1 makes to the standard header <utility> to make std::pair look more like tuple.

References

[1] Pretty much the only remaining vestige of the original dual structure is the requirement for two votes on every technical issue, the first by X3J11 to advise the U.S. representative to ISO on how to vote, followed immediately by the WG21 vote to make the actual decision.
[2] ISO/IEC Standard 14882:1998 (Geneva: International Standards Organization, 1998).
[3] See http:///www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html and its internal links for the core language defect reports, and http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html for the library defect reports.
[4] ISO/IEC Standard 14882:2003 (Geneva: International Standards Organization, 2003).
[5] Hypothetically, things can be removed, too. The usual approach to this is for a revision of the standard to mark those things as "deprecated," which serves as a warning that they might be removed in a future standard. However, even if something is removed from the standard, there will be code that relies on it, and resulting pressure on implementors from users to keep it.
[6] Where "arbitrary" means 10 or fewer. TR1 requires implementations of tuple to support up to 10 members. Implementors are permitted, of course, to allow more.
[7] There's more to it than this, but there is limited space in this column.
[8] ISO/IEC Standard 9899:1990 (Geneva: International Standards Organization, 1990).
[9] ISO/IEC Amendment 1 to Standard 9899:1990 (Geneva: International Standards Organization, 1995).
[10] ISO/IEC Standard 9899:1999 (Geneva: International Standards Organization, 1999).
[11] The floating-point control functions are designed to support the operations required by IEC 60559:1999, Binary floating-point arithmetic for microprocessor systems (Geneva: International Electrotechnical Commission, 1989) (formerly IEC 559:1989).

1 2 3 4 5 6 7 8 9 10 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

The New C++ Not-So-Standard Library

The ISO Standardization Process

Utilities

Function Objects

Type Traits

Numerics

Containers

Regular Expressions

C Compatibility

Coming Up

References

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

The New C++ Not-So-Standard Library

The ISO Standardization Process

Utilities

Function Objects

Type Traits

Numerics

Containers

Regular Expressions

C Compatibility

Coming Up

References

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content