Next, consider what happens in the real world under the following assumptions, which happen to be drawn from a popular platform:
- Pointers and
int
s are four bytes long (typical for 32-bit platforms). -
sizeof(string)
is 16. Note that this is just the size of the immediatestring
object and ignores any data buffers thestring
may itself allocate; the number and size ofstring
s internal buffers will vary from implementation to implementation, but doesnt affect the comparative results below. (This is more variable; I took the value of one popular implementation.) - The default memory allocation strategy is to use fixed-size allocation where the block sizes are multiples of 16 bytes. (Typical for Microsoft Visual C++.) Table 3 contains a sample analysis with these numbers.

Table 3: Same actual overhead per contained object (implementation-dependent assumptions: sizeof(string) == 16, 4-byte pointers and ints, and 16-byte fixed-size allocation blocks)
You can try this at home; just plug in the appropriate numbers for your platform to see how this kind of analysis applies to your own current environment. To see how to write a program that figures out what the actual block overhead is for allocations of specific sizes on your platform, see Jon Bentleys classic Programming Pearls, Second edition [2].
Looking at Table 3, we immediately spy one interesting
result: For many cases that is, for about 75% of possible sizes of the
contained type T
list
and set/multiset
actually
incur the same memory overhead in this particular environment. Whats more,
heres an even more interesting result: even list<char>
and
set<int>
have the same actual overhead in this particular environment,
even though the latter stores more object data and more housekeeping data in
each node. If memory footprint is an important consideration for your choice
of data structure in specific situations, take a few minutes to do this kind
of analysis and see what the difference really is in your own environment
sometimes its less than you might think!
Summary
Each kind of container chooses a different space/performance tradeoff. You can do things efficiently with vector
and set
that you cant do with list
, such as O(log N) searching [3]; you can do things efficiently with vector
that you cant do with list
or set
, such as random element access; you can do things efficiently with list
, less so with set
, and more slowly still with vector
, such as insertion in the middle; and so on. To get more flexibility often requires more storage overhead inside the container, but after you account for data alignment and memory allocation strategies, the difference in overhead may be significantly different than youd think! For related discussion about data alignment and space optimizations, see also Item 30 in Exceptional C++ [4].
Acknowledgments
Thanks to Pete Becker for the discussion that got me thinking about this topic.
References
[1] L. Marrie. "Alternating Skip Lists," Dr. Dobbs Journal, 25(8), August 2000.
[2] Jon Bentley. Programming Pearls, Second Edition (Addison-Wesley, 2000).
[3] This is possible if the vector
s contents are sorted.
[4] Herb Sutter. Exceptional C++, 47 Engineering Puzzles, Programming Problems, and Solutions (Addison-Wesley, 2000).
[5] Herb Sutter. Standard Library News, Part 1: Vectors and Deques, C++ Report, 11(7), July/August 1999.
[6] Herb Sutter. When is a Container not a Container? C++ Report, 11(5), May 1999.
Standards Update
Breaking news at press time: On Friday, October 27, 2000 at the conclusion of the Toronto meeting, the C++ standards committee approved two important milestones:
1. Approved the contents of the C++ Standards first TC (Technical Corrigendum). The vote passed unanimously, and this material will become the first official update to the ISO/ANSI C++ Standard pending only a few more months of grinding through the routine publication mechanics and paperwork that the standards bodies require.
One of the interesting changes in the TC is that the Standard will now guarantee that vector
storage is contiguous (except of course for the specialization vector<bool>
), where "contiguous" means to be stored in the same way as a C-style array; see Figure 2.
Figure 2: What "contiguous" means typical in-memory layout of an array of n-byte structs that require m-byte alignment
One reason it is important that vector
be stored contiguously is so that it can be used easily as a complete drop-in replacement for C-style arrays, even when calling legacy facilities designed to operate on plain arrays; for more details, see my July/August 1999 column in C++ Report [5]. If youre wondering why vector<bool>
would get a seemingly surprising exception from this rule, see also my May 1999 column in C++ Report [6] for the scoop on that juicy little eccentricity.
2. Approved initiation of work toward a Library Extensions Technical Report. The LWG (Library Working Group) and the full committee agree that its time we start considering extensions to the standard library; of course, any such extensions wont appear in an official standard for probably at least three years yet, if not more, but the point is that there are things that the community wants/needs and that we ought to start working into the standard library. Commonly requested items include things like hash-based containers, regular expressions, smart pointers, and expression templates, among other facilities.
Between now and the next meeting (Copenhagen, April 29 - May 4, 2001) we will be drafting an official ISO request for a New Work Item, which essentially translates to "a request for ISOs blessing/authorization to do this work." I fully expect this request to be drafted and approved by/at the Copenhagen meeting; and after a few more months worth of bureaucratic machinery we should be officially in business by summer. Of course, some people have already been starting to work on such facilities in anticipation of this approval; if you havent checked out Boost yet, be sure to do so at www.boost.org
.
Herb Sutter ([email protected]) is chief technology officer of PeerDirect Inc. and the architect of their heterogeneous database replication products. He is secretary of the ISO/ANSI C++ standards committees, a member of the ANSI SQL committee, and author of the book Exceptional C++ (Addison-Wesley). Copyright © 2000 by Herb Sutter