C/C++

Sutter's Mill

By Herb Sutter, January 01, 2001

If you are basing your selection of standard containers on memory requirements, then Herb has some bad news and some good news.

Next, consider what happens in the real world under the following assumptions, which happen to be drawn from a popular platform:

Pointers and ints are four bytes long (typical for 32-bit platforms).
sizeof(string) is 16. Note that this is just the size of the immediate string object and ignores any data buffers the string may itself allocate; the number and size of string’s internal buffers will vary from implementation to implementation, but doesn’t affect the comparative results below. (This is more variable; I took the value of one popular implementation.)
The default memory allocation strategy is to use fixed-size allocation where the block sizes are multiples of 16 bytes. (Typical for Microsoft Visual C++.) Table 3 contains a sample analysis with these numbers.

Table 3: Same actual overhead per contained object (implementation-dependent assumptions: sizeof(string) == 16, 4-byte pointers and ints, and 16-byte fixed-size allocation blocks)

You can try this at home; just plug in the appropriate numbers for your platform to see how this kind of analysis applies to your own current environment. To see how to write a program that figures out what the actual block overhead is for allocations of specific sizes on your platform, see Jon Bentley’s classic Programming Pearls, Second edition [2].

Looking at Table 3, we immediately spy one interesting result: For many cases — that is, for about 75% of possible sizes of the contained type T — list and set/multiset actually incur the same memory overhead in this particular environment. What’s more, here’s an even more interesting result: even list<char> and set<int> have the same actual overhead in this particular environment, even though the latter stores more object data and more housekeeping data in each node. If memory footprint is an important consideration for your choice of data structure in specific situations, take a few minutes to do this kind of analysis and see what the difference really is in your own environment — sometimes it’s less than you might think!

Summary

Each kind of container chooses a different space/performance tradeoff. You can do things efficiently with vector and set that you can’t do with list, such as O(log N) searching [3]; you can do things efficiently with vector that you can’t do with list or set, such as random element access; you can do things efficiently with list, less so with set, and more slowly still with vector, such as insertion in the middle; and so on. To get more flexibility often requires more storage overhead inside the container, but after you account for data alignment and memory allocation strategies, the difference in overhead may be significantly different than you’d think! For related discussion about data alignment and space optimizations, see also Item 30 in Exceptional C++ [4].

Acknowledgments

Thanks to Pete Becker for the discussion that got me thinking about this topic.

References

[1] L. Marrie. "Alternating Skip Lists," Dr. Dobbs Journal, 25(8), August 2000.

[2] Jon Bentley. Programming Pearls, Second Edition (Addison-Wesley, 2000).

[3] This is possible if the vector’s contents are sorted.

[4] Herb Sutter. Exceptional C++, 47 Engineering Puzzles, Programming Problems, and Solutions (Addison-Wesley, 2000).

[5] Herb Sutter. “Standard Library News, Part 1: Vectors and Deques,” C++ Report, 11(7), July/August 1999.

[6] Herb Sutter. “When is a Container not a Container?” C++ Report, 11(5), May 1999.

Standards Update

Breaking news at press time: On Friday, October 27, 2000 at the conclusion of the Toronto meeting, the C++ standards committee approved two important milestones:

1. Approved the contents of the C++ Standard’s first TC (Technical Corrigendum). The vote passed unanimously, and this material will become the first official update to the ISO/ANSI C++ Standard pending only a few more months of grinding through the routine publication mechanics and paperwork that the standards bodies require.

One of the interesting changes in the TC is that the Standard will now guarantee that vector storage is contiguous (except of course for the specialization vector<bool>), where "contiguous" means to be stored in the same way as a C-style array; see Figure 2.

Figure 2: What "contiguous" means — typical in-memory layout of an array of n-byte structs that require m-byte alignment

One reason it is important that vector be stored contiguously is so that it can be used easily as a complete drop-in replacement for C-style arrays, even when calling legacy facilities designed to operate on plain arrays; for more details, see my July/August 1999 column in C++ Report [5]. If you’re wondering why vector<bool> would get a seemingly surprising exception from this rule, see also my May 1999 column in C++ Report [6] for the scoop on that juicy little eccentricity.

2. Approved initiation of work toward a Library Extensions Technical Report. The LWG (Library Working Group) and the full committee agree that it’s time we start considering extensions to the standard library; of course, any such extensions won’t appear in an official standard for probably at least three years yet, if not more, but the point is that there are things that the community wants/needs and that we ought to start working into the standard library. Commonly requested items include things like hash-based containers, regular expressions, smart pointers, and expression templates, among other facilities.

Between now and the next meeting (Copenhagen, April 29 - May 4, 2001) we will be drafting an official ISO request for a New Work Item, which essentially translates to "a request for ISO’s blessing/authorization to do this work." I fully expect this request to be drafted and approved by/at the Copenhagen meeting; and after a few more months’ worth of bureaucratic machinery we should be officially in business by summer. Of course, some people have already been starting to work on such facilities in anticipation of this approval; if you haven’t checked out Boost yet, be sure to do so at www.boost.org.

Herb Sutter ([email protected]) is chief technology officer of PeerDirect Inc. and the architect of their heterogeneous database replication products. He is secretary of the ISO/ANSI C++ standards committees, a member of the ANSI SQL committee, and author of the book Exceptional C++ (Addison-Wesley). Copyright © 2000 by Herb Sutter

Previous 1 2 3 4 5 6 7 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

C/C++