Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

Sutter's Mill


Next, consider what happens in the real world under the following assumptions, which happen to be drawn from a popular platform:

  • Pointers and ints are four bytes long (typical for 32-bit platforms).
  • sizeof(string) is 16. Note that this is just the size of the immediate string object and ignores any data buffers the string may itself allocate; the number and size of string’s internal buffers will vary from implementation to implementation, but doesn’t affect the comparative results below. (This is more variable; I took the value of one popular implementation.)
  • The default memory allocation strategy is to use fixed-size allocation where the block sizes are multiples of 16 bytes. (Typical for Microsoft Visual C++.) Table 3 contains a sample analysis with these numbers.


Table 3: Same actual overhead per contained object (implementation-dependent assumptions: sizeof(string) == 16, 4-byte pointers and ints, and 16-byte fixed-size allocation blocks)

You can try this at home; just plug in the appropriate numbers for your platform to see how this kind of analysis applies to your own current environment. To see how to write a program that figures out what the actual block overhead is for allocations of specific sizes on your platform, see Jon Bentley’s classic Programming Pearls, Second edition [2].

Looking at Table 3, we immediately spy one interesting result: For many cases — that is, for about 75% of possible sizes of the contained type Tlist and set/multiset actually incur the same memory overhead in this particular environment. What’s more, here’s an even more interesting result: even list<char> and set<int> have the same actual overhead in this particular environment, even though the latter stores more object data and more housekeeping data in each node. If memory footprint is an important consideration for your choice of data structure in specific situations, take a few minutes to do this kind of analysis and see what the difference really is in your own environment — sometimes it’s less than you might think!

Summary

Each kind of container chooses a different space/performance tradeoff. You can do things efficiently with vector and set that you can’t do with list, such as O(log N) searching [3]; you can do things efficiently with vector that you can’t do with list or set, such as random element access; you can do things efficiently with list, less so with set, and more slowly still with vector, such as insertion in the middle; and so on. To get more flexibility often requires more storage overhead inside the container, but after you account for data alignment and memory allocation strategies, the difference in overhead may be significantly different than you’d think! For related discussion about data alignment and space optimizations, see also Item 30 in Exceptional C++ [4].

Acknowledgments

Thanks to Pete Becker for the discussion that got me thinking about this topic.

References

[1] L. Marrie. "Alternating Skip Lists," Dr. Dobbs Journal, 25(8), August 2000.

[2] Jon Bentley. Programming Pearls, Second Edition (Addison-Wesley, 2000).

[3] This is possible if the vector’s contents are sorted.

[4] Herb Sutter. Exceptional C++, 47 Engineering Puzzles, Programming Problems, and Solutions (Addison-Wesley, 2000).

[5] Herb Sutter. “Standard Library News, Part 1: Vectors and Deques,” C++ Report, 11(7), July/August 1999.

[6] Herb Sutter. “When is a Container not a Container?” C++ Report, 11(5), May 1999.


Standards Update


Breaking news at press time: On Friday, October 27, 2000 at the conclusion of the Toronto meeting, the C++ standards committee approved two important milestones:

1. Approved the contents of the C++ Standard’s first TC (Technical Corrigendum). The vote passed unanimously, and this material will become the first official update to the ISO/ANSI C++ Standard pending only a few more months of grinding through the routine publication mechanics and paperwork that the standards bodies require.

One of the interesting changes in the TC is that the Standard will now guarantee that vector storage is contiguous (except of course for the specialization vector<bool>), where "contiguous" means to be stored in the same way as a C-style array; see Figure 2.

Figure 2: What "contiguous" means — typical in-memory layout of an array of n-byte structs that require m-byte alignment

One reason it is important that vector be stored contiguously is so that it can be used easily as a complete drop-in replacement for C-style arrays, even when calling legacy facilities designed to operate on plain arrays; for more details, see my July/August 1999 column in C++ Report [5]. If you’re wondering why vector<bool> would get a seemingly surprising exception from this rule, see also my May 1999 column in C++ Report [6] for the scoop on that juicy little eccentricity.

2. Approved initiation of work toward a Library Extensions Technical Report. The LWG (Library Working Group) and the full committee agree that it’s time we start considering extensions to the standard library; of course, any such extensions won’t appear in an official standard for probably at least three years yet, if not more, but the point is that there are things that the community wants/needs and that we ought to start working into the standard library. Commonly requested items include things like hash-based containers, regular expressions, smart pointers, and expression templates, among other facilities.

Between now and the next meeting (Copenhagen, April 29 - May 4, 2001) we will be drafting an official ISO request for a New Work Item, which essentially translates to "a request for ISO’s blessing/authorization to do this work." I fully expect this request to be drafted and approved by/at the Copenhagen meeting; and after a few more months’ worth of bureaucratic machinery we should be officially in business by summer. Of course, some people have already been starting to work on such facilities in anticipation of this approval; if you haven’t checked out Boost yet, be sure to do so at www.boost.org.


Herb Sutter ([email protected]) is chief technology officer of PeerDirect Inc. and the architect of their heterogeneous database replication products. He is secretary of the ISO/ANSI C++ standards committees, a member of the ANSI SQL committee, and author of the book Exceptional C++ (Addison-Wesley). Copyright © 2000 by Herb Sutter


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.