Channels ▼

Mark Nelson

Dr. Dobb's Bloggers

C++11 Hash Containers and Debug Mode

November 29, 2011

Microsoft has never been a slacker in the C++ department — it has always worked hard to provide a top-notch, compliant product. Visual Studio 10 supports its current incarnation, and for the most part it is up to Microsoft's usual standards. It's a great development environment, and I am a dedicated user, but I have to give Microsoft a demerit in one area: Its C++11 hash containers have some serious performance problems — so much that the debug versions of the containers may well be unusable in your application.

Background

I first noticed the problem with unordered_map when I was working on the code for my updated LZW article. I found that when running in the debugger, my program would hang after exiting the compression routine. A little debugging showed that the destructor for my hash table was taking a long time to run. And by a long time, I mean it was approaching an hour.

Destroying a hash table didn't seem to be a complicated task, so I decided to see if I could come up with a reasonable benchmark. I wrote a test program that does a simple word frequency count. As a starter data set, I used the first one million whitespace-delimited words in the 2010 CIA factbook, as published by Project Gutenberg. This data set yields 74,208 unique tokens.

I wrote a simple test rig that I used to test the word count program using four different containers:

  • unordered_map indexed by std::string
  • unordered_map indexed by std::string *
  • map indexed by std::string
  • map indexed by std::string *

Testing with std::string * reduces the cost of copying strings into the hash table as it was filled, and then reduces the cost of destroying those strings when the table was destroyed.

I ran tests against map, expecting to see a pretty big difference in performance. Because map is normally implemented using a balanced binary tree structure, it has O(log(N)) performance on insertions. A sparsely populated hash table can have O(1) performance. By using fairly large data sets, I expected to see a big difference between the two.

I tried to eliminate a few obvious sources for error in my test function — and I used a template function so that I could use the same code on all the different container types:

template<class CONTAINER, class DATA>
void test( const DATA &data, const char *test_name )
{
  std::cout << "Testing container: " << test_name << std::endl;

#ifdef _DEBUG
  const int passes = 2;
#else
  const int passes = 10;
#endif
  double fill_times = 0;
  double delete_times = 0;
  size_t entries;
  for ( int i = 0 ; i < passes ; i++ ) {
    CONTAINER *container = new CONTAINER();
    std::cout << "Filling... " << std::flush;
    clock_t t0 = clock();
    for ( auto ii = data.begin() ; ii != data.end() ; ii++ ) 
      (*container)[*ii]++;
    double span = double(clock() - t0)/CLOCKS_PER_SEC;
    fill_times += span;
    entries = container->size();
    std::cout << " " << span << " Deleting... " << std::flush;
    t0 = clock();
    delete container;
    span = double(clock() - t0)/CLOCKS_PER_SEC;
    delete_times += span;
    std::cout << span << " " << std::endl;
  }
  std::cout << "Entries: " << entries 
            << ", Fill time: " << (fill_times/passes) 
            << ", Delete time: " << (delete_times/passes) 
            << std::endl;
}

I didn't go overboard when it came to instrumenting this problem, I just used the timing functions built into the C++ library. On my Windows and Linux test systems, the values of CLOCKS_PER_SEC are both high enough that I'm not worried about granularity issues.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video