C++11 Hash Containers and Debug Mode
Microsoft has never been a slacker in the C++ department — it has always worked hard to provide a top-notch, compliant product. Visual Studio 10 supports its current incarnation, and for the most part it is up to Microsoft's usual standards. It's a great development environment, and I am a dedicated user, but I have to give Microsoft a demerit in one area: Its C++11 hash containers have some serious performance problems — so much that the debug versions of the containers may well be unusable in your application.
Background
I first noticed the problem with unordered_map when I was working on the code for my updated LZW article. I found that when running in the debugger, my program would hang after exiting the compression routine. A little debugging showed that the destructor for my hash table was taking a long time to run. And by a long time, I mean it was approaching an hour.
Destroying a hash table didn't seem to be a complicated task, so I decided to see if I could come up with a reasonable benchmark. I wrote a test program that does a simple word frequency count. As a starter data set, I used the first one million whitespace-delimited words in the 2010 CIA factbook, as published by Project Gutenberg. This data set yields 74,208 unique tokens.
I wrote a simple test rig that I used to test the word count program using four different containers:
unordered_map indexed by std::string
unordered_map indexed by std::string *
map indexed by std::string
map indexed by std::string *
Testing with std::string * reduces the cost of copying strings into the hash table as it was filled, and then reduces the cost of destroying those strings when the table was destroyed.
I ran tests against map, expecting to see a pretty big difference in performance. Because map is normally implemented using a balanced binary tree structure, it has O(log(N)) performance on insertions. A sparsely populated hash table can have O(1) performance. By using fairly large data sets, I expected to see a big difference between the two.
I tried to eliminate a few obvious sources for error in my test function — and I used a template function so that I could use the same code on all the different container types:
template<class CONTAINER, class DATA>
void test( const DATA &data, const char *test_name )
{
std::cout << "Testing container: " << test_name << std::endl;
#ifdef _DEBUG
const int passes = 2;
#else
const int passes = 10;
#endif
double fill_times = 0;
double delete_times = 0;
size_t entries;
for ( int i = 0 ; i < passes ; i++ ) {
CONTAINER *container = new CONTAINER();
std::cout << "Filling... " << std::flush;
clock_t t0 = clock();
for ( auto ii = data.begin() ; ii != data.end() ; ii++ )
(*container)[*ii]++;
double span = double(clock() - t0)/CLOCKS_PER_SEC;
fill_times += span;
entries = container->size();
std::cout << " " << span << " Deleting... " << std::flush;
t0 = clock();
delete container;
span = double(clock() - t0)/CLOCKS_PER_SEC;
delete_times += span;
std::cout << span << " " << std::endl;
}
std::cout << "Entries: " << entries
<< ", Fill time: " << (fill_times/passes)
<< ", Delete time: " << (delete_times/passes)
<< std::endl;
}
I didn't go overboard when it came to instrumenting this problem, I just used the timing functions built into the C++ library. On my Windows and Linux test systems, the values of CLOCKS_PER_SEC are both high enough that I'm not worried about granularity issues.

