Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

C++ STL Hash Containers and Performance


How the Code Works

Here's how this code works. Conceptually, you are placing key.first into the upper 16 bits of the size_t and key.second into the lower 16 bits. To do this, you use the bitwise << operator to shift key.first the appropriate number of bits to the left. The number of bits to shift is half the number of bits in our key type, which you calculate with the formula sizeof(unsigned int)*8/2. Next, you use ^, the bitwise XOR operator, to put key.second into the lower half of the hash value. I have chosen to use XOR instead of OR because if key.second grows beyond 16 bits, XOR yields hash values that are more unique than OR, since OR sets the 17th bit to 1 for all values of key.second larger than 65536. ShiftedPairHasher is quite fast: It does just one left shift operation and one XOR operation. The compiler optimizes the sizeof(unsigned int)*8/2 to a constant at compile time. This same concept can be extended if you want to hash more than two values simply by shifting each value the appropriate number of bits and then XORing it with the other shifted values.

With proper hashers, hash containers can give you O(1) access time. Nonetheless, even if you are able to create excellent hash functions, there are certain cases when nonhashed containers actually give better runtime performance. Consider the case where you create a huge number of containers that hold only a few values, and you will be doing only a few lookups on each container (Listing Five). If you typedef TMap to a regular map in doMapBenchmark(), this program takes about 5.9 seconds to execute (all timings were generated on a Pentium 4 2.8 GHz). But if you change TMap to be a hash_map, the same program takes nearly 16.5 seconds. The reason for this 280 percent increase in runtime lies in how memory is allocated when an application instantiates hash maps and maps.

#include <iostream>
#include <ext/hash_map>
#include <map>
int doMapBenchmark() {
  typedef __gnu_cxx::hash_map<int,int> TMap;
  //typedef std::map<int,int> TMap;
  std::vector<TMap> myMaps;
  int mySum=0;
  for(int mapCnt=0; mapCnt<1000000; ++mapCnt) {
    TMap myMap;
    for(int i=0; i<10; ++i)
      myMap.insert(std::make_pair(i,i+1));
    mySum=myMap.find(0)->second+myMap.find(1)->second;
    myMaps.push_back(myMap);
  }
  return mySum;
}

Listing Five

Because maps are implemented as binary trees, the map constructor only needs to create a few pointers when the application instantiates a new map. But hash maps are implemented as arrays of linked lists. When you create a new hash map, by default, the STL creates 193 buckets and inserts a NULL pointer into each one. If you are creating a large number of hash maps, this can take a long time. The problem can be somewhat mitigated by asking for a smaller number of buckets. In the benchmark program (Listing Five), you can change "TMap myMap;" to "TMap myMap(0);". Although you won't get zero buckets, you'll get the smallest number of buckets that your STL's implementation offers; in the case of the SGI STL it's 53 buckets. Making this change reduces the benchmark's wall-clock time to about 8.8 seconds—nearly a 50 percent decrease in runtime.

It is also important to consider how long it takes to compute the hash function relative to the function's uniqueness. For instance, if you have two hash functions—one of which produces many hash collisions but is much faster to calculate—and another—which is slower but produces very few collisions—it may make sense to choose the faster one even though this leads to scaling that is closer to O(n) than O(1). For instance, this kind of strategy may make sense when dealing with long strings. Because the default string hasher iterates over each character in the string when calculating the hash, the lookups become O(l) where l is the length of the string. If the characters of the string are well distributed, wall-clock time may be improved by hashing fewer characters, as in Listing Six (available in the source code link at the top of this page). In this example we are inserting 10,000 randomly generated strings of length 10,000 into a hash map and then looking up each one 10,000 times. If we use the default string hasher, this takes 17.1 seconds. However, if we calculate the hash based only on the first three characters of the string, the runtime is reduced to 8.4 seconds. Clearly the calculation time of the hash function dominates the additional linked list traversals required to resolve hash collisions.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.