Channels ▼
RSS

.NET

User-Defined Hash Functions for Unordered Maps in C++0x


The container classes included in the C++ standard library serve as good illustrations of both the strengths and weaknesses of the language. The strengths are obvious: efficient, type-safe containers with performance guarantees suitable for a huge variety of applications. And the weaknesses? Compiler error messages that redefine the term useless, and documentation that makes a mockery of the word.

In this article I'll illustrate how you might bump into these problems using the unordered_map container, as well showing you how to work past the problems. By rights, this basic hash map should be the first- or second-most used container in your arsenal, but if you are less than a C++ savant, you might find yourself ditching it out of frustration.

Hash Tables

It's a little bit of an embarrassment to the C++ community that it didn't have a hash table in the standard library until TR1 was published in 2005. In a perfect world, the original standard should have contained hash map and hash set containers. But Alexander Stepanov didn't include these containers in the original Standard Template Library, and the standardization committee was reluctant to bless containers that didn't have a decent amount of mileage in the real world.

By 2005, there were enough non-standard implementations to allow the TR1 extension to confidently add four new template classes:

  • unordered_map
  • unordered_multimap
  • unordered_set
  • unordered_multiset

With basically the same semantics as their ordered counterparts (map, multimap, etc.) and the ideal O(1) performance afforded by hashed indexing, C++ finally had a feature that was basically table stakes for any high-level language created since the mid-1980s.

The container classes in general, and unordered_map in particular, are such a useful part of the language that we generally want to introduce them as early as possible to people learning C++. Admittedly, the underpinnings of template programming are out of the beginner's depth, but using the containers doesn't require much knowledge about templates beyond an understanding of a few syntax rules.

A simple piece of sample code that you might use to teach beginners about these containers is shown here:

//
// Compile this example with Visual Studio 2010
// or g++ 4.5 using -std=c++0x
//
#include <iostream>
#include <unordered_map>
#include <string>

using namespace std;

typedef string Name;

int main(int argc, char* argv[])
{
    unordered_map<Name,int> ids;
    ids["Mark Nelson"] = 40561;
    ids["Andrew Binstock"] = 40562;
    for ( auto ii = ids.begin() ; ii != ids.end() ; ii++ )
        cout << ii->first << " : " << ii->second << endl;
    return 0;
}

This example works great in the classroom or in the documentation page for unordered_map. But the new C++ programmer runs into trouble as soon as he or she steps outside the classroom and tries to implement a real-world example. A very common change to this program on its path to the real world will be in the use of a simple class or structure to hold the person's name. To keep it simple, I'll just assume we want to keep first and last names separate, and use the built-in pair class to hold the name:

//
// Compile this example with Visual Studio 2010
// or g++ 4.5 using -std=c++0x
//
#include <iostream>
#include <unordered_map>
#include <string>

using namespace std;

typedef pair<string,string> Name;

int main(int argc, char* argv[])
{
    unordered_map<Name,int> ids;
    ids[Name("Mark", "Nelson")] = 40561;
    ids[Name("Andrew","Binstock")] = 40562;
    for ( auto ii = ids.begin() ; ii != ids.end() ; ii++ )
        cout << ii->first.first 
        << " "
        << ii->first.second 
        << " : "
        << ii->second
        << endl;
        return 0;
}

This seemingly small change generates seven errors in Visual C++, five in g++, and none of the errors point the user to the actual problem.

And what is the problem? It's actually a simple one: unordered_map doesn't know how to create a hash for the given key type of std::pair. Instead, the user is left to puzzle over things like this:

c:\program files\microsoft visual studio 10.0\vc\include\xfunctional(768):
 error C2440: 'type cast' :
 cannot convert from 'const Name' to 'size_t'

Or worse yet from g++:

/tmp/cc0B9FPH.o: In function 'std::__detail::_Hash_code_base<std::pair<std::basic_string<char, std::char_traits<char>, 
std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, 
std::pair<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, 
std::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, int>, 
std::_Select1st<std::pair<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, 
std::basic_string<char, std::char_traits<char>, std::allocator<char> > > const, int> >, 
std::equal_to<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, 
std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, 
std::hash<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, 
std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, 
std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, false>::_M_hash_code(
std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, 
std::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&) const'

The g++ compiler doesn't seem to actually produce a sentence describing the error. I think it figures no human could parse it anyway.

The inscrutability of compiler errors in template classes is hardly something new — I was complaining about it all the way back in 1995. And the C++0x committee tried to do something about it. Had the C++0x Concepts feature been accepted, the compiler might instead have issued an error message that looked more like this:

hash_test.cpp(15): type 'Name' does not have a hash function

Unfortunately, Concepts did not make it, and we are stuck with error messages that are of no help at all.

RTFM

There are a couple of obvious places to try to figure out what these errors mean. Google would be one, and Visual Studio's class documentation pages would be another.

Just as an experiment, try putting yourself in the place of a novice, and execute a search on:

error C2440: 'type cast' : cannot convert from 'const Name' to 'size_t'

or show some sophistication and change your search to:

"error C2440: type cast : cannot convert from" to size_t

You will find some good clues, but probably no solutions to your problem. Much of the published code deals with pre-standard hash tables using boost implementations, or early g++ hash tables, which are not going to help.

But let's say you eventually figure out that you need to write a hash function for your Name class. All you need to know are the mechanics. Where do you find out what they are?

The logical place to do this would be in the Visual Studio unordered_map documentation page. This page has some good information in it, but nowhere does it address an undoubtedly common problem: how do I create a user-defined hash function for unordered_map?. And don't even think about getting anything useful from the g++ documentation page.

These stumbling blocks are the kind of thing that give C++ a reputation as one of the most difficult languages to learn. To see the contrast, you might compare it to Java. The Java programmer won't ever run into this problem, because the language defines a default hash function in the base class Object. Any Java object may be used as a key in Java's HashMap generic class. While the base class definition may be far from optimal for many cases, it will at least work, and won't prevent the beginner from using the container in real programs.

C++ could have done the same thing when implementing the unordered containers, but it was constrained by both philosophy and language limitations. It would have been a real accomplishment to have overcome both, and to be honest, the people on the standards committee have to work hard enough as it is. Insisting on X-Ray vision and a cape for each member might be pushing it.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video