STL and TR1: Part III

With TR1, C++ gets hash tables in the form of the template classes unordered_map, unordered_multimap, unordered_set, and unordered_multiset.


February 01, 2006
URL:http://www.drdobbs.com/stl-and-tr1-part-iii/184402066

February, 2006: STL and TR1: Part III

Pete Becker is a software developer at Dinkumware Ltd., where he works on standard library implementation and documentation for C, C++, and Java. He is Project Editor for the C++ Standard, and for several years has written a column for C/C++ Users Journal. He is currently writing a book about TR1. He can be contacted at [email protected].


One of the most commonly requested additions to the C++ Standard Library is hash tables. Properly used, they can provide significant speed improvements in searches. There was a proposal to add STL-style hash tables to the Standard Library in 1995; it was rejected because the target date for completion of the standard did not leave enough time for proper evaluation and adaptation of the proposed changes. TR1 provides hash tables under the names unordered_map, unordered_multimap, unordered_set, and unordered_multiset [1]. These template classes are defined in the headers <unordered_map> and <unordered_set>. Listing 1 is a synopsis of the header <unordered_map>. Listing 2 is a synopsis of the header <unordered_set>. In addition to these four containers, TR1 provides a set of specializations of a template class named hash that serves as the default hash functions for these containers. Listing 3 has a synopsis of this template class and its specializations, which are defined in the header <functional>.

These containers don't quite fit into the existing set of container requirements, so TR1 provides a new set of requirements for what it formally refers to as "unordered associative containers." Don't let the name confuse you: They are not more powerful forms of associative containers. Their requirements are slightly different from the existing requirements for associative containers, so the two kinds aren't interchangeable. We'll look at their differences in more detail a little later.

One of the biggest problems in designing a generic interface to a hash table is choosing the right set of parameters for tuning the operation of the resulting objects. With too few parameters, the template is too inflexible for broad use. With too many, it's too flexible, and it becomes hard to know what it's really doing. TR1's unordered containers can be tuned at compile time by selecting suitable type arguments for hashing data values and for comparing data values for equality. They can be tuned at runtime by setting the maximum allowable load factor, which determines how many objects can be inserted in the container before it reshuffles them into more buckets, and by calling the member function rehash, which forces that reshuffling.

Hash Tables

A hash table stores elements in buckets. Each bucket can hold zero or more elements. The hash table chooses a bucket for an element based on the hash value for that element. The hash value for an element is determined by passing that element to a hash function. If this is done right, a hash table can provide constant-time searches.

For example, the code in Listing 4 stores integer values in a hash table with five buckets, where each bucket is an object of type std::list<int>. The hash function, hash, takes an argument of type int, and simply converts it to a value of type size_t [2]. The function insert calls hash with the value to be inserted, reduces the result modulo to the size of the container, and uses the result as an index to determine which linked list to append the value to. As you can see, the time it takes to insert a new element into this hash table does not depend on how many elements are already in the list. Thus, insertion into this table is O(1).

Searching this table for a value is more complicated. The function contains calls hash with the target value, reduces the result modulo the size of the container, and uses the result as an index to determine which linked list to search. Having done that, it calls the standard algorithm find to look for the target value in that linked list. In looking for the target value, find starts at the beginning of the sequence of values managed by the linked list and walks through the sequence until it finds a matching value or reaches the end of the sequence [3]. Thus, the time needed to find an element is proportional to the number of elements in the linked list. That's not a problem if there are only a few elements in each linked list, but with a fixed number of linked lists, as in this particular version of a hash table, as we add elements to the table, the linked lists become proportionately longer. On average, each linked list will have n/M elements, where n is the number of elements in the table and M is the number of buckets. When n/M is large, the time needed to find an element is proportional to the number of elements in the table, so it is much longer than the constant-time lookup that hash tables are capable of.

The key to keeping hash table searches fast is to keep the number of elements in each bucket as low as is reasonable. Because the average number of elements in each bucket is n/M, this means that we need to keep that ratio low. To do this, as n increases, we also need to increase M. That is, we need to add more buckets to the table and redistribute the elements held in the table throughout the new set of buckets. This process, known as "rehashing," is so important to maintaining the speed of hash tables that TR1 hash tables are automatically rehashed whenever the average number of elements in each bucket exceeds the table's load factor.

Hash Tables as STL Containers

The C++ Standard defines four standard categories of container types: containers, reversible containers, sequence containers, and associative containers. If you're careful, these abstract categories allow you to design an application that can be implemented with one of several different container types, leaving you the flexibility to select the actual container to use later on, when you know more about the characteristics of the real-world data that the application will handle. If your application only uses operations defined for a particular container category, you can use any container type that satisfies the requirements for that category. For example, we saw last month that the TR1 template class array is a reversible container, but not a sequence container. It can be used in place of a list, a vector, or a deque in any application that relies only on the operations defined for a reversible container.

The C++ Standard Library has four template classes that satisfy the requirements for associative containers: map, multimap, set, and multiset. They differ in how they compare objects and in how they handle insertion of objects that compare as equal. Each element in a map and a multimap is a pair of objects; the first element in the pair is the key, which the container uses to determine whether two elements are equal; the second element in the pair is the value, which is not used in comparing elements. A map won't insert a new element whose key is already present; a multimap allows multiple elements with the same key. A set uses the entire element as the key; a multiset allows multiple elements with the same key.

To be an associative container, a data structure must meet all the requirements for a container as discussed last month, and it must meet several more. Rather than present them all here, Listing 5 has a synopsis of the template class multimap and its associated free functions; the other associative containers support the same interface [4].

TR1 adds a fifth container category. Unordered containers satisfy all of the requirements for containers except that they do not support any of the six comparison operators [5]. Unordered containers also provide a set of operations that is similar to the operations available for associative containers, as well as a set of operations for examining the contents of individual buckets and tuning the operation of the container.

In addition to the container requirements, unordered containers must provide the nested type names key_type and key_equal, which name the type of the container's key and the type of an object that can be used to compare objects of type key_type for equality. They have a default constructor, a copy constructor, and an assignment operator, as well as a templated constructor that takes two iterators that designate a sequence of values to be inserted into the container. Just as with ordered containers, you can insert a value t with the member function insert(t), you can insert with a hint using insert(q, t), where q is an iterator into the container, and you can insert a sequence of values with insert(i, j), where i and j are iterators that designate a sequence of values.

To remove elements, unordered containers provide the usual clear and erase member functions. You can call erase with a value to remove all elements whose key compares equal to that value. You can also call it with an iterator that designates the element to be removed. And, finally, you can call it with a pair of iterators that designates a range of elements within the container to be removed.

To find elements, unordered containers provide the member functions find, count, and equal_range. Each of them takes a key value to search for. The member function find returns an iterator that points to an element whose key compares equal to the key value, or an iterator equal to end() if no such element exists. The member function count returns the number of elements whose keys compare equal to the key value. The member function equal_range returns a pair of iterators that designate a range of elements within the container, all of whose keys compare equal to the key value [6].

Listing 6 is a synopsis of the template class unordered_multimap. If you compare that listing with the synopsis of the template class multimap in Listing 5, you'll see that they have quite a bit in common. The main differences are that unordered_multimap does not provide reverse iterators (that is, it's not a reversible container), and it has a handful of functions for tuning its operation.

Listing 7 shows some of the container operations for the template class unordered_multimap. Although I cautioned you earlier that unordered containers are not direct replacements for associative containers, in this code example, they are interchangeable. If you change the definition of the type table to use a multimap instead of an unordered_multimap, the program still runs correctly, although it might show the container's contents in a different order.

Next Time

Next time we'll look at the rest of the interface to unordered containers: the member functions for examining and tuning the organization of the container. We'll also look at more details of the four unordered containers in TR1.

References

  1. [1] The more obvious names, hash_set and the like, were rejected because they are already widely used for hash tables that are somewhat different from the hashed containers in TR1.
  2. [2] Despite what some compilers may tell you, this conversion is well defined and meaningful. If you get a warning for this code, either complain to the compiler writer or add a cast.
  3. [3] The actual algorithm used in the unordered containers is more sophisticated than this, but it still degenerates into linear time when the bins are overfilled.
  4. [4] As is often the case, some of the associative containers provide additional members that are not in the associative container requirements. In particular, the template class map provides operator[] and, in the next revision of the Standard, the member function at to look up stored values from key values.
  5. [5] The reason for this omission is that the unordered containers are unordered. Comparing two unordered containers for equality is expensive, and it is not at all clear what it would mean for one unordered container object to be less than another.
  6. [6] This requirement means that the simple implementation of a hash table in Listing 4 can't be used as an unordered container. It doesn't group elements that compare equal together, so there might not be a valid range that holds all elements that compare equal to a given key and no others. The TR1 unordered containers use a more sophisticated technique that groups equal elements together.

February, 2006: STL and TR1: Part III

Listing 1

// the TR1 header <unordered_map>

namespace std {     // C++ Standard Library
 namespace tr1 {    // TR1 additions

    // TEMPLATE CLASS unordered_map
template <class Key, class Ty,
    class Hash, class Pred, class Alloc> class unordered_map;

    // TEMPLATE CLASS unordered_multimap
template <class Key, class Ty,
    class Hash, class Pred, class Alloc> class unordered_multimap;

    // TEMPLATE FUNCTIONS swap
template <class Key, class Ty, class Hash, class Pred, class Alloc>
void swap(unordered_map(Key, Ty, Hash, Pred, Alloc>& left,
    unordered_map(Key, Ty, Hash, Pred, Alloc>& right);
template <class Key, class Ty, class Hash, class Pred, class Alloc>
void swap(unordered_multimap(Key, Ty, Hash, Pred, Alloc>& left,
    unordered_multimap(Key, Ty, Hash, Pred, Alloc>& right);

} }

February, 2006: STL and TR1: Part III

Listing 2

// the TR1 header <unordered_set>

namespace std {     // C++ Standard Library
 namespace tr1 {    // TR1 additions

    // TEMPLATE CLASS unordered_set
template <class Key,
    class Hash, class Pred, class Alloc> class unordered_set;

    // TEMPLATE CLASS unordered_multiset
template <class Key,
    class Hash, class Pred, class Alloc> class unordered_multiset;

    // TEMPLATE FUNCTIONS swap
template <class Key, class Hash, class Pred, class Alloc>
void swap(unordered_set(Key, Hash, Pred, Alloc>& left,
    unordered_set(Key, Hash, Pred, Alloc>& right);
template <class Key, class Hash, class Pred, class Alloc>
void swap(unordered_multiset(Key, Hash, Pred, Alloc>& left,
    unordered_multiset(Key, Hash, Pred, Alloc>& right);

} }

February, 2006: STL and TR1: Part III

Listing 3

// partial synopsis of TR1 header <functional>

namespace std {     // C++ Standard Library
 namespace tr1 {    // TR1 additions

    // TEMPLATE CLASS hash
template <class Ty> struct hash;

    // TEMPLATE CLASS hash SPECIALIZATIONS
template <> struct hash<bool>;
template <> struct hash<char>;
template <> struct hash<signed char>;
template <> struct hash<unsigned char>;
template <> struct hash<wchar_t>;
template <> struct hash<short>;
template <> struct hash<unsigned short>;
template <> struct hash<int>;
template <> struct hash<unsigned int>;
template <> struct hash<long>;
template <> struct hash<unsigned long>;
template <> struct hash<float>;
template <> struct hash<double>;
template <> struct hash<long double>;

template <class Ty> struct hash<Ty*>;

template <> struct hash<std::string>;
template <> struct hash<std::wstring>;

} }

February, 2006: STL and TR1: Part III

Listing 4

#include <array>
#include <iomanip>
#include <iostream>
#include <iterator>
#include <limits>
#include <list>
using std::tr1::array;
using std::copy; using std::find;
using std::ostream_iterator;
using std::list;
using std::cout; using std::setw;
using std::numeric_limits;

typedef list<int> bucket;
typedef array<bucket, 5> table;

size_t hash(int i)
  { // return hash value for i
  return i;
  }

void show(const table& tbl)
  { // show contents of buckets in table
  for (int i = 0; i < tbl.size(); ++i)
    { // show contents of bucket i
    cout << "bucket " << setw(2) << i << ": ";
    copy(tbl[i].begin(), tbl[i].end(), ostream_iterator<int>(cout, " "));
    cout << '\n';
    }
  }

void insert(table& tbl, int val)
  { // insert val into table
  size_t hash_val = hash(val) % tbl.size();
  tbl[hash_val].push_back(val);
  }

bool contains(const table& tbl, int val)
  { // return true if tbl contains val
  int hash_val = hash(val) % tbl.size();
  bucket::const_iterator first = tbl[hash_val].begin();
  bucket::const_iterator last = tbl[hash_val].end();
  return find(first, last, val) != last;
  }

void report(const table& tbl, int val)
  { // report whether tbl contains val
  cout << "table "
    << (contains(tbl, val) ? "contains " : "does not contain ")
    << val << '\n';
  }

int main()
  { // demonstrate simple hash table
  table tbl;
  insert(tbl, 3);
  insert(tbl, 195);
  insert(tbl, 5);
  insert(tbl, 6);
  insert(tbl, 55);
  insert(tbl, 1);
  insert(tbl, 33);
  insert(tbl, 40);
  show(tbl);
  report(tbl, 3);
  report(tbl, 4);
  return 0;
  }

February, 2006: STL and TR1: Part III

Listing 5

// synopsis of template class multimap

template <class Key, class Ty,
  class Pr = std::less<Key>,
  class Alloc = std::allocator<std::pair<const Key, Ty> > >
class multimap { // ordered container holding (Key, Ty) pairs
public:
    // NESTED TYPES
    typedef Key key_type;
    typedef Ty mapped_type;
    typedef Pr key_compare;
    typedef Alloc allocator_type;
    typedef std::pair<const Key, Ty> value_type;
    class value_compare;
    typedef Alloc::pointer pointer;
    typedef Alloc::const_pointer const_pointer;
    typedef Alloc::reference reference;
    typedef Alloc::const_reference const_reference;
    typedef T0 size_type;
    typedef T1 difference_type;
    typedef T2 iterator;
    typedef T3 const_iterator;
    typedef std::reverse_iterator<iterator> reverse_iterator;
    typedef std::reverse_iterator<const_iterator> const_reverse_iterator;

    // CONSTRUCTORS
    multimap();
    explicit multimap(const Pr& pr);
    multimap(const Pr& pr, const Alloc& al);
    multimap(const multimap& right);
    template <class InIt>
    multimap(InIt first, InIt last);
    template <class InIt>
    multimap(InIt first, InIt last, const Pr& pr);
    template <class InIt>
    multimap(InIt first, InIt last, const Pr& pr, const Alloc& al);

    // MODIFICATION
    iterator insert(const value_type& val);
    iterator insert(iterator hint, const value_type& val);
    template <class InIt>
    void insert(InIt first, InIt last);

    iterator erase(iterator pos);
    iterator erase(iterator first, iterator last);
    iterator erase(cosnt Key& key);
    void clear();

    void swap(multimap& right);
  
    // CONTAINER ITERATORS
    iterator begin();
    const_iterator begin() const;
    iterator end();
    const_iterator end() const;
    reverse_iterator rbegin();
    const_reverse_iterator rbegin() const;
    reverse_iterator rend();
    const_reverse_iterator rend() const;

    // ELEMENT ACCESS
    iterator find(const Key& key);
    const_iterator find(const Key& key) const;
    size_type count(const Key& key) const;
    iterator lower_bound(const Key& key);
    const_iterator lower_bound(const Key& key) const;
    iterator upper_bound(const Key& key);
    const_iterator upper_bound(const Key& key) const;
    pair<iterator, iterator> equal_range(const Key& key);
    pair<const_iterator, const_iterator> equal_range(const Key& key) const;

    // QUERIES
    size_type size() const;
    size_type max_size() const;
    bool empty() const;
    Alloc get_allocator() const;
    key_compare key_comp() const;
    value_compare value_comp() const;
};

template <class Key, class Ty, class Pr, class Alloc>
bool operator==(
  const multimap<Key, Ty, Pr, Alloc>&,
  const multimap<Key, Ty, Pr, Alloc>&);
/* analogous template functions for !=, <, <=, >, and >= */

template <class Key, class Ty, class Pr, class Alloc>
void swap(
  const multimap<Key, Ty, Pr, Alloc>&,
  const multimap<Key, Ty, Pr, Alloc>&);

February, 2006: STL and TR1: Part III

Listing 6

// synopsis of template class unordered_multimap

template <class Key, class Ty,
  class Hash = std::tr1::hash<Key>,
  class Pred = std::equal_to<Key>,
  class Alloc = std::allocator<Key> >
class unordered_multimap { // unordered container holding (Key, Ty) pairs
public:
    // NESTED TYPES
    typedef Key key_type;
    typedef Ty mapped_type;
    typedef std::pair<const Key, Ty> value_type;
    typedef Hash hasher;
    typedef Pr key_equal;
    typedef Alloc allocator_type;

    typedef Alloc::pointer pointer;
    typedef Alloc::const_pointer const_pointer;
    typedef Alloc::reference reference;
    typedef Alloc::const_reference const_reference;

    typedef T0 size_type;
    typedef T1 difference_type;
    typedef T2 iterator;
    typedef T3 const_iterator;
    typedef T4 local_iterator;
    typedef T5 const_local_iterator;

    // CONSTRUCTORS
    unordered_multimap(size_type nbuckets = N0,
        const Hash& hfn = Hash(),
        const Pred& pr = Pred(),
        const Alloc& al = Alloc());
    template <class InIt>
    unordered_multimap(InIt first, InIt last,
        size_type nbuckets = N0,
        const Hash& hfn = Hash(),
        const Pred& pr = Pred(),
        const Alloc& al = Alloc());
    unordered_multimap(const unordered_multimap& right);

    // MODIFICATION
    iterator insert(const value_type& val);
    iterator insert(iterator pos, const value_type& val);
    template <class InIt>
        void insert(InIt first, InIt last);

    iterator erase(iterator pos);
    iterator erase(iterator first, iterator last);
    size_type erase(const key_type& key);
    void clar();

    void swap(unordered_multimap& right);

    // CONTAINER ITERATORS
    iterator begin();
    const_iterator begin() const;
    iterator end();
    const_iterator end() const;

    // ELEMENT ACCESS
    const_iterator find(const key_type& key) const;
    size_type count(const key_type& key) const;
    std::pair<iterator, iterator>
        equal_range(const key_type& key);
    std::pair<const_iterator, const_iterator>
        equal_range(const key_type& key) const;

    // QUERIES
    size_type size() const;
    size_type max_size() const;
    bool empty() const;

    allocator_type get_allocator() const;
    key_type key_eq() const;
    hasher hash_function() const;

    // TUNING
    local_iterator begin(size_type nbucket);
    const_local_iterator begin(size_type nbucket) const;
    local_iterator end(size_type nbucket);
    const_local_iterator end(size_type nbucket) const;

    size_type bucket_count() const;
    size_type max_bucket_count() const;
    size_type bucket_size(size_type nbucket) const;
    size_type bucket(const key_type& key) const;

    float load_factor() const;
    float max_load_factor() const;
    float max_load_factor(float f);
    void rehash(size_type nbuckets);
};

February, 2006: STL and TR1: Part III

Listing 7

// demonstrate unordered container construction and insertion
#include <unordered_map>
#include <array>
#include <iomanip>
#include <iostream>
#include <ostream>
#include <iterator>
#include <string>
#include <utility>
using std::tr1::unordered_multimap; using std::tr1::array;
using std::string; using std::ostream_iterator;
using std::pair; using std::make_pair;
using std::basic_ostream; using std::cout; using std::setw;

typedef unordered_multimap<int, string> table;
typedef table::iterator iter;
typedef table::value_type elt;

array<elt, 5> values =
    { // initial values for unordered containers
    elt(1, "first"),
    elt(2, "second"),
    elt(3, "third"),
    elt(4, "fourth"),
    elt(5, "fifth")
    };

namespace std { // put inserter in namespace std
template <class Elem, class Traits>
basic_ostream<Elem, Traits>& operator<<(
  basic_ostream<Elem, Traits>& os, const elt& val)
  { // insert elt into stream
  return os << '[' << val.first << ',' << val.second << ']';
  }
}

void show(const char * title, iter first, iter last)
    { // show title and contents of range [first, last)
    cout << title << ":\n  ";
    copy(first, last, ostream_iterator<elt>(cout, " "));
    cout << '\n';
    }

int main()
    { // demonstrate use of std::tr1::unordered_multimap
    table t0(values.begin(), values.end());
    show("initialized table", t0.begin(), t0.end());
    table t1;
    show("empty table", t1.begin(), t1.end());
    t1.insert(values.begin(), values.end());
    show("insert range", t1.begin(), t1.end());
    t1.insert(make_pair(4, "other fourth"));
    show("insert element", t1.begin(), t1.end());
    t1.erase(3);
    show("erase element", t1.begin(), t1.end());
    pair<iter, iter> res = t1.equal_range(4);
    show("equal range", res.first, res.second);
    t1.erase(res.first, res.second);
    show("erase range", t1.begin(), t1.end());
    return 0;
    }

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.