Dr. Dobb's | The Standard Librarian:Hash Tables for the Standard Library

The Standard Librarian:Hash Tables for the Standard Library

The C++ Standard library is being extended, and the standardization committee is looking for proposals. This column gives you an idea of what a proposal should look like, by presenting a sample proposal on hash tables.

April 01, 2002
URL:http://www.drdobbs.com/the-standard-librarianhash-tables-for-th/184403822

April 2002 C++ Experts Forum/The Standard Librarian

Now that work on the next revision of C++ has begun, the Standardization Committee is beginning to consider extensions to the Standard C++ library. One of the first proposals the committee will consider is a proposal to add hash tables to the standard library.

Why hash tables? One reason is that hash tables are one of the most frequently requested extensions. C++ has associative containers (set, map, multiset, and multimap) that allow fast O(log N) lookup of an element by key and that store elements in ascending order. Unlike some other languages, C++ doesn't have hash tables. Many users find this an obvious hole in the language. Hash tables and sorted associative containers have different performance tradeoffs: hash tables have poorer worst-case performance, but, in many real-world applications, better performance on average.

A second reason for this proposal is that there is extensive experience with hash tables implemented in C++ in the style of standard containers. Hash tables were proposed for the C++ Standard in 1995 [1]; the proposal was rejected, mainly because it was submitted at the wrong time. Three independently written libraries (SGI [2], Dinkumware [3], and Metrowerks [4]) now provide hashed associative containers as an extension. The GNU C++ library includes hash tables derived from SGI's hashed associative containers. Standardization committees do lots of different things, but one of the things they're supposed to do is standardize existing practice. We now have multiple, widely used, incompatible definitions of hash tables, and that's exactly what standardization is supposed to prevent.

I recently submitted a new hash table proposal [5], and the standardization committee has begun to consider it. This column is based on that proposal. There will undoubtedly be changes -- there always are -- so, while it's safe to assume that hash tables will eventually make it into the standard library, you should not assume they will look exactly like the hash tables described in this proposal.

What Does a Proposal Look Like?

A formal proposal should include these sections:

Motivation. You should explain why this proposal should be considered. Why is it important, and what problem area does it address? What kinds of programmers will use it? To what extent is it based on existing practice?

Impact on the Standard and on users. What does it depend on, and what depends on it? Is it a pure extension, or do you propose changing existing library components? Does your proposal involve core language changes? Can it be implemented portably, or does it require nonstandard facilities or compiler magic?

Design decisions. Having a record of your intentions will be important to programmers who are trying to understand the Standard, and to the committee as it makes changes and bug fixes in the future. (In the past, we've occasionally had problems when the committee made "obvious" changes without a full understanding of the original design.) This section should address design issues that you think are important, as well design issues that other people raise. If you've made any decisions that you don't think are important, but where you had to make some definite choice, that too is important to know.
Some specific questions to address in this section: Why did you choose the specific design that you did? What alternatives did you consider, and what are the tradeoffs? What are the performance implications of your choices? What decisions are left up to implementers? Were there any "obvious" ideas that you tried and rejected?

Proposed text. Wording that's suitable for the Standard. There will probably be some editorial tinkering, but you should assume that it will be minor. The style and order of presentation should be consistent with that of the C++ Standard's library clauses [6].

References. This section can be important in areas that a lot of people have worked on: if the committee is choosing one design in an area where there are lots of alternatives, we can be more confident if we know you've looked at the other possibilities.

I already discussed the motivation for standardizing hash tables. As for the impact on the Standard, this proposal is almost a pure extension. (Only "almost," because I'm proposing to add a new declaration to the header <functional>.) It doesn't require changes to any of the standard requirement tables, and it can be and has been implemented in Standard C++.

The rest of this column will discuss design decisions. As you'll see, designing a set of requirements for the Standard isn't quite the same as designing a library. The important point is that you're not just documenting what your library does: you're imposing requirements that are supposed to be satisfied by any conforming implementation. There are tradeoffs between specificity and implementer latitude, and you have to think carefully about those tradeoffs.

Packaging Issues

The three implementations in current use, as well as the Barreiro/Fraley/Musser proposal, all used the names hash_set, hash_map, hash_multiset, and hash_multimap. Existing practice suggests that these names should be retained.

The distinction between the four hashed containers is the same as the distinction between the four standard associative containers. All four hashed containers allow lookup of elements by key. In hash_set and hash_multiset, the elements are the keys; modification of elements is not allowed. In hash_map and hash_multimap, the elements are of type std::pair<const Key, Value>. The key part can't be modified, but the value part can. In hash_set and hash_map, no two elements may have the same key; in hash_multiset and hash_multimap, there may be any number of elements with the same key.

Similarly, because of existing practice, this proposal defines the hash_set and hash_multiset classes within a new <hash_set> header, and hash_map and hash_multimap within <hash_map>.

This proposal defines a set of Hashed Associative Container requirements and then separately describes four classes that conform to those requirements. In that respect, this proposal follows the lead of the Standard and the Barreiro/Fraley/Musser proposal. However, Barreiro, Fraley, and Musser proposed more extensive changes to the container requirements. They proposed two new requirements tables, not one: Sorted Associative Container, satisfied by the existing standard associative containers, and Hashed Associative Container. They then modified the existing Associative Container requirements (Table 69 in the C++ Standard) so that both sorted associative containers and hashed associative containers would satisfy the new Associative Container requirements. The difference is shown in Figure 1.

I believe that the Barreiro/Fraley/Musser taxonomy is better: the generality of the name "Associative Container," and the specificity of Table 69, aren't a good match. However, that proposal was made before the C++ Standard was finalized. The superiority of the Barreiro/Fraley/Musser taxonomy isn't so great as to justify changing an existing requirements table that users may be relying on.

The three hash table implementations in current use are not identical, but they are similar enough that, for simple uses, they are interchangeable. This proposal attempts to maintain a similar level of compatibility.

Chaining vs. Open Addressing

Knuth [7] distinguishes between two kinds of hashing: chaining, where a hash code is associated with the head of a linked list, and open addressing, where a hash code is associated with an index into an array.

I'm not aware of any satisfactory implementation of open addressing in a generic framework. Open addressing presents a number of problems:

It's necessary to distinguish between a vacant position and an occupied one.

It's necessary either to restrict the hash table to types with a default constructor and construct every array element ahead of time, or else to maintain an array some of whose elements are objects and others of which are raw memory.

Open addressing makes collision management difficult: if you're inserting an element whose hash code maps to an already-occupied location, you need a policy that tells you where to try next. This is a solved problem, but the best solutions are complicated.

Collision management is especially complicated when erasing elements is allowed. (See [7] for a discussion.) A container class for the standard library ought to allow erasure.

Collision management schemes for open addressing tend to assume a fixed-size array that can hold up to N elements. A container class for the standard library ought to be able to grow as necessary when new elements are inserted, up to the limit of available memory.

Solving these problems could be an interesting research project, but, in the absence of implementation experience in the context of C++, it would be inappropriate to standardize an open-addressing container class.

All further discussion will assume chaining. Each linked list within a hash table is called a bucket. The average number of elements per bucket is called the load factor, or a.

Lookup within a Bucket

When looking up an item in a hash table by key k, the general strategy is to find the bucket that corresponds to k and then to perform a linear search within that bucket. The first step uses the hash function; the second step must use something else.

The most obvious technique is to use std::find or the equivalent: look for an item whose key is equal to k. Naturally, it would be wrong for operator== to be hard-wired in; it should be possible for the user to provide a function object with semantics of equality. As an example where some predicate other than operator== is useful, suppose the user is storing C-style strings (i.e., pointers to null-terminated arrays of characters). In this case, equality of keys k1 and k2 shouldn't mean pointer comparison; it should mean testing that strcmp(k1, k2) == 0.

This proposal takes such an approach. Hashed associative containers are parameterized by two function objects: a hash function and an equality function. Both have defaults.

An alternative technique is possible: instead of testing for equality, sort each bucket in ascending order by key. Linear search for a key k would mean searching for a key k' such that k < k' and k' < k are both false. Again, this shouldn't be taken to mean that operator< would literally be hard-wired in; users could provide their own comparison function, so long as that function has less-than semantics.

The performance characteristics of the two techniques are slightly different. The following table shows the average number of comparisons required for a search through a bucket of n elements:

Using Equality Using Less-Than

Failed Search n n/2

Successful Search n/2 n/2 + 1

The difference for a failed search is that with less-than you can tell that a search has failed as soon as you see a key that's larger than k; with equal-to you have to get to the end of the bucket. The difference for a successful search is with equal-to you can tell that a search has succeeded as soon as you see a key that's equal to k; with less-than all you know when you find a key that's not less than k is that the search has terminated, and you need an extra comparison to tell whether it has terminated in success or failure.

I do not see a clear-cut performance advantage from either technique. Which technique is faster depends on usage pattern: the load factor and the relative frequency of failed and successful searches. (Remember, a bucket may only contain a single element. That extra "+ 1" can be important!) There are also performance implications for insertion, but I expect those differences to be smaller because in most cases I expect insertion to be dominated by the cost of memory allocation or element construction and not by the cost of lookup.

For users, it's sometimes important for a container to present its elements as a sorted range; sorting elements by inserting them into an std::set, for example, is a common idiom. However, I see no value (other than the performance issues discussed above) in guaranteeing the order of elements within a single bucket. If the hash function is well chosen, after all, elements will be distributed between buckets in a seemingly random way. It is useful to guarantee that equivalent elements are adjacent, but we can make that guarantee with either equal-to or less-than.

From the point of view of user convenience, there isn't a huge difference between the two alternatives of equal-to and less-than. I view equality as slightly more convenient, since it's common to define data types that have equality operations but not less-than operations, and rather less common to do the reverse. There are some types where less-than is not a natural operation, and users would have to define a somewhat arbitrary less-than operation for no reason other than to put the objects in a hash table. One obvious example is std::complex<>.

Existing implementations differ. The SGI and Metrowerks implementations use equal-to, and the Dinkumware implementation uses less-than.

An aside: in principle, linear search isn't strictly necessary. A bucket doesn't have to be structured as a linked list; it could be structured as a binary tree, or as some other data structure. This proposal assumes linked lists, partly for reasons of existing practice (all of the C++ hash table implementations in widespread use are implemented in terms of linked lists) and partly because I believe that in practice a tree structure would hurt performance more often than it would help. Balanced trees have large per-node space overhead, and binary tree lookup is faster than linear search only when the number of elements is large. If the hash table's load factor is small and the hash function well chosen, trees have no advantage over linear lists.

The Hash Function Interface

Abstractly, a hash function is a function f(k) that takes an argument of type Key and returns an integer in the range [0, B), where B is the number of buckets in the hash table. A hash function must have the property that f(k₁) and f(k₂) are the same when k₁ and k₂ are the same. A good hash function should also have the property that f(k₁) and f(k₂) are unlikely to be the same when k₁ and k₂ are different.

It is impossible to write a fully general hash function that's valid for all types. (You can't just convert an object to raw memory and hash the bytes; among other reasons, that idea fails because of padding.) Because of that, and also because a good hash function is only good in the context of a specific usage pattern, it's essential to allow users to provide their own hash functions.

There can be a default hash function for a selected set of types; ideally, it should include the most commonly used types (e.g., strings).

There are two design decisions involving non-default hash functions:

How can a user-written function return an integer in the range [0, B) for arbitrary B, especially since B may vary at run time?

Should the hash function be a stand-alone function object or part of a larger package that controls other aspects of the hash policy?

In principle there are two possible answers to the first question. First, the hash function could take two arguments instead of one, where the second argument is B. The hash function would have the responsibility of returning some number in the range [0, B). Second, the hash function could return a number in a very large range, say [0, std::numeric_limits<std::size_t>::max()). The hash table class would be responsible for converting the hash code (the value returned by the hash function) into a bucket index in the range [0, B).

This proposal uses a single-argument hash function. The reasons are:

Existing practice. The three shipping hash table implementations all use single-argument hash functions.

The main argument in favor of a two-argument hash function is that a user-written hash function might make good use of the bucket count; it might, for example, involve multiplication modulo B. However, this argument is weaker than it might seem, since a user-written hash function that took B as an argument would have to cope with arbitrary B. (B is chosen by the hash table class, not the user.) This is a significant restriction, because it means a user-written hash function couldn't rely on any special numerical properties.

As discussed later, the bucket count won't always remain the same during the lifetime of a hash table. This means that, for a particular key k, the hash table class may need the bucket index for two different bucket counts B₁ and B₂. With a double-argument hash function, the hash table class would have to call the hash function twice. With a single-argument hash function, the hash table class only has to invoke the hash function once.

If the hash function is to be packaged along with other aspects of hash policy, what should those aspects be? There are two obvious candidates. First, it could be packaged along with the function object that tests for key equality, or perhaps, even more generally, with a function object that specifies a policy for linear search within a bucket. Second, it could be packaged along with the parameters that govern resizing the bucket count.

This proposal uses a stand-alone hash function, rather than a hash function that's part of a policy package. This is mostly a consequence of other design decisions. First, as discussed later, this proposal controls bucket resizing by floating-point parameters that can be changed at run time, so there is no advantage in putting them in a policy class. Second, linear search within a bucket uses equality, and equality is such a common operation that in most cases I expect that a user-supplied equality predicate will have been written for some other purpose and will be reused as a hash table template argument. Making equality part of a larger policy class would make such reuse harder.

This proposal includes a function object hash<>, with an operator() that takes a single argument and returns an std::size_t. The hash<> template is an incomplete type; it is specialized, and declared as a complete type, for a few common types. I've chosen all of the built-in integer types, void*, char* and const char*, and std::basic_string<charT, traits, Allocator>. I believe that char*, const char*, and std::basic_string are important, because hash tables are often used for strings. Beyond that, the list is fairly arbitrary.

The hash<> function object is defined in the <functional> header. Another sensible alternative would have been to declare it in both <hash_set> and <hash_map>. Implementers would have to arrange for there to be only a single definition when both headers are included, but that's straightforward. The main reason I chose to put it in <functional> is that authors of user-defined types may need to specialize hash<> (which means they need the declaration) even if they have no need to use any of the hashed containers, but this choice too is basically arbitrary.

Trivial as it may seem, hash function packaging may be the most contentious part of this proposal. Existing implementations differ. The SGI and Metrowerks implementations use hash functions that aren't bundled with anything else, but the Dinkumware implementation uses a more general hash policy class. Since this is an interface issue, a decision is necessary.

Legacy implementations are a tricky enough problem that some people have suggested a drastic solution: abandon the names hash_set, hash_map, hash_multiset, and hash_multimap, because of the multiple pre-standard designs using those names, and instead standardize some alternate set of names like hashset, hashmap, hashmultiset, and hashmultimap. I hope nothing of the sort turns out to be necessary.

Control of Hash Resizing

The time required for looking up an element by key k is c₁ + c₂ n, where c₁ and c₂ are constants, and where n is the number of elements in the bucket indexed by k's hash code. If the hash function is well chosen, and elements are evenly distributed between buckets, this is approximately c₁ + c₂ N/B, where N is the number of elements in the container and B is the bucket count. If the bucket count is taken as a constant, then the asymptotic complexity for element lookup is O(N).

To maintain average case complexity O(1) for lookup, the bucket count must grow as elements are added to the hash table; on average the bucket count must be proportional to N. Another way of putting this is that the load factor, N/B, must be approximately constant.

Two methods of maintaining a roughly constant load factor are in current use. First is traditional rehashing. When the load factor becomes too large, choose a new and larger bucket count, B'. Then go through every element in the hash table, computing a new bucket index based on B'. This operation is called rehashing, and it is expensive. Since we want the amortized complexity of element insertion to be constant, we must use exponential growth; that is, B' = g B, where the growth factor, g, is larger than 1. In general this proportionality can only be approximate, since many hashing schemes require B to have special numerical properties (e.g., primality).

Second is a newer technique, incremental hashing [8]. Incremental hashing structures the hash table in such a way that it is possible to add a single bucket at a time. When adding a bucket it is only necessary to examine the elements of a single old bucket, distributing some of them to the new one.

The advantage of incremental hashing is that insertion time becomes more predictable: there's no longer a large time difference between insertions that do trigger rehashing and insertions that don't. The disadvantage is that incremental hashing makes lookup slightly slower. The slowdown is for two reasons. First, the logic to determine a bucket index from a hash code is slightly more complicated: it requires one extra test. Second, incremental hashing results in a somewhat less uniform distribution of elements within buckets. It relies on a construction where there are conceptually B buckets, of which U are in current use; B is a power of two, and U > B/2. We first find a bucket index i in the range [0, B) and then find a bucket index j in the range [0, U) by subtracting U from i if necessary. If the original hash codes are evenly distributed, a bucket in the range [0, B-U) will on average have twice the number of elements as a bucket in the range [B-U, U).

Because of this tradeoff, there is not a clear choice between incremental hashing and traditional rehashing; both are legitimate implementation techniques. A standard, of course, need not and should not dictate implementation technique. The goal of this proposal is to allow both.

From a user's perspective, all of this is invisible in normal use. It's visible when users want to do one of these things:

Examine the number of buckets, or the distribution of elements between buckets.

Request a rehash even when the load factor isn't yet large enough so that one would be triggered automatically. Suppose, for example, that a hash table currently contains 100 elements, and a user is about to add 1,000,000 elements. A user would save a lot of time by requesting a rehash before performing those insertions.

Control the parameters that govern automatic rehashing.

What are those parameters? The most obvious is the maximum load factor, since that's what triggers an automatic rehash. There's also a second parameter, which can be thought of in two different ways: as a growth factor (the constant of proportionality by which the bucket count grows in a rehash) or as a minimum load factor.

Letting users control that second parameter is less straightforward than it seems at first.

For incremental hashing, it makes no sense to talk about a growth factor; incremental hashing doesn't work by exponential growth.

Even if we regard the second parameter as a minimum load factor, it's not clear how it would be used in an incremental-rehashing implementation. The natural strategy for incremental rehashing is to add one bucket whenever the load factor exceeds the maximum; this automatically keeps the load factor within a tight range.

The interactions between a minimum load factor and manual rehash are tricky. Again, consider the example where a hash table has 100 elements, and the user, anticipating a large insertion, requests that the table rehash itself for 1,000,000 elements. In between the rehash and the insertion, the load factor will be very small -- probably below reasonable values of the minimum.

A minimum load factor presents difficulties for the hash table's initial state. An empty hash table -- one that has a nonzero bucket count, but zero elements -- has a load factor of zero. If we impose an invariant that the load factor must always lie between a_min and a_max, an empty hash table would fail to satisfy that invariant.

The second parameter, however it's expressed, would probably have to be interpreted in some approximate or asymptotic sense. The number of buckets must always be an integer, and in some implementations it has to obey other constraints (e.g., primality). If we specify an exact growth factor, or a tight minimum and maximum, then there might simply be no suitable number available.

I don't know how to specify invariants that are precise enough to be meaningful and normative, but loose enough to accommodate traditional and incremental hashing, hash tables with bucket counts that are restricted to prime numbers, manual rehashing, and empty hash tables. We could include a member function for setting the growth factor (or minimum load factor), but not say exactly how that number is used. However, I see very little value in such a vacuous requirement. This proposal provides user control of the maximum load factor, but not of growth factor or minimum load factor.

One implication is that this proposal says what happens as the number of elements in a hash table increase, but doesn't say what happens as the number decreases. This is unfortunate. An unnecessarily small load factor wastes space (the magnitude of the load factor is a time/space tradeoff) and can lead to unnecessarily slow iteration.

There's still one last question about the maximum load factor: How should the user specify it? Setting a_max to a small value optimizes for speed, and setting a_max to a large value optimizes for space; surely this should be under user control. Letting users specify a_max as an integer is an unnecessarily restrictive choice, since fractional values (especially in the range 0 < a_max < 1) are sensible. There are three reasonable options: as a rational number (perhaps an ad hoc rational number, where the user provides the numerator and denominator separately), as an enum (the user may select one of a small number of predetermined values, such as 1/4, 1/2, 1, 3/2, 2), or as a floating-point number.

This proposal provides a member function that allows the user to set the maximum load factor at run time, using a floating-point number. The reasons are:

In my opinion, there is no compelling performance advantage gained from setting the maximum load factor at compile time rather than at run time. The cost of a run-time floating-point parameter is one floating-point multiplication at every rehash (not every insertion). Even with incremental hashing, this is almost certain to be dwarfed by the cost of a rehash. And, except when using a compile-time constant of one, the costs of an integer or rational limit aren't much less in any case.

An enum is less flexible from the user's point of view, because a user might want a load factor that's not in the predetermined list. Ad hoc rational numbers are just clumsy, both for the user and for the implementer.

I propose that the maximum load factor initially be set to one. This is an arbitrary choice.

The Dinkumware hash table implementation uses a compile-time integer constant (part of a hash traits class) to control the maximum load factor, and the Metrowerks implementation uses a run-time floating-point parameter. The SGI implementation does not provide any mechanism for controlling the maximum load factor.

Iterators

There is one basic decision to be made about hash table iterators, which can be expressed either from the implementer's or the user's point of view. From the implementer's point of view: are the buckets singly linked lists, or doubly linked lists? From the user's point of view: are the iterators forward iterators, or bidirectional iterators?

From the implementer's point of view, there's no question that doubly linked lists are much easier to work with. One advantage is that you don't have to maintain a separate list for each bucket. You can keep a single long list, taking care that elements within a bucket remain adjacent; a bucket is then just a pair of pointers into the list. This is nice for the implementer, because a hash table iterator can just be a recycled std::list<>::iterator. It's nice for the user, because iteration is fast. This implementation technique may not appear to depend on the choice between singly and doubly linked lists, but it seems that it does. I don't know of a way to make the single-long-list technique work for singly linked lists: some operations that ought to be linear time would require a linear search through buckets. (The sticking point turns out to be that erasing a node in a singly linked list requires access to the node before it. The obvious workarounds just push the problem to a different place.)

From the user's point of view, the choice is a tradeoff. Hash tables that use singly linked lists have slower iterators, because an iterator first steps within a bucket and then, upon reaching the end of a bucket, steps to the next. Additionally, users may sometimes want to apply algorithms that require bidirectional iterators. If a hash table supplies bidirectional iterators, it's easier for users to switch between (say) hash_set<> and std::set<>. (But "easier" still doesn't mean easy. Some applications use the standard associative containers because of those containers' ordering guarantees, which can't possibly be preserved by hash tables.)

For the user, the disadvantage of bidirectional iterators is greater space overhead. The space overhead for singly linked lists is N + B words, where N is the number of elements and B is the bucket count, and the space overhead for doubly linked lists is 2N + 2B. This is an important consideration, because the main reason for using hashed associative containers is performance.

The SGI and Metrowerks implementations provide forward iterators. The Dinkumware implementation provides bidirectional iterators.

This proposal allows both choices. It requires hashed associative containers to provide forward iterators. An implementation that provides bidirectional iterators is conforming, because bidirectional iterators are forward iterators.

Bucket Interface

Like all STL containers, each of the hashed containers has member functions begin and end. The range [c.begin(), c.end()) contains all of the elements in the container, presented as a flat range. Elements within a bucket are adjacent, but the iterator interface presents no information about where one bucket ends and the next begins.

It's also useful to expose the bucket structure, for two reasons. First, it lets users investigate how well their hash function performs: it lets them test how evenly elements are distributed within buckets and look at the elements within a bucket to see if they have any common properties. Second, if the iterators have an underlying segmented structure (as they do in existing hash table implementations that use singly linked lists), algorithms that exploit that structure, with an explicit nested loop, can be more efficient than algorithms that view the elements as a flat range.

The most important part of the bucket interface is an overloading of begin and end. If n is an integer, [begin(n), end(n)) is a range of iterators pointing to the elements in the nth bucket. These member functions return iterators, of course, but not of type X::iterator or X::const_iterator. Instead they return iterators of type X::local_iterator or X::const_local_iterator. A local iterator is able to iterate within a bucket, but not necessarily between buckets; in some implementations it's possible for X::local_iterator to be a simpler data structure than X::iterator. X::iterator and X::local_iterator are permitted to be the same type; implementations that use doubly linked lists will probably take advantage of that freedom.

This bucket interface is not provided by the SGI, Dinkumware, or Metrowerks implementations. It is inspired partly by the Metrowerks collision-detection interface, and partly by earlier work [9] on algorithms for segmented containers.

Conclusion

The full text of the proposal, in a form and style that's similar to the existing C++ Standard, is shown in the sidebar, "Standardese for the Hash Table Proposal." All of the important design decisions have already been discussed; the text is merely a more formal way of saying the same thing.

Will the Standardization Committee adopt this proposal unchanged? Almost certainly not. It's a safe bet that hash tables will make it into the next revision of the C++ Standard, but it's also a safe bet that there will be changes along the way. And, of course, hash tables won't be the only way in which the standard library gets extended! If you want to write a proposal for a library extension, this can serve as a model.

Notes

[1] J. Barreiro, R. Fraley, and D. R. Musser. "Hash Tables for the Standard Template Library," Standardization Committee Document X3J16/94-0218 = WG21/N0605, 1995.

[2] SGI. "Standard Template Library Programmer's Guide," <www.sgi.com/tech/stl>.

[3] Dinkumware. "Dinkum C++ Library Reference," <www.dinkumware.com/htm_cpl/index.html>.

[4] Metrowerks. "Metrowerks CodeWarrior Pro 7 MSL C++ Reference Manual."

[5] M. H. Austern. "A Proposal to Add Hash Tables to the Standard Library," Standardization Committee Document N1326 = 01-0040, 2001.

[6] 17.3 of the Standard gives some guidance. 17.3.1.3, in particular, gives the format for detailed specification of class definitions and function prototypes.

[7] D. E. Knuth. The Art of Computer Programming Volume 3: Sorting and Searching, Second Edition, (Addison-Wesley, 1998). Hash tables are discussed in 6.4.

[8] P. J. Plauger. "State of the Art: Hash It," Embedded Systems Programming, September, 1998.

[9] M. H. Austern. "Segmented Iterators and Hierarchical Algorithms," 1998, in M. Jazayeri, R. G. K. Loos, and D. R. Musser, ed., Generic Programming: International Seminar on Generic Programming, Castle Dagstuhl, Springer, 2001.

Matt Austern is the author of Generic Programming and the STL and the chair of the C++ standardization committee’s library working group. He works at AT&T Labs — Research and can be contacted at [email protected].

April 2002/The Standard Librarian/Figure 1

Figure 1: Differences between this proposal and the Barreiro/Fraley/Musser proposal

April 2002/The Standard Librarian/Sidebar

Standardese for the Hash Table Proposal

A. Requirements

Hashed associative containers provide an ability for fast retrieval of data based on keys. The worst-case complexity for most operations is linear, but the average case is much faster. The library provides four basic kinds of hashed associative containers: hash_set, hash_map, hash_multiset, and hash_multimap.

Each hashed associative container is parameterized by Key, by a function object Hash that acts as a hash function for values of type Key, and on a binary predicate Pred that induces an equivalence relation on values of type Key. Additionally, hash_map and hash_multimap associate an arbitrary mapped type T with the Key.

A hash function is a function object that takes a single argument of type Key and returns a value of type std::size_t in the range [0, std::numeric_limits<std::size_t>::max()).

Two values k1 and k2 of type Key are considered equal if the container's equality function object returns true when passed those values. If k1 and k2 are equal, the hash function must return the same value for both.

A hashed associative container supports unique keys if it may contain at most one element for each key. Otherwise, it supports equivalent keys. hash_set and hash_map support unique keys. hash_multiset and hash_multimap support equivalent keys. In containers that support equivalent keys, elements with equivalent keys are adjacent to each other.

For hash_set and hash_multiset the value type is the same as the key type. For hash_map and hash_multimap it is equal to std::pair<const Key, T>.

The iterator types iterator and const_iterator of a hashed associative container are of at least the forward iterator category. For hashed associative containers where the key type and value type are the same, both iterator and const_iterator are const iterators.

The elements of a hashed associative container are organized into buckets. Keys with the same hash code appear in the same bucket. The number of buckets is automatically increased as elements are added to a hashed associative container so that the average number of elements per bucket is kept below a bound. Rehashing invalidates iterators, changes ordering between elements, and changes which buckets elements appear in, but does not invalidate pointers or references to elements.

In the following table, X is a hashed associative container class, a is an object of type X, a_uniq is an object of type X when X supports unique keys, a_eq is an object of type X when X supports equivalent keys, i and j are input iterators that refer to value_type, [i, j) is a valid range, p and q2 are valid iterators to a, q and q1 are valid dereferenceable iterators to a, [q1, q2) is a valid range in a, r and r1 are valid dereferenceable const iterators to a, r2 is a valid const iterator to a, [r1, r2) is a valid range in a, t is a value of type X::value_type, k is a value of type key_type, hf is a value of type hasher, eq is a value of type key_equal, n is a value of type size_type, and z is a value of type double.

Hashed Associative Container Requirements (in Addition to Container)

Expression Return Type Assertion/Note
Pre/Post-Condition Complexity

X::key_type Key Key is Assignable and CopyConstructible Compile time

X::hasher Hash Hash is a unary function object that takes an argument of type Key and returns a value of type std::size_t. Compile time

X::key_equal Pred Pred is a binary predicate that takes two arguments of type Key. Pred is an equivalence relation. Compile time

X::local_iterator An iterator type whose category, value type, difference type, and pointer and reference types are the same as X::iterator's. A local_iterator object may be used to iterate through a single bucket, but may not be used to iterate across buckets. Compile time

X::const_local_iterator An iterator type whose category, value type, difference type, and pointer and reference types are the same as X::const_iterator's. A const_local_iterator object may be used to iterate through a single bucket, but may not be used to iterate across buckets. Compile time

X(n, hf, eq)
X a(n, hf, eq); X Constructs an empty container with at least n buckets, using hf as the hash function and hf as the key equality predicate. Constant

X(n, hf)
X a(n, hf); X Constructs an empty container with at least n buckets, using hf as the hash function and key_equal() as the key equality predicate. Constant

X(n)
X a(n); X Constructs an empty container with at least n buckets, using hasher() as the hash function and key_equal() as the key equality predicate. Constant

X()
X a; X Constructs an empty container with an unspecified number of buckets, using hasher() as the hash function and key_equal as the key equality predicate. Constant

X(i, j, n, hf, eq)
X a(i, j, n, hf, eq); X Constructs an empty container with at least n buckets, using hf as the hash function and hf as the key equality predicate, and inserts elements from [i, j) into it. Average case O(N)
(N is
std::distance(i, j));
worse case O(N²)

X(i, j, n, hf)
X a(i, j, n, hf); X Constructs an empty container with at least n buckets, using hf as the hash function and key_equal() as the key equality predicate, and inserts elements from [i, j) into it. Average case O(N)
(N is
std::distance(i, j));
worse case O(N²)

X(i, j, n)
X a(i, j, n); X Constructs an empty container with at least n buckets, using hasher() as the hash function and key_equal() as the key equality predicate, and inserts elements from [i, j) into it. Average case O(N)
(N is
std::distance(i, j));
worse case O(N²)

X(i, j)
X a(i, j); X Constructs an empty container with an unspecified number of buckets, using hasher() as the hash function and key_equal as the key equality predicate, and inserts elements from [i, j) into it. Average case O(N)
(N is
std::distance(i, j));
worse case O(N²)

a.hash_funct() hasher Returns the hash function out of which a was constructed. Constant

a.key_eq() key_equal Returns the key equality function out of which a was constructed. Constant

a_uniq.insert(t) std::pair<iterator, bool> Inserts t if and only if there is no element in the container with key equivalent to the key of t. The bool component of the returned pair indicates whether the insertion takes place, and the iterator component points to the element with key equivalent to the key of t. Average case O(1); worst case O(a_uniq.size())

a_eq.insert(t) iterator Inserts t, and returns an iterator pointing to the newly inserted element. Average case O(1); worst case O(a_uniq.size())

a.insert(r, t) iterator Equivalent to a.insert(t). Return value is an iterator pointing to the element with the key equivalent to that of t. The const iterator r is a hint pointing to where the search should start. Implementations are permitted to ignore the hint. Average case O(1); worst case O(a.size())

a.insert(i, j) void Pre: i and j are not iterators in a.
Equivalent to a.insert(t) for each element in [i,j). Average case O(N), where N is std::distance(i, j); worst case O(N * a.size())

a.erase(k) size_type Erases all elements with key equivalent to k. Returns the number of elements erased. Average case O(a.count(k)); worst case O(a.size()

a.erase(r) void Erases the element pointed to by r. Average case O(1); worst case O(a.size())

a.erase(r1, r2) void Erases all elements in the range [r1, t2). Average case O(std::distance(r1, r2)); worst case O(a.size())

a.clear() void Erases all elements in the container.
Post: a.size() == 0 Linear

a.find(k) iterator;
const_iterator for const a Returns an iterator pointing to an element with key equivalent to k, or a.end() if no such element exists. Average case O(1); worst case O(a.size())

a.count(k) size_type Returns the number of elements with key equivalent to k. Average case O(1); worst case O(a.size())

a.equal_range(k) std::pair<iterator, iterator>;
std::pair<const_iterator, const_iterator> for const a Returns a range containing all elements with keys equivalent to k. Returns std::make_pair(a.end(), a.end()) if no such elements exist. Average case O(a.count(k)); worst case O(a.size())

a.bucket_count() size_type Returns the number of buckets that a contains. Constant

a.max_bucket_count() size_type Returns an upper bound on the number of buckets that a might ever contain. Constant

a.bucket(k) size_type Returns the index of the bucket in which elements with keys equivalent to k would be found, if any such elements exist.
Post: the return value is in the range [0, a.bucket_count()). Constant

a.bucket_size(n) size_type Pre: n is in the range [0, a.bucket_count()).
Returns the number of elements in the n^th bucket. O(a.bucket_size(n))

a.begin(n) local_iterator;
const_local_iterator for const a Pre: n is in the range [0, a.bucket_count()).
Note: [a.begin(n), a.end(n)) is a valid range containing all of the elements in the n^th bucket. Constant

a.end(n) local_iterator;
const_local_iterator for const a Pre: n is in the range [0, a.bucket_count()). Constant

a.load_factor() double Returns the average number of elements per bucket. Constant

a.max_load_factor() double Returns a number that the container attempts to keep the load factor less than or equal to. The container automatically increases the number of buckets as necessary to keep the load factor below this number.
Post: return value is positive. Constant

a.set_max_load_factor(z) void Pre: z is positive.
Changes the container's maximum load load factor.
Post: a.max_load_factor() == z Constant

a.rehash(n) void Pre: n > a.size() / a.max_load_factor().
Changes the number of buckets so that it is at least n. O(a.size()).

B. Hash Function

1. To be added to the <functional> synopsis
    // Hash function base template
    template <class T> struct hash;

    // Hash function specializations

    template <> struct hash<char>;
    template <> struct hash<signed char>;
    template <> struct hash<unsigned char>;
    template <> struct hash<wchar_t>;
    template <> struct hash<short>;
    template <> struct hash<int>;
    template <> struct hash<long>;
    template <> struct hash<unsigned short>;
    template <> struct hash<unsigned int>;
    template <> struct hash<unsigned long>;

    template <class charT, class traits>
    struct hash<std::basic_string<charT, traits> >;

    template<> struct hash<const char*>;
    template<> struct hash<char*>;
2. Class template hash

The function object hash is used as the default hash function by the hashed associative containers. This class template is required to be instantiable only for the following types: char, signed char, unsigned char, wchar_t, short, int, long, unsigned short, unsigned int, unsigned long, char*, const char*, and (for any valid set of charT, traits, and Alloc) std::basic_string<charT, traits, Alloc>.
    template <class T>
    struct hash : public std::unary_function<T, std::size_t>
    {
      std::size_t operator()(T val) const;
    };
The return value of operator() is unspecified, except that equal arguments yield the same result. For char* and const char*, two arguments s1 and s2 yield the same result if std::strcmp(s1, s2) == 0.

C. Hashed Associative Containers

1. Header <hash_set> synopsis
namespace std {
 template <class Value,
            class Hash = hash<Value>,
            class Pred = std::equal_to<Value>,
            class Alloc = std::allocator<Value> > class hash_set;

 template <class Value, class Hash, class Pred, class Alloc>
 bool operator==(const hash_set<Value, Hash, Pred, Alloc>&,
                 const hash_set<Value, Hash, Pred, Alloc>&);

 template <class Value, class Hash, class Pred, class Alloc>
 bool operator!=(const hash_set<Value, Hash, Pred, Alloc>&,
                 const hash_set<Value, Hash, Pred, Alloc>&);

 template <class Value,
            class Hash = hash<Value>,
            class Pred = std::equal_to<Value>,
            class Alloc = std::allocator<Value> > class hash_multiset;

 template <class Value, class Hash, class Pred, class Alloc>
 bool operator==(const hash_multiset<Value, Hash, Pred, Alloc>&,
                 const hash_multiset<Value, Hash, Pred, Alloc>&);

 template <class Value, class Hash, class Pred, class Alloc>
 bool operator!=(const hash_multiset<Value, Hash, Pred, Alloc>&,
                 const hash_multiset<Value, Hash, Pred, Alloc>&);}
2. Header <hash_map> synopsis
namespace std { 
 template <class Key,
            class T,
            class Hash = hash<Key>,
            class Pred = std::equal_to<Key>,
            class Alloc = std::allocator<std::pair<const Key, T> > > class hash_set;

 template <class Key, class T, class Hash, class Pred, class Alloc> 
 bool operator==(const hash_set<Key, T, Hash, Pred, Alloc>&,    
                 const hash_set<Key, T, Hash, Pred, Alloc>&);

 template <class Key, class T, class Hash, class Pred, class Alloc>
 bool operator!=(const hash_set<Key, T, Hash, Pred, Alloc>&,
                 const hash_set<Key, T, Hash, Pred, Alloc>&);

 template <class Key,
            class T,
            class Hash = hash<Key>,
            class Pred = std::equal_to<Key>,
            class Alloc = std::allocator<std::pair<const Key, T> > > class hash_multiset;

 template <class Key, class T, class Hash, class Pred, class Alloc> 
 bool operator==(const hash_multiset<Key, T, Hash, Pred, Alloc>&,
                 const hash_multiset<Key, T, Hash, Pred, Alloc>&);

 template <class Key, class T, class Hash, class Pred, class Alloc>
 bool operator!=(const hash_multiset<Key, T, Hash, Pred, Alloc>&,
                 const hash_multiset<Key, T, Hash, Pred, Alloc>&);}
3. Class template hash_set

A hash_set is a kind of hashed associative container that supports unique keys (a hash_set contains at most one of each key value) and in which the elements' keys are the elements themselves.

A hash_set satisfies all of the requirements of a container and of a hashed associative container. It provides the operations described in the preceding requirements table for unique keys; that is, a hash_set supports the a_uniq operations in that table, not the a_eq operations. For a hash_set<Value> the key type and the value type are both Value. The iterator and const_iterator types are both const iterator types. It is unspecified whether or not they are the same type.

This section only describes operations on hash_set that are not described in one of the requirement tables, or for which there is additional semantic information.
 namespace std {
    template <class Value, 
              class Hash = hash<Value>,
              class Pred = std::equal_to<Value>,
              class Alloc = std::allocator<Value> >
    class hash_set
    {
    public:
      // types
      typedef Value                                    key_type;
      typedef Value                                    value_type;
      typedef Hash                                     hasher;
      typedef Pred                                     key_equal;
      typedef Alloc                                    allocator_type;
      typedef typename allocator_type::pointer         pointer;
      typedef typename allocator_type::const_pointer   const_pointer;
      typedef typename allocator_type::reference       reference;
      typedef typename allocator_type::const_reference const_reference;
      typedef implementation defined            size_type;
      typedef implementation defineddifference_type;

      typedef implementation defined            iterator;
  typedef implementation defined            const_iterator;

typedef implementation defined
              local_iterator;
      typedef implementation defined
              const_local_iterator;
      // construct/destroy/copy
      explicit hash_set(size_type n = implementation defined,
                        const hasher& hf = hasher(),
                        const key_equal& eql = key_equal(),
                        const allocator_type& a = allocator_type());
      template <class InputIterator>
        hash_set(InputIterator f, InputIterator l,
                 size_type n = implementation defined,
                 const hasher& hf = hasher(),
                 const key_equal& eql = key_equal(),
                 const allocator_type& a = allocator_type());
      hash_set(const hash_set&);
      ~hash_set();
      hash_set& operator=(const hash_set&);
      allocator_type get_allocator() const;

      // size and capacity
      bool empty() const;
      size_type size() const;
      size_type max_size() const;

      // iterators
      iterator       begin();
      const_iterator begin() const;
      iterator       end();
      const_iterator end() const;

      // modifiers
      std::pair<iterator, bool> insert(const value_type& obj);
      iterator insert(const_iterator hint, const value_type& obj);
      template <class InputIterator>
        void insert(InputIterator first, InputIterator last);

      void erase(const_iterator position);
      size_type erase(const key_type& k);
      void erase(const_iterator first, const_iterator last);
      void clear();
      void swap(hash_set&);

      // observers
      hasher hash_funct() const;
      key_equal key_eq() const;

      // lookup
      iterator       find(const key_type& k);
      const_iterator find(const key_type& k) const;
      size_type count(const key_type& k) const;
      std::pair<iterator, iterator> 
        equal_range(const key_type& k);
      std::pair<const_iterator, const_iterator>
        equal_range(const key_type& k) const;

      // bucket interface
      size_type bucket_count() const;
      size_type max_bucket_count() const;
      size_type bucket_size(size_type n);
      size_type bucket(const key_type& k);
      local_iterator begin(size_type n);
      const_local_iterator begin(size_type n) const;
      local_iterator end(size_type n);
      const_local_iterator end(size_type n) const;

      // hash policy
      double load_factor() const;
      double max_load_factor() const;
      void set_max_load_factor(double z);
      void rehash(size_type n);
    };

    template <class Value, class Hash, class Pred, class Alloc>
    bool operator==(const hash_set<Value, Hash, Pred, Alloc>& x,
                    const hash_set<Value, Hash, Pred, Alloc>& y);

    template <class Value, class Hash, class Pred, class Alloc>
    bool operator!=(const hash_set<Value, Hash, Pred, Alloc>& x,
                    const hash_set<Value, Hash, Pred, Alloc>& y);

    template <class Value, class Hash, class Pred, class Alloc>
      void swap(const hash_set<Value, Hash, Pred, Alloc>& x,
                const hash_set<Value, Hash, Pred, Alloc>& y); }
a. hash_set constructors
      explicit hash_set(size_type n = implementation defined,
                        const hasher& hf = hasher(),
                        const key_equal& eql = key_equal(),
                        const allocator_type& a = allocator_type());
Effects: Constructs an empty hash_set using the specified hash function, key equality function, and allocator, and using at least n buckets. If n is not provided, the number of buckets is implementation defined.

Complexity: Constant.

Note: The maximum load factor is initially 1.0.
      template <class InputIterator>
        hash_set(InputIterator f, InputIterator l,
                 size_type n = implementation defined,
                 const hasher& hf = hasher(),
                 const key_equal& eql = key_equal(),
                 const allocator_type& a = allocator_type());
Effects: Constructs an empty hash_set using the specified hash function, key equality function, and allocator, and using at least n buckets. (If n is not provided, the number of buckets is implementation defined.) Then inserts elements from the range [first, last).

Complexity: Average case linear; worst case quadratic.

Note: The maximum load factor is initially 1.0.

b. hash_set swap
      template <class Value, class Hash, class Pred, class Alloc>
        void swap(const hash_set<Value, Hash, Pred, Alloc>& x,
                  const hash_set<Value, Hash, Pred, Alloc>& y);
Effects:
 x.swap(y);
4. Class template hash_map

A hash_map is a kind of hashed associative container that supports unique keys (a hash_map contains at most one of each key value) and that associates values of another type mapped_type with the keys.

A hash_map satisfies all of the requirements of a container and of a hashed associative container. It provides the operations described in the preceding requirements table for unique keys; that is, a hash_map supports the a_uniq operations in that table, not the a_eq operations. For a hash_map<Key, T> the key type is Key, the mapped type is T, and the value type is std::pair<const Key, T>.

This section only describes operations on hash_map that are not described in one of the requirement tables, or for which there is additional semantic information.
 namespace std {
    template <class Key,
              class T,
              class Hash = hash<Key>,
       class Pred = std::equal_to<Key>,
              class Alloc = std::allocator<std::pair<const Key, T> > >
    class hash_map
    {
    public:
      // types
      typedef Key key_type;
      typedef std::pair<const Key, T>                  value_type;
      typedef T mapped_type;
      typedef Hash                                     hasher;
      typedef Pred                                     key_equal;
      typedef Alloc                                    allocator_type;
      typedef typename allocator_type::pointer         pointer;
      typedef typename allocator_type::const_pointer   const_pointer;
      typedef typename allocator_type::reference       reference;
 typedef typename allocator_type::const_reference const_reference;
      typedef implementation defined  size_type;
      typedef implementation defined difference_type;

      typedef implementation defined iterator;
      typedef implementation defined const_iterator;

  typedef implementation defined
              local_iterator;
      typedef implementation defined
              const_local_iterator;

      // construct/destroy/copy
      explicit hash_map(size_type n = implementation defined,
                        const hasher& hf = hasher(),
                        const key_equal& eql = key_equal(),
                        const allocator_type& a = allocator_type());

      template <class InputIterator>
        hash_map(InputIterator f, InputIterator l,
                 size_type n = implementation defined,
                 const hasher& hf = hasher(),
                 const key_equal& eql = key_equal(),
                 const allocator_type& a = allocator_type());
      hash_map(const hash_map&);
      ~hash_map();
      hash_map& operator=(const hash_map&);
      allocator_type get_allocator() const;

      // size and capacity
      bool empty() const;
      size_type size() const;
      size_type max_size() const;

      // iterators
      iterator       begin();
      const_iterator begin() const;
      iterator       end();
      const_iterator end() const;

      // modifiers
      std::pair<iterator, bool> insert(const value_type& obj);
      iterator insert(const_iterator hint, const value_type& obj);
      template <class InputIterator>
        void insert(InputIterator first, InputIterator last);

      void erase(const_iterator position);
      size_type erase(const key_type& k);
      void erase(const_iterator first, const_iterator last);
      void clear();

      void swap(hash_map&);

      // observers
      hasher hash_funct() const;
      key_equal key_eq() const;

      // lookup
      iterator       find(const key_type& k);
      const_iterator find(const key_type& k) const;
      size_type count(const key_type& k) const;
      std::pair<iterator, iterator> 
        equal_range(const key_type& k);
      std::pair<const_iterator, const_iterator>
        equal_range(const key_type& k) const;

      mapped_type& operator[](const key_type& k);

      // bucket interface
      size_type bucket_count() const;
      size_type max_bucket_count() const;
      size_type bucket_size(size_type n);
      size_type bucket(const key_type& k);
      local_iterator begin(size_type n);
      const_local_iterator begin(size_type n) const;
      local_iterator end(size_type n);
      const_local_iterator end(size_type n) const;

       // hash policy
      double load_factor() const;
      double max_load_factor() const;
      void set_max_load_factor(double z);
      void rehash(size_type n);
    };

    template <class Key, class T, class Hash, class Pred, class Alloc>
    bool operator==(const hash_map<Key, T, Hash, Pred, Alloc>& x,
                    const hash_map<Key, T, Hash, Pred, Alloc>& y);

    template <class Key, class T, class Hash, class Pred, class Alloc>
    bool operator!=(const hash_map<Key, T, Hash, Pred, Alloc>& x,
                    const hash_map<Key, T, Hash, Pred, Alloc>& y);

    template <class Key, class T, class Hash, class Pred, class Alloc>
      void swap(const hash_map<Key, T, Hash, Pred, Alloc>& x,
                const hash_map<Key, T, Hash, Pred, Alloc>& y); }
a. hash_map constructors
      explicit hash_map(size_type n = implementation defined,
                        const hasher& hf = hasher(),
                        const key_equal& eql = key_equal(),
                        const allocator_type& a = allocator_type());
Effects: Constructs an empty hash_map using the specified hash function, key equality function, and allocator, and using at least n buckets. If n is not provided, the number of buckets is implementation defined.

Complexity: Constant.

Note: The maximum load factor is initially 1.0.
      template <class InputIterator>
        hash_map(InputIterator f, InputIterator l,
                 size_type n = implementation defined,
                 const hasher& hf = hasher(),
                 const key_equal& eql = key_equal(),
                 const allocator_type& a = allocator_type());
Effects: Constructs an empty hash_map using the specified hash function, key equality function, and allocator, and using at least n buckets. (If n is not provided, the number of buckets is implementation defined.) Then inserts elements from the range [first, last).

Complexity: Average case linear; worst case quadratic.

Note:The maximum load factor is initially 1.0.

b. hash_map element access
      mapped_type& operator[](const key_type& k);
Effects: If the hash_map does not already contain an element whose key is equivalent to k, inserts std::pair<const key_type, mapped_type>(k, mapped_type()).

Returns: A reference to x.second, where x is the (unique) element whose key is equivalent to k.

c. hash_map swap
      template <class Value, class Hash, class Pred, class Alloc>
        void swap(const hash_map<Value, Hash, Pred, Alloc>& x,
                  const hash_map<Value, Hash, Pred, Alloc>& y);
Effects:
 x.swap(y);
5. Class template hash_multiset

A hash_multiset is a kind of hashed associative container that supports equivalent keys (a hash_multiset may contain multiple copies of the same key value) and in which the elements' keys are the elements themselves.

A hash_multiset satisfies all of the requirements of a container and of a hashed associative container. It provides the operations described in the preceding requirements table for equivalent keys; that is, a hash_multiset supports the a_eq operations in that table, not the a_uniq operations. For a hash_multiset<Value> the key type and the value type are both Value. The iterator and const_iterator types are both const iterator types. It is unspecified whether or not they are the same type.

This section only describes operations on hash_multiset that are not described in one of the requirement tables, or for which there is additional semantic information.
 namespace std {
    template <class Value, 
              class Hash = hash<Value>,
              class Pred = std::equal_to<Value>,
              class Alloc = std::allocator<Value> >
    class hash_multiset
    {
    public:
      // types
      typedef Value                                    key_type;
      typedef Value                                    value_type;
      typedef Hash                                     hasher;
      typedef Pred                 key_equal;
      typedef Alloc                                    allocator_type;
      typedef typename allocator_type::pointer         pointer;
      typedef typename allocator_type::const_pointer   const_pointer;
      typedef typename allocator_type::reference       reference;
      typedef typename allocator_type::const_reference const_reference;
      typedef implementation defined size_type;
      typedef implementation defined difference_type;

      typedef implementation defined iterator;
      typedef implementation defined const_iterator;

      typedef implementation defined
              local_iterator;
      typedef implementation defined
              const_local_iterator;

      // construct/destroy/copy
      explicit hash_multiset(size_type n = implementation defined,
                             const hasher& hf = hasher(),
                             const key_equal& eql = key_equal(),
                             const allocator_type& a = allocator_type());
      template <class InputIterator>
        hash_multiset(InputIterator f, InputIterator l,
                 size_type n = implementation defined,
       const hasher& hf = hasher(),
                 const key_equal& eql = key_equal(),
                 const allocator_type& a = allocator_type());
      hash_multiset(const hash_multiset&);
      ~hash_multiset();
      hash_multiset& operator=(const hash_multiset&);
      allocator_type get_allocator() const;

      // size and capacity
      bool empty() const;
      size_type size() const;
      size_type max_size() const;

      // iterators
      iterator       begin();
      const_iterator begin() const;
      iterator       end();
      const_iterator end() const;

      // modifiers
      iterator insert(const value_type& obj);
      iterator insert(const_iterator hint, const value_type& obj);
      template <class InputIterator>
        void insert(InputIterator first, InputIterator last);

      void erase(const_iterator position);
      size_type erase(const key_type& k);
      void erase(const_iterator first, const_iterator last);
      void clear();

      void swap(hash_multiset&);

      // observers
      hasher hash_funct() const;
      key_equal key_eq() const;

      // lookup
      iterator       find(const key_type& k);
      const_iterator find(const key_type& k) const;
      size_type count(const key_type& k) const;
      std::pair<iterator, iterator> 
        equal_range(const key_type& k);
      std::pair<const_iterator, const_iterator>
        equal_range(const key_type& k) const;

      // bucket interface
      size_type bucket_count() const;
      size_type max_bucket_count() const;
      size_type bucket_size(size_type n);
      size_type bucket(const key_type& k);
      local_iterator begin(size_type n);
      const_local_iterator begin(size_type n) const;
      local_iterator end(size_type n);
      const_local_iterator end(size_type n) const;

      // hash policy
      double load_factor() const;
      double max_load_factor() const;
      void set_max_load_factor(double z);
      void rehash(size_type n);
    };

    template <class Value, class Hash, class Pred, class Alloc>
    bool operator==(const hash_multiset<Value, Hash, Pred, Alloc>& x,
                    const hash_multiset<Value, Hash, Pred, Alloc>& y);

    template <class Value, class Hash, class Pred, class Alloc>
    bool operator!=(const hash_multiset<Value, Hash, Pred, Alloc>& x,
                    const hash_multiset<Value, Hash, Pred, Alloc>& y);

    template <class Value, class Hash, class Pred, class Alloc>
      void swap(const hash_multiset<Value, Hash, Pred, Alloc>& x,
                const hash_multiset<Value, Hash, Pred, Alloc>& y); }
a. hash_multiset constructors
      explicit hash_multiset(size_type n = implementation defined,
                             const hasher& hf = hasher(),
 const key_equal& eql = key_equal(),
                             const allocator_type& a
 = allocator_type());
Effects: Constructs an empty hash_multiset using the specified hash function, key equality function, and allocator, and using at least n buckets. If n is not provided, the number of buckets is implementation defined.

Complexity: Constant.

Note: The maximum load factor is initially 1.0.
      template <class InputIterator>
        hash_multiset(InputIterator f, InputIterator l,
 size_type n = implementation defined,
 const hasher& hf = hasher(),
                      const key_equal& eql = key_equal(),
                      const allocator_type& a = allocator_type());
Effects: Constructs an empty hash_multiset using the specified hash function, key equality function, and allocator, and using at least n buckets. (If n is not provided, the number of buckets is implementation defined.) Then inserts elements from the range [first, last).

Complexity: Average case linear; worst case quadratic.

Note: The maximum load factor is initially 1.0.

b. hash_multiset swap
      template <class Value, class Hash, class Pred, class Alloc>
        void swap(const hash_multiset<Value, Hash, Pred, Alloc>& x,
                  const hash_multiset<Value, Hash, Pred, Alloc>& y);
Effects:
 x.swap(y);
6. Class template hash_multimap

A hash_multimap is a kind of hashed associative container that supports equivalent keys (a hash_multimap may contain multiple copies of each key value) and that associates values of another type mapped_type with the keys.

A hash_multimap satisfies all of the requirements of a container and of a hashed associative container. It provides the operations described in the preceding requirements table for equivalent keys; that is, a hash_multimap supports the a_eq operations in that table, not the a_uniq operations. For a hash_multimap<Key, T> the key type is Key, the mapped type is T, and the value type is std::pair<const Key, T>.

This section only describes operations on hash_multimap that are not described in one of the requirement tables, or for which there is additional semantic information.
 namespace std {
    template <class Key,
            class T,
              class Hash = hash<Key>,
              class Pred = std::equal_to<Key>,
              class Alloc = std::allocator<std::pair<const Key, T> > >
    class hash_multimap
    {
    public:
      // types
      typedef Key key_type;
      typedef std::pair<const Key, T>            value_type;
      typedef T mapped_type;
      typedef Hash                                     hasher;
      typedef Pred key_equal;
      typedef Alloc                                    allocator_type;
      typedef typename allocator_type::pointer         pointer;
      typedef typename allocator_type::const_pointer   const_pointer;
      typedef typename allocator_type::reference       reference;
      typedef typename allocator_type::const_reference const_reference;
      typedef implementation defined size_type;
      typedef implementation defined difference_type;

      typedef implementation defined iterator;
      typedef implementation defined const_iterator;

      typedef implementation defined
              local_iterator;
      typedef implementation defined
              const_local_iterator;

      // construct/destroy/copy
      explicit hash_multimap(size_type n = implementation defined,
                             const hasher& hf = hasher(),
                             const key_equal& eql = key_equal(),
                             const allocator_type& a = allocator_type());
      template <class InputIterator>
        hash_multimap(InputIterator f, InputIterator l,
                      size_type n = implementation defined,
                      const hasher& hf = hasher(),
                      const key_equal& eql = key_equal(),
                      const allocator_type& a = allocator_type());
      hash_multimap(const hash_multimap&);
      ~hash_multimap();
      hash_multimap& operator=(const hash_multimap&);
      allocator_type get_allocator() const;

      // size and capacity
      bool empty() const;
      size_type size() const;
      size_type max_size() const;

      // iterators
      iterator       begin();
      const_iterator begin() const;
      iterator       end();
      const_iterator end() const;

      // modifiers
      iterator insert(const value_type& obj);
      iterator insert(const_iterator hint, const value_type& obj);
    template <class InputIterator>
        void insert(InputIterator first, InputIterator last);

      void erase(const_iterator position);
      size_type erase(const key_type& k);
      void erase(const_iterator first, const_iterator last);
      void clear();

      void swap(hash_multimap&);

      // observers
      hasher hash_funct() const;
      key_equal key_eq() const;

      // lookup
      iterator       find(const key_type& k);
      const_iterator find(const key_type& k) const;
      size_type count(const key_type& k) const;
      std::pair<iterator, iterator> 
        equal_range(const key_type& k);
      std::pair<const_iterator, const_iterator>
        equal_range(const key_type& k) const;

      mapped_type& operator[](const key_type& k);

      // bucket interface
      size_type bucket_count() const;
      size_type max_bucket_count() const;
      size_type bucket_size(size_type n);
      size_type bucket(const key_type& k);
      local_iterator begin(size_type n);
      const_local_iterator begin(size_type n) const;
      local_iterator end(size_type n);
      const_local_iterator end(size_type n) const;

      // hash policy
      double load_factor() const;
      double max_load_factor() const;
      void set_max_load_factor(double z);
      void rehash(size_type n);
    };

    template <class Key, class T, class Hash, class Pred, class Alloc>
    bool operator==(const hash_multimap<Key, T, Hash, Pred, Alloc>& x,
                    const hash_multimap<Key, T, Hash, Pred, Alloc>& y);

    template <class Key, class T, class Hash, class Pred, class Alloc>
    bool operator!=(const hash_multimap<Key, T, Hash, Pred, Alloc>& x,
                    const hash_multimap<Key, T, Hash, Pred, Alloc>& y);

    template <class Key, class T, class Hash, class Pred, class Alloc>
      void swap(const hash_multimap<Key, T, Hash, Pred, Alloc>& x,
                const hash_multimap<Key, T, Hash, Pred, Alloc>& y); }
a. hash_multimap constructors
      explicit hash_multimap(size_type n = implementation defined,
                             const hasher& hf = hasher(),
                             const key_equal& eql = key_equal(),
                             const allocator_type& a 
= allocator_type());
Effects: Constructs an empty hash_multimap using the specified hash function, key equality function, and allocator, and using at least n buckets. If n is not provided, the number of buckets is implementation defined.

Complexity: Constant.

Note: The maximum load factor is initially 1.0.
      template <class InputIterator>
        hash_multimap(InputIterator f, InputIterator l,
                      size_type n = implementation defined,
                      const hasher& hf = hasher(),
                      const key_equal& eql = key_equal(),
                      const allocator_type& a = allocator_type());
Effects: Constructs an empty hash_multimap using the specified hash function, key equality function, and allocator, and using at least n buckets. (If n is not provided, the number of buckets is implementation defined.) Then inserts elements from the range [first, last).

Complexity: Average case linear; worst case quadratic.

Note: The maximum load factor is initially 1.0.

b. hash_multimap swap
      template <class Value, class Hash, class Pred, class Alloc>
        void swap(const hash_multimap<Value, Hash, Pred, Alloc>& x,
                  const hash_multimap<Value, Hash, Pred, Alloc>& y);
Effects:
 x.swap(y);

	Using Equality	Using Less-Than
Failed Search	n	n/2
Successful Search	n/2	n/2 + 1

Hashed Associative Container Requirements (in Addition to Container)
Expression	Return Type	Assertion/Note Pre/Post-Condition	Complexity
X::key_type	Key	Key is Assignable and CopyConstructible	Compile time
X::hasher	Hash	Hash is a unary function object that takes an argument of type Key and returns a value of type std::size_t.	Compile time
X::key_equal	Pred	Pred is a binary predicate that takes two arguments of type Key. Pred is an equivalence relation.	Compile time
X::local_iterator	An iterator type whose category, value type, difference type, and pointer and reference types are the same as X::iterator's.	A local_iterator object may be used to iterate through a single bucket, but may not be used to iterate across buckets.	Compile time
X::const_local_iterator	An iterator type whose category, value type, difference type, and pointer and reference types are the same as X::const_iterator's.	A const_local_iterator object may be used to iterate through a single bucket, but may not be used to iterate across buckets.	Compile time
X(n, hf, eq) X a(n, hf, eq);	X	Constructs an empty container with at least n buckets, using hf as the hash function and hf as the key equality predicate.	Constant
X(n, hf) X a(n, hf);	X	Constructs an empty container with at least n buckets, using hf as the hash function and key_equal() as the key equality predicate.	Constant
X(n) X a(n);	X	Constructs an empty container with at least n buckets, using hasher() as the hash function and key_equal() as the key equality predicate.	Constant
X() X a;	X	Constructs an empty container with an unspecified number of buckets, using hasher() as the hash function and key_equal as the key equality predicate.	Constant
X(i, j, n, hf, eq) X a(i, j, n, hf, eq);	X	Constructs an empty container with at least n buckets, using hf as the hash function and hf as the key equality predicate, and inserts elements from [i, j) into it.	Average case O(N) (N is std::distance(i, j)); worse case O(N²)
X(i, j, n, hf) X a(i, j, n, hf);	X	Constructs an empty container with at least n buckets, using hf as the hash function and key_equal() as the key equality predicate, and inserts elements from [i, j) into it.	Average case O(N) (N is std::distance(i, j)); worse case O(N²)
X(i, j, n) X a(i, j, n);	X	Constructs an empty container with at least n buckets, using hasher() as the hash function and key_equal() as the key equality predicate, and inserts elements from [i, j) into it.	Average case O(N) (N is std::distance(i, j)); worse case O(N²)
X(i, j) X a(i, j);	X	Constructs an empty container with an unspecified number of buckets, using hasher() as the hash function and key_equal as the key equality predicate, and inserts elements from [i, j) into it.	Average case O(N) (N is std::distance(i, j)); worse case O(N²)
a.hash_funct()	hasher	Returns the hash function out of which a was constructed.	Constant
a.key_eq()	key_equal	Returns the key equality function out of which a was constructed.	Constant
a_uniq.insert(t)	std::pair<iterator, bool>	Inserts t if and only if there is no element in the container with key equivalent to the key of t. The bool component of the returned pair indicates whether the insertion takes place, and the iterator component points to the element with key equivalent to the key of t.	Average case O(1); worst case O(a_uniq.size())
a_eq.insert(t)	iterator	Inserts t, and returns an iterator pointing to the newly inserted element.	Average case O(1); worst case O(a_uniq.size())
a.insert(r, t)	iterator	Equivalent to a.insert(t). Return value is an iterator pointing to the element with the key equivalent to that of t. The const iterator r is a hint pointing to where the search should start. Implementations are permitted to ignore the hint.	Average case O(1); worst case O(a.size())
a.insert(i, j)	void	Pre: i and j are not iterators in a. Equivalent to a.insert(t) for each element in [i,j).	Average case O(N), where N is std::distance(i, j); worst case O(N * a.size())
a.erase(k)	size_type	Erases all elements with key equivalent to k. Returns the number of elements erased.	Average case O(a.count(k)); worst case O(a.size()
a.erase(r)	void	Erases the element pointed to by r.	Average case O(1); worst case O(a.size())
a.erase(r1, r2)	void	Erases all elements in the range [r1, t2).	Average case O(std::distance(r1, r2)); worst case O(a.size())
a.clear()	void	Erases all elements in the container. Post: a.size() == 0	Linear
a.find(k)	iterator; const_iterator for const a	Returns an iterator pointing to an element with key equivalent to k, or a.end() if no such element exists.	Average case O(1); worst case O(a.size())
a.count(k)	size_type	Returns the number of elements with key equivalent to k.	Average case O(1); worst case O(a.size())
a.equal_range(k)	std::pair<iterator, iterator>; std::pair<const_iterator, const_iterator> for const a	Returns a range containing all elements with keys equivalent to k. Returns std::make_pair(a.end(), a.end()) if no such elements exist.	Average case O(a.count(k)); worst case O(a.size())
a.bucket_count()	size_type	Returns the number of buckets that a contains.	Constant
a.max_bucket_count()	size_type	Returns an upper bound on the number of buckets that a might ever contain.	Constant
a.bucket(k)	size_type	Returns the index of the bucket in which elements with keys equivalent to k would be found, if any such elements exist. Post: the return value is in the range [0, a.bucket_count()).	Constant
a.bucket_size(n)	size_type	Pre: n is in the range [0, a.bucket_count()). Returns the number of elements in the n^th bucket.	O(a.bucket_size(n))
a.begin(n)	local_iterator; const_local_iterator for const a	Pre: n is in the range [0, a.bucket_count()). Note: [a.begin(n), a.end(n)) is a valid range containing all of the elements in the n^th bucket.	Constant
a.end(n)	local_iterator; const_local_iterator for const a	Pre: n is in the range [0, a.bucket_count()).	Constant
a.load_factor()	double	Returns the average number of elements per bucket.	Constant
a.max_load_factor()	double	Returns a number that the container attempts to keep the load factor less than or equal to. The container automatically increases the number of buckets as necessary to keep the load factor below this number. Post: return value is positive.	Constant
a.set_max_load_factor(z)	void	Pre: z is positive. Changes the container's maximum load load factor. Post: a.max_load_factor() == z	Constant
a.rehash(n)	void	Pre: n > a.size() / a.max_load_factor(). Changes the number of buckets so that it is at least n.	O(a.size()).