Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

Tables Within Tables: C++ and Brent's Technique


Hash Tables

Hash tables are a method of implementing the abstract data type Dictionary, which supports the following operations:

  • CREATE - create an empty dictionary
  • INSERT - insert an object into the dictionary
  • LOOKUP - find an object in the dictionary or return an error
  • REMOVE - remove an object from the dictionary.

The simplest implementation of a dictionary is a linear list. To insert an object, stick it in the first empty slot in the list. To find an object, look at each object in the list, succeed if you find it, and fail otherwise. To delete an object, mark its slot as empty. This is fine for small numbers of objects, but the time taken by the LOOKUP operation is proportional to the number of objects in the list.

Hashing is one of the methods used to improve the time complexity of LOOKUP. Define a function that maps the data contained in an object into a table address. Instead of looking at every object in the table to see if an object is present, compute the address from the object you're seeking and look at that element of the table. Either it's there or it isn't. The time complexity of a hash LOOKUP is determined by the time it takes to compute the hash function, which is independent of the table size.

The catch is that if you are using a finite table (which is generally the case!), more than one key can map to the same address. This duplicate mapping is called a collision. You can minimize collisions by increasing the size of the table; if the table is most empty, new objects are unlikely to collide with existing ones. But there will always be a practical limit to how much memory you can afford to waste. Therefore, the time complexity of a hashing algorithm is determined primarily by how efficiently it can resolve collisions.

Use a second function to handle collisions. Add the secondary hash value to the primary one to get a new address and, while a collision still exists, add the secondary hash value to the current address. Eventually, you will arrive at an empty slot in which to put your data.

The best hash function is the simplest one that sufficiently minimizes collisions. Knuth discusses several hash functions, but the best is based on modulo arithmetic and careful selection of table size. If the table size is prime, you will get an even distribution of objects in the table. So your primary hash function becomes:

H1(data) = data MOD tablesize

or, more properly:

H1(data) = map(data) MOD tablesize

where map is some function that derives an integer from the data in the key.

The secondary function must never collide with the primary function. To prevent such collisions, select a number relatively prime to the table size. You can start at any address in the table and visit every other address by repeatedly adding in the secondary hash function.

The best behavior of all would be to use twin primes -- two prime numbers that have the property p1 - p2 = 2. (Why this is the case isn't within the scope of this article; take it on faith or read Knuth.) LOOKUP closely follows the logic of INSERT. The sequence of address produced by adding the secondary hash value is called the collision list for a particular object. To find an object, look down its collision list until you find either the object or an empty slot.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.