Channels ▼

Bil Lewis

Dr. Dobb's Bloggers

The Canonical Object Hashtable

November 28, 2008



Let us now turn to the other primary use of association--when we want
to associate an object with a canonical counter-part. The whole idea
here is that we are going to have a single object that represents
whatever it is we want (say a person, Tiia). We are going to get
indirect references to this person in different fashions. Perhaps
someone will type her name into our program. Perhaps we'll get a web
request with her id in it. Perhaps we'll issue a database request and
get back a Person object.

No matter which technique we use to get a reference to Tiia, we want
to be certain that there's only one actual Person object for
Tiia. This means that we have to maintain some sort of mapping from
whatever unique identifiers we have to the one Person. We may well
write code that looks a bit like this:

Person findByName(String name) {
  Person p = nameToPersonMap.get(name);
  if (p != null) return p;
  p = (Person) database.lookup("FROM Person p WHERE p.name ='?'", name);
  if (p == null) return null;
  nameToPersonMap.put(name, p);
  return p;
}

The three most common implementations for nameToPersonMap are: a
hashtable (or HashMap), a list of name/Person pairs, and a binary
tree. 

A list is fastest for very small numbers of objects. How many objects
depends on the speed of the hashCode() method. For objects that
precompute the hash code and store it in an instance variable (e.g.,
String objects), the number is 1 or 2. If the hash code computation is
dynamic (very rarely useful), then the cutoff point is up in the air.

I ran a few numbers for computing string hash dynamically. For strings
of 10 characters, the list was fastest for upwards of 10 objects.

The main take-away here is that you should use hashtables all the
time, unless you sit down and calculate the cutoff point very
carefully.

Binary trees have similar behaviors as lists. The cutoffs are a bit
higher, but the same take-away is valid. Unless you do the
calculations, use a hashtable. Binary trees have other attributes that
make them attractive for somewhat different purposes. In particular,
they are sorted and finding a range of objects (e.g., everyone with
age between 19 and 21) in a binary tree is super-fast.

But that's not what we're dealing with here.

For us, hashtables are the way to go.

====

The JVM makes use of canonical objects in one instance. Any literal
string in your program code is guaranteed to be "interned". So if the
same literal string shows up in other code, you can rest assured that
the two references with be to the same object. This will print out
"The same object!" and "The same characters!".

String s0 = "XYZZY";
String s1 = "XYZZY";

if (s0 == s1) System.out.println("The same object!");
if (s0.equals(s1)) System.out.println("The same characters!");


whereas this will just print the second message:

String s0 = "XYZZY"+i;
String s1 = "XYZZY"+i;

if (s0 == s1) System.out.println("The same object!");
if (s0.equals(s1)) System.out.println("The same characters!");


because the compiler doesn't know what the value of i is going to be,
so it won't intern the new strings. It will just print the second
message. (If Java wanted to, it could intern all strings all the time,
but the cost of doing so wouldn't be worth the effort for most
programs.)

====

The big difference between the two situations is that for ownership,
we always want to find the entry for exactly the object we have. In
the second case we're looking for objects that are similar to the
object we have (e.g., strings that have the same characters in
them). Thus an equals() method for strings that compares characters is
appropriate for canonical lookups, while any equals() for Person other
than just == is inappropriate. This is a very common mistake for
programmers to make.

It's inappropriate because we have just promised ourselves that we
will never have two Person objects that represent the same person

If your code doesn't follow this pattern, you probably want to rewrite
your code(!)

For the ownership case, it makes no difference how the hash code is
computed. As long as there's no pattern to the hash codes that puts
too many of them in the same hash bucket, all is well. Most (all?)
JVMs use some of the bits (20 for Sun's JVM) in one of the header
words to store an identity hash code. This number is written upon
object creation and should be completely random. The only thing we use
it for is to find the starting bucket in the hashtable.

For the canonical lookup case, it's different. Here we require that
different objects that are "equals" have the same hash code. So two
strings with the letters "ABC" must produce the same hash code. Some
clever folks have spent a lot of time coming up with the best
algorithm for creating hash codes for strings.

For our own objects, a similar approach would be useful, although
close is usually just fine. It is very rare that we ever want declare
our own equals() method for an object. Almost every instance of
equals() that I've seen is wrong. Wrong in the sense that it doesn't
accomplish what the programmer intended.

We'll leave the evils of equals() for another time.

Chao!

-Bil

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video