How Dictionaries Work

February 06, 2013

Last week, I described a strategy for comparing complicated objects: Reduce each object to a single value that can then be compared. I'd like to continue the discussion with another strategy that is often useful: Treat each object as a sequence of symbols and impose a dictionary order on the symbols.

I hinted at this approach when I asked how one might compare two `Thing`s defined as

```struct Thing { int a, b; };
```

Many programmers have tripped up by defining a comparison function similar to

```               bool compare(const Thing& t1, const Thing& t2)
{
return t1.a < t2.a && t1.b < t2.b;
}
```

This function does not meet the requirements of a C++ order relation because it implies an "unordered" relation that is not transitive. However, this function is based on a reasonable idea: Compare respective elements of the two objects and combine those comparisons into a single result.

This strategy is complementary to the one I described last week. That strategy was to reduce each object to a single value and compare the values associated with the two objects. This one is to do the comparisons first, then combine the results of the comparisons into a single final result.

The general idea is to pick an order in which to compare elements. For example, we might decide to compare `t1.a` with `t2.a`. If one of these two values is "less than" the other, then that comparison will be the final result. Otherwise, `t1.a` and `t2.a` are unordered, so our result will be the result of comparing `t1.b` with `t2.b`. We can implement this strategy as follows:

```               bool compare(const Thing& t1, const Thing& t2)
{
if (t1.a < t2.a)
return true;
if (t2.a < t1.a)
return false;
return t1.b < t2.b;
}
```

Notice that I have carefully written this function to as to use only < for comparisons. We can rewrite it more succinctly, although not (in my opinion) more clearly, as

```               bool compare(const Thing& t1, const Thing& t2)
{
return t1.a < t2.a || !(t2.a < t1.a) && t1.b < t2.b;
}
```

This strategy is analogous to how a dictionary works, so it is often called "dictionary order." All the words that begin with the same letter are together. Within the group of words beginning with a particular letter, all the ones that have the same second letter are together, and so on. Note that we have said nothing about how individual letters are compared. Instead, our strategy allows us to generalize a comparison function between individual components into a comparison function over collections of components.

We can generalize this notion still further to account for the possibility that the two sequences might not have the same number of elements. To compare two sequences:

• If the sequences are the same length, and the corresponding elements of the sequences are unrelated, then the sequences themselves are unrelated. Remember that unrelated elements belong to the same equivalence class, so it means that each element of each sequence is in the same equivalence class as the corresponding element of the other sequence.
• If one sequence is a prefix of the other (i.e., the sequences have different lengths, and each element of the shorter sequence is unrelated to the corresponding sequence of the longer sequence), then the shorter sequence is "smaller" than the longer one.
• Otherwise, at least one element of one sequence is "less than" the corresponding element of the other sequence. The first such pair of elements (i.e., the ones closest to the beginnings of their respective sequences) determines the result of the overall comparison.

For example, if we're thinking about strings of characters, then `abc` and `abc` are unrelated because they have the same length and the corresponding elements are unrelated. Moreover, `abc` is "less than" `abcd`, because `abc` is a prefix of `abcd`. Finally, if we compare `abc` and `abacus`, we find that the first pair of corresponding elements in which one is "less than" the other is the `c` of `abc` and the `a` of `abacus`, so the result of comparing `c` with `a` determines the result of comparing `abc` with `abacus`.

Next week, we'll start talking about more general examples of comparison functions over sequences.

More Insights

 To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.