Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

How Dictionaries Work

February 06, 2013

Last week, I described a strategy for comparing complicated objects: Reduce each object to a single value that can then be compared. I'd like to continue the discussion with another strategy that is often useful: Treat each object as a sequence of symbols and impose a dictionary order on the symbols.

I hinted at this approach when I asked how one might compare two Things defined as

struct Thing { int a, b; };

Many programmers have tripped up by defining a comparison function similar to

               bool compare(const Thing& t1, const Thing& t2)
{
return t1.a < t2.a && t1.b < t2.b;
}

This function does not meet the requirements of a C++ order relation because it implies an "unordered" relation that is not transitive. However, this function is based on a reasonable idea: Compare respective elements of the two objects and combine those comparisons into a single result.

This strategy is complementary to the one I described last week. That strategy was to reduce each object to a single value and compare the values associated with the two objects. This one is to do the comparisons first, then combine the results of the comparisons into a single final result.

The general idea is to pick an order in which to compare elements. For example, we might decide to compare t1.a with t2.a. If one of these two values is "less than" the other, then that comparison will be the final result. Otherwise, t1.a and t2.a are unordered, so our result will be the result of comparing t1.b with t2.b. We can implement this strategy as follows:

               bool compare(const Thing& t1, const Thing& t2)
{
if (t1.a < t2.a)
          return true;
if (t2.a < t1.a)
           return false;
     return t1.b < t2.b;
}

Notice that I have carefully written this function to as to use only < for comparisons. We can rewrite it more succinctly, although not (in my opinion) more clearly, as

               bool compare(const Thing& t1, const Thing& t2)
{
return t1.a < t2.a || !(t2.a < t1.a) && t1.b < t2.b;
}

This strategy is analogous to how a dictionary works, so it is often called "dictionary order." All the words that begin with the same letter are together. Within the group of words beginning with a particular letter, all the ones that have the same second letter are together, and so on. Note that we have said nothing about how individual letters are compared. Instead, our strategy allows us to generalize a comparison function between individual components into a comparison function over collections of components.

We can generalize this notion still further to account for the possibility that the two sequences might not have the same number of elements. To compare two sequences:

  • If the sequences are the same length, and the corresponding elements of the sequences are unrelated, then the sequences themselves are unrelated. Remember that unrelated elements belong to the same equivalence class, so it means that each element of each sequence is in the same equivalence class as the corresponding element of the other sequence.
  • If one sequence is a prefix of the other (i.e., the sequences have different lengths, and each element of the shorter sequence is unrelated to the corresponding sequence of the longer sequence), then the shorter sequence is "smaller" than the longer one.
  • Otherwise, at least one element of one sequence is "less than" the corresponding element of the other sequence. The first such pair of elements (i.e., the ones closest to the beginnings of their respective sequences) determines the result of the overall comparison.

For example, if we're thinking about strings of characters, then abc and abc are unrelated because they have the same length and the corresponding elements are unrelated. Moreover, abc is "less than" abcd, because abc is a prefix of abcd. Finally, if we compare abc and abacus, we find that the first pair of corresponding elements in which one is "less than" the other is the c of abc and the a of abacus, so the result of comparing c with a determines the result of comparing abc with abacus.

Next week, we'll start talking about more general examples of comparison functions over sequences.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Comments:

ubm_techweb_disqus_sso_-826ad3d60cb32366c17900d34c93db85
2013-02-13T15:57:22

Clever! I'm not sure whether that's an argument for the technique or against it.


Permalink
ubm_techweb_disqus_sso_-826ad3d60cb32366c17900d34c93db85
2013-02-13T15:54:52

I think the choice of term depends mostly on which community happens to be nearby.


Permalink
ubm_techweb_disqus_sso_-5769185644cc5a3d9f8fb9eaf2098132
2013-02-13T08:19:21

I have heard "lexicographic(al) order" more often than "dictionary order".


Permalink
ubm_techweb_disqus_sso_-299869749595b6420305549d9e827720
2013-02-07T23:50:14

It is also possible to implement dictionary order with some help from std::tuple like this: std::forward_as_tuple( t1.a, t1.b ) < std::forward_as_tuple( t2.a, t2.b )


Permalink


Video