Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

Putting Things In Order

June 15, 2011

Last week, I gave an example of a function that should never be passed as an argument to sort:

struct Point { int x, y; };

bool compare(const Point& p, const Point& q)
{
   return p.x < q.x && p.y < q.y;
}

One reader took me to task for naming this function compare, and another commented more generally about how to teach sorting, merging, and recursion. What surprised me is that no one asked what was wrong with this function. I can think of two possible reasons for this phenomenon:

  1. What's wrong with this function is so obvious that no one thought it worth mentioning.
  2. I didn't make it clear enough that this code was broken.

To help me decide which of these cases applies (or, for that matter, whether there's a third possibility that I've missed), I want to talk about what you have to do if you want to write a comparison function to pass to sort.

Obviously, such a function should make it possible for sort to sort its input. It doesn't actually have to be a less-than comparison; in fact, one common way to use sort is to pass it a greater-than comparison, which causes sort to sort its input into decreasing order. However, the function does have to be able to "compare" elements in a way that lets sort put them in order.

The most fundamental requirement of such a function is consistency. In particular, sort should never be able to determine that a must come before b in the output or that a must come after b. We can express this requirement by saying that the comparison function must be:

  • Antisymmetric: If compare(a, b) is true, compare(b, a) must be false. By implication, compare(a, a) must always be false for any value of a.
  • Transitive: If compare(a, b) and compare(b, c) are both true, compare(a, c) must also be true.

These two requirements are enough to ensure that comparison is internally consistent. However, by themselves, they are not enough.

To see why, consider what might happen when we compare two values a and b. There are three possibilities:

  • compare(a, b) is true, in which case compare(b, a) must be false.
  • compare(b, a) is true, in which case compare(a, b) must be false.
  • Both compare(a, b) and compare(b, a) are false.

Less formally, the three cases are:

  • a must come before b in the output.
  • b must come before a in the output.
  • a and b can appear in either order.

For consistency, then, we have an additional requirement, which we might describe as follows:

  • If both compare(a, b) and compare(b, a) are false, we will say that a and b are unordered.
  • If a and b are unordered, and b and c are also unordered, then a and c are unordered. In other words, the notion of "unordered" is transitive.

If you're mathematically inclined (as I imagine that most readers are who have gotten this far), you will see that "unordered" is reflexive (a and a are always unordered because compare(a, a) is always false) and symmetric (because "and" is commutative), and therefore that this requirement makes "unordered" an equivalence relation.

If your eyes have glazed over, you might start reading again here, because I have an interesting example for you. I used to program in APL, which, among other interesting features, tried to make floating-point computation a little more user-friendly by treating numbers as equal if they were close enough together. In APL, you could divide 1 by 10 and be confident that the result would be equal to 0.1.

As a result of this attempted kindness, it was possible to create three numbers, which I'll call A, B, and C, with the following characteristics:

  • A is equal to B
  • B is equal to C
  • A is less than C

It is precisely this kind of inconsistency that the C++ requirements on comparison functions is intended to prevent.

If you really want to understand the rules for comparison functions, here are two exercises:

  1. Find numbers A, B, and C with the characteristics I mentioned.
  2. Explain why the compare function I showed at the beginning of this article is wrong.

I'll give the solutions to these exercises next week.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video