# If C++ Objects Are Unrelated, Are They Equal?

Last week, I said that sorting in C++ depended on an appropriately defined comparison operation, and ended by explaining some properties that any such comparison operation must have. If for convenience we use < to refer to such an operation, then

- < must be irreflexive, antisymmetric, and transitive.
- If
`a<b`

and`b<a`

are both false, we say that`a`

and`b`

are unordered. - The unordered relation must be reflexive, symmetric, and transitive.

Readers experienced in mathematical jargon will recognize the phrase *reflexive*, *symmetric*, and *transitive*. An operation with those properties is called an *equivalence relation*. Every equivalence relation over a set of objects *partitions* those objects. In other words, the equivalence relation divides the objects into subsets, called partitions, with the property that every object is a member of exactly one partition. As far as the particular equivalence relation is concerned, the relation holds between any two objects in the same partition, and does not hold between objects in different partitions.

To make this explanation complete, let's take a look at the comparison operation I suggested as an exercise last week:

bool compare(int m, int n) { return abs(m) < abs(n); }

I asked whether this operation meets the requirements for a comparison operation; the answer is yes in theory, but no in practice. The reason is that C++ implementations almost always represent integers as 2's-complement binary values, and if an integer has `n`

bits (including the sign), it can potentially take on 2`n`

possible values. One of those values is zero, so by implication the number of distinct negative integers must differ from the number of distinct positive integers. In other words, there must be at least one value `k`

such that `abs(k)`

cannot be represented as an integer. This fact has a surprising implication:

*It is not possible to write a portable program that applies abs to an integer unless that integer is constrained in a machine-dependent way.*

Therefore, to make this function work correctly in all cases, we have to rewrite it without using `abs`

. Moreover, we cannot apply unary — to its arguments either, for the same reason. What we can do, however, is to break the problem down into cases:

- If both arguments are positive or zero, we can just compare them.
- If both arguments are negative, we can also just compare them, so long as we invert the result of the comparison.
- Otherwise, their signs differ. In this case, we can compare their magnitudes by adding them and looking at the sign of the result. Because the signs differ, we are guaranteed that the addition will not overflow. In this case, saying that
`abs(m) < abs(n)`

is equivalent to saying that`m+n`

is nonzero and has the same sign as`n`

.

These observations allow us to rewrite our comparison function to work correctly in all cases:

bool compare(int m, int n) { if (m >= 0 && n >= 0) return m < n; if (m < 0 && n < 0) return n < m; int sum = m + n; return sum != 0 && ((sum < 0) == (n < 0)); }

Having corrected our comparison function, we can now think about its implications. This function treats `m`

as "less than" `n`

if the magnitude of `m`

is less than that of `n`

. In this case, it should be easy to see that `m`

and `n`

will be unordered if they have the same magnitude; that is, if they are the same except possibly for their sign.

The equivalence relation "equal except for sign" partitions the integers into sets of integers with the same magnitude. One of these sets has only a single element, namely 0. The others have two elements each: {-1, 1}, {-2, 2}, and so on. It should be easy to see that if you pick `m`

from one of these sets and `n`

from a different set, then (exactly) one of `compare(m, n)`

and `compare(n, m)`

will be true. Moreover, if you pick `m`

and `n`

from the same set, then both `compare(m, n)`

and `compare(n, m)`

will be false.

When you think about designing a C++ comparison function, it is a useful cross-check to verify that "unordered" is an equivalence relation and see if you can explain what two unordered elements have in common. In this example, integers are considered unordered if they have the same magnitude, regardless of sign. Other such situations may be harder to understand.

Here's another exercise:

struct Thing { int a, b; }; bool compare(const Thing& t1, const Thing& t2) { return t1.a < t2.a && t1.b < t2.b; }

Is this `compare`

function a valid C++ comparison function? If not, how would you go about correcting it?

## Comments:

2013-01-30T14:54:04

Andrew, you are great, but just plain evil! And I mean that in a good way! You tend to "stir the pot" with developers, and that is the good thing. Anyway, excellent analysis and exposition of the issues related to this oft ignored issue. Thanks. It gives me some grist for my mental mill.

Permalink

2013-01-29T22:23:55

i++ is guaranteed to fail for integers when i equals INT_MAX, so does that mean I cannot use ++ for even built-in types?

Permalink

2013-01-26T17:06:09

you are right right about intransitiveness. However, your comparison leaves some Thing's unordered, when they can be ordered. Also numerical overflow is a problem. To avoid those problems,

t1.a < t2.a || ( t1.a==t2.a && t1.b < t2.b )

Permalink

2013-01-26T11:37:06

Your comparison function creates needlessly large equivalence sets (the circles around (0, 0)), which is not very useful as a predicate for std::set or std::map.

The simplest solution is to "do like the ints", which assumes that "=" is the equality induced by "<". Also this style composes nicely with larger structs:

bool less(const Thing& lhs, const Thing& rhs)

{

if (lhs.a != rhs.a)

return lhs.a < rhs.a;

return lhs.b < rhs.b;

}

Permalink

2013-01-26T06:28:38

This function is not a valid comparison function.

For objects the are unrelated, they do not meet the transitive requirement.

For example, with three objects, t1(a1, b1), t2(a2, b2), and t3(a3,b3).

a1 < a2, b1 > b2; a1 < a3, b1 > b3; a2 < a3, b2 < b3

Then, t1 and t2 are unordered, t1 and t3 are unordered, but t2 < t3.

To make it work, we can use the following function,

bool compare(const Thing& t1, const Thing& t2)

{

return (t1.a * t1.a + t1.b * t1.b) < (t2.a * t2.a + t2.b * t2.b);

}

Permalink