Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

If C++ Objects Are Unrelated, Are They Equal?

January 24, 2013

Last week, I said that sorting in C++ depended on an appropriately defined comparison operation, and ended by explaining some properties that any such comparison operation must have. If for convenience we use < to refer to such an operation, then

  • < must be irreflexive, antisymmetric, and transitive.
  • If a<b and b<a are both false, we say that a and b are unordered.
  • The unordered relation must be reflexive, symmetric, and transitive.

Readers experienced in mathematical jargon will recognize the phrase reflexive, symmetric, and transitive. An operation with those properties is called an equivalence relation. Every equivalence relation over a set of objects partitions those objects. In other words, the equivalence relation divides the objects into subsets, called partitions, with the property that every object is a member of exactly one partition. As far as the particular equivalence relation is concerned, the relation holds between any two objects in the same partition, and does not hold between objects in different partitions.

To make this explanation complete, let's take a look at the comparison operation I suggested as an exercise last week:

bool compare(int m, int n)
{
     return abs(m) < abs(n);
}
 

I asked whether this operation meets the requirements for a comparison operation; the answer is yes in theory, but no in practice. The reason is that C++ implementations almost always represent integers as 2's-complement binary values, and if an integer has n bits (including the sign), it can potentially take on 2n possible values. One of those values is zero, so by implication the number of distinct negative integers must differ from the number of distinct positive integers. In other words, there must be at least one value k such that abs(k) cannot be represented as an integer. This fact has a surprising implication:

It is not possible to write a portable program that applies abs to an integer unless that integer is constrained in a machine-dependent way.

Therefore, to make this function work correctly in all cases, we have to rewrite it without using abs. Moreover, we cannot apply unary — to its arguments either, for the same reason. What we can do, however, is to break the problem down into cases:

  • If both arguments are positive or zero, we can just compare them.
  • If both arguments are negative, we can also just compare them, so long as we invert the result of the comparison.
  • Otherwise, their signs differ. In this case, we can compare their magnitudes by adding them and looking at the sign of the result. Because the signs differ, we are guaranteed that the addition will not overflow. In this case, saying that abs(m) < abs(n) is equivalent to saying that m+n is nonzero and has the same sign as n.

These observations allow us to rewrite our comparison function to work correctly in all cases:

bool compare(int m, int n)
{
     if (m >= 0 && n >= 0)
           return m < n;
     if (m < 0 && n < 0)
           return n < m;
     int sum = m + n;      
     return sum != 0 && ((sum < 0) == (n < 0));
}
 

Having corrected our comparison function, we can now think about its implications. This function treats m as "less than" n if the magnitude of m is less than that of n. In this case, it should be easy to see that m and n will be unordered if they have the same magnitude; that is, if they are the same except possibly for their sign.

The equivalence relation "equal except for sign" partitions the integers into sets of integers with the same magnitude. One of these sets has only a single element, namely 0. The others have two elements each: {-1, 1}, {-2, 2}, and so on. It should be easy to see that if you pick m from one of these sets and n from a different set, then (exactly) one of compare(m, n) and compare(n, m) will be true. Moreover, if you pick m and n from the same set, then both compare(m, n) and compare(n, m) will be false.

When you think about designing a C++ comparison function, it is a useful cross-check to verify that "unordered" is an equivalence relation and see if you can explain what two unordered elements have in common. In this example, integers are considered unordered if they have the same magnitude, regardless of sign. Other such situations may be harder to understand.

Here's another exercise:

           struct Thing { int a, b; };
 
     bool compare(const Thing& t1, const Thing& t2)
     {
           return t1.a < t2.a && t1.b < t2.b;
     }

Is this compare function a valid C++ comparison function? If not, how would you go about correcting it?

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Comments:

ubm_techweb_disqus_sso_-ff6fe7ac6b446db67dd665e3c8e52316
2013-01-30T14:54:04

Andrew, you are great, but just plain evil! And I mean that in a good way! You tend to "stir the pot" with developers, and that is the good thing. Anyway, excellent analysis and exposition of the issues related to this oft ignored issue. Thanks. It gives me some grist for my mental mill.


Permalink
ubm_techweb_disqus_sso_-16d7c50d6c078abc2acbddb01ccf46d4
2013-01-29T22:23:55

i++ is guaranteed to fail for integers when i equals INT_MAX, so does that mean I cannot use ++ for even built-in types?


Permalink
ubm_techweb_disqus_sso_-9194e6632fb2064475cc727130aa021e
2013-01-26T17:06:09

you are right right about intransitiveness. However, your comparison leaves some Thing's unordered, when they can be ordered. Also numerical overflow is a problem. To avoid those problems,
t1.a < t2.a || ( t1.a==t2.a && t1.b < t2.b )


Permalink
Zenju
2013-01-26T11:37:06

Your comparison function creates needlessly large equivalence sets (the circles around (0, 0)), which is not very useful as a predicate for std::set or std::map.

The simplest solution is to "do like the ints", which assumes that "=" is the equality induced by "<". Also this style composes nicely with larger structs:

bool less(const Thing& lhs, const Thing& rhs)
{
if (lhs.a != rhs.a)
return lhs.a < rhs.a;
return lhs.b < rhs.b;
}


Permalink
ubm_techweb_disqus_sso_-f4701f1f1b238a738686a9cb27de840f
2013-01-26T06:28:38

This function is not a valid comparison function.

For objects the are unrelated, they do not meet the transitive requirement.

For example, with three objects, t1(a1, b1), t2(a2, b2), and t3(a3,b3).
a1 < a2, b1 > b2; a1 < a3, b1 > b3; a2 < a3, b2 < b3
Then, t1 and t2 are unordered, t1 and t3 are unordered, but t2 < t3.

To make it work, we can use the following function,

bool compare(const Thing& t1, const Thing& t2)
{
return (t1.a * t1.a + t1.b * t1.b) < (t2.a * t2.a + t2.b * t2.b);
}


Permalink


Video