 Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726. # It's Hard To Compare Floating-Point Numbers

March 01, 2013

Last week I posed a problem: Suppose you have an inheritance hierarchy that lets you represent integers or floating-point numbers. How would you define comparison within your hierarchy? We can restate this problem in a language-independent way: How can we compare two numbers, either of which might be integer or floating-point?

## More Insights

More >>

More >>

### Webcasts

More >>

There are three cases: (1) Both numbers are integers; (2) Both numbers are floating-point; (3) One is an integer and the other is floating-point. We shall treat each case separately.

The first case, comparing two integers, is easy: Integer comparison is well defined and works on any integer value. However, even an operation seemingly as simple as comparing two floating-point numbers can cause trouble. The source of this trouble is a special value called "not a number," usually abbreviated NaN, that is implemented in most floating-point hardware as an extension to what C or C++ requires from a machine's floating-point arithmetic.

The purpose of NaN is to provide a value — or, more accurately, a family of values — that can be used to indicate that something went wrong during the process of computing that value. NaN values have the curious property that they compare as "unordered" with all values, even with themselves. In other words, if `x` is a NaN, and `y` is any floating-point value, NaN or not, then `x<y`, `x>y`, `x<=y`, `x>=y`, and `x==y` are all false, and `x!=y` is true.

This definition means that < on floating-point values fails to meet the requirements for C++ order relations! The reason for this failure is that if `x` and `y` are unequal ordinary values, and `z` is NaN, then `x` is unrelated to `z`, `z` is unrelated to `y`, but `x` and `y` are not unrelated to each other. Therefore, "unrelated" is not transitive, as C++ requires.

Remember that the requirements on "unrelated" cause it to partition the set of all values into equivalence classes. Because all NaN values are mutually unrelated, it makes sense to include them in a single equivalence class. As soon as we do so, we are saying that all NaN values are indistinguishable from each other for comparison purposes. In order for comparison to make sense, then, we must define it so that NaN is distinguishable from other values, so that non-NaN values can consistently be distinguished from each other.

The most straightforward strategy for defining comparison on floating-point numbers that might include NaN is to decide either that every NaN is "less than" every other number or "greater than" every other number. For the sake of the discussion, we'll define NaN as "greater than" every other number. Then we can define comparison between floating-point numbers as follows:

```
bool compare(double x, double y)
{
if (isnan(y))
return !isnan(x);
return x < y;
}
```

We use a function named `isnan`, which tests whether its argument is a NaN. This function is part of C++11, but might not be available in earlier versions of C++. Fortunately, it is easy to implement:

```
bool isnan(double x)
{
return x != x;
}
```

Our definition of compare collapses the case analysis. If `y` is NaN, then either `x` is also NaN, in which case we must return false (because all NaN values are unrelated to each other), or `x` is an ordinary value, in which case we must return true (because we decided earlier that NaN is "greater than" any ordinary value).

We are left with the case in which `y` is an ordinary number. If `x` is NaN, we've seen that < will return false — which is what we want because `x` should be "greater than" `y`. If `x` is not NaN, then < does comparison on ordinary numbers, which is again what we want.

In short, comparing two floating-point numbers is harder than it looks if your goal is to meet the C++ order-relation requirements. What about comparing a floating-point number with an integer? C++ defines such comparisons by converting the integer to floating-point. Such conversions are accurate as long as the floating-point number has at least as many bits in its fraction as the integer.

Unfortunately, integers sometimes have more bits than do the fractions of floating-point numbers. In particular, this sad state of affairs will occur on machines in which integers and floating-point numbers have the same total number of bits, because floating-point numbers devote some of those bits to the exponent. So, for example, most implementations devote only 24 bits of a 32-bit floating-point number to the fraction, and only 53 bits of a 64-bit floating-point number.

For many years, it was reasonable to assume that the double type had at least 64 bits, and the `int` and `long` types had at most 32 bits. Therefore, one could be pragmatically confident in the result of comparing integer and double values, though not necessarily float values. This assumption no longer applies: Now that both disk and memory addresses routinely exceed 232, it is not uncommon to find implementations with 64-bit integers and 64-bit floating-point values. Because 64-bit floating-point values have fewer than 64 bits in their floating-point fractions, comparisons between 64-bit integers and 64-bit floating-point numbers will not meet the C++ ordering requirements on such implementations.

How can we do such comparisons correctly? I'll show you one way of doing so next week. To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.

## Featured Reports ## Featured Whitepapers 