When the Simplest Case Is One Of The Hardest To Get Right
What have we learned so far about floating-point input and output?
Floating-point input involves a conversion with a known target precision. For example, when we write
double x; std::cin >> x;
we know the precision of x
. Therefore, we can reasonably expect that an implementation will cause x
to be the best possible approximation to whatever value we read, where "best possible" means "the result of rounding the infinite-precision input value according to the rounding rules currently in effect."
In contrast, floating-point output involves a conversion that might not have a known target precision. In particular, when we write
std::cout << 0.1 << std::endl;
there is no obvious choice about the number of significant digits that should appear in the decimal conversion of the floating-point number that was converted from 0.1
. To be sure, it is reasonable to expect that this statement will print 0.1
; but if we change it slightly:
std::cout << 1.0 / 3.0 << std::endl;
it is far from clear how many 3
s should appear in 0.33333…
.
In other words, despite appearances, input and output are far from symmetric.
Whenever we do floating-point input, we know exactly what the input is, and we also know exactly what precision the result should have. Accordingly, it seems to be possible to specify how an ideal implementation should handle input: Store best possible approximation to the given input in the required precision. The hard problem, then, is how to specify the behavior of floating-point output.
One reason that this problem is hard is that it is trying to meet conflicting goals. For example, on a machine with 53-bit double-precision floating-point fractions (i.e., most of the computers in use today), the closest double-precision value to 0.1
, converted back to decimal, is (exactly) 0.09999999999999997779553950749686919152736663818359375
. The first 15 significant digits of this number are all 9
, but the 16th and 17th digits are 78
. As a result, if we convert this value to decimal with 15 significant digits, we will get 0.1
, but if we use 16 significant digits, we will get 0.09999999999999998
.
This is a nasty state of affairs, because it implies that if we want (the closest floating-point value to) 0.1 to print as 0.1, we must limit our output to 15 significant digits. Unfortunately, 253 is 9007199254740992
, which has 16 digits. This fact shows that 15 significant digits are not always enough to represent accurately the value of a floating-point number with a 53-bit fraction.
In other words, we have two laudable goals that seem impossible to meet at once:
- Floating-point output should not automatically lose information — that is, when we convert two distinct floating-point number to decimal, the conversion yields two distinct results.
- Converting a floating-point number that is equal to an obviously simple value such as
0.1
should yield a similarly simple result.
We can rephrase the first of these goals in terms of idempotence: When we convert a floating-point number to decimal, and then convert the decimal representation back to floating-point, we would like the result of this round-trip conversion to be exactly the same value with which we started.
The first of these rules is easy enough to implement: When we convert a floating-point number to decimal, the result could simply be the exact decimal representation of the number. However, this suggestion is not really practical, because it would mean printing 0.1 as 0.09999999999999997779553950749686919152736663818359375
. So what should we do?
The first major step toward resolving these problems was Jerome Coonen's observation, around 1980, that it was possible to place specific bounds on how much error could be allowed in input and output while maintaining idempotence. Moreover, it was possible to come up with bounds that could be implemented efficiently. These observations were far from easy to prove; indeed, they were ultimately the core of his 1984 PhD thesis, and also found their way into the IEEE floating-point standard.
However, as we observed last week, loose bounds of this sort are an invitation for implementations to differ from each other — and sometimes even from themselves at different times or in different contexts — and such differences can stand in the way of effective debugging. Next week, we'll take a look at a beautifully elegant way of specifying floating-point input and output that avoids all these problems — albeit at the cost of being harder to implement — and then we'll explore the social factors that make this elegant solution hard to implement in practice.