Even Simple Floating-Point Output Is Complicated
We continue the theme from last week and the week before that by discussing an idea that is both fundamental and complicated: floating-point input-output — in particular, the behavior that a programmer should have the right to expect from a program that reads or writes a floating-point number in human-readable form.
Much thought has gone into such behavior over the years, and as a result, an answer has emerged that is easy to define and to understand. Moreover, open-source implementations of this desirable behavior are generally available. Nevertheless, most programming languages still do not specify this aspect of implementations' behavior — because of the very tension between theory and practice that we have been discussing.
The root of the problem is that people usually write numbers in base 10 and computers usually store numbers in base 2. 10 is, of course, 2 times 5; and 2 does not divide 5 evenly. As a result, it follows that powers of 2 are generally not powers of 10, and vice versa. It further follows that numbers that are easy to write exactly in base 10, such as 0.1, are often impossible to write exactly in base 2. For example, 0.1 in base 10 is 0.0001100110011… in base 2; the accuracy of the approximation depends on how many bits one is willing to use to write it.
Interestingly, there is no problem converting numbers in base 2 to their exact equivalents in base 10. For example, 0.1, 0.01, and 0.001 in base 2 are 0.5, 0.25, and 0.125 in base 10, respectively. More generally, it is always possible to represent a number in base 2 with n bits after the binary point exactly as a number in base 10 with n digits after the decimal point. This phenomenon leads to a fundamental conclusion about floating-point output:
It is always possible to represent a binary floating-point number exactly in decimal if you are willing to use as many decimal digits after the decimal point , as there are bits after the binary point.
Of course, the computation that produced a particular binary floating-point number may have its own errors; but every binary floating-point number represents a mathematically exact value that always has an exact decimal representation.
Let's see what happens when we try to convince a C++ implementation to show us one of those exact decimal values. We'll start with a simple example:
double d = 1.0 / 3.0; std::cout << d << std::endl;
When I run this on my computer, it prints
0.333333. This behavior suggests that the default is to display either six digits after the decimal point or six significant digits overall; I'll leave it to the reader to devise an experiment to determine which case applies.
0.333333 is not the closest decimal approximation to the floating-point value
1.0 / 3.0. We can see this by running the following program fragment:
double d = 1.0 / 3.0; std::cout << d - 0.333333 << std::endl;
Doing so prints
3.33333e-07, which represents the first six significant digits of the difference between
1.0 / 3.0 and
0.333333. Moreover, if we increase the precision that we use to print our value:
double d = 1.0 / 3.0; std::cout << std::setprecision(20) << d << std::endl;
0.33333333333333331483, a result that suggests that our
d contains about 16 decimal digits' worth of precision — or at least a result that suggests that we can print that many digits. Is this result an exact representation of the value of
d? Probably not: On this particular computer, a
double variable comprises 64 bits, of which 12 are devoted to the sign and exponent. That leaves 52 bits, to which we add a hidden leading bit that is always 1 to obtain 53. Accordingly, the value of
1.0 / 3.0 should have 53 bits after the binary points, so that representing that value exactly in decimal should require 53 digits after the decimal point.
What happens when we try to obtain this 53-digit representation? If we execute
double d = 1.0 / 3.0; std::cout << std::setprecision(53) << d << std::endl;
0.3333333333333333148296162562473909929394722, which has only 43 digits after the decimal point. Without even going to the trouble of verifying these digits, we can be confident that what we have here is not an exact representation.
Why should something as seemingly simple as printing a floating-point number be so hard? The answer is a tangle of technical, pragmatic, and historical reasons, which we shall begin to explore next week.