More Accurate + Faster = Better, Right?
Last week, I invited you to consider a hypothetical computer that always computes the product of two
float values as a
double. How should a compiler writer for such a machine implement
float f1 = 0.1, f2 = 10; double d = f1 * f2;
In particular, in the last statement above, is it OK for the implementation to store the double-precision value of
f1 * f2 directly in
d,, or must it round the product to
float and extend it back to
double before storing it?
On most computers, a
double has at least twice as many fraction bits as a
float. On such computers, if we assume that
f2 are exact, then the
double value of
f1 * f2 is also exact. Accordingly, if we store the double-precision value of
f1 * f2 directly in
d, we can be confident that the value of
d is the best possible approximation to the exact value of
f1 * f2. If, on the other hand, we round the product to
float, then extend it back to
double, and then store it in
d, the value of
d will generally be slightly incorrect as a result of the rounding.
If the only way to multiply two
float values yields a
double result, then not only does converting that result to
float and back to
double make it less accurate, but it also makes the computation slower. As a result, storing the
double value directly in
d is not only more accurate, but also faster than converting to
float and back.
Moreover, this extra accuracy is a useful way to avoid an important pitfall:
vector<float> v1, v2; // Find the bug in the code below: assert(v1.size() == v2.size()); double sum = 0; auto iter = v2.begin(); for (float x: v1) sum += x * (*iter++);
This code checks that
v2 have the same number of elements, then multiplies each element of
v1 by the corresponding element of
v2, adding all of the products together and putting their sum in
sum. This computation is surprisingly hard to do with perfect accuracy; but by using a
double to hold the intermediate sums, the code tries to eliminate one major source of inaccuracy.
Unfortunately, if the computation
x * (*iter++) is rounded to
float, much of the accuracy that might be gained by making
double is thrown away — and quietly, to boot. In contrast, if the implementation computes
x * (*iter++) as a
double, then the final value of
sum will almost surely be more accurate than it would be if the computation is rounded to
Despite the advantages of an implementation taking advantage of single-precision multiplication yielding a double-precision result, I would like to argue that bypassing the extra steps of rounding the result to
float and then extending it back to
double is wrong from both a theoretical and a practical point of view.,/p>
The theoretical point of view is easy to explain: The C++ standard says that the product of two
float values is a
float, and if the
double result is not exactly equal to any
float, then that product is not being computed correctly. However, this theoretical argument does not immediately answer the question of why a more accurate result should not be preferred. If an implementation can produce a more accurate result than the standard requires, why not do so?
The answer to that question is practical: In this context, consistency trumps accuracy. To see why, consider these statements:
float f = f1 * f2; double d = f;
If the value of
d after these statements is not exactly the same as it would be after
double d = f1 * f2;
then we have the strange situation that the expression
f1 * f2, which the language defines as having type
float, has two different values in different contexts. Such discrepancies can make it dramatically harder to reason about programs, because two different ways of writing a program that have the same meaning as far as the language definition is concerned might wind up with different results in practice.
I encountered just such a discrepancy when I was using a PL/I implementation many years ago. This implementation offered two different compilers, called respectively the optimizing compiler and the checkout compiler. The checkout compiler was designed to enforce all the language rules; the optimizing compiler was intended to generate fast code but to leave open the possibility that incorrect code might quietly generate incorrect results during execution.
I was working on a program that I wanted to be able to test. Accordingly, I ran it on the checkout compiler until I was confident that it did what I wanted it to do. After that, I ran it on the optimizing compiler — and was astonished to find that it gave different results. The whole point of having these two compilers was that if a program passed the checkout compiler, it would be possible to count on the program producing the same results under the optimizing compiler.
The problem turned out to be exactly the one we've been discussing in this article: The PL/I implementation was on a machine on which the only way to multiply two single-precision floating-point numbers yielded a double-precision product, and the optimizing compiler did not shorten that double-precision result to single precision. As a result, the optimizing compiler produced a result that, although theoretically more accurate, was different from the result produced by the compiler that was supposedly enforcing all of the language rules. I reported this discrepancy to the implementors, who eventually told me that they did not consider the compiler's behavior to be a bug because the result was at least as precise as the language required.
What this experience taught me was that "at least as precise" is sometimes just a polite way of saying "loosely specified." Moreover, this loose specification turned what should have been a key verification tool into yet one more piece of software to debug. To put it more succinctly:
In order to know whether your program is producing the right results, you must know what the right results are.
Whenever an implementation has latitude about what results it produces, that latitude makes the whole notion of "the right results" that much harder to nail down.