# More Accurate + Faster = Better, Right?

Last week, I invited you to consider a hypothetical computer that always computes the product of two `float`

values as a `double`

. How should a compiler writer for such a machine implement

float f1 = 0.1, f2 = 10; double d = f1 * f2;

In particular, in the last statement above, is it OK for the implementation to store the double-precision value of `f1 * f2`

directly in `d,`

, or must it round the product to `float`

and extend it back to `double`

before storing it?

On most computers, a `double`

has at least twice as many fraction bits as a `float`

. On such computers, if we assume that `f1`

and `f2`

are exact, then the `double`

value of `f1 * f2`

is also exact. Accordingly, if we store the double-precision value of `f1 * f2`

directly in `d`

, we can be confident that the value of `d`

is the best possible approximation to the exact value of `f1 * f2`

. If, on the other hand, we round the product to `float`

, then extend it back to `double`

, and then store it in `d`

, the value of `d`

will generally be slightly incorrect as a result of the rounding.

If the only way to multiply two `float`

values yields a `double`

result, then not only does converting that result to `float`

and back to `double`

make it less accurate, but it also makes the computation slower. As a result, storing the `double`

value directly in `d`

is not only more accurate, but also faster than converting to `float`

and back.

Moreover, this extra accuracy is a useful way to avoid an important pitfall:

vector<float> v1, v2; // Find the bug in the code below: assert(v1.size() == v2.size()); double sum = 0; auto iter = v2.begin(); for (float x: v1) sum += x * (*iter++);

This code checks that `v1`

and `v2`

have the same number of elements, then multiplies each element of `v1`

by the corresponding element of `v2`

, adding all of the products together and putting their sum in `sum`

. This computation is surprisingly hard to do with perfect accuracy; but by using a `double`

to hold the intermediate sums, the code tries to eliminate one major source of inaccuracy.

Unfortunately, if the computation `x * (*iter++)`

is rounded to `float`

, much of the accuracy that might be gained by making `sum`

a `double`

is thrown away — and quietly, to boot. In contrast, if the implementation computes `x * (*iter++)`

as a `double`

, then the final value of `sum`

will almost surely be more accurate than it would be if the computation is rounded to `float`

.

Despite the advantages of an implementation taking advantage of single-precision multiplication yielding a double-precision result, I would like to argue that bypassing the extra steps of rounding the result to `float`

and then extending it back to `double`

is wrong from both a theoretical and a practical point of view.,/p>

The theoretical point of view is easy to explain: The C++ standard says that the product of two `float`

values is a `float`

, and if the `double`

result is not exactly equal to any `float`

, then that product is not being computed correctly. However, this theoretical argument does not immediately answer the question of why a more accurate result should not be preferred. If an implementation can produce a more accurate result than the standard requires, why not do so?

The answer to that question is practical: In this context, consistency trumps accuracy. To see why, consider these statements:

float f = f1 * f2; double d = f;

If the value of `d`

after these statements is not exactly the same as it would be after

double d = f1 * f2;

then we have the strange situation that the expression `f1 * f2`

, which the language defines as having type `float`

, has two different values in different contexts. Such discrepancies can make it dramatically harder to reason about programs, because two different ways of writing a program that have the same meaning as far as the language definition is concerned might wind up with different results in practice.

I encountered just such a discrepancy when I was using a PL/I implementation many years ago. This implementation offered two different compilers, called respectively the optimizing compiler and the checkout compiler. The checkout compiler was designed to enforce all the language rules; the optimizing compiler was intended to generate fast code but to leave open the possibility that incorrect code might quietly generate incorrect results during execution.

I was working on a program that I wanted to be able to test. Accordingly, I ran it on the checkout compiler until I was confident that it did what I wanted it to do. After that, I ran it on the optimizing compiler — and was astonished to find that it gave different results. The whole point of having these two compilers was that if a program passed the checkout compiler, it would be possible to count on the program producing the same results under the optimizing compiler.

The problem turned out to be exactly the one we've been discussing in this article: The PL/I implementation was on a machine on which the only way to multiply two single-precision floating-point numbers yielded a double-precision product, and the optimizing compiler did not shorten that double-precision result to single precision. As a result, the optimizing compiler produced a result that, although theoretically more accurate, was different from the result produced by the compiler that was supposedly enforcing all of the language rules. I reported this discrepancy to the implementors, who eventually told me that they did not consider the compiler's behavior to be a bug because the result was at least as precise as the language required.

What this experience taught me was that "at least as precise" is sometimes just a polite way of saying "loosely specified." Moreover, this loose specification turned what should have been a key verification tool into yet one more piece of software to debug. To put it more succinctly:

*In order to know whether your program is producing the right results, you must know what the right results are.*

Whenever an implementation has latitude about what results it produces, that latitude makes the whole notion of "the right results" that much harder to nail down.