Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

More Accurate + Faster = Better, Right?

February 27, 2014

Last week, I invited you to consider a hypothetical computer that always computes the product of two float values as a double. How should a compiler writer for such a machine implement

 
     float f1 = 0.1, f2 = 10;
     double d = f1 * f2;

In particular, in the last statement above, is it OK for the implementation to store the double-precision value of f1 * f2 directly in d,, or must it round the product to float and extend it back to double before storing it?

On most computers, a double has at least twice as many fraction bits as a float. On such computers, if we assume that f1 and f2 are exact, then the double value of f1 * f2 is also exact. Accordingly, if we store the double-precision value of f1 * f2 directly in d, we can be confident that the value of d is the best possible approximation to the exact value of f1 * f2. If, on the other hand, we round the product to float, then extend it back to double, and then store it in d, the value of d will generally be slightly incorrect as a result of the rounding.

If the only way to multiply two float values yields a double result, then not only does converting that result to float and back to double make it less accurate, but it also makes the computation slower. As a result, storing the double value directly in d is not only more accurate, but also faster than converting to float and back.

Moreover, this extra accuracy is a useful way to avoid an important pitfall:

 
     vector<float> v1, v2;
 
     // Find the bug in the code below:
 
     assert(v1.size() == v2.size());
     double sum = 0;
     auto iter = v2.begin();
     for (float x: v1)
           sum += x * (*iter++);
 

This code checks that v1 and v2 have the same number of elements, then multiplies each element of v1 by the corresponding element of v2, adding all of the products together and putting their sum in sum. This computation is surprisingly hard to do with perfect accuracy; but by using a double to hold the intermediate sums, the code tries to eliminate one major source of inaccuracy.

Unfortunately, if the computation x * (*iter++) is rounded to float, much of the accuracy that might be gained by making sum a double is thrown away — and quietly, to boot. In contrast, if the implementation computes x * (*iter++) as a double, then the final value of sum will almost surely be more accurate than it would be if the computation is rounded to float.

Despite the advantages of an implementation taking advantage of single-precision multiplication yielding a double-precision result, I would like to argue that bypassing the extra steps of rounding the result to float and then extending it back to double is wrong from both a theoretical and a practical point of view.,/p>

The theoretical point of view is easy to explain: The C++ standard says that the product of two float values is a float, and if the double result is not exactly equal to any float, then that product is not being computed correctly. However, this theoretical argument does not immediately answer the question of why a more accurate result should not be preferred. If an implementation can produce a more accurate result than the standard requires, why not do so?

The answer to that question is practical: In this context, consistency trumps accuracy. To see why, consider these statements:

 
     float f = f1 * f2;
     double d = f;

If the value of d after these statements is not exactly the same as it would be after

 
               double d = f1 * f2;

then we have the strange situation that the expression f1 * f2, which the language defines as having type float, has two different values in different contexts. Such discrepancies can make it dramatically harder to reason about programs, because two different ways of writing a program that have the same meaning as far as the language definition is concerned might wind up with different results in practice.

I encountered just such a discrepancy when I was using a PL/I implementation many years ago. This implementation offered two different compilers, called respectively the optimizing compiler and the checkout compiler. The checkout compiler was designed to enforce all the language rules; the optimizing compiler was intended to generate fast code but to leave open the possibility that incorrect code might quietly generate incorrect results during execution.

I was working on a program that I wanted to be able to test. Accordingly, I ran it on the checkout compiler until I was confident that it did what I wanted it to do. After that, I ran it on the optimizing compiler — and was astonished to find that it gave different results. The whole point of having these two compilers was that if a program passed the checkout compiler, it would be possible to count on the program producing the same results under the optimizing compiler.

The problem turned out to be exactly the one we've been discussing in this article: The PL/I implementation was on a machine on which the only way to multiply two single-precision floating-point numbers yielded a double-precision product, and the optimizing compiler did not shorten that double-precision result to single precision. As a result, the optimizing compiler produced a result that, although theoretically more accurate, was different from the result produced by the compiler that was supposedly enforcing all of the language rules. I reported this discrepancy to the implementors, who eventually told me that they did not consider the compiler's behavior to be a bug because the result was at least as precise as the language required.

What this experience taught me was that "at least as precise" is sometimes just a polite way of saying "loosely specified." Moreover, this loose specification turned what should have been a key verification tool into yet one more piece of software to debug. To put it more succinctly:

In order to know whether your program is producing the right results, you must know what the right results are.

Whenever an implementation has latitude about what results it produces, that latitude makes the whole notion of "the right results" that much harder to nail down.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video