Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

A Simple Matter of Conversion

August 09, 2010

Suppose you write a program that prints the value of the floating-point literal 0.1 in the programming language of your choice. Do you think it will actually print 0.1, or will it print something like 0.10000000000000001? Can you be confident in your answer without trying it?

Suppose you print the value of a floating-point number and then copy that output into a program as a floating-point literal. Will the value of that literal be exactly equal to the floating-point number that you printed? Can you be confident in your answer without trying it?

Surprisingly, the most useful answers to these questions have not only been understood for more than 20 years, but these answers have even been implemented efficiently, and made available free of charge. Nevertheless, most programmers do not even know that these solutions exist, and most programming-language implementations do not use them.The reason that these questions are interesting at all is that people normally use decimal numbers and computers normally use binary numbers. Moreover, decimal fractions, such as 0.1, do not have exact binary representations. Therefore, whenever we want to ask a computer to work with floating-point numbers, there is the possibility that the machine may not be able to represent exactly the numbers we wish to use. When that happens, it is necessary for the machine to use an approximation.

Under these circumstances, what is the right way to deal with decimal/binary floating-point conversions? Naively, one might think that the answer is always to use the most accurate possible converion. However, this notion fails to take into account two crucial differences between the decimal numbers that people write and the binary numbers that computers use.

The first difference is that computer numbers are usually limited in precision. For example, most computers limit their floating-point values to 64 bits, which must encompass the sign, exponent, and fraction. In contrast, when people write floating-point values in decimal, they tend to write as many digits as they need to express the values they intend.

The second difference is that there are some decimal values, such as 0.1, that cannot be expressed exactly in binary, no matter how many bits are available to do so. In contrast, every binary fraction can be expressed exactly in decimal--although the number of digits needed to do so may be surprising.

These differences between binary and decimal fractions mean that there is a fundamental difference between the two directions. When we convert decimal to binary, there is no way to keep more precision than will fit in the binary destination; when we convert binary to decimal, we can always obtain an exact result if we are willing to use enough digits.

If we were always willing to use enough digits to represent a floating-point value exactly we could always do so. However, 1/2 is 0.5, 1/4 is 0.25, 1/8 is 0.125, and in general, 1/(2^n) requires n digits after the decimal point. Because the exponent of a floating-point number can be large and negative, we can easily get into the situation of needing hundreds of decimal digits to represent a floating-point number with only, say, 52 bits in its fractional part.

In 1990, the ACM PLDI (Programming Language Design and Implementation) conference proceedings published two papers named How to read floating-point numbers accurately (by William Dl Clinger) and How to print floating-point numbers accurately (by Guy Steele and Jon White) that argued persuasively for an elegant solution to these problems.

The reading problem is not so difficult: It is hard to argue that a well-implemented programming language should do anything but yield the most accurate possible representation of whatever number is read. In other words, if you read 0.1 and convert it to binary floating-point, you should get whatever the result would be of converting the exact (i.e. infinite precision) value 0.1 to floating-point. There is never any question of what this value is, because the precision of the floating-point number that you are trying to read is known in advance.

Clinger's contribution to the solution of this problem was to publish an efficient way of doing this conversion. Shortly after that, David Gay came up with an even more efficient implementation, and made the code available to the public free of charge.

The writing problem is slightly more troublesome, because one must figure out not only how to obtain an appropriate result, and how to do so efficiently, but also how many decimal digits the result should have. Steele and White had the nice insight is that the right number of digits is the smallest number that will yield exactly the same value when used for input. Again David Gay implemented this algorithm efficiently and made the code available to the world.

These two algorithms together have some nice properties. For example, they guarantee that if you read a number that has a short representation as a literal, such as 0.1, and then print it again, you will never get a number with a longer representation. To see this, suppose that the internal representation of 0.1 is x. Then when we print x, 0.1 is one of the possible outputs. Any representation longer than 0.1 will therefore be rejected because there is a shorter representation, namely 0.1, that yields exactly x when we read it. So any time we can write a short representation for a floating-point number, the computer will choose the same representation (or an even shorter one).

Moreover, it should be obvious that if we use these algorithms for floating-point input and output, we can always write a floating-point number in decimal form, read it back, and obtain exactly the same value again. It may be more efficient in time and space to use a binary representstion, but it will be no more accurate.

These algorithms were published in 1990, and the authors said they had been thinking about the problems since the 1970s. In other words, there are decades of experience behind the solutions to these problems. These facts raise an interesting sociological question: Why don't all programming languages behave this way? Because in fact they don't. The C++ 2003 standard doesn't require it. I don't think the C++0x standard will do so either. Python started behaving this way only in release 3.1. Why? I think there are three reasons.

First, many progammers don't care. It is possible to write substantial programs without doing any floating-point arithmetic at all, and programmers who do significant floating-point computation often have other things to worry about.

Second, although the implementation is available, it's still not easy to get right--especially if one expects to be able to test it thoroughly.

Third, and probably most significantly, a lot of the code that handles floating-point conversions is probably even older than the Steele/White and Clinger papers. The reason is that input-output conversions are one of these areas that, once code exists to handle a problem, the code is largely left alone except in cases of absolute necessity. Moreover, any change to the code is apt to introduce an incompatibility: A program that once produced one result now produces a slightly different result. Few people would be happy about having to deal with bug reports that say that a program now prints 0.1 when it used to print 0.100000000000000005. So there is a lot of inertia that acts against improving code, especially code that's part of a run-time library, that already works reasonably well, or appears to do so.

This phenomenon is an example of one that appears in many contexts: there is a problem; a solution comes along that, although not perfect, is pretty good; and there the matter rests for a very long time. Even though it a better solution is known, that solution doesn't get implemented because a lot of users don't care, the improvement is hard to get right, and changing it would mean that programs no longer behave in exactly the same way as formerly.

The next few columns will discuss more examples of implementations that have persisted because they are "good enough."

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.