Why Do Even Trivial Improvements Take So Long To Adopt?
Last week, I outlined two simple rules for floating-point conversion:
- On input, treat a decimal value as an exact rational number; round that number to floating-point according to the normal rounding rules.
- On output, produce the decimal value with the smallest number of significant digits that yields exactly the same value when converted back to floating-point.
Although these rules are not easy to implement, reasonably fast implementations do exist. Moreover, these rules usually avoid annoyances such as printing numbers with long strings of trailing nines or zeroes followed by one or more garbage digits, and allow programmers to write floating-point numbers in human readable-form without losing information. In fact, it is hard to think of any disadvantages that might weigh against the advantages of this approach. So why doesn't every C++ implementation follow these rules?
To help understand the answer to this question, consider something that happened to me many years ago. I was part of a group providing software support for a university mainframe. One day we figured out that we could provide slightly more memory to our users by default. Because this change increased, rather than decreased, the resources available to user programs, we thought that the change would be so transparent that few if any users would even notice. Were we ever wrong!
The first sign of trouble was when a user reported that a program that he had been running every day was now unaccountably producing different results with the same input. We tested the program with the old, smaller memory limits and found that indeed, changing the program's memory allocation changed its results. We told that to the user, and explained that the only way in which this could happen was if the program contained a bug that was causing it to access uninitialized memory. Perhaps reducing the amount of memory available to a program might cause it to stop working, but there was no way in which increasing the amount of memory available could do so.
The user went away, and we thought the problem had been solved. Shortly after that, the facility director ordered us to roll back the memory expansion until the user had found the problem. His rationale was that the program was behaving satisfactorily before the change; it was behaving differently now; therefore, the change had broken something.
I tried to explain to the director that the program could not possibly have been working correctly before the change, and therefore that what we had really discovered was that the program's results should not have been trusted. His answer was that he really didn't care about that; a change in behavior was a change in behavior, and until we could prove to this user's satisfaction that everything was working properly, we had to roll back the change.
This happened long enough ago that I have forgotten how the problem was eventually resolved. Someone probably looked at the user's program and figured out what it had been doing wrong, and once it had been fixed, the memory-limit change no longer caused a behavior change. However, the lesson has stayed with me:
When a change to the programming environment changes a program's behavior, that environment change is considered suspicious until proven otherwise.
This lesson is true even when the behavior change is trivial. The reason is that it is often far from trivial to determine whether a change in a program's behavior is trivial or profound. As an example, I once knew someone who ran a suite of floating-point accuracy tests every night on the computer he normally used. One day, this suite started producing different results than it had done previously. It took a lot of investigation, but what he finally found was that the electrical contacts on the floating-point circuit board had corroded, with a net effect of causing certain floating-point computations to be done in single precision instead of double precision.
As another example, consider the 1994 bug in floating-point division on the Intel Pentium chip. On chips with this bug, dividing 4,195,835 by 3,145,727 would yield 1.333739… instead of the correct value 1.333820… . Many programs did not encounter this bug at all, but others would show the problem only by very subtle changes in their results.
Because of how programmers behave when they encounter problems of this sort, changing how a programming system behaves at all is much harder than adding new behavior — even when the change is a clear improvement. There will always be some programmers who consider any kind of change to be a bug.
In addition to the social pressure created by this behavior on the part of programmers, C++ is in an unusual situation that makes improvements of this kind even harder to adopt than they might be in other languages. We'll talk about that situation in detail next week.