NaNs Just Don't Get No Respect
Floating-point NaN (Not A Number) values are the Rodney Dangerfields of numerical work. They are met mostly with a baffled "what are these silly things for?" and then avoided and ignored. The D programming language, however, has elevated NaNs to a more significant role, and I'll try to show how they can be very useful.
What Is A NaN?
NaNs are special floating-point values. They have the interesting property that any operation that has a NaN as one or more of its operands will yield a NaN result. NaNs are a bad smell that sticks to whatever you do with them. But it's the key to their utility.
The floating point representation on modern computers contains a representation of the number's sign, exponent, and mantissa. Special encodings are used to represent 0, positive and negative infinity, and NaN. In other words, NaN is a special bit pattern that is recognized by the floating-point hardware as, well, a NaN.
Getting NaN values is straightforward in D, as they are a property of their corresponding type:
float.nan double.nan real.nan
What's It Good For?
Let's suppose you've got a deep space probe. It's a million miles from Earth, and there's a bad sensor in an array of sensors, like a dead pixel on a monitor. It can't be fixed, you just have to live with it. A lot of math will be run on the output of that sensor array, and in order to survive scrutiny you'll need to be sure that the results are not influenced by that one bit of bad data.
What value is best to pick for that bad data?
A NaN fits the bill perfectly. Since any operations on a NaN yield a NaN, then any NaN results from the mathematical analysis will be ones that are dependent on the bad data. Simply substituting in "0" for the bad data and assuming it won't have a significant effect on the results just won't pass scrutiny.
It's not just for deep space probes; any real-world data collection is going to get bad data now and then, and they must be dealt with correctly.
That's explicit use of NaNs, and of course you can choose to not use them.
But D goes a bit farther, making implicit use of NaNs. By that I mean that floating-point data is default initialized to NaN. This is highly unusual. C and C++, for example, default initialize them to 0 if they are in static data, and to garbage if they are on the heap or the stack. I don't know of any other language that default initializes them to NaN.
float f; // f is now set to float.nan float h = 0; // h is initialized to 0 float g = f + 1; // g is initialized to float.nan
The first reaction to this is often one of irritation: "Why not just default initialize it to 0, which is actually useful?" The reply is that the programmer's intent that it be 0 initialized is indistinguishable from simply forgetting to initialize it to the correct value (which may not be 0 at all). Default initializing it to NaN will ensure that it won't silently produce results that look valid but are garbage.
The next reaction is along the lines of:
float f; ... code ... f = 0;
"I cannot afford the cost of the extra assignment." Fortunately, modern optimizer technology that's been around for 30 years is fairly good at eliminating dead assignments (dead assignments are values written but never read). Normally, this should never be an issue.
Given the code:
float f; if (condition) ++f;
the user writes "in language X, the compiler complains on line 3 saying that
f is used without being initialized. Isn't that a better solution?" For that example, it probably is a better solution. But consider:
float f; if (condition1) f = 7; ... code ... if (condition2) ++f;
condition2 are runtime tests, there are plenty of scenarios where
condition2 is always true if
condition1 is true, yet no static analyzer could possibly deduce that. This leaves the compiler with the unfortunate choice of:
- Ignoring the issue, even though there might be a serious bug there.
- Issuing a diagnostic on the
++fanyway, and add the word "perhaps" to the diagnostic.
The latter is what the compiler developer will likely select. This leads to the programmer getting annoyed with false positive error diagnostics, and he'll likely add an
=0 like this:
float f = 0; if (condition1) f = 7; ... code ... if (condition2) ++f;
All is well and good, and life goes on. The programmer leaves the company for a better job, and the maintenance programmer is given the job of modifying
condition1. He modifies it so that
condition2 is not always true if
condition1 is true, so that sometimes the
++f will see
f as 0. Now, suppose that in the algorithm being expressed, 0 is an invalid out-of-range value. A subtle bug has appeared, which is hardly obvious. After all, how could the maintenance programmer know that that
=0 is invalid? His predecessor was a top-notch programmer and must have known what he was doing.
f is default initialized to NaN, the fallout is:
fis to be initialized elsewhere, the initializer can be left off of the declaration.
- If the static analysis can prove that the default initializer is a dead assignment, it will remove it for no ill effect.
- If a bug is introduced into the code by providing an alternate path where the initialization is bypassed, then subsequent operations will see the NaN and will propagate it to the results. A NaN will be glaringly obvious in the output of the program, and so the bug can be detected and traced back to its source.
I hope this explanation illuminates why NaNs are useful for explicit handling of bad data, and useful for detecting initialization bugs when used implicitly. I know they can be annoying, like that obnoxious red light on the dashboard indicating you've got no oil pressure, but they are annoying for good reasons.
I almost forgot — if you really don't want default initialization, you can tell the compiler, "Relax, I'm a professional, I know what I'm doing" and void initialize:
float f = void;
Thanks to Jason House and Brad Roberts for their helpful comments on this article.