This article discusses the question of numerical problems in programming, and focuses on the approach of using on-the-fly code instrumentation to uncover them at runtime. Two realizations are presented: a complete and stable solution for Java applications, and a proof-of-concept Valgrind add-on for Linux executables. Both tools require no intervention on the source code and no recompilation, and should be helpful as a diagnostic tool for developers, as well as for education purposes for undergraduate programmers.

### Numerical Pitfalls

The internal representation of computer numbers sometimes leads to abnormal behavior when compared with the world of mathematics. In particular, two levels of problems are commonly found:

- The basic differences between math and computer numbers that every programmer has to cope with in daily work. These are widespread and tricky enough, with nasty consequences (bug, vulnerabilities, and waste of time).
- The subtler and more challenging difficulties that experts in numerical applications and scientific computing face; for example, improving an algorithm with the help of numerical-error analysis, or selecting the right IEEE 754-rounding method when necessary. These matters, although very interesting, are outside the scope of this article. (However, a good introduction to these problems is available in PDF form.)

For the first group of problems, here is a possible classification of the most elementary numerical conflicts between theoretical and computer arithmetic:

- Integer overflow: Standard integer number types have a fixed-length representation, so the result of an arithmetic operation may fall out of bounds. In many programming languages (Java, C/C++), integer overflows are not signalled at runtime and are typically handled with circular arithmetic.
- Rounding: As a substitute for real numbers, standard types include floating-point representations. The result of an operation is always rounded to keep a fixed amount of significant bits. Moreover, as the representation relies on base two, some numbers as simple as 1.1 have a periodic (hence approximated) representation. Rounding can be harmless, but there are at least the following annoying consequences:
- Numerical stability: Rounding errors can propagate to the point where the final result of a cascade of operations is completely different from the theoretical value.
- Smearing: Adding a non-zero floating-point number to another may have no effect if the order of magnitude of the latter is sufficiently higher (for example, with Java doubles, A+1,000,000 gives exactly A if A>10
^{22}). This can break the logic of your algorithm. - Cancellation: If the manipulated values suffer from imprecision (either because of a noisy acquisition or as an effect of rounding), subtracting two very close numbers essentially leaves only garbage data because all most-significant bits are lost. The result has a correct order of magnitude, but the accuracy has disappeared.
- Underflow: The result of an operation has an order of magnitude that is too small to fit in the representation and gets rounded to exactly zero.

- Offending typecasting: When converting from one type to another, the value might be out-of-domain for the target type. This occurs between integers (signed-ness issues, narrowing conversion to a smaller width), between floating-point numbers (single/double precision), or between one another. In many programming languages, such typecasting must be explicitly stated, but at runtime, nothing special happens other than the result having nothing in common with the original value.
- Special values: Floating-point types define some special values such as NaN (Not a Number) and positive/negative infinity. So for instance, the square root of a negative number or 3.0/0.0 gives NaN, respectively +∞; a CPU flag is raised, but the event goes unnoticed when programming in high-level languages. When the final result gives an unexpected special value, it is often tedious to track down the particular line of code that first introduced the problem.

All this being well known, can it be tagged as a harmless phenomenon or should it be considered as a serious threat? The bad news is that these numerical pitfalls indeed have a high cost:

- Developers spend hours to detect then fix the bugs related to poor manipulation of numbers. As if this were not tedious enough, some language specifications (like C) complicate things further by considering several arithmetic events as "undefined behavior," and optimizing compilers make use of this, producing very counter-intuitive code. Isn't it frightening that whenever you add two signed
`int`

s, half of the possible computations may put your program in an undefined state, just like dereferencing an uninitialized pointer? - Numerical problems, and overflows in particular, are the kind of vulnerability that hackers like to exploit to break software. A very recent article gives an interesting analysis of the prevalence of the overflow problem and proposes its own detection utility, implemented as a compiler extension.

In sum, numerical problems are easy to understand, but hard to detect. They can turn into nasty bugs: The program generates seemingly perfect output, but may have a catastrophic effect when processing a bad conjunction of data.

As an example, let's consider the power-modulo operation (`x`

), which is used in the RSA crypto-system. Listing One shows two faulty, naive implementations in Java, along with a JUnit test case. The test checks the output of one invocation with small parameter values, then with larger ones.
^{y} mod z

#### Listing One: Faulty, naive implementations of power-modulo realizations.

static int powerModA(int x, int y, int z) { int res=1; while(y>0) { res = res*x; y--; } return res % z; } //---------------------------------------- static int powerModB(int x, int y, int z) { return (int) Math.pow(x,y) % z; } //--------------------------------------- @Test public void tinyTest() { int a,b; a= powerModA(2, 4, 5); b= powerModB(2, 4, 5); assertEquals(1, a); assertEquals(1, b); a= powerModA(497, 1854, 281); b= powerModB(497, 1854, 281); assertEquals(157, a); assertEquals(157, b); }

With this code, many occurrences of numerical flaws (integer overflow, infinite result, bad cast) are encountered during the test execution. The computed results on large parameter values are then nonsense — but they just happen to be correct with those particular test values. Thus, the test silently claims that all is OK, although a step-by-step walk through the computation reveals several clues about the defects.

### Defense Against Numerical Defects

Now that the threats conveyed by computer numbers have been highlighted, let's look at the possible means to help the programmer avoid the pitfalls and keep computations under control.

In the long term, as Bil Lewis points out, it would be legitimate to ask for a better support at several levels: hardware, runtime systems, and programming languages. Such a decisive improvement might happen in a decade or two, but in the meantime, what can help to cope with the dangers carried by computer numbers? We'll focus on the present situation of a developer working with C/C++ or Java.

### Attitude

The topmost weapon should be education and discipline. The good programmer is aware of the counter-intuitive numerical behavior, and exercises vigilance in every piece of code. Considering integer overflows or numerical stability has to become a reflex, just like automatic testing, code-injection prevention, memory freeing, deadlock-free threading, or GUI usability. But acquiring this skill is difficult. The topic is certainly addressed early in a beginning programming course, but can be perceived as an anecdotal feature — a normal consequence of the finite representation with detailed rules dependent on the programming language. Later, students may follow a course on "numerical recipes," but the discussion is focused on numerical stability and on comparing various algorithms. In the end, it's common to find computer scientists who have difficulties predicting the behavior of numbers in non-trivial situations. For them, some of the tools discussed shortly will be of help.

### Hardware

Most CPUs detect events such as integer overflows or NaN results, but these features (part of IEEE 754) are not always available to the developer, as the corresponding register flags are essentially hidden from high-level programming languages. Accessing to this information is possible in C/C++ by inserting assembly code, but this solution is tedious and not portable.

### Static Analysis

Compiler warnings and static-analysis tools can contribute to the prevention of numerical problems and many potential dangers can be tracked and signalled during development. Static analysis leverages the information known at compile-time. Unfortunately, on most programs, far too many false positives can be expected when dealing with integer overflow or floating-point cancellation. The support would improve only if the developer adds dedicated annotations in the code to explicitly state the assumptions about the valid range of values for several fields, parameters, and variables. This implies a shift of the development effort towards more formal specification (which in itself is an interesting trend).

### Source Code

Many numerical computations boil down to classic operations such as matrix manipulations. Whenever possible, it is a good practice to take advantage of well-tested numerical computation libraries (such as LAPACK) that are designed to avoid most problems and use the most appropriate algorithms.

At source-code level, the developer can decide to rely on safer types of numbers that can be found in third-party libraries (sometimes even in the standard library of the language). These include:

- Less-risky alternatives to standard types, with stricter policies for type conversions or overflow-signalling behavior, such as the SafeInt package for C++;
- Unlimited integers, such as BigInteger in Java;
- Unlimited decimal numbers, such as BigDecimal in Java;
- Fractions (which tend to avoid the floating-point pitfalls and increase the risk of integer overflows);
- Lower/upper bounds couples as used in interval computation;
- Symbolic expressions solvers (as in Mathematica).

As can be seen, the alternatives to using raw number types do exist, but they come with some drawbacks (performance, compatibility), which might explain why they are not in wide enough use yet.

In the common choice of primitive number types, developers still must decide among many possible representations (`short`

versus` int`

, or money as a floating-point number of dollars or an integral number of cents). But above all, they then have a two-fold burden:

- Analyzing every arithmetic operation and proving that nothing harmful can ever break the result;
- Wrapping every remaining risky operation with an explicit check and an appropriate reaction to handle the problematic cases.

The explicit check itself can be tricky. Even an experienced programmer is likely to occasionally make wrong assumptions about arithmetic robustness. Furthermore, the additional tests make the code a lot uglier and harder to maintain.

### Compiler

As powerful as compilers have become, we must not expect them to guess the intent of a computation and improve the numerical soundness by replacing the data types and algorithms. However, compilers can contribute by preparing the target code to uncover numerical problems during testing time. It is a tedious task for the developer, but a compiler can add necessary checking code in a systematic way, so that any anomaly will be detected at runtime and a predefined reaction applied (such as signalling or even aborting). This offers the ability to uncover the root of the problem during testing. This approach is far better than just hoping for a test case in which the defect causes an infection that propagates until an observable failure of the program.

Indeed, some compilers do provide an option for checking integer overflows (`gcc -ftrapv`

). As far as we know, it seems uncommon to find a similar feature for floating-point anomalies.

Activating such compiler flags has an impact on the performance of the resulting program, as it slows down every risky computation. A related research prototype has obtained a runtime increase of less than 10%, which is remarkable.