Runtime: The On-the-Fly Instrumentation Approach
The additional checking code that the compiler is able to generate can be injected later at runtime, provided that on-the-fly instrumentation of the running program is possible. This is notably the case when the code runs inside a virtual machine or an emulator.
This on-the-fly instrumentation approach leads to diagnostic tools that help uncover numerical problems during the testing and debugging phases. Following this approach, the program would be executed under the supervision of a component that watches every arithmetic operation and signals whenever the operand values are introducing a potential problem into the result. We call this watching component a "numerical problem sniffer." Such sniffers are not a perfect remedy, but they provide two important advantages: They require no modification of the source code, and there is no need to recompile the code under test.
The idea of sniffers is not brand new, but we haven't found complete realizations of this kind of diagnostic tool. The next sections present our contribution, which shows how to design and implement the on-the-fly instrumentation approach — one in the context of Java, another as a Valgrind tool.
Cojac: The Numerical Problem Sniffer for Java
The Java context is especially well suited to integrate the on-the-fly instrumentation approach:
- The target machine language is the bytecode, a very simple stack-based language.
- Great tools are available to manipulate bytecode files and contents (we used ASM library).
- Java offers a way to define a new class loader, so it is possible to perform bytecode instrumentation at class-loading time.
- Java's arithmetic model is fully defined and specifies the behavior in every situation (as opposed to C, which leaves some cases "undefined" as signed integer overflow).
We have developed a full-featured numerical problem sniffer for Java, named Cojac. In 2008, we discussed a first prototype in Dr. Dobb's, which was restricted to the detection of overflows and limited to an offline usage: It took a bytecode file and produced another one with additional instructions wrapped around every integer operation. The current version is a complete solution for the on-the-fly instrumentation approach, able to detect at runtime a wide range of numerical problems on both integer and floating-point numbers.
For the end user, Cojac acts as an easy-to-use Java launcher. It is invoked by simply adding a JVM option in the normal command line, which relies on the Java Agent mechanism to instrument classes at load time.
Cojac is able to detect the following events in any part of the executed code:
- Any integer overflow in any primitive type with any operation (except shifts);
- Any typecasting that leads to losing the essence of the original value;
- NaN or infinite results of arithmetic operations (operators as well as predefined functions defined in
java.lang.Math/StrictMath) from normal-value operands;
- Smearing when adding/subtracting a floating-point number with an exponent too small relatively to the other operand;
- The comparison of floating-point numbers that are very close, differing only in the least-significant bits;
- Cancellation when subtracting (or adding the opposite of) two close numbers;
- Underflow when dividing (or multiplying) a number so that the result gets rounded to 0.0.
The behavior of Cojac can be adjusted using a set of options. Here are some available parameters:
- Several possible reaction policies: A warning message to
stderror to a log file, exception raising, or user-supplied callback.
- Fine-grained selection of the activated detectors: you can restrict on types (
int/long/float/double), casts, all
Math.*operations, or even a particular bytecode instruction.
- Message filtering: A warning message can be made more verbose with a full stack trace; the log can be shortened so that a problem is reported only once per location (useful in loops). Cojac can display a summary of the encountered problems at the end of the run.
- Class filtering: A list of prefixes can be provided to prevent the instrumentation of the matching class names; the Java standard library is by default trusted.
- The detected events can be accessed by a JMX-based tool such as jconsole. This feature helps monitor long-term running applications.
- On-the-fly instrumentation is the default, but we also provide a batch mode in case you want to get the resulting instrumented bytecode as an output file.
Figure 1 shows the output when running the test case of Listing One in Cojac mode.
COJAC: Overflow : IMUL HelloCojac.powerModA(HelloCojac.java:158) COJAC: Maths error (Infinity) with Math.pow(DD)D HelloCojac.powerMoB(HelloCojac.java:165) COJAC: Overflow : D2I HelloCojac.powerModB(HelloCojac.java:165) COJAC: Problematic instructions: 1848 times -> COJAC: Overflow : IMUL HelloCojac.powerModBuggy01(HelloCojac.java:158) 1 times -> COJAC: Maths error (Infinity) with Math.pow(DD)D HelloCojac.powerModBuggy02(HelloCojac.java:165) 1 times -> COJAC: Overflow : D2I HelloCojac.powerModBuggy02(HelloCojac.java:165)
An Eclipse plugin is provided to show how Cojac can be integrated in conventional IDEs. Our plugin defines a Cojac Run Configuration as a substitute to "Java Application." When launching a program in this mode, the plugin reports any signalled problems directly in the editor through the Eclipse mechanism of warning annotations.
Cojac-grind: A Lower-Level Numerical Problem Sniffer Based on Valgrind
In the Linux C/C++ programming community, Valgrind is a must-have tool that performs runtime memory checking. It is a free but invaluable companion for the novice as well as expert programmer, and it helps detect insidious memory problems that can creep in C/C++ programs, including double freeing, memory leaks, out-of-bounds accesses, etc. Valgrind in itself is a very general instrumentation framework for Linux, and the Memcheck memory checker is only one example of what can be done with it. The kind of numerical problem sniffer we designed for Java bytecode can be ported to the Valgrind emulator (or other tools like DynInst).
As a proof-of-concept, we have developed Cojac-grind, a first quick-and-dirty prototype of a Valgrind tool that detects a wide subset of the anomalies described for Cojac, on integers (16, 32, 64 bits), and floating-points (32, 64 bits). Like other Valgrind tools, Cojac-grind takes an executable Linux program as a parameter (Figure 2 shows an example).
prompt$ valgrind --tool=cojac HelloCojac ==22322== Cojac-0.0.1, the Cojac-grind numerical problem sniffer ==22322== Cojac: Precision, AddF64 at 0x8048EFB: demoFunc (HelloCojac.c:280) ==22322== Cojac: Infinity, AddF64 at 0x8048F01: demoFunc (HelloCojac.c:281) ==22322== Cojac: Precision, SubF64 at 0x8048F0C: demoFunc (HelloCojac.c:282) ==22322== Cojac: NaN, DivF64 at 0x8048F15: demoFunc (HelloCojac.c:283) ==22322== Cojac: DivByZero, DivF64 at 0x8048F1E: demoFunc (HelloCojac.c:284) ==22322== Cojac: DivByZero, DivF64 at 0x8048F27: demoFunc (HelloCojac.c:285) ==22322== Cojac: DivByZero, DivF64 at 0x8048F30: demoFunc (HelloCojac.c:286) ==22322== Cojac: Cancellation, AddF64 at 0x8048F44: demoFunc (HelloCojac.c:287) ==22322== Cojac: Overflow, Add32 at 0x8048FA0: demoFunc (HelloCojac.c:294) ==22322== Cojac: Overflow, Mul32 at 0x8048FAB: demoFunc (HelloCojac.c:295) ==22322== Cojac: Overflow, Sub32 at 0x8048FB2: demoFunc (HelloCojac.c:296) ==22322== Cojac: Overflow, Mul32 at 0x8048FBD: demoFunc (HelloCojac.c:297) ==22322== Cojac: Overflow, Sub32 at 0x8048FC7: demoFunc (HelloCojac.c:300) ==22322== Cojac instrumentation statistics: ==22322== Add32 239 Sub16 2 Sub32 79 Mul16 1 ==22322== Mul32 6 32to16 4 AddF64 7 SubF64 5 ==22322== MulF64 2 DivF64 6 ==22322== ERROR SUMMARY: 13 errors from 13 contexts (suppressed: 0 from 0)
It is worth mentioning that the "semantic distance" between the source code and the object code is bigger in native code than in Java and its bytecode. This distance causes additional complications for the numerical problem sniffer approach. For instance:
- There are two different 32-bit integer additions in C/C++, corresponding to signed versus unsigned operands. But in the x86 instruction set, they both correspond to a unique machine statement. So there is no way at instrumentation time to detect which phenomenon to watch, unsigned carry or signed overflow. We chose to restrict on signed arithmetic, which leaves the user with an increase of the potentially irrelevant messages.
- On some architectures, the compiler may translate a particular operation (such as one 64-bit
intaddition) into other operations (two 32-bit
intadditions), some of which might erroneously be signalled as suspicious.
Such complications have an undesirable impact on diagnostic accuracy (false positives and false negatives). This suggests that numerical problem sniffing is more relevant when applied on an intermediate language that is closer to the source code. Nevertheless, our Cojac-grind prototype works and can indeed spot numerical problems.