Oliver is the president of OC Systems. He can be contacted at [email protected]
Software bugs are a fact of life. No matter how hard we try, the reality is that even the best programmers can't write error-free code all the time. On average, even well-written programs have one to three bugs for every 100 statements. It is estimated that testing to find those bugs consumes half the labor involved in producing a working program (see Software Testing Techniques, Second Edition, by Boris Beizer, Van Nostrand Reinhold, 1990, ISBN 1850328803). Statistics like these explain why so much attention is focused on making testing more effective.
Traditionally, there are two main approaches to testing software: black-box (or functional) testing and white-box (or structural) testing. In black-box testing, software is exercised over a full range of inputs and the outputs are observed for correctness. How those outputs are achieved -- or what is inside the box -- doesn't matter.
Although black-box testing has many advantages, by itself it is not sufficient. First, real-life systems have too many different kinds of inputs, resulting in a combinatorial explosion of test cases. It is fine to run a set of representative test cases on a 100-line check-balancing program, but a commercial 747 pilot simulator/trainer has too many inputs/outputs to be able to test strictly using black-box techniques.
Second, the correct operation of the program may not be a measurable output. The output of a check-balancing program is the current account balance, which is simple to verify. A 747 pilot simulator, however, has some outputs that are not so obvious -- the height of the bounce on a too-aggressive approach, for example.
Third, it is impossible to determine whether portions of the code have even been executed by black-box testing. Code that has not been executed during testing is a sleeping bomb in any software package. Certainly, code that has not been executed has not been tested.
Finally, and perhaps most convincingly, empirical evidence shows that black-box testing alone does not uncover as many errors as a combination of testing methods does (see "Comparing and Combining Software Defect Detection Techniques: A Replicated Empirical Study," by Murray Wood, et al., Proceedings of the 6th European Conference held jointly with the 5th ACM SIGSOFT Symposium on Software Engineering, 1997.)
Enter White-Box Testing
One solution to this problem is to use white-box testing in addition to black-box testing. White-box testing strategies include designing tests such that every line of source code is executed at least once, or requiring every function to be individually tested.
Very few white-box tests can be done without modifying the program, changing values to force different execution paths, or to generate a full range of inputs to test a particular function. Traditionally, this modification has been done using interactive debuggers, or by actually changing the source code. While this may be adequate for small programs, it does not scale well to larger applications. Traditional debuggers greatly affect the timing, sometimes enough so that a large application will not run without major modifications. Changing the source code is also unwieldy on a large program that runs in a test bed environment.
There are a number of testing tools that let you perform white-box testing on executables, without modifying the source and without incurring the high overhead of an interactive debugger. Advantages of this approach include:
- These tools speed testing and debugging because there is no need to wait for test support code to be inserted into the program. In commercial software development, with many developers and a separate department in charge of testing and integration, this can save significant time.
- Best use can be made of a test environment. For example, running a 747 simulator requires a specific hardware setup. If each tester changes the loaded software on the simulator configuration, then each test takes much longer to set up and each testing scenario is much more error prone. A 747 test environment is expensive to maintain. The faster testing can occur, the cheaper the cost of the testing process.
- Source code may not be available for all of the software. It is common to use third-party products or software delivered from another organization as part of any commercial software effort. These typically do not ship with source code and certainly do not ship with the ability to rebuild them after changing the source code.
- It is better to test the actual executable that will be delivered, rather than a special testing executable. For one thing, it eliminates extra error-prone steps in the testing process. It also lets the testing scenario be more easily repeated on demand, rather than the single shot effort that typically occurs with source code modification.
Most white-box testing tools change the executable in one way or another, or check for certain classes of failures. Rational's Purify (http://www.rational.com/), for instance, modifies the compiled code so each load or store initiates a check of the memory location against specific criteria. BugTrapper from Mutek (http://www.mutek.com/) alters the executable to record all of the system calls. Applied Microsystems LiveCODE (http:// www.applied-microsystems.com/) adds trace commands to embedded systems so execution paths can be analyzed. Aprobe from OC Systems (my company; http://www.ocsystems.com/) allows you to modify the executable in whatever way you might choose.
Typically, white-box testing tools such as these provide a high-level programming interface for writing code that patches the executable and performs a specific function or obtains a particular type of information. Other features might include patches that execute as part of the application, at full machine speeds, which makes them noninvasive and useful for experimentation. Also, the tools might be able to automatically mangle/demangle names so users can use source code names even though the executable contains mangled object code names.
To illustrate, I'll present examples that use code for Aprobe. Aprobe uses ANSI C (with a few keywords added) as the base language in which the patches are specified, so it is straightforward to read/write code.
The aprobe.h include file contains a number of functions useful for writing patches for testing, debugging, and benchmarking. All of these functions have descriptive names and start with "ap_." These functions are designed to support the testing process. While adding arbitrary source code to an executable is advantageous, much of the power of such tools lies in these support functions.
One thing to note is the shortness of the code sequences. With this type of testing tool, most of the patches are extremely short -- much shorter than the corresponding source code modifications.
White-box testing requires visibility into the executable to determine what to test. It also requires a method to determine the outcome of the test. The ability to output values from within the application in the most noninvasive manner possible is a necessary capability in any white-box testing tool.
The log function takes a variable and writes it out to a disk file in binary form. The variable is either defined in the program, in which case there is a $ in front of the variable identifier, or defined in the probe itself, in which case there is no $ in front of the identifier.
This write is done at RAM speeds via a shared-memory segment capability, with the disk I/O being done not in the context of the running process, so the impact on the timing of the program is small. Simple configuration commands allow one to define the logging to wraparound so that file systems do not fill up for long-running applications.
After running the program (or as a separate process while the program is running), Aprobe formats the data in human-readable form according to the type of data that was logged. For example, if a struct or an object was logged, then the field names of the various fields are printed along with the actual values.
Example 1 is code for the log function, along with the output as it would be formatted. It takes only a few lines of code to produce this type of annotated data. Aprobe automatically determines the type and format for the log function. (Users can override the Aprobe default formatting, of course.)
While it is possible to add such logging by modifying the source code, it is cumbersome: opening a file, mapping it to a shared segment, actually writing the values. Most importantly, since you almost always log a number of different items, it can be complicated to construct sufficient logic to determine which logged items are which. Also, the added logging code has to be thread safe, which adds another level of complexity. And, of course, the source code that is added has to be debugged. Using a tool designed to do logging avoids all these concerns.
A fundamental measurement in any system is time. Many system requirements are specified in terms of time, particularly response time, and a relatively large amount of testing is focused on timing. The necessary primitive for white-box testing of timing is a precise clock and the ability to take times at various points in the program while affecting the program's execution as little as possible.
Example 2 is code required to read and log a nanosecond time-of-day clock. The time is logged at several points: on entry to and exit from the read_q function, and at line 32 (along with descriptive strings). The formatted output is included.
The ability to get a nanosecond time from any function entry/ exit/source line provides for easy benchmarking. For example, logging the arrival and departure time of messages in a message-passing system is a key system health metric. Example 3 benchmarks the throughput of a SendMsg function.
In any operational environment, faults occur and must be handled correctly. Even the simplest programs must be able to recover from mistakes: mistyped user inputs, disk-full conditions, and such.
This raises a problem: How do you induce faults to test how fault-tolerant a system is? For example, if the system is supposed to be able to recover from disk errors, how do you test for that?
In the past, users had to modify the source code to simulate the various faults, explicitly changing the application to make it take error paths. Given that a system may need to tolerate hundreds of faults, this can be onerous. Also, given the amount of work involved, it is unlikely to be repeated when modifications are made to the software.
With a white-box testing tool, fault injection becomes simple and can be repeated as part of normal regression testing. Example 4 shows the Aprobe code used to inject a disk error fault into the application. In this example, it is assumed that a -1 return indicates an error.
Any variables in the host program can be referenced by prepending a $ to their name. It is simple to corrupt data at various points in the host program. If the routine ReadfromMsgQueue takes a single parameter called Msg, which is a pointer in which to place the message, then the probe in Example 5 will corrupt the first five words of that buffer to be 42.
Test coverage is an important component of white-box testing. The goal is to try to execute (that is, test) all lines in an application at least once.
Because white-box testing tools can individually or collectively instrument source lines, it is straightforward to determine which lines in a host program have or have not been executed without modifying source. Aprobe can reference and change variables in the application, so it can easily support the test coverage effort.
However, what about exception handlers/catchers? How do you get the code in an exception handler/catcher to be executed? Example 6 throws an exception at line 32. Using probes like this, you can easily force different execution paths to ensure execution of all lines.
A common white-box technique for increasing software quality is to add assertions to the code. These assertions typically abort the program as soon as a fatal error is detected, so the error does not propagate and become more difficult to diagnose.
Usually, assertions are added to source code and turned on/off at compilation time. However, adding assertions to the executable is easier. Any number of assertions can be coded and added to the executable on demand. They can be added or removed on each run, without having to recompile the code.
In Example 7, the factorial function takes a formal parameter called n. The probe asserts that this value is never less than 0 (which is undefined for the factorial function). If n is ever less than 0, the bad value is logged along with a traceback so you can determine the caller that supplied the bad value. It then calls exit, although you could just as easily omit this so the application would continue.
Specific assertions can also be designed for specific tests. For example, when testing the handling of a network error, it may be useful to add an assertion that message queue lengths do not grow unboundedly.
White-box testing is a powerful software testing technique. However, it can be cumbersome when source code modifications are needed. Source-level patching of the executable gives you valuable insight into the way a program works. Having the ability to force errors and stress the application allows for more thorough testing. Because testing can proceed without waiting for source code changes, more extensive testing can occur in a shorter time frame. When used in combination with black-box testing, white-box tools allow more thorough and comprehensive testing of an application. The result is an increase in the quality of the software.