A brutally simple regression-testing tool
I am a customer of a software company that recently made a surprising announcement: They had become so dissatisfied with the number of bugs in their product that they were going to suspend all new-feature development and work exclusively on fixing bugs until they were once again satisfied with their product's reliability. It is rare for a company to take such a courageous stand, and it reminded me of something similar that happened about 20 years ago.
At the time, I was working with some developers who were in charge of producing a commercially viable version of cfront, the program that compiled C++ programs into C programs. At the time, cfront was the only way to run C++ programs; so of course there was a great clamor for the latest and greatest features. In the rush to give users what they said they wanted, testing got short shrift, so users were clamoring not only for new features but also for fixes to the bugs that they kept finding.
At the time, we didn't really have a good idea how to go about testing compilers. We did the obvious things: We verified that the compiler could build itself successfully, and that several major applications that relied on it continued to work successfully; but we had a harder time determining whether specific bugs that users reported were fixed, and an even harder time verifying that once a bug was fixed, it remained fixed in the next release.
This last criterion was particularly important from a user's standpoint. A user who reports a bug probably cares about getting it fixed, and once it is fixed, the user is apt to rely on the fixed behavior--otherwise, the user would probably not have reported the bug in the first place. So if a bug is fixed in one release of the compiler, and it comes back in a later release, the user is apt to be particularly annoyed.
I think that my biggest contribution to this project was to invent a strategy for checking for specific bugs that was so simple as to require almost no explanation. This strategy had three parts:
1) Construct a test case for every bug report. These test cases were always C++ programs, and often (and ideally) very small C++ programs. Such a program would be deemed to run correctly if (a) it compiled with no errors, and (b) its output consisted of pairs of identical lines.
The structure of such a program would typically be to do a computation, print the result, and then print the expected result on a separate line. If the computation were correct, the two lines would match; otherwise there would be a discrepancy.
The point of this part of the strategy is that there is no need to know anything about the contents of the test program: Either it works properly or it doesn't.
2) Sometimes, test cases were expected not to compile. A typical example would be when a user reported that the compiler gave an incorrect diagnostic message, or when it accepted a program it should have rejected. In this case, we would mark the test program with special comments. Such a comment would begin with four slashes in a row (////) and then contain a regular expression that would be expected to match every diagnostic message that can be traced back to the line of source code containing the comment.
In other words, if we had a test program that we expected not to compile, we would take each line that we expected to cause a diagnostic message and decorate it with a regular expression that would match not only the message we expected, but other forms of the message that we might expect in the future.
3) With (1) and (2) out of the way, we now had a collection of little programs that we could use to test our compiler without worrying about their contents. Simply try to compile and run the program, and then (1) If no line in the program contains ////, verify that the program compiled without errors and its output contained only pairs of adjacent lines, or (2) If one or more lines in the program contains ////, verify that the program compiled with errors and check every error to ensure that a //// line accounts for it.
With such an easy procedure to run an individual test, we could now run the entire library of tests. This process started taking a few hours, although as our test library grew to thousands of cases, it was eventually necessary to run it over an entire weekend. The result was a file with one line per test, containing the name of the test program and whether it passed or failed. We saved these files for each release, which made it a trivial matter to check whether any bugs fixed in one release were broken again in a later one.
The entire regression-testing program was less than 200 lines of awk code and shell scripts, so it hardly deserves to be called a system. Nevertheless, I believe that this simple tool played a major role in reducing the number of bugs in cfront and--perhaps more importantly--ensuring that once fixed, bugs stayed fixed.