Why Testing Isn't Enough
Last week, I discussed some reasons that software is hard to develop. I would like to continue by concentrating on two of those reasons: Programmers generally assume that
- They can translate their knowledge of what a program is supposed to do accurately into code, and
- The code actually does what its author thinks it does.
Test-driven development tends to strengthen these assumptions: We use tests to express our knowledge of what the program is supposed to do, and then we take the view that when a program passes all of the tests, it's done.
This approach works in some circumstances, but not all. Dijkstra put it well in an essay when he used as an example the problem of determining whether a computer's (fixed-point) multiplication hardware is working correctly. He happened to be talking about a computer with 27-bit words; today we would expect to find 32-bit words. With 32-bit words, there are 264 distinct cases to test. If we can test 227 (about 134 million) of those cases per second, then testing every possible case will take 237 seconds, or well over 4,000 years. Even if we wait for computers to become faster before we begin our testing, we will have done nothing toward testing the particular computer we were assigned to test. So even for a very simple problem, there is no way to test that every possible input yields a correct solution.
Testing has another problem: Sometimes our programs are not intended to have exact results. For example, suppose we have written a program to add the values of the elements of a floating-point array:
double sum(double* begin, double* end) { double result = 0; double* p = begin; while (p != end) { result += *p; ++p; } return result; }
How do we go about testing code such as this in light of the fact that floating-point arithmetic is not perfectly accurate? How do we predict accurately what the program's output is supposed to be when that very prediction might be subject to the same inaccuracies as the program that we are trying to test?
What we need is a way to look at the code and verify, rather than merely test, that the code produces the result we intend. As a simple example, let's think about what happens if we call sum
with two pointers that are equal to each other. In other words, let's start by seeing what we can figure out about what happens when begin == end
.
Our program begins by setting result
to 0.0
. It then copies begin
into p
and starts a loop that continues as long as p
is not equal to end
. Because we are assuming that begin == end
, we know that that loop does not get executed at all. After the loop is a return
statement; this statement returns 0.0
because result
was set to 0.0
at the beginning of sum
.
We have just proved that our sum
function correctly computes the sum of a floating-point array that doesn't have any elements. It is true that our proof might contain an error, or that the code might contain an error that causes it not to correspond accurately to the proof, but nevertheless, if it is correct, this proof does something that testing does not do: It establishes that this program should compute the sum of a floating-point array correctly, at least when the array is empty.
This proof may seem trivial, and so far, it is. We can make it slightly less trivial by looking at the case where end == begin + 1
. As before, we set result
to 0.0
and copy begin
into p
. This time, however, the loop is actually executed. As a result, we add *p
(which is equal to *begin
) to result
, then increment p
. Now p
is equal to end
, because we are investigating the case in which end == begin + 1
. Accordingly, we add *begin
to result
, which we had previously set to 0.0
. So long as *begin + 0.0
is equal to *begin
(or close enough to equal for our purposes), this function will return the value of (only) element of the array.
We have just generalized our proof to show that our code works correctly when the number of elements in the input array is zero or one. Again, this proof accomplishes something that testing might not, namely to show that the program should work correctly in these cases.
Our next step is where things get tricky: How do we generalize our proof to handle any number of elements? I'll let you think about that problem before I discuss it in detail next week.