# Why Testing Isn't Enough

Last week, I discussed some reasons that software is hard to develop. I would like to continue by concentrating on two of those reasons: Programmers generally assume that

- They can translate their knowledge of what a program is supposed to do accurately into code, and
- The code actually does what its author thinks it does.

Test-driven development tends to strengthen these assumptions: We use tests to express our knowledge of what the program is supposed to do, and then we take the view that when a program passes all of the tests, it's done.

This approach works in some circumstances, but not all. Dijkstra put it well in an essay when he used as an example the problem of determining whether a computer's (fixed-point) multiplication hardware is working correctly. He happened to be talking about a computer with 27-bit words; today we would expect to find 32-bit words. With 32-bit words, there are 2^{64} distinct cases to test. If we can test 2^{27} (about 134 million) of those cases per second, then testing every possible case will take 2^{37} seconds, or well over 4,000 years. Even if we wait for computers to become faster before we begin our testing, we will have done nothing toward testing the particular computer we were assigned to test. So even for a very simple problem, there is no way to test that every possible input yields a correct solution.

Testing has another problem: Sometimes our programs are not intended to have exact results. For example, suppose we have written a program to add the values of the elements of a floating-point array:

double sum(double* begin, double* end) { double result = 0; double* p = begin; while (p != end) { result += *p; ++p; } return result; }

How do we go about testing code such as this in light of the fact that floating-point arithmetic is not perfectly accurate? How do we predict accurately what the program's output is supposed to be when that very prediction might be subject to the same inaccuracies as the program that we are trying to test?

What we need is a way to look at the code and verify, rather than merely test, that the code produces the result we intend. As a simple example, let's think about what happens if we call `sum`

with two pointers that are equal to each other. In other words, let's start by seeing what we can figure out about what happens when `begin == end`

.

Our program begins by setting `result`

to `0.0`

. It then copies `begin`

into `p`

and starts a loop that continues as long as `p`

is not equal to `end`

. Because we are assuming that `begin == end`

, we know that that loop does not get executed at all. After the loop is a `return`

statement; this statement returns `0.0`

because `result`

was set to `0.0`

at the beginning of `sum`

.

We have just proved that our `sum`

function correctly computes the sum of a floating-point array that doesn't have any elements. It is true that our proof might contain an error, or that the code might contain an error that causes it not to correspond accurately to the proof, but nevertheless, if it is correct, this proof does something that testing does not do: It establishes that this program *should* compute the sum of a floating-point array correctly, at least when the array is empty.

This proof may seem trivial, and so far, it is. We can make it slightly less trivial by looking at the case where `end == begin + 1`

. As before, we set `result`

to `0.0`

and copy `begin`

into `p`

. This time, however, the loop is actually executed. As a result, we add `*p`

(which is equal to `*begin`

) to `result`

, then increment `p`

. Now `p `

is equal to `end`

, because we are investigating the case in which `end == begin + 1`

. Accordingly, we add `*begin`

to `result`

, which we had previously set to `0.0`

. So long as `*begin + 0.0`

is equal to `*begin`

(or close enough to equal for our purposes), this function will return the value of (only) element of the array.

We have just generalized our proof to show that our code works correctly when the number of elements in the input array is zero or one. Again, this proof accomplishes something that testing might not, namely to show that the program *should* work correctly in these cases.

Our next step is where things get tricky: How do we generalize our proof to handle any number of elements? I'll let you think about that problem before I discuss it in detail next week.