# Abstractions For Binary Search, Part 9: What Do We Need to Test?

Last week, we wrote a `linear_search`

function that is intended to yield the same results as our `binary_search`

function whenever we present these two functions with identical, valid input. However, it is not good enough simply to call these two functions with a bunch of test cases and verify that they produce the same result.

This claim is controversial enough that it bears repeating: *Verifying that a program's output is correct is not enough to test it thoroughly.*

Where is the controversy? It comes from confusing the idea of a test case working correctly with the idea of a program producing correct output. The trouble is that it is possible for a program to produce what is apparently correct output, even though it is doing so entirely by coincidence.

For example, consider the expression `i+j+k`

, where `i`

, `j`

, and `k`

all have type `int`

. This expression is defined as being equivalent to `(i+j)+k`

, by the normal precedence rules. Now let's consider what happens if `k`

is negative, and `i`

and `j`

have large enough values that `i+j`

overflows. In both C and C++, the result of a (signed) integer overflow is undefined: The implementation is entitled to do as it pleases. Most implementations do not check for overflow, so as long as the infinite-precision result of this expression is within bounds for an `int`

, the result will be the same as if we had written `i+(j+k)`

. However, in such cases and on such implementations, writing the expression as `i+(j+k)`

would be correct even though the incorrect expression `i+j+k`

would yield the same result. This example should be enough to show that it is possible for an incorrect program to produce correct output.

The fact that incorrect programs can produce correct output greatly complicates the notion of test-driven development. Wikipedia defines test-driven development, in part, as

*First the developer writes an (initially failing) automated test case that defines a desired improvement or new function, then produces the minimum amount of code to pass that test, and finally refactors the new code to acceptable standards.*

You can see the attraction of this approach: It forces developers to think about the essential behavior of what they're trying to develop, and to capture that behavior in a way that can be verified automatically. However, behavior is not all that matters. For example, our overflow example might cause a program to fail if we tried to move it to a different implementation, or even if a future version of our local implementation started checking for overflow. Array indices out of bounds can be even a more serious problem: It is entirely possible for a program to appear to produce correct results, but to have a side effect of disrupting memory used by an unrelated program. As a result, it behooves us to test for array bounds errors, even if they do not produce obviously incorrect behavior. In fact, we might argue that testing for such errors is *especially* important if they do not produce obviously incorrect behavior.

With these ideas in mind, let's look again at our binary-search code:

size_t binary_search(T x, const T* array, size_t n) { size_t begin = 0, end = n; // Invariant: This function will eventually return a value in the range [begin, end] while (begin != end) { size_t mid = (begin + end) / 2; if (array[mid] < x) { begin = mid + 1; } else { end = mid; } } return begin; // Or return end, because begin == end }

What claims might we want to test about this code that will not generally be reflected in its behavior? The first notion that the foregoing discussion brings to mind is that inside the loop body, `begin`

and `mid`

should both be valid array indices, and `end`

should be an array index or `n`

. Moreover, `mid`

must be in the range `[begin, end)`

. We know that none of these variables is ever negative, because `size_t`

is an unsigned type. Therefore, we can augment the first few lines of the loop by writing something such as

while (begin != end) { assert (begin < end && begin < n && end <= n); size_t mid = (begin + end) / 2; assert (begin <= mid && mid < end); if (array[mid] < x) { // …

What about the computation of `mid`

? As we saw with our `i+j+k`

example, it is possible for integer arithmetic to overflow. Indeed, as one reader pointed out, it is entirely possible that `begin + end`

might overflow, even though computing `(begin + end) / 2`

might appear to yield a correct result. What's particularly interesting about this specific error is that if we were using classes, rather than integers, to implement our indices, this problem might not occur. If our index class supported *only* the random-access iterator operations, the problem would show up during compilation, because the result of adding two random-access iterators is not defined. With this observation, we see that we should have written

size_t mid = (begin + end) / 2;

as

size_t mid = begin + (end – begin) / 2;

all along, because although the sum of two random-access iterators is not well defined, the difference is.

So far, then, we have found three changes that we can make to our binary-search function to ensure that it is operating correctly from an internal viewpoint: two assert statements and one change that we could find only through a generally suspicious attitude toward arithmetic. Each of these changes detects problems for which there is no guarantee that they would ever be visible from outside the function. They are good examples of why white-box testing is rarely enough by itself — in other words, of why it is not enough to verify that a program behaves correctly. Next week, we shall continue building our tests.