# Abstractions For Binary Search, Part 6: How On Earth Do You Test It?

After much discussion and refinement, we have arrived at this code:

size_t binary_search(T x, const T* array, size_t n) { size_t begin = 0, end = n; // Invariant: This function will eventually return a value in the range [begin, end] while (begin != end) { size_t mid = (begin + end) / 2; if (array[mid] < x) { begin = mid + 1; } else { end = mid; } } return begin; // Or return end, because begin == end }

We must now determine whether it works — which means that we must figure out *how* to determine whether it works.

Obviously, the most fundamental aspect of such a program working is that it must produce the correct result whenever we present it with legitimate input. However, it is reasonable for us to have other expectations as well. For example, we should expect that the code never tries to access elements of the array that do not exist. Moreover, each time it accesses an element, it should choose one that divides the remaining elements to be searched approximately in half. If it fails to do so, it will not be able to achieve the desired performance characteristics.

This last claim may be controversial, because it implies that the notion of correctness extends beyond "Does it give the right result?" However, if performance is not an issue, then substituting a linear search for our binary search would still be correct. If you were teaching a course in algorithms, and a student turned in a linear search in response to a homework assignment requiring a binary search, would you give that student full credit?

These observations suggest that a key part of determining whether this code is correct is to verify that

- It returns the same result as a linear search;
- It accesses only legitimate array elements; and
- It approximately bisects the available elements each time through the inner loop.

The first of these requirements should be easy: We write a linear-search function that we believe should behave the same way as our binary-search function. Then, for whatever test cases we construct, we call both functions and verify that they return the same result. The other two requirements are more interesting, because there is no way to verify them by looking only at the function's result. Instead, we must instrument the code somehow, in order to verify that it is not doing anything out of line. We shall have more to say about these distinctions next week. For now, let's write our linear-search function.

Recall that the binary-search function returns the lowest index `k`

for which `vless(x, k)`

is `true`

. We defined `vless`

here, in a way that `vless(x, k)`

is `true`

if

`k`

is the position of an element of the sequence that is not (strictly) less than`x`

, or`k`

is equal to`n`

(i.e.,`k`

is the off-the-end position).

We might define this function as follows:

size_t linear_search(T x, const T* array, size_t n) { size_t k = 0; // Invariant: vless(x, i) is false for all values of i such that 0 <= i < k while (!vless(x, k)) ++k; return k; }

Because we call `vless`

only once, we can substitute its definition for its call, so that

while (!vless(x, k))

becomes

while (!(k == n || !(array[k] < x)))

This convoluted expression comes directly from the specification. We start by observing that `!(a||b)`

is equivalent to `!a && !b`

, even with short-circuit evaluation, so we can rewrite it as

while (k != n && array[k] < x)

so the whole function becomes

size_t linear_search(T x, const T* array, size_t n) { size_t k = 0; while (k != n && array[k] < x) ++k; return k; }

Of course, this code might also be incorrect; but if its results match those of the binary-search function, we have increased our confidence in both. Moreover, we can look at the results from some simple test cases; if those results are correct, we can be much more confident that that correctness will extend to larger cases than we would have been had we looked only at the binary-search function.

We have now reduced a substantial part of our problem to calling these two functions repeatedly with suitably chosen test cases and verifying that they return the same results. The rest of our problem will be to ensure that the binary-search function never goes out of bounds while it is computing those results.

Next week, we shall discuss how to choose test cases, after which we will look at the problem of staying in bounds.