Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

Abstractions For Binary Search, Part 9: What Do We Need to Test?

December 16, 2014

Last week, we wrote a linear_search function that is intended to yield the same results as our binary_search function whenever we present these two functions with identical, valid input. However, it is not good enough simply to call these two functions with a bunch of test cases and verify that they produce the same result.

This claim is controversial enough that it bears repeating: Verifying that a program's output is correct is not enough to test it thoroughly.

Where is the controversy? It comes from confusing the idea of a test case working correctly with the idea of a program producing correct output. The trouble is that it is possible for a program to produce what is apparently correct output, even though it is doing so entirely by coincidence.

For example, consider the expression i+j+k, where i, j, and k all have type int. This expression is defined as being equivalent to (i+j)+k, by the normal precedence rules. Now let's consider what happens if k is negative, and i and j have large enough values that i+j overflows. In both C and C++, the result of a (signed) integer overflow is undefined: The implementation is entitled to do as it pleases. Most implementations do not check for overflow, so as long as the infinite-precision result of this expression is within bounds for an int, the result will be the same as if we had written i+(j+k). However, in such cases and on such implementations, writing the expression as i+(j+k) would be correct even though the incorrect expression i+j+k would yield the same result. This example should be enough to show that it is possible for an incorrect program to produce correct output.

The fact that incorrect programs can produce correct output greatly complicates the notion of test-driven development. Wikipedia defines test-driven development, in part, as

First the developer writes an (initially failing) automated test case that defines a desired improvement or new function, then produces the minimum amount of code to pass that test, and finally refactors the new code to acceptable standards.

You can see the attraction of this approach: It forces developers to think about the essential behavior of what they're trying to develop, and to capture that behavior in a way that can be verified automatically. However, behavior is not all that matters. For example, our overflow example might cause a program to fail if we tried to move it to a different implementation, or even if a future version of our local implementation started checking for overflow. Array indices out of bounds can be even a more serious problem: It is entirely possible for a program to appear to produce correct results, but to have a side effect of disrupting memory used by an unrelated program. As a result, it behooves us to test for array bounds errors, even if they do not produce obviously incorrect behavior. In fact, we might argue that testing for such errors is especially important if they do not produce obviously incorrect behavior.

With these ideas in mind, let's look again at our binary-search code:

size_t binary_search(T x, const T* array, size_t n)
   size_t begin = 0, end = n;
   // Invariant: This function will eventually return a value in the range [begin, end]
    while (begin != end) {
 size_t mid = (begin + end) / 2;
 if (array[mid] < x) {
     begin = mid + 1;
 } else {
    end = mid;
     return begin;   // Or return end, because begin == end

What claims might we want to test about this code that will not generally be reflected in its behavior? The first notion that the foregoing discussion brings to mind is that inside the loop body, begin and mid should both be valid array indices, and end should be an array index or n. Moreover, mid must be in the range [begin, end). We know that none of these variables is ever negative, because size_t is an unsigned type. Therefore, we can augment the first few lines of the loop by writing something such as

while (begin != end) {
  assert (begin < end && begin < n && end <= n);
  size_t mid = (begin + end) / 2;
  assert (begin <= mid && mid < end);
  if (array[mid] < x) {
     // …

What about the computation of mid? As we saw with our i+j+k example, it is possible for integer arithmetic to overflow. Indeed, as one reader pointed out, it is entirely possible that begin + end might overflow, even though computing (begin + end) / 2 might appear to yield a correct result. What's particularly interesting about this specific error is that if we were using classes, rather than integers, to implement our indices, this problem might not occur. If our index class supported only the random-access iterator operations, the problem would show up during compilation, because the result of adding two random-access iterators is not defined. With this observation, we see that we should have written

size_t mid = (begin + end) / 2;


size_t mid = begin + (end – begin) / 2;

all along, because although the sum of two random-access iterators is not well defined, the difference is.

So far, then, we have found three changes that we can make to our binary-search function to ensure that it is operating correctly from an internal viewpoint: two assert statements and one change that we could find only through a generally suspicious attitude toward arithmetic. Each of these changes detects problems for which there is no guarantee that they would ever be visible from outside the function. They are good examples of why white-box testing is rarely enough by itself — in other words, of why it is not enough to verify that a program behaves correctly.

 Next week, we shall continue building our tests.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.