Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

Why Testing Isn't Enough

August 18, 2014

Last week, I discussed some reasons that software is hard to develop. I would like to continue by concentrating on two of those reasons: Programmers generally assume that

  • They can translate their knowledge of what a program is supposed to do accurately into code, and
  • The code actually does what its author thinks it does.

Test-driven development tends to strengthen these assumptions: We use tests to express our knowledge of what the program is supposed to do, and then we take the view that when a program passes all of the tests, it's done.

This approach works in some circumstances, but not all. Dijkstra put it well in an essay when he used as an example the problem of determining whether a computer's (fixed-point) multiplication hardware is working correctly. He happened to be talking about a computer with 27-bit words; today we would expect to find 32-bit words. With 32-bit words, there are 264 distinct cases to test. If we can test 227 (about 134 million) of those cases per second, then testing every possible case will take 237 seconds, or well over 4,000 years. Even if we wait for computers to become faster before we begin our testing, we will have done nothing toward testing the particular computer we were assigned to test. So even for a very simple problem, there is no way to test that every possible input yields a correct solution.

Testing has another problem: Sometimes our programs are not intended to have exact results. For example, suppose we have written a program to add the values of the elements of a floating-point array:

     double sum(double* begin, double* end)
           double result = 0;
           double* p = begin;
           while (p != end) {
                result += *p;
           return result;

How do we go about testing code such as this in light of the fact that floating-point arithmetic is not perfectly accurate? How do we predict accurately what the program's output is supposed to be when that very prediction might be subject to the same inaccuracies as the program that we are trying to test?

What we need is a way to look at the code and verify, rather than merely test, that the code produces the result we intend. As a simple example, let's think about what happens if we call sum with two pointers that are equal to each other. In other words, let's start by seeing what we can figure out about what happens when begin == end.

Our program begins by setting result to 0.0. It then copies begin into p and starts a loop that continues as long as p is not equal to end. Because we are assuming that begin == end, we know that that loop does not get executed at all. After the loop is a return statement; this statement returns 0.0 because result was set to 0.0 at the beginning of sum.

We have just proved that our sum function correctly computes the sum of a floating-point array that doesn't have any elements. It is true that our proof might contain an error, or that the code might contain an error that causes it not to correspond accurately to the proof, but nevertheless, if it is correct, this proof does something that testing does not do: It establishes that this program should compute the sum of a floating-point array correctly, at least when the array is empty.

This proof may seem trivial, and so far, it is. We can make it slightly less trivial by looking at the case where end == begin + 1. As before, we set result to 0.0 and copy begin into p. This time, however, the loop is actually executed. As a result, we add *p (which is equal to *begin) to result, then increment p. Now p is equal to end, because we are investigating the case in which end == begin + 1. Accordingly, we add *begin to result, which we had previously set to 0.0. So long as *begin + 0.0 is equal to *begin (or close enough to equal for our purposes), this function will return the value of (only) element of the array.

We have just generalized our proof to show that our code works correctly when the number of elements in the input array is zero or one. Again, this proof accomplishes something that testing might not, namely to show that the program should work correctly in these cases.

Our next step is where things get tricky: How do we generalize our proof to handle any number of elements? I'll let you think about that problem before I discuss it in detail next week.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.