Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

When Systematic Testing Isn't Enough

September 13, 2012

Last week, I promised to describe some aspects of software that make testing difficult. As I was organizing my thoughts to do so, I noticed Andrew Binstock's article, which argues that the whole idea of unit testing is so important and useful as to be beyond dispute.

So let me get this out of the way up front: He's right. Unit testing — which I sometimes think of as sanity testing — is an extremely important idea, and everyone ought to be doing it. In fact, hardware designers have been doing it literally for decades. They call it BIST, which stands for Built-In Self-Test. In both hardware and software, the idea is the same: It is better to have a program or device say "I'm broken" than for it to go nuts, and much better than for it to produce incorrect results quietly. As Andrew Binstock says:

Today (and for years now), I write code and then I write unit tests that exercise the edge cases and one or two main cases. Right away, I can tell if I missed something obvious or if my implementation has a slight burble that mishandles cases I expected to flow through easily.

The issue that I want to address is when this kind of testing is not enough. In particular, I'll argue that a testing strategy that says "When all the tests pass, you're done" is not enough for anything beyond trivial programs.

One reason that such a mechanical approach doesn't work in practice is that once a program grows beyond what a few programmers can handle, it will never be completely bug-free. For example, a handful of relatively unimportant bugs may require an architectural change to fix, and it may be much easier to wait for that change than to fix each of the bugs individually and then redo the fixes for the new architecture. Alternatively, a bug report might come in when testing is almost complete, and fixing that bug might delay shipment unacceptably. In either of these circumstances, it won't do just to run the tests and wait to ship until all the tests succeed.

However, even if we disregard such situations — treating them as project-management issues rather than as testing issues — there are still at least five kinds of bugs that systematic testing does an unusually bad job of revealing, and for which other strategies are therefore necessary:

  1. Performance bugs. I'm not talking about the kind of bugs that might come from using an insertion sort or a bubble sort instead of a more sensible algorithm; rather, I'm talking about bugs — or even architectural errors — that result in systems that work fine on a small scale but perform unacceptably under heavy but realistic loads. Detecting such bugs requires simulated load testing at the very least; and of course the load has to be a sensible model of the conditions that the system actually encounters.
  2. Resource leaks. Curiously, even programs in languages with built-in garbage can leak memory. For example, suppose we execute a statement such as y=f(x), where x and y are both large data structures. As long as the variable x continues to exist, the entire data structure that it represents will stick around — and x will continue to consume memory even if the programmer knows that x will not be used again. Of course programmers can solve these problems by setting variables such as x to a null value once they know that the variables will not be used again. However, the same argument can be made for the case of deleting variables explicitly in languages without garbage collection.
  3. Security vulnerabilities. Testing for security bugs requires a different mindset than testing for ordinary bugs. Security testers must assume that problems they encounter will be the work of a malicious adversary — if you like, that they will come from Machiavelli rather than from Murphy.
  4. Timing bugs in parallel systems. Such bugs can be extraordinarily hard to find because the same program can work on one occasion and fail on another.
  5. Corrupted data structures. Such problems can be particularly nasty when the data structures are on a disk or network. The trouble is that a program can ask for several data-structure operations, but they might not all be executed. For example, a network or power failure might halt such a sequence of operations in the middle. What is worse, especially in the case of disk operations, is that programs cannot generally even assume that all the operations before a particular point in time will have been executed and the ones after that point will not be executed. For example, a disk controller might rearrange the sequence of operations in its queue, thereby causing an operation to be executed even though one requested before it is not executed.

Unit tests by themselves don't do a very good job of detecting any of these kinds of problems. Clearly, a strategy is necessary that goes well beyond unit testing — regardless of how much effort the programming language exerts in order to avoid undefined behavior.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video