Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Tools

Unit and Regression Testing


Dr. Dobb's Journal February 1997: Unit and Regression Testing

Adrian is a software engineer for Intuit and can be contacted at [email protected].


While we were looking at an image generated by my ray tracer, my brother asked, "How do you know it's right?" The question sparked an interesting discussion on software testing. Until then, I had done only black-box testing: Create a scene and inspect the output for inconsistencies. One bug in my ray tracer -- a sign error in a vector cross-product function -- eluded detection for nearly a year because the symmetries in my early test scenes caused the sign error to cancel itself. Even after detection, it took a long time to find the bug. It could have been detected easily (and the bug found and fixed more quickly) by testing the vector operations in isolation from the rest of the system.

Of course, black-box system testing is an important step in making sure your program works, but it's only one piece of the puzzle. Unit testing (testing a function, module, or object in isolation from the rest of the program) and regression testing (rerunning tests to detect unexpected changes in behavior) can dramatically reduce your bug counts.

In his excellent book, Writing Solid Code (Microsoft Press, 1993), Steve Maguire mentions unit testing, but he doesn't give many details on how to set up such tests and use them. In this article, I describe how to build simple and effective unit tests and how to automate regression testing using make.

Unit testing should not be a luxury accessible only to huge product teams with hordes of testers. In fact, unit and regression testing make it possible for individuals and small teams to outperform big companies that rely too heavily on black-box testing to ferret out bugs.

Automating Unit Tests

Suppose for each key object or module in your project, you wrote a small test program to test just that portion. That might sound like a lot of work, but, as you'll see, it's not. If each of these unit tests accepts sample data from standard input and sends the results to standard output, you can automate the tests using your favorite variety of make.

Imagine implementing a C++ class for performing vector arithmetic in a file called VECTOR.CPP. A unit test TVECTOR.CPP file would have targets in the makefile, indicating its dependency on the module it tests; see Example 1(a). The unit test is driven by a data file called TVECTOR.IN and produces output that you capture in TVECTOR.OUT, as in Example 1(b). To pull it all together, a pseudotarget generates output from all of your unit tests; see Example 1(c).

As you add more units to your project, you build more unit-test programs and add the desired output file to the TESTOUTS list. You now have a way to automatically test all parts of your program that have changed.

Big deal, you think -- those output files still have to be checked by hand to see if they're right. True, but only the first time. After that, you care only about the changes in behavior. Changes occur when you make a fix, when you make a global change, or when you enhance a unit test. In all of these cases, you need only verify that the changes are appropriate. Example 2 adds an accept target, enhancing the regress target.

If your unit tests have passed, you run make with the accept target. That makes reference copies (often called "canon files") of the test results. When you make changes in the future, the regress target will compare the new test outputs with the canon files and list the differences in REGRESS.RPT. The regression report tells you immediately what changed. If the changes are correct, you simply accept them to make new reference copies. If not, you correct the problem and run the regression tests again. These makefile examples have been simplified for discussion. Listing One resents a more realistic example.

Have you ever considered making a global change in a big project and felt some trepidation? Perhaps you've been building and testing debug versions, loaded with assertion macros. The ship date is approaching, and it's time to disable the debug code. What if somebody accidentally put code with important side effects into an assertion? Maybe your team has been slaving away, porting code to a new compiler, memory model, or operating system. At times like these, wouldn't it be nice to type MAKE REGRESS and know in a few minutes which modules may have changed their behavior?

Building Unit Tests

Is it worth making a test program for each unit? Yes. The effort is small, since the test program can be simple. More importantly, the process will make your code better and smooth your development by eliminating bugs earlier.

What should a unit test do? That depends on the module being tested. Minimally, the test should thoroughly exercise the module, striving for complete coverage. For example, to test the Vector class, an exerciser would call each of the class's member functions with a list of vectors read from TVECTOR.IN (see Listing Two). Fortunately, this simplistic approach is appropriate for many modules.

Slightly more ambitious is a unit test that attempts to verify the requirements directly. For example, if you're testing a preprocessor that should strip comments and trailing spaces, then the checker would examine the output for comments and spaces that were missed.

A sophisticated unit test might try to simulate a client. A simulator makes sense for units that maintain state information from call to call, as is typical with parsers and event-driven systems. In these types of modules, the order of the calls is significant. You can have the simulator generate random calls, but that is incompatible with our regression testing, since we require the output of a successful test to match the reference run. One approach is to have the simulator use the same input-driven technique of the exerciser and have a separate program create the random data. A long, randomly generated sequence provides a good chance of uncovering sequence-dependent bugs while maintaining reproducibility.

For particularly troublesome modules that maintain state information and require intensive abuse to beat out the bugs, a combination simulator/checker might be the ticket. The simulator portion would simulate calls in a random order. Rather than outputting the results directly (and thereby making it impossible to compare to the canon files), the checker portion would verify the results and output a summary. Summaries of successful runs would always match the canon file, but one from a failure would not. One disadvantage of the simulator/checker is the amount of effort required to build it. In addition to writing code to simulate and check the module, you should add a facility to reproduce a run so that when a bug is uncovered, it can be traced. Often this can be done by supplying the seed for your random-number generator.

Once you've decided which approach to take, you need to decide exactly what your test program is going to do. Make it data driven so you can easily add more tests. Don't be afraid to let it produce reams and reams of output. Remember, you're only going to check it manually once.

To decide which calls to make and what data to use, consider the following three things:

  • Your requirements list.
  • The module's public interface.
  • Your knowledge of boundary cases.

If your requirements specifically mention certain cases (like what to do if a buffer is too small), make sure your test module tries those special cases. If you don't have a formal requirements list, you still have a good idea of what the module is supposed to do. For a moment, try to forget what you know about the implementation and pretend you're a client who only has access to the comments in the header file or other documentation. The goal is to ensure that the module works as advertised.

Make sure you call every function in the public interface. For example, if you're testing a complex number object, make sure you test all of the operators like + and +=. Don't assume that one works because the other does, even if you know you implemented one by calling the other. When you add a new function to an existing module, make sure you expand the unit test to exercise it. The goal is to ensure coverage.

Finally, think about how you implemented the module. What assumptions did you make? What are the buffer limits? Where are the boundary cases? Test them by adding more test data to the input file. For my vector class, I made sure I had the zero vector, orthogonal unit vectors, vectors with lots of decimal places, normalized vectors, unnormalized vectors, and so on; see Listing Three. Again, the goal is to ensure coverage.

How many cases should you handle? How do you know when your test is complete? You'll never know for sure. When I write a new module, I work on the unit test until I find at least a few bugs. If your test passes the very first time, then you probably missed something. Stress it harder until you uncover a few bugs.

Listing Two presents the unit test for a vector class, and Listing Three shows the input file that drives the test. The program itself is trivial, but by adding cases to the input data, I eventually shook out about a dozen bugs.

The test for the vector class tries each sample vector with each of the unary operators (unitize, scale, negate, and so on), and it tests each binary operator with every pair of sample vectors. That creates a lot more tests, but that's the whole idea.

Benefits of Regressive Unit Testing

From the start, I was convinced that unit and regression testing were worthwhile, but I was surprised at how useful they have proven to be. These are not theoretical benefits, but real advantages I observed almost immediately after implementing my first few unit tests. Unit and regression testing will:

  • Find bugs early.
  • Isolate bugs faster.
  • Stop bugs from coming back.
  • Prevent global changes from creating surprises.
  • Validate interfaces.
  • Increase coverage.
  • Test error handling.
  • Debug smaller chunks.
  • Give you a deeper understanding of your code.

If you're still doubtful, consider writing just one unit test, perhaps with a module you already use or with the next one you write. While it's useful to have unit tests for all the modules in your program, having one or two is better than none, especially for the key modules. If it doesn't seem to help much and you abandon the practice, at least leave the one you built in the makefile. One day it'll catch something nasty and unexpected.

Practical Suggestions

Some people resist the idea of unit tests because it seems that only toy modules can be tested in isolation. Usually, this just reflects a lack of creativity on the programmer's part. If you have a nontrivial module, consider it a challenge to write its unit test. Unit tests can be written to simulate asynchronous I/O events, user events, client requests, server responses, plain function calls, and just about anything else required to test a module. After all, if the rest of your program can interface to a module, then you should be able to write a different program that also connects.

Some modules may seem too dependent on other modules to be tested in isolation. Often, this occurs when there's a pipeline of modules operating on data. It might be awkward to test a parser without also calling the lexical analyzer, which calls the preprocessor. There are a couple of approaches you can try.

One is to let each stage of the pipeline rely on the earlier stages, but to write a test for each stage. For example, you could have one unit test for the preprocessor, one for the lexical analyzer and preprocessor in combination, and a third for the entire pipeline. By using the same input and checking the outputs at each stage, you can isolate changes and bugs about as easily as if you had truly independent tests.

Another solution is to have the test module provide stub interfaces for the other dependencies. For example, the test program for the parser might include an interface equivalent to the lexical analyzer. The stub would simply provide a canned stream of tokens from the input file. Stubs also work with nonpipelined dependencies.

If a module has so many dependencies that the unit test is difficult, then it probably needs a unit test more than any other module. When there's that much complexity, it will have more bugs, and it'll be much easier to find and repair them if you've got some scaffolding to help you get to the heart of the matter.

When writing test programs, give a little thought to the output that will be captured and compared. Keep in mind that you'll be looking primarily at differences as reported by your favorite file-comparison program. Put enough information on a single output line to identify which case failed. Each output line from the Vector exerciser shows the vector or vectors used, the operator, and the resulting vector. In more abstract cases, this explicit detail might not be possible. An alternative is to refer to the line number in the input file that generated the current case.

If you get serious about unit and regression testing, take the time to integrate them into your development environment. Your test programs, sample data, and canon files should be kept in your source-control system. The ability to recreate tests from an earlier version can help isolate changes that led to elusive bugs. Establish a naming convention for your tests. In my examples, I've simply prepended the module name with T (for Test). You might want to do something more sophisticated. If you have automated weekly or nightly builds of your system, expand them to run the unit tests as well. A fancy system might even send e-mail to team leads if the regression tests indicate changes in behavior.

In my makefile examples, the accept target cavalierly replaces the current canon files with the current test results. This may be too accident prone for a large project. There are several alternative ways to handle it.

  • Write a batch file that asks for confirmation.
  • Have the accept target make sure the current references are recoverable by checking them into your source-control system.
  • Create makefile rules to check in test results individually.

The best solution might be a combination of these approaches.

Conclusion

Fortunately, few programmers write an entire application before testing it at all. Common practice is to write a bare skeleton and then fill in the pieces, testing along the way. But the first working skeleton may require bringing together quite a few untested pieces before having anything that runs. The program-in-progress generally isn't the best test bed for new components, nor does it provide for automation of ongoing testing. Nonetheless, common practice survives because the perceived value of better strategies doesn't justify perceived cost. My experiences, however, indicate that unit testing has more benefits than one might think, and automated regression testing of these units is easy to implement with existing tools.

DDJ

Listing One

####################################################################### Makefile with automated regression testing
# Copyright 1996 Adrian C. McCarthy
######################################################################
# Written for Borland MAKE. This copy has been simplified to demonstrate 
# the integration of unit testing and automated regression testing. It 
# is not a complete platform for building a program.
######################################################################
# Targets
#  accept   copies test results to reference directory
#  all      builds all executables (default)
#  clean    deletes all non-source files (executables, objs, etc.)
#  regress  builds all unit test and creates a regression report
#  release  PKZIPs the deliverable files and puts them in BINDIR
# Option Macros
#  DEBUG  if defined, compile and link with debugging information
#  MODEL  memory model to compile for 
#         (l, m, c, or s for large, medium, compact, or small)
#  PLATFORM 2, 3, 4, or 5 for 286, 386, 486, or Pentium
# Directory Macros
#  BINDIR directory to place executables
#  TINDIR directory containing module test data
#  OBJDIR directory to place object files and libraries
#  OUTDIR directory to place test results
#  REFDIR directory where reference test results are kept
#  SRCDIR directory containing sources
######################################################################


</p>
# Target platform
!ifdef PLATFORM
  PLATFORM=-$(PLATFORM)
!endif


</p>
# Debug control
!ifdef DEBUG
  CDBGOPT = -v -DDEBUG -w
    # -v       include debug info
    # -DDEBUG  #defines DEBUG
    # -w       display all warnings
  LDBGOPT = /v
  !message Debugging enabled -- no optimization
!else
  CDBGOPT = -Ox $(PLATFORM)
  !message Optimization enabled -- no debugging
!endif
!ifndef BINDIR
  BINDIR = ..\bin
!endif
!ifndef TINDIR
  TINDIR = ..\test\in
!endif
!ifndef OBJDIR
  OBJDIR = ..\obj
!endif
!ifndef OUTDIR
  OUTDIR = ..\test\out
!endif
!ifndef REFDIR
  REFDIR = ..\test\ref
!endif
!ifndef SRCDIR
  SRCDIR = ..\source
!endif


</p>
# Memory model
!ifndef MODEL
  MODEL = s
  !message Using small memory model by default.
!endif


</p>
####### Tools and Options #######
CMDLOPT = -m$(MODEL) -I$(INCLUDE)
STARTUP = c0$(MODEL).obj
RUNTIME = emu.lib math$(MODEL).lib c$(MODEL).lib
# Borland C/C++ compiler
CC = bcc -c
COPTS = $(CDBGOPT) $(CMDLOPT)
# Borland TLINK
LINK = tlink
LOPTS = /c /m /n /Tde /L$(LIB);$(OBJDIR) $(LDBGOPT)
# File Compare -- Use your favorite DIFF program here!
DIFF = fc


</p>
####### Paths #######
.path.exe=$(BINDIR)
.path.map=$(OBJDIR)
.path.lib=$(OBJDIR)
.path.obj=$(OBJDIR)
.path.c  =$(SRCDIR)
.path.cpp=$(SRCDIR)
.path.in =$(TINDIR)
.path.out=$(OUTDIR)
.path.rpt=$(OUTDIR)


</p>
####### Files #######
# Note, for each new unit test, you need to add a $(DIFF) line to the
# regress.rpt target.  You also need to add an explicit link rule for
# the corresponding unit test.  This is because the $** and $? macros
# don't work in implicit rules nor do they work with macro modifiers.
TESTEXES = tvector4.exe tmatrix4.exe tprepro.exe
TESTOUTS = tvector4.out tmatrix4.out tprepro.out
REGRPT = $(OUTDIR)\regress.rpt
.precious $(TESTOUTS) $(REGRPT)


</p>
####### Rules #######
.c.obj:
    $(CC) $(COPTS) -n$(OBJDIR) $<
.cpp.obj:
    $(CC) $(COPTS) -n$(OBJDIR) $<
# This rule runs a unit test.
.in.out:
    $(BINDIR)\$&.exe < $< > $@


</p>
####### Pseudo-targets #######
all : $(EXES) $(REGRPT)


</p>
# The accept target copies the current test results to the reference
# directory.  To avoid costly accidents, we generally keep the
# reference copies read-only, and we make backups of the current ones
# right before replacing them.
accept : $(TESTOUTS)
  -attrib -r $(REFDIR)\*.bak
  -del $(REFDIR)\*.bak
  rename $(REFDIR)\*.out *.bak
  copy $(OUTDIR)\*.out $(REFDIR)
  attrib +r $(REFDIR)\*.out
clean :
  -del $(OBJDIR)\*.obj
  -del $(BINDIR)\*.exe
  -del $(BINDIR)\*.map
  -del $(OUTDIR)\*.out
  -del $(REGRPT)
regress : $(REGRPT)
release : $(EXES)
    $(ZIP) $(BINDIR)\ART96.ZIP $(EXES)


</p>
####### Regression testing #######
#  The macro modifiers don't work on the $** or $? targets, so
#  you have to add a $(DIFF) line for each new unit test.
$(REGRPT) : $(TESTOUTS)
  echo Regression Test Report > $(REGRPT)
  echo ====================== >> $(REGRPT)
  $(DIFF) $(REFDIR)\tvector4.out $(OUTDIR)\tvector4.out >> $(REGRPT)
  $(DIFF) $(REFDIR)\tmatrix4.out $(OUTDIR)\tmatrix4.out >> $(REGRPT)
  $(DIFF) $(REFDIR)\tprepro.out  $(OUTDIR)\tprepro.out  >> $(REGRPT)


</p>
####### Dependencies #######
vector4.obj : vector4.cpp vector4.h
tvector4.obj : tvector4.cpp xbase.h vector4.h
tvector4.exe : tvector4.obj vector4.obj
  $(LINK) $(LOPTS) @&&|
$(STARTUP)+
$**
$*.exe
$*.map
$(RUNTIME)
|
tvector4.out : tvector4.exe tvector4.in
matrix44.obj : matrix44.cpp matrix44.h
tmatrix4.obj : tmatrix4.cpp xbase.h matrix44.h
tmatrix4.exe : tmatrix4.obj matrix44.obj vector4.obj
  $(LINK) $(LOPTS) @&&|
$(STARTUP)+
$**
$*.exe
$*.map
$(RUNTIME)
|
tmatrix4.out : tmatrix4.exe tmatrix4.in
prepro.obj : prepro.cpp prepro.h
tprepro.obj : tprepro.cpp prepro.h
tprepro.exe : tprepro.obj prepro.obj
  $(LINK) $(LOPTS) @&&|
$(STARTUP)+
$**
$*.exe
$*.map
$(RUNTIME)
|
tprepro.out : tprepro.exe tprepro.in

Back to Article

Listing Two

// Unit test for Vector4.cpp. Reads a file of n vectors, exercises the Vector4 // functions on those vectors, and outputs the results.  The input file should
// start with an integer (the number of vectors) followed by the vectors.


</p>
#include <iostream.h>
#include "vector4.h"


</p>
#define TF(x)  (x ? "true " : "false")


</p>
main()
{
    int count, i, j;
    cin >> count;  // number of vectors to read from cin
    Vector4 *pV = new Vector4[count];
    Vector4 temp;


</p>
    cout << "VECTOR4 Regression Test" << endl
         << "=======================" << endl << endl;


</p>
    cout << "Reading " << count << " vectors." << endl;


</p>
    // The Vector4 module supports iostream input and output which
    // must be tested, but is also very convenient for this unit test
    // in general.
    cout << endl
         << "I/O Test" << endl
         << "========" << endl;
    for (i = 0; i < count; i++) {
        cin >> pV[i];
        cout << "I/O  " << i << "  " << pV[i] << endl;
    }
    cout << endl
         << "Unary Functions" << endl
         << "===============" << endl;
    for (i = 0; i < count; i++) {
        cout << pV[i] << " Homogeneous:  "
             << TF(pV[i].IsHomogeneous()) << endl;
        cout << "Homogenize " << pV[i] << " == ";
        pV[i].Homogenize();
        cout << pV[i] << endl;


</p>
        cout << pV[i] << " Unit:  " << TF(pV[i].IsUnit()) << endl;
        temp = pV[i];
        if (temp.Magnitude() != 0.0) {
            temp.Unitize();
            cout << "Unitize " << pV[i] << " == " << temp 
                 << " (" << temp.Magnitude() << ")" << endl;
        }
        else {
            cout << "Can't make zero vector unit length." << endl;
        }
        cout << "||" << pV[i] << "||   = " << pV[i].Magnitude()
             << endl;
        cout << "||" << pV[i] << "||^2 = " << pV[i].MagSquare()
             << endl;
        temp = -pV[i];
        cout << " -" << pV[i] << " = " << temp << endl;
        temp = 0*pV[i];
        cout << "0*" << pV[i] << " = " << temp << endl;
        temp = 1.0*pV[i];
        cout << "1*" << pV[i] << " = " << temp << endl;
        temp = pV[i]*2;
        cout << pV[i] << "*2"    " = " << temp << endl;
        cout << "---" << endl;
    }
    cout << endl
         << "Binary functions" << endl
         << "================" << endl;
    for (i = 0; i < count; i++) {
        for (j = 0; j < count; j++) {
            cout << pV[i] << " == " << pV[j] << ":  "
                 << TF(pV[i] == pV[j]) << endl;
            cout << pV[i] << " != " << pV[j] << ":  "
                 << TF(pV[i] != pV[j]) << endl;
            temp = pV[i] + pV[j];
            cout << pV[i] << " + " << pV[j] << " = " << temp << endl;
            temp = pV[i] - pV[j];
            cout << pV[i] << " - " << pV[j] << " = " << temp << endl;
            temp = pV[i];
            temp += pV[j];
            cout << pV[i] << " += " << pV[j] << " = " << temp
                 << endl;
            temp = pV[i];
            temp -= pV[j];
            cout << pV[i] << " -= " << pV[j] << " = " << temp << endl;
            cout << pV[i] << " dot " << pV[j] << " = "
                 << Dot(pV[i], pV[j]) << endl;
            temp = Cross(pV[i], pV[j]);
            cout << pV[i] << " cross " << pV[j] << " = " << temp
                 << endl;
            cout << "---" << endl;
        }
    }
    delete pV;
    return 0;
}

Back to Article

Listing Three

9[1 0 0]
[0 1 0]
[0 0 1]
[1 1 1]
[0 0 0]
[4 4 4 (2)]
[1 2 3 (1)]
[3 2 1 (0)]
[1.234567 1.23456789 1.2345678901234567890]

Back to Article


Copyright © 1997, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.