Channels ▼


Tough Apps: Combinatorial Testing

Dennis Shasha is a professor of Computer Science at New York University. His latest puzzle book Puzzles for Programmers and Pros outlined some general approaches to programmatic puzzle solving. This column challenges you to find the solutions to the puzzles that lie at the core of some cool tough applications. He solicits your help in identifying such cool apps.

Long-time Dr. Dobb's readers surely remember Dr. Ecco's Omniheurist Corner, where Dennis Shasha posed puzzles that required computer solutions. Dennis is returning to Dr. Dobbs, but with a new twist on puzzles -- one that takes inspiration from the challenging apps that readers have faced and to turn those into puzzles.

If you have written an application -- or run across one -- that required extensive use of heuristics (because no algorithm would solve it) or that is otherwise algorithmically interesting, please send mail to Dennis at Put "Tough Apps" in the subject line. If Dennis likes it, he will contact you, try to understand what you've done, and create a puzzle explaining the algorithmic core. Please include a blurb about yourself and the problem.

In the 1990s, four researchers at Telcordia found a good way to test software. Instead of trying to achieve high code coverage (which might not be possible when one doesn't control the source), they proposed to test the external behavior of the software in a clever way. By "external behavior" I mean testing the interface and user-settable configuration parameters. For example, if testing the implementation of a language, one would test language constructs and optimization parameters.

Why does one need to be clever about such testing? The short answer is -- combinatorial explosion. A language will often allow a countably infinite number of statements. It's impossible to test them all. The good news is that they found that many software bugs occur because of an interaction between two factors (e.g., two language features or perhaps a language feature and a configuration parameter), so if they could test every combination of values of every pair of factors, they would often catch most if not all bugs. Because even a single factor may have an infinite number (or at least billions) of values (e.g., all floats), one has to discretize the test space. This may mean extensive sampling on boundary conditions and a few samples of large intermediate sets of values.

For example, suppose you are testing a new SQL engine. Optimization parameters include buffer space, query caching, replacement algorithm, and so on. Language features include the formation of the SELECT clause, number of tables, the WHERE clause, grouping, aggregates, and subqueries.

In my work, I like to lay out each factor in a line along with its values as you can see below:

buffer space: 20 megabytes, 200 megabytes, 2 gigabytes, 20 gigabytes, 200 gigabytes
query caching: off, on
replacement algorithm: least recently used, most recently used
select clause: asterisk, named columns, aggregates
number of tables: 1, 2, 10, 50
grouping: unused, used one attribute, used many attributes
aggregates: none, single, five
subqueries: none, one, five

Testing all these combinations would require 5 * 2 * 2 * 3 * 4 * 3 * 3 * 3 = 6,480 experiments. You may consider this to be too many, so you'd like to do far fewer but still satisfy the two factor condition, viz. for every pair of values in every pair of factors, some experiment tests this pair. The puzzle is to find such a small set of experiments.


Suppose there were only four factors each having three values:

F1: 1 2 3
F2: 1 2 3
F3: 1 2 3
F4: 1 2 3

Testing every combination requires 3*3*3*3 = 81 experiments, but achieving the two factor condition requires only 12 experiments:

On the left you see the experiment number followed by the values of each factor in each experiment. For example, experiment 5 has value 3 for factor F1, and 1 for all other factors. The two factor condition is achieved because for any pair of factors e.g. F1 and F4, every pair of values is found in at least one experiment. Experiment 1 has F1 = 3 and F4 = 3, experiment 2 has F1 = 1 and F4 = 3, experiment 3 has F1 = 2 and F4 = 2, experiment 4 repeats F1 = 3 and F4 = 3 and so on. A covering set for F1 and F4 would be experiments 11, 7, 2, 9, 3, 12, 5, 10, and 1.

End of Warm-Up

Here is the challenge for you: Can you build a program that will give you a set of experiments that satisfies the two factor condition for the SQL case?

Hint: You need fewer than 25 experiments.

The original paper by the Telcordia people is: The Combinatorial Design Approach to Automatic Test Generation, by David M. Cohen, Siddhartha R. Dalal, Jesse Parelius, Gardner C. Patton.


In this solution, the X represents a "don't care," meaning any value for that factor would be okay.

[Click image to view at full size]

Reader Response

Michael Schneider of Saarbruecken Germany suggested a simple nine experiment solution to the warm-up problem.

This one is clearly optimal since even just two factors demand nine experiments. See this Excel spreadsheet.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.