Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

An Example of Data-Structure Auditing in Practice

November 23, 2012

Last week, I introduced the notion of a data-structure audit routine. I would like to continue that discussion with a concrete example.

More Insights

White Papers

More >>

Reports

More >>

Webcasts

More >>

In 1988, I wrote what I think may be the first generic associative-array library that anyone had ever written for C++. Such facilities had long existed in other languages, of course, but C++ did not yet even have templates available, so writing generic programs was an interesting challenge. I met that challenge by using — some would say abusing — the C preprocessor. I defined a macro named Map, with the idea that a macro expression such as Map(int, string) would translate into the name of a type that, in turn, would contain components with type int and string, respectively.

Although I did not say so explicitly, my goals for this library included making it both fast enough and robust enough to use for large-scale production programs. To that end, I made the somewhat controversial decision to use an order-based algorithm rather than a hash-based algorithm for storing associative arrays. I felt that if I used a hash-based algorithm, I would have to require users of the associative-array library to supply their own hashing algorithm, and programmers who did not go to the trouble of figuring out an effective hashing algorithm would wind up with programs that performed terribly. It would not take too many such programs to give the library a bad reputation that it would be impossible to erase.,/p>

The algorithm I chose, AVL trees, has the useful property that inserting or searching for an item in an n-element data structure never takes more than order log(n) time. The algorithm achieves this useful property at the expense of being quite tricky to implement correctly. An AVL tree is a binary tree with the following properties:

  • Every node has two (possibly empty) subtrees, which we can call the left and right subtrees.
  • For every node N, every element in N's left subtree has a key that is less than or equal to N's key, and every element in N's right subtree has a key that is greater than or equal to N's key.
  • For every node N, the height of N's left subtree differs by no more than 1 from the height of N's right subtree.

Here, the height of a tree is 0 if the tree is empty. Otherwise, the tree's height is one more than the greater of the heights of the tree's two subtrees.

If you look at the description of the AVL tree algorithm, you will see that maintaining the balance requirement requires some moderately tricky tree manipulation — which, in C++ terms, really means pointer manipulation. Moreover, these manipulations divide into multiple cases, depending on whether, and in what direction, the tree is unbalanced, and where items are being inserted. These complexities raised the question of how to go about testing this code.

I decided on a combination of data-structure auditing and brute force.

The auditing part was straightforward: I simply took the AVL tree properties that I mentioned earlier and gave my class a member function named audit that would use an assert to check those properties. For the brute-force part, I wrote a simple program that would create 10 values, insert them into an associative array, and then verify that it could find each of those values in the appropriate place. I then combined these two testing strategies by generating each of the 3,628,800 permutations of 10 values and calling this test program with each of those permutations. The test program would create a new associative array, insert that permutation into it — auditing the data structure after each insertion — and then verify that it could find each element from the permutation in the proper place.

I don't think that this test program took more than an hour to write. Of course, it took much more than an hour to run. In fact, I think it took an entire weekend. However, after I had gotten it to run successfully, and sent the program out in the field for people to use, I do not think I ever received any bug reports against that code.

Related Reading






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video