An Example of Data-Structure Auditing in Practice
Last week, I introduced the notion of a data-structure audit routine. I would like to continue that discussion with a concrete example.
- The People Problem: Cyber Threats Aren't Just a Technology Challenge
- Securosis Analyst Report: Security and Privacy on the Encrypted Network
- High Performance Computing in Finance: Best Practices Revealed
- Big Data and Customer Interaction Analytics: How To Create An Innovative Customer Experience
In 1988, I wrote what I think may be the first generic associative-array library that anyone had ever written for C++. Such facilities had long existed in other languages, of course, but C++ did not yet even have templates available, so writing generic programs was an interesting challenge. I met that challenge by using — some would say abusing — the C preprocessor. I defined a macro named
Map, with the idea that a macro expression such as
Map(int, string) would translate into the name of a type that, in turn, would contain components with type
Although I did not say so explicitly, my goals for this library included making it both fast enough and robust enough to use for large-scale production programs. To that end, I made the somewhat controversial decision to use an order-based algorithm rather than a hash-based algorithm for storing associative arrays. I felt that if I used a hash-based algorithm, I would have to require users of the associative-array library to supply their own hashing algorithm, and programmers who did not go to the trouble of figuring out an effective hashing algorithm would wind up with programs that performed terribly. It would not take too many such programs to give the library a bad reputation that it would be impossible to erase.,/p>
The algorithm I chose, AVL trees, has the useful property that inserting or searching for an item in an n-element data structure never takes more than order log(n) time. The algorithm achieves this useful property at the expense of being quite tricky to implement correctly. An AVL tree is a binary tree with the following properties:
- Every node has two (possibly empty) subtrees, which we can call the left and right subtrees.
- For every node
N, every element in
N's left subtree has a key that is less than or equal to
N's key, and every element in
N's right subtree has a key that is greater than or equal to
- For every node
N, the height of
N's left subtree differs by no more than 1 from the height of
N's right subtree.
Here, the height of a tree is 0 if the tree is empty. Otherwise, the tree's height is one more than the greater of the heights of the tree's two subtrees.
If you look at the description of the AVL tree algorithm, you will see that maintaining the balance requirement requires some moderately tricky tree manipulation — which, in C++ terms, really means pointer manipulation. Moreover, these manipulations divide into multiple cases, depending on whether, and in what direction, the tree is unbalanced, and where items are being inserted. These complexities raised the question of how to go about testing this code.
I decided on a combination of data-structure auditing and brute force.
The auditing part was straightforward: I simply took the AVL tree properties that I mentioned earlier and gave my class a member function named
audit that would use an
assert to check those properties. For the brute-force part, I wrote a simple program that would create 10 values, insert them into an associative array, and then verify that it could find each of those values in the appropriate place. I then combined these two testing strategies by generating each of the 3,628,800 permutations of 10 values and calling this test program with each of those permutations. The test program would create a new associative array, insert that permutation into it — auditing the data structure after each insertion — and then verify that it could find each element from the permutation in the proper place.
I don't think that this test program took more than an hour to write. Of course, it took much more than an hour to run. In fact, I think it took an entire weekend. However, after I had gotten it to run successfully, and sent the program out in the field for people to use, I do not think I ever received any bug reports against that code.