*
Timothy is an associate professor of computer science at Eastern Washington University. He can be contacted at [email protected] ewu.edu.*

The binary search tree (BST) is a data structure for holding information that allows rapid access according to some ordering key. As the name indicates, it is a tree structure in which the nodes containing information can be connected to two subtrees (hence the "binary"). The search condition is met by requiring that all information to the left of a node comes before the node itself, and that all information to the right of the node comes after the node itself. If you want to allow equal comparisons, you choose which side to put them on.

Searching for an item within such a data structure is simply a matter of examining each node (starting from the beginning, called the "root" of the tree). If the value desired is found within that node, you're finished. If not, you can see how your desired value compares with the node value: If the desired value is less than the node value, you can ignore the entire right subtree and just move down the left subtree, repeating the same operation. If the desired value is greater than the node, then ignore the left subtree and move down the right.

Insertion into a BST involves searching for the value to be inserted, then putting it in exactly at the spot where it wasn't* *found.

Deletion is a bit more of a problem. The more elegant solution is to find a replacement node from farther down in the tree that can replace the deleted node and still retain the BST condition. In other words, replace the node with a node in the tree that, in the key order generating the tree, comes immediately before or after that node.

If the BST is most densely organized, you can, within a path of length *k*, access one piece of information among 2^{k+1} pieces of information. The extremely careful would want that written as 2^{ k+1} -1, though the -1 makes little difference as *k* becomes large. Starting from the top, every level of the tree is fully populated, and the number of levels in the tree of *N *nodes is approximately 3.3 *log*_{10} *N -- *in other words, the base-2 logarithm of *N*, written as *lg*(*N*).

However, the BST has a significant problem. When the tree is at its lowest density, the path of length *k* only chooses from among *k*+1 pieces of information. This happens when every node in the tree has just one subtree; perhaps at every level there is no left subtree, but only a right subtree (referred to as a degenerate tree because it really is nothing but a linked list). This is exactly what happens if you build the BST with values that are already in order.

With random data, the number of levels in the tree tends to be almost 3 *lg*(*N*), and on average you go down a bit over 1 1/3 *lg*(*N*) levels to find a value.

People who want faster access have devised schemes that guarantee that the BST is denser than that (which would certainly avoid the degenerate tree case), regardless of the order of operations on the tree. The first one was proposed by Adel'son-Velskii and Landis, and is called the "AVL tree" in their honor (see G.M. Adel'son-Velskii and Y.M. Landis, "An Algorithm For The Organization Of Information," *Soviet Math Dockl*., 1962).

Every node in a tree has what is called its "height" -- the number of links in the longest path from that node down to what is called a "leaf node" (one with no children) at the bottom of the tree. (A memory aid for this is to remember that height tells you how far up you are.) The AVL condition for a BST is that, for every node, the heights of its two children differ at most by one. (You can see that you have to allow a difference of one just by looking at a two-node tree.)

The operations in a BST that affect the height are insertion and deletion. I will focus on deletion as effectively removing the node that becomes the replacement for the deleted node. On an insertion, there are obviously no nodes below the point of insertion since there's nothing there. With the way we are focusing for deletion, there are also no changes in height for nodes in the subtree below the node removed. Consequently, the adjustments to enforce the AVL condition must be performed in the nodes from the point of the operation back up to the root of the tree. This might appear to be a problem, but it really isn't. To see why, we will have to get into explicit programming, and we will use C++ as its basis (C++ because I depend on having a pointer as a reference parameter). I will explicitly discuss insertion, but also include the deletion function in the accompanying code.

Insertion is most easily handled as a recursive function: recurse to the point of failure, then change the pointer (and, because of the reference parameter, the pointer in the calling routine) to reference the newly created node. Written for a BST without regard to the AVL condition, this function (see Listing One) is particularly simple. Assume a *class BST* for the tree itself, a *class *or *struct BSTnode* for the nodes of the tree, and a *class*, *struct*, or type *Data* for the data portion of the *BSTnode* (for which the relational operator "<" is defined). To keep the logic simple, I'll omit validation of the pointer returned by the *new* operator.

Because of the recursion, the entire path back up to the root of the tree is available as the function goes through its returns. Thus, you can easily do the correction for the AVL condition simply by putting the appropriate code after the *if/else-if/else* structure in Listing One.

Obviously, you need to get at the height of each tree node easily for an AVL tree. This means that you will have a small expense in space within each AVL node to retain that information. While there are strategies for retaining the necessary AVL information in just a couple of bits, the logic is more easily seen if you simply store the height, and unless you get into bit fields, you'll probably end up using a default *int* anyway.

When checking the AVL condition, you test whether the two subtrees differ in height by more than one. Because the tree was a valid AVL tree before the most recent operation, you know that the difference will be at most two. So what you need is an operation that changes the height relationship. That is found in what is called a "BST rotation" -- shuffle links at three levels of the BST so that one side goes up and the other side goes down. The rightward and leftward rotations are simply inverses of each other, so that a single figure shows both.

As Figure 1 illustrates, it is most useful to show the grandparent (Gr) with the link that must be altered, and then instead of parent and child, leftward node (Lt) and rightward node (Rt). For completeness, the leftward child of Lt is shown, even though it is not moved, as well as the rightward child of Rt. The child that is moved is the subtree occurring between Lt and Rt (the subtree labeled Q): In one circumstance it is the rightward child of Lt, in the other it is the leftward child of Rt.

Because of the reference parameter, you have available for alteration the link from the grandparent that needs to be changed in the rotation. If the subtree to the left has a height longer than the subtree on the right, then a rightward rotation should decrease the leftward subtree by one and increase the rightward subtree. Listing Two is the rightward rotation function; the leftward rotation is its mirror image.

There is a problem, though, with the subtree Q. As you can see in Figure 1, it remains at the same level (even though it switches sides). If the height of the leftward subtree is based on Q (that is, Q's height is one more than R's) you end up not correcting the AVL condition. You can, however, do preprocessing to ensure that the leftward side is longer by doing a leftward rotation at Lt before doing the rightward rotation at Rt.

You have enough information now to build the function *AVLadjust* (see Listing Three), then insert a call to it at the end of the *Insert* function; see Listing One. Note that the *Rotate* functions adjust heights after the rotation has been accomplished -- something also checked if there is a rotation.

The AVL tree has an interesting property. In its worst case (that is, the AVL tree in which every node has subtrees that differ in height by one), the number of nodes in the entire tree, and within each subtree that comprises the entire tree, is close to a Fibonacci number. (The recurrence is actually *H _{k}*+1=

*H*+

_{k}*H*-1+1.) On this basis, there is proof that the worst-case height (the largest number of links that must be traversed in a search) is less than 1.44

_{ k}*lg*(

*N*+2)-1.328. This is a marked improvement over the average case for a BST, noted earlier as nearly 3

*lg*(

*N*). The average height for AVL trees, unfortunately, has not been mathematically determined.

Figure 2 combines two sources of experimental data. For AVL trees of up to 12 nodes, you can explicitly generate all possible trees (up to 12 permutations) to get the average tree height (worst-case search path length) and the average node depth (the average of the search path lengths to find a value). For trees with more than 12 nodes, the results are based on generating, for each tree size, 5000 AVL trees from random data, and then generating their averages. Besides the average height (top graph) and the average depth (bottom graph), you also see as the step function between them the height of a complete binary tree -- the best-case tree in which every tree level, except possibly the bottom, is completely filled. The average tree height shows some interesting undulations for each power of two in the size of the tree. As the tree size grows, however, the ratio of the average tree height to the average node depth settles down to approximately 1.3 to 1. There is no plot of the average node depth in the complete binary tree because it is close to the average node depth of the AVL tree, lying at most about 0.2 units below the AVL tree average node depth. To the scale of the graph, the two can't be distinguished.

Figure 3 shows the results from taking the AVL tree experimentation out to trees with 5000 nodes, again running 5000 random trees of each size. You see that the tree height approaches 10 percent more than the corresponding perfect tree, and the average depth comes in a bit under 15 percent less than the height of the corresponding perfect tree. This represents a marked improvement over the BST: For tree height, 1.1 versus 3; for average depth, 0.85 versus 1 1/3.

You do have to pay for this, of course, by the added AVL-adjustment logic on each insertion and deletion. For my implementations of these data structures, the insertions into AVL trees take approximately twice as long as insertions into BSTs where the question of node height is ignored. On the other hand, if you are using this tree data structure as a look-up table (where most accesses are finds rather than inserts or deletes), the denser organization will double throughput on average. In addition, if your look-up table is generated from data stored exactly in order, you will still easily generate an efficient tree structure for your look-ups, while the straight binary search tree would require explicit balancing to avoid the worst-case behavior of a degenerate tree (effectively, just a linked list).

Since Adel'son-Velskii and Landis proposed their self-adjusting data structure, there have been numerous others proposed, each claiming some improvement on the AVL tree. All of these, however, are more complex than the original AVL tree, both in concept and in implementation, and the AVL tree provides attractive performance with a simple implementation.

### References

Binary search trees are discussed in all introductory texts on the topic of data structures. Some of those texts also discuss the self-adjusting binary search trees, and more extensive information is available on all of these topics in the more advanced data structures texts. The information on random binary search trees is from Robert Sedgewick and Philippe Flajolet's *An Introduction to the Analysis of Algorithms* (Addison-Wesley, 1996). The information on the worst-case AVL tree height is from Mark Allen Weiss's *Algorithms, Data Structures, and Problem Solving with C++ *(Addison-Wesley, 1996). This is the text for the data-structures course I am teaching at Eastern Washington University. In addition to the fundamental BST and the AVL tree, Weiss also discusses several of the more recent self-adjusting trees (2-3 trees, red-black trees, and splay trees). In January 1994, *DDJ *published an article by Bruce Schneier with the title "Skip Lists" about an alternative structure for look-ups.

### Online Resources

The file AVL.ZIP is available electronically (see "Resource Center," page 5). When expanded, it generates the source code used to obtain the statistics for the average depths of AVL trees -- appropriate class files and two main program files (one to generate all permutations up to a maximum tree size, the other to generate random permutations). The mains are even written (under Linux on a dual-processor machine) to generate each of these statistics in two parallel computations. In addition, there is an RTF file giving a single sheet with exact figures and references for quantities with approximate values.

**DDJ**

#### Listing One

void BST::Insert( BSTnode* &Node, const Data &Value ) { if ( Node == NULL ) // failure point --- recursion base case Node = new BSTnode(Value); // change Node (reference parameter) else if ( Value < Node->Value ) Insert(Node->Left, Value); // recurse left else // equal keys go right Insert(Node->Right, Value); // recurse right }

#### Listing Two

void AVL::RotateRight(BasePtr &Rt) // Rotate rightward --- right node moves down, left node moves up. { BasePtr Lt = Rt->Left, Q = Lt->Right;x Lt->Right= Rt; Rt->Left = Q; HtAdjust (Lt); // Shifting Q may well have changed Lt and Rt heights HtAdjust (Rt); Rt = Lt; // Change the link itself that was passed by reference }

#### Listing Three

void AVL::AVLadjust ( AVLnode* &Node ) {//We presume the AVL class has access to AVLnode's Height data member int Lht = Node->Left == NULL ? -1 : Node->Left->Height, Rht = Node->Right == NULL ? -1 : Node->Right->Height, Diff = Lht - Rht; if ( abs(Diff) < 2 ) // AVL condition is met HtAdjust(Node); // May need to adjust the height anyway else if ( Lht > Rht )// Left side two longer than right { int Lck = Node->Left->Left == NULL ? -1 : Node->Left->Left->Height, Rck = Node->Left->Right == NULL ? -1 : Node->Left->Right->Height; if ( Lck < Rck ) RotateLeft (Node->Left); // Make left the longer RotateRight (Node); // Adjust Node itself HtAdjust(Node); // Update current node's height } else // Mirror image logic to that above: Right is two longer than left { int Lck = Node->Right->Left == NULL ? -1 : Node->Right->Left->Height, Rck = Node->Right->Right == NULL ? -1 : Node->Right->Right->Height; if ( Lck > Rck ) RotateRight (Node->Right); RotateLeft (Node); HtAdjust(Node); } }