Dr. Dobb's | Implementing Splay Trees in C++

Implementing Splay Trees in C++

Splay trees are self-adjusting binary search trees that are typically used in caches, memory allocators, routers, garbage collectors, data compression, and the like.

September 01, 2005
URL:http://www.drdobbs.com/cpp/implementing-splay-trees-in-c/184402007

Splay trees are self-adjusting binary search trees. If the set of frequently accessed elements is a small subset of the elements in the tree, splay trees perform better than other search trees [1]. Splay trees are typically used in the implementation of caches, memory allocators, routers, garbage collectors, data compression, and ropes (replacement of string used for long text strings). Additionally, I've used splay trees as the internal representation of associative container classes with the same interface as the associative containers in the C++ Standard Library. The implementation I present in this article is a plug-in replacement for the standard associative containers, the only difference being performance characteristics. The source code for my containers is available at http://www.dtilib.dk.

Setting the Stage

The C++ Standard Library describes a set of containers based on the concept of associative containers. They all belong to a category of data structures referred to as "dictionaries." Dictionaries let you make fast searches for their elements. One popular group of dictionary implementations is based on binary search trees. All implementations of the Standard Library I'm aware of use red-black trees to implement associative containers. Here, I present a splay-tree-based alternative implementation of associative containers. Splay trees have an interesting property compared to red-black trees. They have a sort of caching behavior: Elements recently accessed have fast access times. If some elements are accessed more frequently than others, splay trees perform the searches faster than traditional balanced search trees.

The Standard defines the set of operations that each associative container must support. For each operation, it not only defines how the operation is supposed to work, but also how efficient the operation must be. I wanted to understand how these requirements affected an implementation. The performance characteristics of splay trees do not quite fit the requirements of the Standard. This wasn't a big issue for me because most problems associated with implementing associative containers is related to the structure of binary trees, not the precise rebalancing algorithm. Consequently, I decided to base my implementation on splay trees because they represent an interesting alternative to traditional red-black implementations.

Finding the Data Fast

When searching is the primary operation on your data, it pays to structure the data so that searching is efficient. A general strategy is to always keep the data sorted and use bisection (binary search) for searching [2]. If efficient insertion and deletion are also needed, a binary search tree (BST) is often the data structure of choice. A BST is based on a set of nodes. Each node holds one data element and two pointers to child nodes (left and right child). The nodes are organized so that all the nodes in the left subtree are less than or equal to the element in the parent node. In the same manner, all elements in the right subtree are greater than or equal to the element in the parent node (Figure 1).

Figure 1: Binary search tree.

A search for a specific element starts at the top of the tree (the "root"). With two comparisons, it can be decided if the search has to continue in the left or right subtree or if the search has ended. A node without children is a "leaf node." The time a search takes for an insertion/deletion is a function of the distance from the root to the wanted node. In the worst case, this is the longest distance from the root to a leaf node, referred to as the "height" of the tree. A BST of height n can maximally hold 2ⁿ-1 nodes, which is also the key to efficiency of binary search trees. If balanced properly, the height is logarithmic in the number of nodes. The logarithm function is a slowly growing function. Consequently, if there are many nodes in the tree, the height is still a small number compared to the number of nodes—making operations fast. There is a catch. You must ensure that the left and right subtrees hold approximately the same number of nodes on all levels; otherwise, there's a risk that the tree degenerates to a sorted chained list of nodes. So how do you keep the tree balanced when nodes are deleted and new ones are inserted?

Enter Rebalancing

If the tree is unbalanced, "rotations" can help (Figure 2). Rotations change the structure of the tree without disturbing the sorting order. Balanced search trees do just that in an ordered manner. They keep some kind of information in each node to help rebalancing the tree when needed—typically when a node is deleted or a new one inserted. The most commonly known variants of balanced search trees are AVL trees and red-black trees [2]; the latter is typically used in implementations of the associative containers in the Standard Library. The difference between AVL trees and red-black trees is in how balanced they guarantee the tree is:

Figure 2: Rotation operation.

AVL trees guarantee that the longest and shortest path from the root to a leaf node at most differs by one.
Red-black trees only guarantee that the longest path from the root to a leaf node is at most twice as long as the shortest path from the root to a leaf node.

Both rebalancing heuristics keep the tree sufficiently balanced to guarantee good performance. There is a trade-off: Guaranteeing a more perfectly balanced tree requires more work when changing the tree structure. Searches in a tree that are guaranteed to be well-balanced are fast, but insertions/deletions can be slow.

Another Approach: Splay Trees

What is important about balanced search trees is that performance guarantees are based on the fact that the longest search path is only logarithmic in the number of nodes in the tree. This guarantees that any individual operation is also fast, making the overall performance good. The tree in Figure 1 is just such a balanced tree. However, is it as fast as it can be if the most frequently accessed node is node-6, while node-4 is only seldom accessed? The answer is "probably not." A tree like that in Figure 3 probably performs better overall, even though the tree isn't as balanced as the tree in Figure 1.

Figure 3: A tree that performs well, even though it isn't as balanced as Figure 1.

In other words, the performance gain by moving node-6 to the top outweighs the extra time used to access node-4, because node-4 is only infrequently accessed. If you know the access patterns in advance, you can arrange the nodes for best performance, but this is seldom the case and difficult to implement if insertion/deletion are also allowed. Splay trees make access to frequently referred nodes fast by using the splaying heuristic: Move the accessed node to the top, but try to keep nodes already close to the top still close. Not all nodes can stay at the top, of course, so nodes not accessed slowly move down the tree.

After each search is performed, a "splay" operation is executed. The splay operation moves the just-found node to the top of the tree, making it the new root. To do this, it uses three primitives based on rotations—zig, zig-zig, and zig-zag (I didn't make up these names). The operations zig-zig and zig-zag are "double rotations" that are implemented using two rotations. Zig is an ordinary rotation. Figure 4 shows the different kinds of operations minus symmetric cases. In all cases, the node named x is moved to the top. The zig operation is only performed if x is the direct child of the root node. In all other cases, zig-zag and zig-zig operations are performed.

Figure 4: Splay operations.

This is referred to as "bottom-up splaying" because it moves a node to the top of the tree starting at the node itself. You can combine the search and the splay operation into one operation so the splaying is done while the search is moving down the tree. In this case, the splay operation is referred to as a "top-down splay." Usually, only one of the splay methods is implemented. The performance of both splaying types is about the same, but top-down splaying tends to be slightly faster if the bottom-up splay has to first search for the node.

The splay operation has a remarkable property—the distance from the root to any node visited during a splay operation is approximately halved. This has the effect that no matter how unbalanced the tree is, it won't stay unbalanced for long. The efficiency guarantee for splay trees is not based on a promise that individual operations are fast. Instead, a splay tree promises that a sequence of operations performs well. For most uses, this distinction isn't important because it is only the total time spent that counts. For time-critical applications, however, the distinction may be important. Splay trees can exploit this freedom and not always rebalance the tree after an update. The tree can be left unbalanced if this is faster, then rebalanced at a later, more convenient time. This type of performance guarantee is known from the push_back function in the standard container vector. Some calls to push_back are slow, but most are so fast that you won't notice the slow ones. These are "amortized performance guarantees." The splay operations restructure the tree in a similar way to the rebalancing operations of balanced search trees—but with the extra twist that if the access pattern has some sort of locality (many searches for the same element), the splay tree exploits this to boost performance. This is basically the same as a disk cache—exploiting locality to boost the performance of disk access.

Implementation

The current C++ Standard describes four associative containers—map, multimap, set, and multiset. Their implementation only slightly differs, making it possible for the four containers to be based on a single implementation. They differ in two ways:

The first is whether the sorting key is a part of the inserted value as with set and multiset, or map and multimap, where the key is a separate entity. For map and multimap, this is handled by storing pairs in each node, combined with a functor extracting the key part to do the comparison between nodes.
The second difference is that set and map only allow unique keys, while multiset and multimap allow duplicates (equivalent keys). The only place where this difference is important is in the insert member functions. Consequently, my implementation has two versions of the insert operation—one for unique key containers and another for equivalent key containers. The four container classes basically keep an instance of the common implementation class and delegate their operations to this instance.

The Standard Library defines several operations on containers. This makes implementing look-alike associative containers a lot of work. Fortunately, you can implement some of the operations with the help of others. This means that after implementing the most basic operations, others can use these to implement their functionality. For example, the insert method taking an iterator range as arguments can use the single element insert as its working horse; see Listing One.

Listing One

template <typename InputIterator>
void insert( InputIterator first, InputIterator last )
{
   iterator pos = end();
   for( InputIterator it = first; it != last; ++it )
   {
      pos = insert( pos, *it );
   }
}

The Standard comes to the rescue by defining the semantics for some of the operations in terms of other operations. For example, the method equal_range is defined in terms of lower_bound and upper_bound. All in all, most of the implementation work is done after implementing the tree manipulation, iterators, and basic versions of insert and erase.

Many books on binary search trees suggest implementing the tree manipulation using recursion. This seems obvious because of the recursive nature of the binary tree structure (Listing Two). However, it isn't really a good idea. Programming languages like C++ allow recursion, but they aren't built for huge recursion depth. Huge recursion depth is exactly what you get when manipulating large binary search trees. For C++ implementations, a better approach is to base the tree manipulation on an iterative implementation; see Listing Three.

Listing Two

void internal_clear( node_base* n )
{
   if( n != 0 )
   {
      internal_clear( n->left );
      internal_clear( n->right );
      destroy( n );
   }
}

Listing Three

void internal_clear( node_base* n )
{
   while( n != data_ )
   {
      if( n->left != 0 )
      {
         n = n->left;
      }
      else if( n->right != 0 )
      {
         n = n->right;
      }
      else
      {
         node_base* p = n->parent;
         if( p->left == n )
         {
            p->left = 0;
         }
         else if( p->right == n )
         {
            p->right = 0;
         }
         destroy( n );
         n = p;
      }
   }
}

Implementing tree handling with loops instead of recursion doesn't do anything good for the readability of the code. On the other hand, it makes it possible to manipulate huge search trees, and is faster.

Code and Data Layout

Like all template classes, my splay-tree implementation has the potential of generating huge amounts of code if used with different types, especially if the compiler does aggressive inlining. To fight this, I split my implementation class and node class in two—a base class containing all the code not dependent on the template parameters, and a derived template class containing the rest of the code (Figure 5). This is a well-known idiom, but won't necessarily fight code bloat. It does, however, give the compiler an opportunity to share code between instantiations if it optimizes for minimal space. And while splitting the implementation into two classes doesn't improve the readability of the code, it seems to be a necessity if the implementation has to be used in serious projects. For the node class, this trick works quite well. Almost all code is moved to the base class; only a constructor remains. All nodes in the tree are accessed through pointers to node_base. If it is necessary to retrieve the element in the node, a static_cast from pointer to node_base to pointer to node is done. This is safe because all nodes are of type node, even though they are pointed to by node_base pointers. As an extra benefit, it's fast because the static_cast probably won't generate any code (the two pointers probably have the same value).

Figure 5: Division of classes into template and nontemplate part.

The trick didn't work as well for the splay_tree class. I had originally planned to put some of the code that didn't depend on the element type (such as the method size and the variable size_, which holds the number of elements) in the base class. Instead, it went into the template class. This was necessary because they depend on the type size_type, defined as an alias to the type size_type in the allocator, thereby indirectly depending on the element type. For other functions, I placed some of the tree manipulation code in splay_tree_base, but not as much as I had hoped for. The reason for this is that code dependent on the template parameter is interleaved with code that is not dependent, making it almost impossible to untangle the code in two parts without changing the semantics.

It is possible to implement splay trees with only an element and two pointers to child nodes in each node. Such an implementation would use top-down splaying. In my implementation, the nodes hold a parent pointer to make bottom-up splay work, and to facilitate iterators.

I wanted the iterators to be lightweight objects by only having a reference to the current node. This is important because iterators are always passed around by value and must be efficient to copy. The iterator must be able to find the previous and next node by knowing the current node. The basic algorithm to find the next node is: If the current node has a right subtree, find the left-most node in that subtree; otherwise, find the parent node by walking up parent pointers until the current node is in the left subtree of that node. The algorithm to find the previous node is analogous.

A bottom-up splay has to traverse upward through the tree, which is greatly simplified by using parent pointers. Bottom-up splaying seems like a natural fit for splay-tree implementations with an associative container interface because not all splay operations are initiated from the root. Some versions of the insert and erase functions take an iterator instead of the key as an argument to reference the wanted element, thereby omitting the initial search for the element (and eliminating the chance for a top-down splay to do its work). When I started implementing my splay tree, I planned to use only bottom-up splay because it is the easiest to implement. However, I also decided to implement a top-down splay for performance reasons. I use the top-down splay in the version of insert, only taking a value, and in the find function. In both cases, a search for a key is followed by a splay operation. Originally, I also planned to use top-down splaying in the implementation of lower_bound and upper_bound, but that didn't work well for containers supporting equivalent keys.

The End-Iterator

I was surprised by the impact iterators had on my implementation. The most important one was how to represent the end-iterator. In my implementation, the iterator holds a pointer to the referenced node. If I only wanted to support forward iterators for my containers, I could represent an end-iterator using a null pointer. However, I wanted to support bidirectional iterators like the standard associative containers. To make it easy, I introduced a special node in the splay tree pointed to by the iterator to represent the end (Figure 6). It was most convenient if the special node stayed at the top of the tree—above the root of the tree—because this position naturally follows the right-most node. Hint: In a tree traversal, all nodes in the left subtree are visited, then the node itself, and finally all nodes in the right subtree. By placing the node at the top, no special codes have to be inserted in the method finding the next node, making it very efficient. Again, the node is implemented via two classes: a class node_base containing all the pointers to other node_base's, and a derived class node holding the element. The special node at the top is of type node_base because it makes things easier if only one type is used for all nodes. The special node cannot be of type node because there is no value to associate with the element. I use a static_cast every time I want to dereference an element in a node. Introducing this special node means that not all node_base pointers point to a node, potentially giving undefined behavior. This doesn't leave me sleepless at night, though. The only node not being of type node is the special end-node. This node is only dereferenced if users dereference the end-iterator, which the standard deems as undefined behavior.

Figure 6: Tree layout. The special node "end" is used by the end iterator.

For the end-node, the node_base class contains three pointers. The left pointer points to the root of the tree. If the splay tree is considered the left subtree of the end-node, the method to find the previous node also works without extra code, making this both fast and elegant. This leaves two pointers untouched. The right pointer isn't used and is always null. The parent pointer is used to hold the left-most node in the tree (Figure 6). The left-most node is used in the call to begin, which must return the first node in the sequence. The Standard requires this to be a constant operation, disallowing a search from the top for the left-most node in the call to begin. Even though this is a hack, it makes it possible to minimize the state of the container. I can just as well hold an additional pointer to the left-most node of the tree in the container itself, and just leave the parent pointer null.

Performance

How fast is a splay tree? That depends. If all elements are accessed with equal frequency or in a random fashion, the splay-tree implementation rearranges elements without gaining an edge over a traditional balanced search tree. Fortunately, many real-world problems have some unbalance in the access frequency [3], giving splay trees an opportunity to move the most frequently accessed elements near the top of the tree.

While I have also made my own measurements, I won't reproduce them here because they depend on the compiler and might be misleading. I made various synthetic tests ranging from worst case to best case for a splay tree. I compared the time for splay-tree-based containers to that of the standard containers. The results were as expected. The performance of the splay-tree implementation varied from half as fast as the standard container in the worst case, to about 50 times as fast as the standard container in the best case.

So splay trees may be faster than balanced search trees, but how about memory consumption? In an ideal world, splay trees would use less memory than balanced search trees because they do not store extra information in each node to help rebalancing. But the world isn't perfect! The memory saved by a splay tree is typically spent on alignment issues, so in most cases, the size of a node in a splay tree and in a balanced tree are the same.

Thread Safety

How about thread safety? Typically, thread safety of a container refers to two properties. The first property is that different instances running on different threads are independent. This is easy to guarantee if the states of two containers are independent. I don't use static member or global variables, so independence of instances is guaranteed without me doing anything special. The second property is that multiple threads calling accessor functions on the same instance should be safe. My implementation doesn't guarantee this, because during many calls to accessor functions (for example, a call to find), the tree is restructured. During this restructuring, many pointers to node_base's are updated. I don't know of a method to make these updates threadsafe without incurring significant overhead, especially if the thread-safety property isn't needed. The bottom line is that my splay-tree implementation isn't threadsafe, so if you use it in multithreaded code, you should make sure all accesses to the container are properly synchronized, even if they are just calls to accessor functions. If there's one weakness to the splay tree, this is it! Traditional balanced search trees only do restructuring during insertion/deletion. This makes it easy to guarantee that multiple threads can call accessor functions on one instance simultaneously.

Conclusion

Extending the Standard Library with a new algorithm isn't that hard because the interface to the algorithm is small and easily comprehensible. On the other hand, extending the Standard Library with a new container can be challenging because containers in the Standard Library have an extra iterator feature compared to the traditional way data structures are described. This makes it hard to copy code from a text book for a baseline of the implementation. I also found that implementation decisions sometimes depend on the small print in the standard.

Splay trees are just a special kind of binary search tree. My implementation can therefore use the same interface as the standard associative containers. The only difference to the standard containers is their performance characteristic. Whether splay trees are the right thing for you depends on your application. If the elements accessed most frequently are a small subset of all elements in the container, it might be worth changing to a splay-tree implementation to improve the performance of your application.

Acknowledgments

Thanks to Claus Tndering for constructive feedback and Kristian Lippert for commenting on draft versions.

References

Sleator, Daniel Dominic and Robert Endre Tarjan. "Self-Adjusting Binary Search Trees," ACM Journal, July 1985.
Sedgewick, Robert. Algorithms in C++: Fundamentals, Data Structures, Sorting, Searching Parts 1-4, Third edition, Addison Wesley, 1998; ISBN 0201350882.
Pfaff, Ben. "Performance Analyses of BSTs in System Software," 2004; http://www.stanford.edu/~blp/papers/libavl.pdf. q

Ralf Mattethat is a senior consultant at the Danish Technological Institute. He can be contacted at [email protected].