Treaps in Java

Dr. Dobb's Journal July 1997: Treaps in Java

Stefan is a lecturer in the department of Computer Science at the Helsinki University of Technology. He is doing research in the area of algorithms and data structures. Stefan can be contacted at Stefan [email protected]

If you could use only one data structure, which one would you choose? A hash table? While it supports the basic insert, find, and remove operations, it doesn't keep the elements in sorted order. Therefore, it can't efficiently perform some tasks that are frequently encountered, such as finding the minimum element or producing an ordered list of all elements. The standard Java packages don't contain a data structure for ordered data.

What would you require of this ideal, sole structure? It should be easy to use (and preferably easy to implement); it should be able to hold an object of any class (as long as you provide a method for comparing objects); it should be efficient in terms of both speed and memory; and it should be thread safe.

The randomized search tree ("treap"), devised by C.R. Aragon and R. Seidel and described in "Randomized Search Trees" (Algorithmica, 16(4/5):464-497, 1996), fulfills all of these requirements, and it offers the functionality of a general-purpose sorting routine, priority queue, hash table, stack, or standard queue.

The Data Structure

The data structure consists of a balanced tree. Therefore, an element is never more than about log2 n steps away, where n is the number of elements in the treap and log2 n is the binary logarithm (you may think of log2 n as the number of bits needed to represent the number n in binary notation).

Being able to access the elements in the treap in log n time is a big improvement over some of the most common basic data structures; see Table 1.

The treap is reasonably space efficient as well. Each element is stored in a node in a tree, and each node contains an integer and two references, one to the left subtree and one to the right subtree. Hence, to store n elements in a treap, we need n integers and 2n references. The standard hash-table implementation in java.util uses n integers for the hash values and n references for the linked lists. Furthermore, it uses an array with a size that is typically close to n. For example, with a load factor of 1.0, the hash table will use the same amount of extra space as a treap if the table is full and more space if it's only partly filled. One problem with the java.util hash-table implementation is that the array doesn't shrink. Even if you remove all but one of the elements, the size of the array is still proportional to the maximum number of elements that were stored in the table during its lifetime. The treap doesn't have this problem.

Interface

Java doesn't have a general method for comparing two elements. Therefore, you must make sure that an object that will act as a key in a treap implements a comparison method. To assure this, the treap package contains an interface, Ordered, that declares the abstract method compareTo() (see Listing One). Any class that is to be used as a key in the treap must implement this interface.

To illustrate, I'll write a simple class, Year, that implements the interface Ordered. You are free to design the class as you see fit. The only requirement is that you implement the compareTo() method (see Listing Two). This is easy in this case: You just compare the integers representing the years. You need to use an explicit cast to tell the compiler that the argument is an instance of the class Year, because the comparison method implements the abstract method compareTo(), whose argument is of type Ordered.

With this out of the way, you can use the treap in the same way as a Java hash table. In this example, you use objects of the class Year as keys, and strings as values. The call to the method keys() returns an enumeration of the treap. The enumeration is the standard Java enumeration class, whose only purpose is to return the keys of the treap in sorted order. Similarly, the method elements() returns an enumeration of the values.

Implementation

If you want to add an element to the middle of an array, you have to move a large number of other elements to make room for the new one. The most common way to avoid this problem is to use a linked list, where the elements don't reside at consecutive memory positions but are instead linked together: Each element in the list has a reference to the next one. In a linked list, it's possible to insert an element in between two other elements just by changing two references. However, in a linked list, you can't directly access an element in the middle of the list, you have to follow a large number of links to find it. One standard technique to overcome this problem with arrays and linked lists is to use a search tree.

A search tree consists of a number of nodes. Each node contains a key and has two references, one to the left subtree and one to the right; see Figure 1. Note that the tree is ordered. All keys in the left subtree are smaller than the key in the node, and all keys in the right subtree are larger. This property holds for every node in the tree. Therefore, it's possible to search the tree in a simple manner. For example, to check if the key F is present in the tree, you start by comparing F to the key in topmost node (the root) of the tree. F is larger than E and you know that F must be in the right subtree (if it's in the tree at all).

Next compare F to H, the root of the right subtree. It's smaller, and you turn to the left, where you find the key you are looking for. You can use the same technique to insert a new node. If the key of the new node isn't already present in the tree, the search will take you to a position at the bottom of the tree where the new node can be inserted.

But you still have a potential problem. The algorithm above is efficient only if the tree is well balanced. For example, if each node in the tree has only one child, the structure will behave just like a linked list: You have to visit every node in the tree to find the one at the bottom. There are plenty of algorithms available that try to keep a search tree balanced. Most of them are rather complicated. The treap offers a simple solution.

You will use rotations (the simplest possible rebalancing operation) to help keep the tree balanced. The idea is to change the form of the tree without disturbing the order of the keys. The left tree in Figure 2 is transformed into the right one by a "right rotation." This operation is easy to implement -- only two references need to be updated, and the order of the keys is preserved. The corresponding balancing operation that transforms the right tree into the left one is called a "left rotation."

Now comes the stroke of genius. You want the tree to look as if it was constructed from a random sequence of updates. (As R. Sedgewick and P. Flajolet describe in An Introduction to the Analysis of Algorithms, Addison-Wesley, 1996, random trees are nicely balanced. The expected length of an average search path is roughly 1.4 log2 (n-2), and the expected length of the longest search path is about 4.3 log2 n.) To achieve this, you assign a random priority number to each node before you insert it into the tree, and you make sure that each node in the tree has higher priority than its children. If you do this, the tree will have the same form as if the nodes had been added in priority order (that is random order).

The insertion algorithm is best explained by an example. In Figure 3(a) the node G (with priority 2) has just been added to the treap. It has been placed in the correct position with respect to the ordering of the keys, but it violates the priority requirement: It has higher priority than its parent. To fix this you perform a left rotation, Figure 3(b), to move it higher up in the tree. G still has higher priority than its parent and we perform a right rotation; see Figure 3(c). Finally, G is rotated up to the root of the tree; Figure 3(d). Now the tree fulfills both the key order and the priority order requirements.

At first this algorithm may seem hard to implement. It looks as if you would have to walk up the tree, which is hard since the references point in the other direction. However, this problem is easily overcome if you formulate the insertion algorithm recursively; see the pseudocode in Example 1.

Compared to those of other balanced search tree schemes, the delete operation is also simple.

• If the node to be deleted is at the very bottom of the tree, you remove it directly and no rebalancing is necessary.
• If the node has only one nonempty subtree, you remove the node and replace it with the root of the subtree.
• If the node has two nonempty subtrees, you look at the priority numbers of the roots of the subtrees. If the left subtree has the highest priority, you rotate the root of this subtree to the left; otherwise, rotate the root of the other subtree to the right. This will move the node to be deleted further down the tree and you have reduced the original problem to a smaller one that may be solved by calling the delete algorithm recursively.

The only piece of code in the Java treap implementation that is somewhat complicated is the enumeration -- the object that returns the nodes in the tree in sorted order. A simple solution would be to have the enumeration store the latest key that was returned and find the next one by searching the tree starting at the root. To traverse the complete tree in this fashion would require at least n log n time. But it's possible to do it in linear time. The idea is to use an explicit stack to keep track not only of the current key but also the path from the root of the tree down to this node. To find the next node, you continue down the tree to the right (pushing the nodes onto the stack). If there are no elements in the right subtree, you instead backtrack on the path using the stack. The code is complicated because you need to check whether you arrived at the current node from the left or from the right.

Finishing Touches

To turn the treap into a really useful general-purpose data structure, you should add some more functionality, such as methods for reporting the size of the treap, removing the minimum (or maximum) element, and creating a clone or a string representation of the treap. You should also wrap up the classes and interfaces in a package and add Java style documentation. This is all straightforward pedestrian programming. The complete package is available electronically; see "Availability," page 3.

There are many ways to further optimize the code. Most of them will yield only minor improvements. For example, eliminating the recursion from the insert and delete methods will make the code harder to maintain and the gain is likely to be only marginal, since simple function calls are efficiently implemented on most modern architectures.

The one optimization that is most likely to significantly improve the speed is to minimize the cost of the comparisons. A drastic way to do this is to use integers, rather than objects, as keys. The modification of the code is slight. All variables of type Ordered should be changed to int, and all calls to compareTo() should be replaced by an integer comparison. This modification will have to be done manually, since Java doesn't (yet) offer a way to write code that operates on both basic data types and objects.

Conclusion

The failure to use an appropriate data structure to maintain an ordered dynamic set of elements is one major reason why so many programs are slow. This problem can be solved by using the right data structure (and the treap is a very good choice).

Listing One

```/* An interface for ordered objects. A class that is to be used * as a key in a treap must implement this interface. */
public interface Ordered
{
/* Compares two ordered objects. Returns the value 0 if this object equals
* the argument; a value less that 0 if this object is less than argument;
* and a value larger than 0 if this object is greater than the argument. */
int compareTo(Ordered anotherOrderedObject);
}
```

Listing Two

```import order.*;import java.util.Enumeration;

</p>
class Year implements Ordered
{
int year;
Year(int y) { year = y; }
public int compareTo(Ordered y) {
if (year > ((Year) y).year)
return 1;
else if (year < ((Year) y).year)
return -1;
else
return 0;
}
public String toString() {
return Integer.toString(year);
}
}
class TreapDemo
{
public static void main(String[] args) {
Treap treap = new Treap();
treap.put(new Year(1935), "Elvis Presley");
treap.put(new Year(1926), "Chuck Berry");
treap.put(new Year(1941), "Bob Dylan");
treap.put(new Year(1936), "Roy Orbison");
treap.put(new Year(1915), "Muddy Waters");
Enumeration k = treap.keys();
Enumeration e = treap.elements();
while (k.hasMoreElements()) {
System.out.print(k.nextElement() + ": ");
System.out.println(e.nextElement());
}
}
}
```

DDJ

More Insights

 To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.