Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Design

Splay Trees


DEC92: SPLAY TREES

SPLAY TREES

For fast access of frequently accessed data

Dean Clark

Dean is a programmer at Logicon R&D Associates in Albuquerque, New Mexico, specializing in computer graphics. He can reached on CompuServe at 71160,2426.


We tend to view software programs as composed of two distinct parts: code and data. We think of code as the animated, active component, while data just sort of lays around wherever we happen to put it. As programmers, one of our duties is to arrange for data to be stored in such a way that our programs can access it efficiently. Usually we assume that once we've put the data in place, it stays put.

The very term "data structure" implies something static, like a bridge or a sky-scraper. Most data structures do not provide a mechanism to rearrange data to account for changing conditions as a program executes. However, in many situations we could improve performance if the data itself could respond to the way the program uses it. What we'd like to have, in fact, are self-adjusting data structures.

Coincidentally, such beasts do exist. They not only modify their shape after obvious operations like insert and delete, they also adjust themselves for innocuous operations like find.

Self-adjusting data structures have been around for quite a while. A type of self-adjusting linear list is used to implement the least-recently-used paging scheme in virtual-memory systems. More recently we have the introduction of skew heaps, Fibonacci heaps, red-black trees, and (drum roll ... ) splay trees.

A splay tree is a normal binary search tree (BST) in all outward respects. The difference is that the standard tree operations--insert, delete, and find--are implemented in terms of an operation called SPLAY.

The easiest way to describe the SPLAY operation is by example. Suppose you want to find a particular key K in a tree (assume K is in the tree). The first step is to search for K in the tree. SPLAY locates the node by performing a normal tree traversal.

Now comes the weird part. Instead of just returning K, SPLAY promotes it to the root of the tree via a series of operations called "rotations." Find then returns the root of the tree. These rotations reorganize the tree such that skewed trees tend to become bushier and therefore more balanced. The node containing the target of the search is now at the root of the tree, so the next time you go looking for it, it'll be waiting right near the top.

That's the real reason for splay trees; frequently accessed data stays near the root of the tree, so most accesses are very fast. Suppose a bank kept its customer database in a splay tree. Frequent customers would tend to be near the root of the tree, while inactive customers would be located near the leaves. Since records for frequent customers are accessed more often, overall performance is improved.

At the same time, SPLAY tends to balance skewed trees. A splay tree is not a balanced tree in the sense that height balancing is one of its invariants, and it can have any shape (even all left or all right children, just like a BST. Nevertheless, for any sequence of k insert, delete and find operations, on a tree of size N, splay trees guarantee that we'll do O(k lg N) work.

Hold it. If a splay tree can be just as skewed as a BST, why isn't its worst case O(n), like the BST? Because the splay tree adjusts itself. In fact, the O(k lg N) bound for the splay tree is shown not through traditional worst-case analysis but by using amortized analysis. Amortized analysis deals with sequences of operations instead of focusing on a single horrible situation like traditional techniques.

Rather than showing a formal proof, let's see if we can get a feel for the splay tree's behavior by comparing it to a plain vanilla binary search tree. The worst thing that can happen in a BST is for all nodes to end up on either the left or right side. If we access the very bottom node, we pay O(n). We could do that forever, paying the same price each time. We also pay O(n) (worst case) to build our unfortunate tree in the first place. Overall cost: O(n).

What if we try the same thing with the splay tree? First of all, in order to get a linear tree, all the nodes must have gone to the root of the tree, so we only pay O(1) for each insert, not O(n), as is the worst case for the ordinary BST. Now, finding the very bottom node of a linear splay tree is about the same amount of work as finding it in a BST. But since all the inserts were so cheap, on the whole we've done much less than O(n) work--in fact, just about O(lg n).

Furthermore, since finding the bottom node means we "splayed" the tree, the whole tree is no longer linear. In fact, it's about half as deep as it was, so if we go get the deepest node again, it's only half as far to go, which again reduces the tree depth, and so on. Eventually, the tree is just about balanced, and all finds take about O(lg n) work.

The gist of the splay tree analysis is that some operations are worse than lg n, and some are better. For any possible combination of operations, the good ones always balance the bad ones to result in O(lg n) overall behavior.

In a splay tree, the find operation is implemented as a SPLAY operation to promote the node to the root. The find operation then simply returns the root of the tree (assuming the node is in the tree). The other common BST operations, delete and insert, are implemented similarly. For insert, the tree is traversed to find the right place for the new node, a normal insert is performed, and then SPLAY is called to promote the new node to the root.Figure 1 illustrates the insert operation.

Deleting is slightly more complex. First, perform a SPLAY on the node to be deleted; this promotes the node to the root. Delete the node; this leaves you with two orphaned subtrees. Pick one of the subtrees (say the left one) and again SPLAY on the node just deleted. Of course the node won't be found; you'll get the in-order successor or predecessor to the node, which becomes the new root of the tree. Because everything in the right orphaned subtree must be greater than everything in the left orphaned subtree, the right subtree is simply attached to the new root. Delete therefore requires two splayings on the tree. Figure 2 illustrates a delete operation.

Rotations

The basis of splaying is the rotations that bring the target node to the root. Three kinds of rotations correspond to three patterns of subtrees consisting of the key node, its parent, and its grandparent. The rotations are called zig, zig-zig, and zig-zag.

The zig rotation is easiest and can only occur when the key node's parent is the root of the tree. This is illustrated in Figure 3, where K is the key node and P is the parent. For the zig-zig and zig-zag rotations, the key node has both a parent and grandparent. If the key node K is the left child of a left child, then a zig-zig rotation is called for, as shown in Figure 4. Finally, if the key is the right child of a left child, then we do a zig-zag rotation; see Figure 5.

Each rotation has its mirror twin: That is, we also zig when the key is the right child of the root, zig-zig for a right child of a right child, and zig-zag for a left child of a right child.

It's easy to see that a zig-zag rotation tends to make the tree more balanced. It's not as readily apparent that the other rotations help the balancing as well. Let's try an example.

Assume we have a BST with all left children and we want to find the left-most child. Figure 6 shows the tree before SPLAY gets ahold of it. Performing a SPLAY on node 1 results in the intermediate steps shown in Figure 7.

It's not readily apparent from an example with only nine nodes, but the resulting tree is roughly half as deep as the original. Notice also that only zig-zig rotations were necessary.

The Code

The subroutines for the six SPLAY rotations are in Listing One (page 106). All rotations require that we have some way to get to the parents of a node. For bottom-up splaying (it's also possible to splay from the top down), we can either push the traversal path onto a stack or store parent pointers in the nodes themselves. I've chosen the latter approach.

Listing Two (page 106) contains the SPLAY routine itself. Listing Three (page 107) contains generic find, delete, and insert routines. You must supply functions for comparing keys and recovering memory used by a deleted node, because these are data dependent.

Note that the overhead for the rotations is pretty low, just a few pointer assignments and a couple of tests. This, combined with the fact that the number of rotations required is roughly half the height of the tree, means that splay trees have good overall performance compared to standard BSTs.

Conclusions

Splay trees are one example of a self-adjusting data structure, a kind of data structure that rearranges itself in response to changing program operations. They're easy to code and maintain, have low overhead, and can really improve performance for data-access situations that are heavily skewed but not predictably so.

References

Moret, Bernard M.E. and Henry D. Shapiro. Algorithms from P to NP. vol. 1. Redwood City, CA: Benjamin Cummings, 1991.

Sleator, Daniel D. and Robert E. Tarjan. "Self-Adjusting Binary Search Trees." Journal of the ACM (July, 1985).

Weiss, Mark A. Data Structures and Algorithm Analysis. Redwood City, CA: Benjamin Cummings, 1992.



_SPLAY TREES_
by Dean Clark


[LISTING ONE]
<a name="02c7_0009">

void zig_left(TREENODE *t)
{
    TREENODE    *p, *r;
    p = t->parent;
    r = t->right;
    t->right = p;
    if (r) {
        r->parent = p;
        }
    p->left = r;
    p->parent = t;
}
void zig_right(TREENODE *t)
{
    TREENODE    *p, *l;
    l = t->left;
    p = t->parent;
    p->right = t;
    if (l) {
        l->parent = t;
    }
    p->right = l;
    p->parent = t;
}
void zig_zig_left(TREENODE *t)
{
    TREENODE    *p, *pr, *g, *r, *gp;
    p = t->parent;
    g = p->parent;
    gp = g->parent;
    r = t->right;
    pr = p->right;
    p->right = g;
    g->left = pr;
    t->parent = gp;
    t->right = p;
    p->left = r;
    if (r) {
        r->parent = p;
    }
    if (pr) {
        pr->parent = g;
    }
    if (gp) {
        if (gp->left == g) {
            gp->left = t;
        }
        else {
            gp->right = t;
        }
    }
}
void zig_zig_right(TREENODE *t)
{
    TREENODE    *p, *pl, *g, *l, *gp;
    p = t->parent;
    g = p->parent;
    gp = g->parent;
    l = t->left;
    pl = p->left;
    p->left = g;
    g->right = pl;
    t->parent = gp;
    t->left = p;
    p->right = l;
    if (l) {
        l->parent = p;
    }
    if (pl) {
        pl->parent = g;
    }
    if (gp) {
        if (gp->left == g) {
            gp->left = t;
        }
        else {
            gp->right = t;
        }
    }
}
void zig_zag_left(TREENODE *t)
{
    TREENODE    *p, *gp, *ggp, *l, *r;
    p = t->parent;
    gp = p->parent;
    ggp = gp->parent;
    l = t->left;
    r = t->right;
    t->parent = ggp;
    t->left = p;
    t->right = gp;
    gp->parent = t;
    p->parent = t;
    p->right = l;
    gp->left = r;
    if (l) {
        l->parent = p;
    }
    if (r) {
        r->parent = gp;
    }
    if (ggp) {
        if (ggp->left == gp) {
            ggp->left = t;
        }
        else {
            ggp->right = t;
        }
    }
}
void zig_zag_right(TREENODE *t)
{
    TREENODE    *p, *gp, *ggp, *l, *r;
    p = t->parent;
    gp = p->parent;
    ggp = gp->parent;
    l = t->left;
    r = t->right;
    t->left = gp;
    t->right = p;
    t->parent = ggp;
    p->parent = t;
    gp->parent = t;
    gp->right = l;
    p->left = r;
    if (l) {
        l->parent = gp;
    }
    if (r) {
        r->parent = p;
    }
    if (ggp) {
        if (ggp->left == gp) {
            ggp->left = t;
        }
        else {
            ggp->right = t;
        }
    }
}




<a name="02c7_000a">
<a name="02c7_000b">
[LISTING TWO]
<a name="02c7_000b">

/* Splay functions
**  Assumptions: Tree is pointed to by global variable T,
**   type KEYTYPE
**     External function Compare_Key(KEYTYPE k1, KEYTYPE k2) returns
**         0 if the two keys are equal, -1 if first key is less than
**         second, 1 if first is greater than second
**     External rotation functions from listing 1
*/

Splay(KEYTYPE k)
{
    int gle;
    /* Assume global tree T.  Traverse T looking for key.  Compare_Key()
    ** returns 0 if the two keys are equal, -1 if the first is less than
    ** the second, 1 if first is greater than the second */
    while ((gle = Compare_Key(T->key,k)) != 0) {
        if (gle > 0) {
            if (T->left) {
                T = T->left;
            }
            else break;
        }
        else {
            if (T->right) {
                T = T->right;
            }
            else break;
        }
    }
    /* T now points to the node containing the key k, or to the inorder
    ** successor or predecessor to k. We don't really care which at
    ** this point. T will be root when its parent pointer points to itself */
    while (T->parent != T) {
        if (T->parent->parent) {
            /* zig-zig or zig-zag*/
            if (T->parent->parent->left == T->parent) {
                if (T->parent->left == T) {
                    zig_zig_left(T);
                }
                else {
                    zig_zag_left(T);
                }
            }
            else {
                if (T->parent->right == T) {
                    zig_zig_right(T);
                }
                else {
                    zig_zag_right(T);
                }
            }
        }
        else {
            /* zig */
            if (T->parent->left == T) {
                zig_left(T);
            }
            else {
                zig_right(T);
            }
        }
    }
}






<a name="02c7_000c">
<a name="02c7_000d">
[LISTING THREE]
<a name="02c7_000d">

/* Splay Tree Standard Operations -- Find, Delete and Insert functions for
**   splay trees. Assumptions:
**     Tree is pointed to by global variable T, type KEYTYPE
**     External function Compare_Key(KEYTYPE k1, KEYTYPE k2) returns
**         0 if the two keys are equal, -1 if first key is less than
**         second, 1 if first is greater than second
**     External function Destroy_Key(KEYTYPE k) that recovers memory
**         used by a key
**     Special KEYTYPE variable ERROR_KEY used as an error sentinel
*/

KEYTYPE Find(KEYTYPE k)
{
    Splay(k);
    /* If k was in the tree it's now the root node */
    if (T->key == k) {
        return (T->key);
    }
    else {
        return (ERROR_KEY);
    }
}
void Delete(KEYTYPE k)
{
    TREENODE    *l, *r;
    /* Bring target node to the root */
    Splay(k);
    /* Make sure key was in the tree... */
    if (Compare_Key(T->key, k) == 0) {
        /* Detach the node and dispose of it */
        l = T->left;
        r = T->right;
        Destroy_Node(T);
        /* Splay left subtree to find the new root of tree (see text) */
        T = l;
        Splay(k);
        /* Root and left subtree are fine now, just attach right subtree */
        T->right = r;
    }
}
void Insert(KEYTYPE k)
{
    TREENODE    *x;
    /* Insert the new node into the tree */
    Attach_Node(k);
    /* Now Splay to bring the new node to root. This is somewhat inefficient
    ** because Splay searches the tree all over again. */
    Splay(k);
}














Copyright © 1992, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.