Algorithms often require fast dynamic set operations -- Search, insert, delete, and successor/predecessor, to name a few. One rich dynamic set interface is
std::set, a part of the Standard C++ Library. Library vendors typically implement this interface using a red-black tree, though other types of balanced search trees can also be used. Implementing any sizable dynamic set interface efficiently can be a daunting task.
In this article, I describe a different approach to implementing a dynamic set -- the alternating skip list (ASL). ASLs are an option anywhere that balanced search trees are appropriate. I present a useful subset of
std::set functionality using ASLs, and give a basic space/time comparison against a typical red-black tree implementation. If you are developing advanced data structures based on balanced trees, you will likely profit from the ASL approach.
The 1-2 Deterministic Skip List
Unlike the probabilistic skip list (PSL), described by William Pugh in "Skip Lists: A Probabilistic Alternative to Balanced Trees," (>Communications of the ACM, June 1990, ), deterministic versions of skip lists (DSLs) can support SEARCH, INSERT, and DELETE in logarithmic time in the worst case. ASLs are derived from the array form of the 1-2 deterministic skip list (see "Deterministic Skip Lists," by J.I. Munro, T. Papadakis, and R. Sedgewick, Proceedings of the ACM-SIAM Third Annual Symposium on Discrete Algorithms, January 1992, which serves as the starting point for our discussion. Figure 1 shows an array implementation of a 1-2 skip list.
Figure 1: Array implementation of a 1-2 skip list.
In array form, a 1-2 DSL can be viewed as a size
N singly linked list
augmented with variable-size nodes. Each node contains an array of forward pointers
of some exponential capacity: for example, 2
Like an unmodified list, one pointer links to a SUCCESSOR node. Nodes with two
or more pointers employ a fraction of the remaining pointers as skip pointers.
(Not all are necessarily used; more on this later.) The first skip pointer links
either to the second or third node down. That is, it skips over at least one,
but at most two nodes. This creates another (level 2) linked list with fewer
nodes. Second (level 3) skip pointers likewise skip over nodes in the level
2 list. The pattern repeats to a maximum of
and represents the "balance" or "gap" invariant of the 1-2 skip list. J.I. Munro
et al., refer to the "height" of a node as the number of linked lists in which
it is contained. Assuming extra HEADER and TAIL nodes 1 higher than the highest
ordinary node, then "between any two nodes of height
or higher there exist either 1 or 2 nodes of height
As with PSLs, SEARCH in a 1-2 DSL is a matter of efficient forward movement using skip pointers. INSERT and DELETE involve adding or removing a node, then re-adjusting node heights until gap invariance is restored. Forward iteration is analogous to the case of the singly linked list.
The 1-2 DSL is not, however, sufficient for implementing all of
std::set. First, you need a doubly linked list at level 1 to support reverse iteration. Less trivially, some operations require the ability to move efficiently in both directions. For each forward skip pointer we could add another to move backward, but that increases memory requirements significantly. The approach I adopt is to alternate the direction of the skip pointers at each level and turn each list into a ring. This places the HEADER node at both ends, so we'll call it END from now on. Due to gap invariance, you can still determine the predecessor node at any given level in constant time.
Another consideration is how to represent nodes. In C/C++, you can implement variable size nodes by exploiting the lack of array bounds checking -- a common PSL technique. That is, one possibility is to store both elements and pointer arrays contiguously in memory. A second method is to use fixed size nodes containing pointers to the pointer arrays. The choice is easy, given that INSERT and DELETE require the ability to raise and lower node heights. In some cases, this involves exchanging one pointer array for another of a different physical size. The first approach forces us to replace an entire node. Iterators necessarily store pointers to nodes, and height adjustments could create dangling pointers. Iterators are not invalidated when fixed size nodes are used.
A disadvantage of fixed size nodes is that more memory allocations are required. Nodes and pointer arrays are allocated separately, incurring a time cost not present with variable size nodes. Since all nodes require both a SUCCESSOR and PREDECESSOR pointer, but not necessarily any skip pointers, it makes sense to store these separately. This saves at least
+1 allocations, because pointer arrays now need to be allocated only for nodes with skip pointers.
A final modification is to require that END is always of even height; the definition of gap invariance is changed accordingly. With all of the aforementioned changes, the DSL in Figure 1 becomes the ASL in Figure 2.
Figure 2: An alternating skip list.
The Alternating Skip List
The program asl_set (available electronically; see "Resource Center," page 5) is an ASL implementation of a dynamic set. Two types each of SEARCH, INSERT, and DELETE are implemented. An iterator class provides for both SUCCESSOR and PREDECESSOR operations. For clarity,
asl_set is not a Standard C++ container, but if you are familiar with the interface requirements, you will be able to make the necessary changes without difficulty. With more work, you can even create a complete implementation of
It is because the search path changes direction at each level that I named this modified 1-2 DSL the "Alternating Skip List." Search in an ASL involves following the second-to-top level list until you encounter the END node or a node with element >= the search key. You then drop a level and continue in the opposite direction until you reach the END node or a node with element <= the search key. The process repeats until the bottom level, where the search key is subsequently found or determined not to be in the list. The search path for 7 in the ASL from Figure 2 is END, 11, 8, 4, and 7, for example. Gap invariance ensures that you never make more than two comparisons before dropping. It follows that search is
The motivation for requiring that the END node keep even height is simply that implementation is easier if the search always begins in the same direction. Observe also that when implementing SEARCH, a subtle optimization is possible -- key comparisons against nodes with height greater than the current search level can be skipped. In such cases, it is always necessary to drop a level.
Now that you know how to search top-down for an element, you might ask how to search bottom-up (an operation not found in
std::set). That is, can you use an iterator as a search finger, or starting point, for the search? A poor hint iterator will result in slower search, but a perfect hint promises constant time lookup. To implement search fingers, you need only climb levels until you overshoot the element you wish to find. You then drop back down to the bottom, as in the top-down search procedure. In the worst case, you may ascend to the highest level before dropping. Search beginning with a hint iterator is therefore also logarithmic in the worst case.
To insert a node, simply search top-down and link into the bottom-level list. This gives the new node a height of 1. If there are now three nodes of height 1 in a row, increase the height of the middle node to 2. If this creates three nodes of height 2 in a row, raise the middle of these to height 3, and so on until gap invariance is restored. In the worst case, one node is raised at each level, finishing with the END node. Raising END is a special case because an increase of 2 is required to maintain even height. The same algorithm applies also to insertion using a hint iterator.
Figure 3 shows the transformations used by the insertion procedure. Figure 4 shows the result of inserting 0 into the ASL in Figure 2.
Figure 3: (a) Raising X, the middle of three nodes in a row (may leave X or Y as a middle node); (b) raising the END node. [Cases not shown are analogous to (a).]
Figure 4: Inserting 0 raises heights of 1 and 4.
Increasing the height of a node is straightforward. If the node height is less than the physical capacity of the skip pointer array, simply increment the height variable (which is stored in the node) and link into the list of the same level. If instead the node height is equal to the capacity of the array, copy the pointers over to a new array of twice the size -- the rest is the same as in the first case. The growth factor is not required to be 2, but it is important that the skip pointer arrays increase exponentially in capacity. To see this, suppose node heights always equaled the physical sizes of the skip pointer arrays. No skip pointers would be unused, but in the worst case you would end up copying 1+2 +
logN+2 pointers, resulting in an
N) insertion cost. The worst case improves to
O(logN) if array sizes increase exponentially since most raisings will then take constant time.
A failure to allocate memory during the raising procedure would prevent the restoration of gap invariance. A remaining consideration is therefore error handling. The easiest remedy is to preallocate as many skip pointer arrays as necessary to handle the worst case. If the preallocation step fails, a new node is not created and the structure stays gap invariant. Adding a preallocation step is as simple as maintaining a free list of skip pointer arrays in increasing order of size. In addition, this provides a substantial speed improvement by eliminating many calls to the memory manager. When a larger skip pointer array is required, a replacement is removed from the free list and the discarded array is added. The effect is to iterate over the free list, always returning all but the last array requested. One call to the memory manager then restores the free list to its previous state. In the worst case, the preallocation step consists of both this call and allocating one new array to compensate for a raised END node.
To delete an element, search to find the node to be removed. If the node has height greater than 1, lower it to 0 by passing the skip pointer array to its SUCCESSOR or PREDECESSOR node and updating all predecessor skip pointers. Finally, unlink from the bottom-level list. If there are now two nodes with heights greater than 1 in a row, lower one of them from 2 to 1. If this creates three nodes of height 1 in a row, finish by raising the middle of these -- gap invariance is restored. If there remain two nodes with heights greater than 2 in a row, lower one from 3 to 2. If this creates three nodes of height 2 in a row, finish by raising the middle node. Otherwise, check for two nodes with heights greater than 3 in a row, and so on. In the worst case, the skip pointer array of the highest node will be passed to a neighbor, and one node will be lowered at each level finishing with the END node. As with raising, lowering END is a special case because a decrease of 2 is required to maintain an even height. Note that since gap invariance is restored bottom-up, the same algorithm applies to deletion with an iterator argument.
Figure 5 shows the transformations used by the deletion procedure. Figure 6 shows the result of deleting 19 from the ASL in Figure 2>.
Figure 5: (a) Lowering X and raising Y (gap invariance is restored); (b) lowering X [Y becomes X in (a), (b), or (c)]; (c) lowering X (gap invariance is restored); (d) lowering the END node. [Cases not shown are analogous to (a), (b), or (c).]
Figure 6: Deleting 19 lowers 16 and 11, and raises 8.
It is easier to lower node heights than to raise them because we do not need to worry about memory allocation. To lower the height of a node, simply decrement the height variable. If the node height is now equal to one half the capacity of the array (assuming a growth factor of 2), copy the pointers over to a smaller array exactly one half the size. It is easy to see that like insertion, the deletion algorithm is
O(log N). Each time a smaller skip pointer array is required, the previous one released is used as the replacement. When gap invariance is finally restored, the last array discarded is added to the free list. If the free list already contains an array of this size, it is simply returned to the memory manager.
Tables 1 and 2 show basic space and "wall clock" time measurements for
asl_set and a red-black tree implementation of
std::set. Integers from 0 to
N were inserted in random order, searched for, and then deleted. The insertion and deletion algorithms for the red-black tree are based on those in Introduction to Algorithms, by Thomas H. Cormen, Charles E. Leiserson, and Ronald L. Rivest (McGraw-Hill, 1990). The test environment did not include virtual memory.
Table 1: Time comparison (in seconds, with relative times shown in parentheses).
Table 2: Space comparison after N insertions.
The times for
asl_set are faster than those of the red-black tree, though the numbers recorded give only a limited basis for comparison. The red-black tree uses a memory pool to improve performance, for example. A more comprehensive time comparison is beyond the scope of this article, so I'll state only that ASL algorithms appear to be competitive.
asl_set compares less favorably in terms of space. It is clear that
asl_set uses more memory than the red-black tree. The difference is actually even greater than reported, however. ASLs make more calls to the memory manager than balanced search trees in general, incurring additional space overhead not recorded in Table 2. When a typical allocator receives a request for a block of memory, a slightly larger amount is reserved. The extra space stores the size of the block (and perhaps other information) for later use if it is released. Since ASLs require many small arrays of fixed sizes, this additional space overhead quickly adds up. This suggests that space requirements can be reduced by using a memory pool. In addition, one could choose to store node heights together with the skip pointer arrays.
I have described a data structure based on the array form of the 1-2 deterministic skip list. Readers may not be surprised to discover that there is a one-to-one correspondence between ASLs, 1-2 DSLs, and 2-3 trees (a type of balanced search tree). ASLs happen to offer the same flexibility and complexity guarantees as balanced search trees, but have algorithms that are easier to express iteratively. They may therefore prove a worthwhile addition to your programming toolbox.
Laurence can be contacted at email@example.com.