A Template Implementation of Skip Lists

, January 01, 1997

January 1997/A Template Implementation of Skip Lists

A Template Implementation of Skip Lists

Michael Martinka

Skip lists are an interesting alternative to balanced trees. This template class encapsulates the essential data and logic of a generic skip list.

Introduction

For storing an ordered collection of objects, balanced trees are often the data structure of choice, because balanced trees provide fast access to data. However, balanced trees become inefficient when objects are frequently inserted and deleted from the collection. Each insertion/deletion requires expensive rebalancing of the tree. In this situation, skip lists may be the better choice of data structure. Skip lists perform the same tasks as balanced trees but use a probabilistic algorithm to avoid the overhead of balancing a tree after each modification of the collection.

Skip lists were first introduced in 1990 by William Pugh [1] . I have been working with skip lists since 1991 and have found them very effective in cases where I need to maintain a sorted collection of rapidly changing data. In this article, I show how to create a templatized skip list. A templatized skip list is one whose contained objects are all of a type as specified by a C++ template parameter. Specifying object type as a parameter at compile time enables creation of type-safe and efficient generic lists.

What is a Skip List

A skip list is a set of singly linked lists (see Figure 1) . These lists are ranked so that the bottom-most list contains all objects in the collection, the next list up contains a subset of the collection, and so on, so that each list contains fewer and fewer objects. Each object in the skip list can potentially reside in one or more lists. The number of lists in which the object actually resides is determined at random, without regard to the data or the number of objects currently in the collection. In Figure 1, the element labeled '2' resides in five lists; element '6' resides in two lists; the rest reside in only one.

Searching this structure proceeds at a rate comparable to that of searching a balanced tree. The search begins at the topmost list and proceeds along that list (horizontally) to and including the last element less than the element being searched for; the search process then drops down a list and continues in the horizontal direction, starting with the same element. When the search of the bottom list is completed, the search is finished. (Note that the search always terminates in the bottom list, even when the object sought could have been found in an upper list. Checking at every level for a match would destroy the skip list's speed advantage.) Figure 2 shows the search path used to find item 7.The average search speed for a skip list is O(log n).

The insertion and deletion routines use the same search process but also build an update vector which contains the pointers followed in the search. This update vector will contain one pointer from each list in the skip list. Each pointer will determine where in each corresponding list the object in question is to be inserted or deleted. This insertion or deletion occurs in each list just as if that list was a simple singly-linked list.

Two factors influence the search speed and memory usage of the skip list. They are the probability of a node being added to the next higher list and the maximum number of lists in the skip list. Pugh shows that the most efficient search will result when the probability, p, of adding a node to the next higher list is equal to 1/e, with 1/2 and 1/4 providing only slightly slower search performance. Selecting p = 1/4 also results in an average of 1.33 pointers per node, reducing memory overhead from a selection of 1/e. The other factor is the maximum number of lists, h, in the skip list. This number affects the number of objects the list can store, (1/p)^h, before search performance begins to degrade. For p = 1/4 and h = 15 the skip list can store approximately one billion objects before performance suffers.

Skip lists offer several advantages over balanced trees. Insertion times range from 1.5 times faster than non-recursive AVL trees to almost 4 times faster than recursive 2-3 trees. Deletion times show similar advantages. Pugh also mentions that skip lists offer advantages when multiple processors are updating the list in shared memory.

The main disadvantage of skip lists is their probabilistic nature. In Pugh's 1990 article he performs an empirical analysis of the probabilities of getting suboptimal search times. This analysis shows that for lists containing a large number of objects (greater than 256) the probability of a search taking more than twice the optimal time is about 1 in 1,000. This probability decreases rapidly as the number of objects in the skip list increases.

A C++ Template Implementation

Overview

The C++ implementation contains two template classes. Skiplist encapsulates the skip list and its programmer interface; SLPosition represents a single node in the skip list (see Listing 1) . The implementation shown is a two-parameter template, with one parameter for the object type and one for the sort key type. This second parameter allows the objects stored in multiple skip lists to be ordered by keys of different types. The implementation requires that the KEY class have a copy constructor, and less-than and equal-to operators defined.

Class SLPosition

The implementation creates an SLPosition object for each object stored in the skip list. In an ideal implementation, SLPosition would be a nested class of the Skiplist class; however, I have not found a compiler that will allow nested templates.

SLPosition's data member is a pointer to the object being stored. The key member contains a copy of the sort key. The forward member is an array of pointers. This array contains one pointer for each list that contains this SLPosition object.

The SLPosition class contains two constructors, shown in Listing 2. The first accepts only an integer; it is used by the Skiplist to build the head of the list. The SLPosition object created using this kind of constructor contains no key or data. The second constructor accepts an integer, a pointer of type DATA, and a KEY. This constructor allocates sufficient space to hold a pointer for each list of which this SLPosition will be a part. The number of lists is the first argument.

Skiplist

The Skiplist class constructor, Listing 3, builds the head of the skip list by calling the SLPosition constructor for the maximum number of lists to be used by the skip list. Building a full head for the list will waste memory if the skip list contains only a few items. If the skip list is used to store a small collection of objects it may be prudent to allocate only one pointer in the constructor and then reallocate the list head as needed.

Skiplist's find member, Listing 4, searches the skip list for an object with a key that matches the key parameter. find returns a pointer to an object with a matching key, or NULL if no object in the skip list has a matching key. The search begins at the top-most non-empty list and proceeds as previously described until it reaches the bottom list. At each list, the search procedure is the same. The search traverses each list until it encounters the last object whose key is less than the search key. After it has searched the bottom list, the search routine advances the pointer by one SLPosition and returns a pointer to the object if the SLPosition's key matches the search key. Otherwise, the search failed and the search routine returns a NULL.

Insertion

The insertion function, insert (Listing 5) , performs a search using the key of the object to be inserted. For each list in the skip list, the algorithm stores a pointer to the SLPosition that is the last in the list with a key that is less than the key of the object to be inserted. These pointers are stored in the update array. After constructing the update array, the insert function builds a new SLPosition for the object being inserted. The rand_level member determines how many lists the new SLPosition will be a part of. insert picks this number of lists so that roughly 1/4 of the SLPositions in list 0 (the bottom list) will be contained in list 1; 1/4 of the SLPositions in list 1 will be contained in list 2; and so on. The algorithm then inserts the SLPosition object into the lists. Each list uses the pointer saved in the update array to identify the object whose key is less than the one to be inserted. The insertion itself is done just as it would be for a singly-linked list.

Deletion

Removing an object from the skip list is similar to inserting one The remove function (Listing 6) constructs an update array using the key of the object to be removed, just as was done in the insert function. After constructing the update array the delete function removes the SLPosition object from each list in which it is a part. Each list uses the pointer saved in the update array to identify the object whose key is less than the one to be removed. Again, removing the object proceeds just as it would for a singly linked list. delete returns a pointer to the object removed. delete returns NULL if it finds no matching object.

An Example

Listing 7 shows a test program for the Skiplist class. The program instantiates the Skiplist template class for objects of type person, and keys of type int. person is a struct, which stores a name and age for each person. Although this program is fairly trivial, it demonstrates the basics of using a skip list. It shows how to create a list of people ordered by age, insert and delete people from the list, and how to search by age.

Summary

Skip lists offer a more efficient alternative to balanced trees when the number of insertions and deletions is comparable to the number of searches. This implementation demonstrates the basic capabilities of skip lists. Skip lists also allow for simple implementation of more advanced capabilites, such as merging and searching for objects matching a range. A more complete implementation would contain iterators to traverse the list and handle cases of multiple objects with the same key value.

Acknowledgment

I would like to acknowledge Barrett Richey of BTG, Inc., whose C implementation first introduced me to skip lists.

References

[1] William Pugh. "Skip Lists: A Probabilistic Alternative to Balanced Trees," Communications of the ACM. June 1990, Vol. 33, Num. 6, pp. 668-676.

[2] Frederick W. Hegeman. "Skip Lists," The C Users Journal. Febraury 1991, Vol. 9, Num. 2, p. 33.

Michael E. Martinka has an M.S. in Computer Science from George Washington University. He is a software engineer with QuesTech, Inc., a systems development firm in the Washington D.C. area. He has been programing in C and C++ for the last five years. He may be contacted at [email protected].

Previous 5 6 7 8 9 10

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

A Template Implementation of Skip Lists