Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Database

Skip Lists


JAN94: Skip Lists

Skip Lists

They're easy to implement and they work

Bruce Schneier

Bruce is the author of Applied Cryptography (John Wiley & Sons) and can be reached at 730 Fair Oaks Ave., Oak Park, IL 60302.


Binary trees look great on paper, but they can have problems in real-life implementations. If nodes are inserted and deleted randomly, binary-tree performance is unmatched. If, however, nodes are inserted in order, the binary-tree structure degenerates into a linked list, and performance plummets. There are tree-balancing algorithms in every post-Knuth book on algorithms ever written, but they are difficult to implement and consume a lot of execution time. Dean Clark's "Splay Trees" (DDJ, December 1992) and my article, "Red-Black Trees" (DDJ, April 1992) describe binary trees with almost-balanced properties that are relatively efficient. Skip lists are even better.

Skip lists were invented by William Pugh at the University of Maryland as an easy alternative to balanced trees. To understand skip lists, we first have to look at their derivation from simple linked lists.

A linked list is a dynamic data structure in which every node points to the node after it. It's easy to implement, but to find a particular node, you have to look at every node before it in the list; see Figure 1(a). If, however, every other node had an additional pointer that skipped to the node two ahead of it, you would only have to look at every other node in the list before you found the node you were looking for, and then maybe one more; see Figure 1(b). If, in addition, every fourth node had yet another pointer to the node four ahead of it, you would only have to look at about every fourth node plus another two before you found the one you wanted; see Figure 1(c). And so on. This would be great for searching--as efficient as a balanced binary tree--but insertion and deletion would be a nightmare. With every insertion and deletion, every node would have to have its pointers rearranged to conform to the skip structure.

Skip Lists

Pugh's skip lists get around this problem by taking a probabilistic approach to the skip pointers; see Figure 1(d). Instead of demanding that every second node have two pointers, every fourth node have three pointers, and so on, the number of pointers in a particular node is determined probabilistically. On the average, half the nodes only have pointers to the node directly in front of them. The other half also have pointers to the node two ahead of them. Half of those nodes also have pointers to the node four ahead of them, half of those nodes also have pointers to the node eight ahead of them, and so on. This means that half the nodes have one pointer, a quarter have two pointers, an eighth have three pointers, and so forth. The addition of randomness means that insertions and deletions won't require modifications to the whole list. Sure, the random dice might degenerate into a normal linked list with lousy performance characteristics, but the odds of that happening are minuscule for a list of reasonable size.

Skip-list algorithms are substantially simpler than balanced tree algorithms. C code for searching, inserting, and deleting nodes in skip lists is given in Listing One (page 88). Pascal code is available electronically; see "Availability," page 3.

The function search() implements the searching algorithm. Searches start at the top level of pointers, those that cover the most ground. Simply search until the node's value is greater than the value you are looking for, and then drop down to the next level of pointers. Eventually you will either find the node or overshoot the value on the bottom pointer level. Figure 2(d) shows an example search path through a skip list.

Skip-list insertion is implemented in the function insert(), which is nothing more than searching and splicing. Depending on the number of pointers the new node has, determined randomly by the function randomlevel(), different pointers will have to be updated. In the example source code, the array update[] contains the list of pointers that must be updated.

Deleting is also easy: Search for the correct node, pull it out of the list, and reconnect a few pointers. The function delete() implements skip-list deletion.

More with Skip Lists

There are still more tricks to improve skip-list performance. You can vary the constant p in the probabilistic algorithm that determines how many pointers a given node has. This affects both performance as well as disk-storage requirements. If p=1/2 (as described in the example), the average node has two pointers.

This is no better than a linked list. If p is reduced to 1/4, the algorithm runs no slower, but the average node has only 1.33 pointers. A p of 1/8 runs one-third slower than a p of 1/4, but only 1.14 pointers are required for each node. If p=16, search time is double that of a p of 1/4, but only 1.07 pointers are required for each node, and so on. Since the space savings converge on 1 fairly rapidly, little savings is gained by making p any less than 1/4.

Conclusion

William Pugh said it best: "From a theoretical point of view, there is no need for skip lists. Balanced trees can do everything skip lists can and have good worst-case time bounds (unlike skip lists). However, the difficulty of implementing balanced trees greatly restricts their use_. Skip lists are simple data structures that can be used in place of balanced trees for most applications." They're easy to implement. They're as versatile as any balanced tree algorithm. They're faster than most tree-balancing "trick" algorithms. They require less memory storage for pointers than tree structures. And they work.

Bibliography

Pugh, William. "Skip Lists: A Probabilistic Alternative to Balanced Trees." Communications of the ACM (June 1990).

Pugh, William. "A Skip List Cookbook." Tech. Report CS-TR-2286.1. Department of Computer Science, University of Maryland, July 1989.

Pugh, William. "Concurrent Maintenance of Skip Lists." Tech. Report, CS-TR-2222.1. Department of Computer Science, University of Maryland, April 1989.

Figure 1: (a) Finding a particular node; (b) looking at every other node in the list; (c) looking at about every fourth node plus another two; (d) taking a probabilistic approach to skip pointers.

Figure 2: Search path through a skip list.


[LISTING ONE]


/* This file contains source code to implement a dictionary using
skip lists and a test driver to test the routines. A couple of comments
about this implementation: The routine randomLevel has been hard-coded to
generate random levels using p=0.25. It can be easily changed. The insertion
routine has been implemented so as to use the dirty hack described in the CACM
paper: if a random level is generated that is more than the current maximum
level, the current maximum level plus one is used instead. Levels start at
zero and go up to MaxLevel (which is equal to (MaxNumberOfLevels-1).
The compile flag allowDuplicates determines whether or not duplicates
are allowed. If defined, duplicates are allowed and act in a FIFO manner.
If not defined, an insertion of a value already in the file updates the
previously existing binding. BitsInRandom is defined to be the number of bits
returned by a call to random(). For most all machines with 32-bit integers,
this is 31 bits as currently set. The routines defined in this file are:
  init: defines NIL and initializes the random bit source
  newList: returns a new, empty list
  freeList(l): deallocates the list l (along with any elements in l)
  randomLevel: Returns a random level
  insert(l,key,value): inserts the binding (key,value) into l. If
    allowDuplicates is undefined, returns true if key was newly
    inserted into the list, false if key already existed
  delete(l,key): deletes any binding of key from the l. Returns
    false if key was not defined.
  search(l,key,&value): Searches for key in l and returns true if found.
    If found, the value associated with key is stored in the
    location pointed to by &value
*/

#define false 0
#define true 1
typedef char boolean;
#define BitsInRandom 31
#define allowDuplicates
#define MaxNumberOfLevels 16
#define MaxLevel (MaxNumberOfLevels-1)
#define newNodeOfLevel(l) (node)malloc(sizeof(struct
nodeStructure)+(l)*sizeof(node *))

typedef int keyType;
typedef int valueType;

#ifdef allowDuplicates
boolean delete(), search();
void insert();
#else
boolean insert(), delete(), search();
#endif

typedef struct nodeStructure *node;
typedef struct nodeStructure{
    keyType key;
    valueType value;
    node forward[1]; /* variable sized array of forward pointers */
    };
typedef struct listStructure{
   int level;     /* Maximum level of the list
            (1 more than the number of levels in the list) */
   struct nodeStructure * header; /* pointer to header */
   } * list;
node NIL;
int randomsLeft;
int randomBits;
init() {
  NIL = newNodeOfLevel(0);
  NIL->key = 0x7fffffff;
  randomBits = random();
  randomsLeft = BitsInRandom/2;
};
list newList() {
  list l;
  int i;
  l = (list)malloc(sizeof(struct listStructure));
  l->level = 0;
  l->header = newNodeOfLevel(MaxNumberOfLevels);
  for(i=0;i<MaxNumberOfLevels;i++) l->header->forward[i] = NIL;
  return(l);
  };
freeList(l)
  list l;
  {
  register node p,q;
  p = l->header;
  do {
    q = p->forward[0];
    free(p);
    p = q; }
    while (p!=NIL);
  free(l);
  };
int randomLevel()
  {register int level = 0;
   register int b;
   do {
    b = randomBits&3;
    if (!b) level++;
    randomBits>>=2;
    if (--randomsLeft == 0) {
    randomBits = random();
    randomsLeft = BitsInRandom/2;
    };
    } while (!b);
    return(level>MaxLevel ? MaxLevel : level);
    };
#ifdef allowDuplicates
void insert(l,key,value)
#else
boolean insert(l,key,value)
#endif

register list l;
register keyType key;
register valueType value;
  {
  register int k;
  node update[MaxNumberOfLevels];
  register node p,q;
  p = l->header;
  k = l->level;
  do {
    while (q = p->forward[k], q->key < key) p = q;
    update[k] = p;
    } while(--k>=0);
#ifndef allowDuplicates
  if (q->key == key) {
    q->value = value;
    return(false);
    };
#endif
    k = randomLevel();
    if (k>l->level) {
    k = ++l->level;
    update[k] = l->header;
    };
    q = newNodeOfLevel(k);
    q->key = key;
    q->value = value;
    do {
    p = update[k];
    q->forward[k] = p->forward[k];
    p->forward[k] = q;
    } while(--k>=0);
#ifndef allowDuplicates
    return(true);
#endif
    }
boolean delete(l,key)
register list l;
register keyType key;
  {
  register int k,m;
  node update[MaxNumberOfLevels];
  register node p,q;
  p = l->header;
  k = m = l->level;
  do {
    while (q = p->forward[k], q->key < key) p = q;
    update[k] = p;
    } while(--k>=0);
  if (q->key == key) {
    for(k=0; k<=m && (p=update[k])->forward[k] == q; k++)
      p->forward[k] = q->forward[k];
    free(q);
    while( l->header->forward[m] == NIL && m > 0 )
         m--;
    l->level = m;
    return(true);
    }
    else return(false);
    }
boolean search(l,key,valuePointer)
register list l;
register keyType key;
valueType * valuePointer;
  {
  register int k;
  register node p,q;
  p = l->header;
  k = l->level;
  do while (q = p->forward[k], q->key < key) p = q;
      while (--k>=0);
  if (q->key != key) return(false);
  *valuePointer = q->value;
  return(true);
  };
#define sampleSize 65536
main() {
  list l;
  register int i,k;
  keyType keys[sampleSize];
  valueType v;
  init();
  l= newList();
  for(k=0;k<sampleSize;k++) {
    keys[k]=random();
    insert(l,keys[k],keys[k]);
    };
  for(i=0;i<4;i++) {
    for(k=0;k<sampleSize;k++) {
        if (!search(l,keys[k],&v)) printf("error in search #%d,#%d\n",i,k);
        if (v != keys[k]) printf("search returned wrong value\n");
        };
    for(k=0;k<sampleSize;k++) {
        if (! delete(l,keys[k])) printf("error in delete\n");
        keys[k] = random();
        insert(l,keys[k],keys[k]);
        };
    };
  freeList(l);
  };
End Listing


Copyright © 1994, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.