Threaded Binary Trees

Textbook discussions of threaded binary trees are almost always accompanied by source code in Pascal, aimed at the university student. In this article, Jim describes some routines he developed for traversing and managing these trees in Microsoft C.


March 01, 1988
URL:http://www.drdobbs.com/database/threaded-binary-trees/184407918

Figure 1

Figure 1

MAR88: THREADED BINARY TREES

THREADED BINARY TREES

Ease of traversal is the major advantage of a threaded binary tree

James Matthews

James Mathews, Blue Sly Software, P.O. Box 232, Absecon, NJ 08201.


I recently discovered a data structure called a threaded binary tree that combines the quick random lookup qualities of a binary tree with the easy traversal of a linked list. This data structure provided a clean and reasonably elegant solution to an application program I was developing. In this article I describe how to create and traverse threaded binary trees and provide a set of functions written in Microsoft C that illustrate how I implemented this data structure in my application.

I'm the type of programmer who likes to tinker with programs just to see how they work. That particular personality trait recently lead me to work on an interactive spelling checker for the letters and documents I write (it certainly would have been more cost-effective simply to buy a spelling checker).

The spelling checker required a data structure that would allow all the unique words in a document to be identified and then retrieved in sorted order for comparison with the dictionary. I knew that a binary tree would be a fairly efficient way of identitying and sorting the unique words in a single step, and methods of building and traversing binary trees are well known and documented.

However, I also wanted the ability to scan back and forth interactively in the list of words not found in the dictionary and select whether a particular word was to be added to the dictionary, marked as misspelt, corrected interactively, or simply ignored. And here is where I ran into trouble with a binary tree--all my solutions to moving back and forth interactively to selected nodes in the tree involved more code than seemed necessary or reasonable. At one point I even considered converting the binary tree to a doubly linked list once the (possibly) misspelled words were identified, just to facilitate scanning back and forth over the words.

And then, while browsing though Fundamentals of Data Structures by Horowitz and Sahni,<fn1> I came across a description of threaded binary trees. Great, a threaded binary tree seemed to be just what I needed.

Each node of a binary tree typically contains pointers to left and nght child nodes. If a particular node does not have a left or right child, the corresponding pointer has a null value. In fact, it turns out that slightly more than half of the pointers in any binary tree will be null. For a binary tree with N nodes, there are 2N pointers (two per node) and N + 1 of those pointers will be null (refer to Horowitz and Sahni or another text for a discussion about why N + 1 pointers are null).

Threaded binary trees differ from other binary trees in that the null pointers to nonexistent left and right child nodes are used as "threads" to point to prior and successor nodes in the tree. Using the (otherwise) null pointers as threads to prior and successor nodes allows a threaded binary tree to be traversed in order almost as easily as a linked list while retaining the quick lookup of a binary tree. Figure 1, page 43, illustrates (a) a standard binary tree and (b) the same tree with the null pointers replaced by threads.

Figure 1: Standard and threaded binary trees.

The Code

Listing One, page 58, shows a C language header file (tbtree.h) that defines a TREE_NODE structure containing some of the fields used by my spelling checker application. A node is created in the threaded binary tree for every unique word found in the document being checked. Each TREE_NODE contains a pointer to the node's word (stored as a null-terminated string), an integer used to hold flag values, a usage count indicating how many times the word appears in the document, and pointers to left and right child nodes.

When traversing a threaded tree structure, it is necessary to know if a particular pointer field contains the address of a child node or if the pointer field contains a thread to a prior or successor node. The tbtree.h header file also defines two flags, RBIT and LBIT, which indicate if the right and left child pointers contain valid child node addresses or if they contain threads to other nodes. If the RBIT flag is set in a node, then that node has a right child node; if RBIT is not set, the right child pointer is a thread. Similarly, the LBIT flag identifies the usage of the left child pointer.

Notice that the RBIT and LBIT flags are the only extra storage required in the tree nodes. The threads occupy the right and left child pointers that would otherwise have been left null. Knuth<fn2> suggests that even these flags could be eliminated if a child node always resides at a higher memory address than its parent--a child node would have a higher address than the current node, whereas a thread would always contain a lower address. In my application, I chose to play it safe and added the two 1-bit flags. Because I already required a group of other flags for each node, these additional flags did not increase the size of a node.

tbtree.c in Listing Two, page 58, shows the routines I created to build and traverse a threaded binary tree. The function add2tree() adds a new word to the tree each time it's called provided the word doesn't already exist in the tree. add2tree() compares the prospective new word to existing nodes in the tree until it either locates a node already containlng that word or it locates a node to which the new word should be added as a right or left child. If the word already appears in the tree, add2word() simply increments the word's usage count and exits. If the word needs to be added to the tree, add2tree() invokes the linsert() or rinsert() function to add a node as a left or right child, respectively.

Functions linsert() and rinsert() implement the logic required to add a node to a threaded binary tree. Notice that these routines are actually quite simple. To add a new left child node, the new node is given the left child pointer from the parent node, the new node's right child pointer becomes a thread back to the parent, and the new node is linked to the parent as a left child. Function rinsert() is an equally simple implementation to add a right child to a node.

I should point out that the linsert() and rinsert() routines call function talloc() to allocate the memory for a new node. The first implementation of these routines called the standard malloc() library routine to allocate memory, but in analyzing the performance of the program on large documents, I found that malloc() was by far the largest consumer of CPU time. I therefore created the talloc() function to allocate tree nodes more efficiently. For relatively small tree structures, malloc() could certainly be used without noticeable degradation.

The functions shown in Listing Two require the presence of a specially initialized root node (although its structure is the same as that of any other node). The add2tree(), linsert(), and rinsert() functions build the threaded binary tree as a left subtree of the root node. An empty threaded binary tree consists of just the root node.

The root node contains a data value that is higher than that of any other node in the tree (I chose the ASCII . character as the root node's word value because . is numerically greater than any uppercase or lowercase alphabetic character). This avoids special code in the add2tree() function for dealing with the root node.

The right and left child pointers and the RBIT and LBIT fields in the root node are initialized such that the root node becomes the in-order predecessor of the lowest-valued tree node and the in-order successor of the highest-valued node. Traversing the tree from start to finish begins and ends at the root node. The definition of the root node occurs in Listing Two just prior to the add2tree() function.

Function inorder_succ() (also in Listing Two) returns the in-order successor of any node in the threaded binary tree. To find the successor node, inorder_ucc() looks at the right child of the starting node. If the right child pointer is a thread, then it points to the successor node and inorder_ucc() simply returns that node address. If the starting node has an actual right child node, however, the successor is the leftmost child of the starting node's right child. The leftmost child is located by a simple while statement that moves down the list of left child pointers.

Function inorder_pred() correspondingly locates the in-order predecessor of any node in the tree. inorder_pred() returns the starting node's left child pointer (if it's a thread) or the rightmost child of the starting node's left child.

To perform an in-order traversal of the entire threaded binary tree, you need only make successive calls to inordersucc() (for nodes in ascending order by data value) or inorder_pred() (descending order by data value). The following few lines of code, for example, would print out all the words in the tree in ascending order:

TREE_NODE *tp;

tp = &root;                    /* start at root */
while ((tp = inorder succ(tp)) != &root) /* end at root */
       puts(tp->word);

In Summary

Ease of traversal is the major advantage of a threaded binary tree. There are other well-known algorithms for traversing binary trees, but some of those require the use of recursion or an auxiliary stack to maintain the current location within the tree. Recursive algorithms and those that use an auxiliary stack require additional memory above and beyond the requirements of the tree itself. For applications in which the size and structure of the tree is not known beforehand (such as a spelling checker), trying to ensure that sufficient memory exists for the program or auxiliary stack can be difficult. Because threaded binary trees use the otherwise null pointers as threads, this type of tree can be traversed without additional runtime storage.

There are other algorithms that can traverse a binary tree without recursion or the use of an auxiliary stack, but they typically require the addition of a parent node pointer field to each node or the temporary reversal of the direction of the pointer fields and the addition of a flag to indicate if a particular node has already been processed. The linked list approach of a threaded binary tree is conceptually simpler and easier to implement.

Threaded binary trees are not a new form of data structure. They were documented in 1960 by Perlis and Thornton<fn3\> and are discussed by Knuth<fn 2>in his The Art of Computer Programming series. Threaded binary trees do not seem to have received the widespread recognition or usage that other forms of binary trees have achieved, however. I was prompted to write this article in the hope that I could introduce (or possibly reintroduce) the concepts to others who might find them useful.

Notes

    1. Ellis Horowitz and Sartaj Sahni, Fundamentals of Data Structures (Rockville, Md.: Computer Science Press, 1982).

    2. D. Knuth, The Art of Computer Programming: Fundamental Algorithms, Volume I, 2d ed (Reading, Mass.: Addison-Wesley, 1973).

    3. A. Perlis and C. Thornton, "Symbol Manipulation by Threaded Lists," Communications of the ACM, vol. 3, no. 4 (April 1960): 195-204.

[LISTING ONE]





/*  001  23-Apr-87  tbtree.h

   Header file for threaded binary tree functions.

   This code is hereby placed into the public domain.

*/

/* define TREE_NODE type */

typedef struct _tree_ent {
   char *word;                 /* word ptr for this node    */
   int usage;                  /* usage counter for word    */
   int flags;                  /* word flags                */
   struct _tree_ent *lchild;   /* thread or left child ptr  */
   struct _tree_ent *rchild;   /* thread or right child ptr */
} TREE_NODE;

/* define bit values for node flags */

#define RBIT (1)         /* right child if set, thread if 0 */
#define LBIT (2)         /* left child if set, thread if 0  */





[LISTING TWO]





/*  001  24-Apr-87  tbtree.c

   Functions to create and traverse a threaded binary tree.

                  James Mathews
                  Blue Sky Software
                  172 Manor Drive
                  Absecon, NJ  08201

   This code is hereby placed into the public domain.

*/

#include <stdio.h>
#include "tbtree.h"

#ifdef LINT_ARGS       /* setup for Microsoft C V 4.0 */
#include <malloc.h>
TREE_NODE *talloc();
void add2tree(char *);
TREE_NODE *linsert(TREE_NODE *);
TREE_NODE *rinsert(TREE_NODE *);
TREE_NODE *inorder_succ(TREE_NODE *);
TREE_NODE *inorder_pred(TREE_NODE *);
#else
char *malloc();
void add2tree();
TREE_NODE *talloc(), *linsert(), *rinsert();
TREE_NODE *inorder_succ(), *indorder_pred();
#endif

/* define the root node for the threaded tree */

TREE_NODE root = { "|", 0, RBIT, &root, &root };


/**********************************************************
                  A D D 2 T R E E
 **********************************************************/

void
add2tree(word)         /* add given word to the B-tree */
char *word;
{
   register int cmp;
   register TREE_NODE *tp = &root;

   /* search tree to find word, add it if not there */

   while ((cmp = stricmp(word,tp->word)) != 0) {


      if (cmp < 0)        /* not equal, look to the left?  */

         if (tp->flags & LBIT)  /* is there a left child?  */
            tp = tp->lchild;    /* yes, go look at it      */
         else {
            tp = linsert(tp);   /* no left child, make one */
            break;
         }

      else                /* not left, look right          */

         if (tp->flags & RBIT)  /* is there a right child? */
            tp = tp->rchild;    /* yes, look it over */
         else {
            tp = rinsert(tp);   /* no right child, make one */
            break;
          }
   }

   if (cmp == 0)                /* did we find the word?    */
      tp->usage++;              /*  if so bump usage count  */

   else {                       /* must have added new word */

      tp->word = word;          /* point it to the word     */
      tp->usage = 1;            /* new word in town         */
   }
}


/**********************************************************
                    R I N S E R T
 **********************************************************/

static TREE_NODE *
rinsert(tp)    /* insert new node as right child of tp */
register TREE_NODE *tp;
{
   register TREE_NODE *new;

   if (new = talloc()) {         /* alloc new entry */

      new->rchild = tp->rchild;     /* new gets what tp had */
      new->flags = tp->flags & RBIT;   /* as a right child  */

      new->lchild = tp;          /* lchild is thread to tp */

      tp->rchild = new;          /* new is rchild of tp   */
      tp->flags |= RBIT;         /* real node, not thread */
   }

   return(new);
}


/**********************************************************
                     L I N S E R T
 **********************************************************/

static TREE_NODE *
linsert(tp)    /* insert new node as left child of tp */
register TREE_NODE *tp;
{
   register TREE_NODE *new;

   if (new = talloc()) {         /* alloc new entry */

      new->lchild = tp->lchild;     /* new gets what tp had */
      new->flags = tp->flags & LBIT;     /* as a left child */

      new->rchild = tp;          /* rchild is thread to tp */

      tp->lchild = new;          /* new is lchild of tp   */
      tp->flags |= LBIT;         /* real node, not thread */
   }

   return(new);
}


/**********************************************************
                      T A L L O C
 **********************************************************/

static TREE_NODE *
talloc() {     /* allocate a TREE_NODE */

#define NODES_PER_ALLOC (50)

   static TREE_NODE *tp;
   static int tidx = NODES_PER_ALLOC;

   /* this routine allocates space for TREE_NODE nodes.  It
      exists to cut down on the number of calls to malloc(). */

   if (tidx < NODES_PER_ALLOC) /* free nodes from last alloc? */
      return(&tp[tidx++]);     /*   yes, return the next one */

   /* allocate another block of TREE_NODE nodes */

   tp = (TREE_NODE *) malloc(sizeof(TREE_NODE) * NODES_PER_ALLOC);

   if (tp == NULL) {
      printf("\nOut of memory in talloc()\n");
      exit();
   }

   tidx = 1;                   /* return # 1 next time */
   return(tp);                 /* return # 0 this time */
}


/**********************************************************
                 I N O R D E R _ S U C C
 **********************************************************/

TREE_NODE *
inorder_succ(tp)       /* return in-order successor of tp */
register TREE_NODE *tp;
{
   register TREE_NODE *next;

   /* if the right child is a thread, it's the successor node */

   next = tp->rchild;

   /* but if the right child is a real node, the successor is
      left-most child of the right child */

   if (tp->flags & RBIT)               /* real right child? */
      while (next->flags & LBIT)       /*   find left-most  */
         next = next->lchild;          /*   child of that   */

   return(next);                       /* return successor  */
}


/**********************************************************
                  I N O R D E R _ P R E D
 **********************************************************/

TREE_NODE *
inorder_pred(tp)       /* return in-order predecessor of tp */
register TREE_NODE *tp;
{
   register TREE_NODE *prev;

   /* if the left child is a thread, it's the predecessor */

   prev = tp->lchild;

   /* but if the left child is a real node, the predecessor is
      right-most child of the left child */

   if (tp->flags & LBIT)               /* real left child?  */
      while (prev->flags & RBIT)       /*   find right-most */
         prev = prev->rchild;          /*   child of that   */

   return(prev);                       /* return predecessor */
}








Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.