Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

C Programming


JUN90: C PROGRAMMING

Programmers tend to think of hypertext in ways that we use it in our applications. A frequent use in our experience is in on-line help systems where the help text is organized into a structured set of help windows. You get to a help window as a result of some context-sensitive control, perhaps with the cursor on a keyword or the program at a particular place in a menu or data entry screen. The help system then allows you to select higher and lower levels of detail of help from keywords strategically positioned in the current window. Such help systems use lightweight hypertext concepts, but they are the ones we are most familiar with.

The real potential of hypertext technology comes when we consider using it for structured retrievals from large static text databases. Many disciplines deal regularly with such data. Lawyers, engineers, programmers, doctors, proposal writers, and researchers of all kinds work constantly with large amounts of boilerplate text. Hypertext offers a way to build structured and disciplined retrieval capabilities into these databases.

The heart of most heavy-duty hypertext applications is the index. To find your way to an associated body of text from a selected word or phrase, you need an index that translates the word into a pointer, into the text database. Last February we built an index system for our TEXTSRCH project that used a hashing technique to derive the pointer to a list of files where a selected keyword exists. That index technique was useful to the purpose of TEXTSRCH, which was the selection of files that match a Boolean keyword query. This application is a form of hypertext, but it too is a lightweight use of it.

This edition of the "C Programming" column looks at a different indexing technique, a loose adaptation of the the B-tree, one that offers a more heavy-duty kind of support to the problems of text searching. Besides providing the random selection of the hashed keyword, the B-tree allows you to navigate an index in the sequence of its keys as well. This facilitates the kind of search where a selected word delivers a list of words that are close, for example.

The B-tree is a data structure resembling an inverted tree. The root is at the top of the tree and the leaves are at the bottom. Each node contains key values, and each key value points to the node at the next lower level in the tree where values reside that are greater than it and less than the next adjacent key. The lower nodes are child nodes, the higher ones are parent nodes. Mixed metaphors, these families and trees. The pointers in the nodes at the bottom level -- the leaves -- are not node pointers but instead contain data values that match the keys. B-trees are typically used by database management systems to manage the indexes for primary and secondary data element keys. To support such applications, the traditional B-tree algorithms must be capable of adding keys in a random sequence and deleting them. I published the C language source code for such B-tree algorithms in two books, C Development Tools for the IBM PC (Brady Books, 1986) and C Data Base Development (MIS Press, 1987).

The modified B-trees that we will use to index a static hypertext database do not need to support key deletion, and the index construction can be done in the sequential order of the keys. These two features greatly simplify the code needed for tree maintenance. Because the index construction techniques are different, the trees of this method are not always precisely balanced after the fashion of B-trees. Every B-tree node below the root always contains at least half the number of keys that it can hold. This is because when a node is filled to capacity, it splits into two nodes to accommodate the next key. When it splits, the key value from the middle goes into the node above it in the tree. When a key is deleted, the node is combined with one of its sibling nodes if the two together can now hold the keys of both. The modified B-tree structure for this problem does not work that way. We add keys in sequential order, so the nodes fill up from left to right. No splitting occurs, and all but the rightmost node at each level will be full. Because the index supports a static database, no key deletion ever happens.

Our index is different from the B-tree in one other way. A B-tree consists of nodes each of which can contain a fixed number of keys. By definition, the key length is fixed if the node length is fixed. In our trees, the node length is fixed and the key length is variable.

We call this data structure the "HyperTree." It consists of a header record and some number of nodes. The header record contains the node number of the root node. Each node contains a header block and an array of variable length keys. The node header block contains a flag indicating whether the node is a leaf or not, a pointer to the node that is a parent to the current one, and a pointer to the node at the next lower level that contains keys that are less than the keys in this node. Each key in the array contains the string value of the key and a pointer to the node at the next lower level in the tree that contains keys greater than the current key and less than the next adjacent key in this array.

When the node is a leaf, the key pointers and the node header's pointer to lower keys serve a different purpose. They contain some value that is relevant to the key. In some applications they will point to a data record. In others they will be the data record. Perhaps they point to a list of data records. The function of the key value is independent of the operation of the HyperTree. When you add a key, you add its associated key value. When you retrieve a key, you retrieve its value. The HyperTree, therefore, serves two retrieval objectives: It tells you if a key argument exists in the index, and it delivers the key value associated with key arguments.

Listing One, page 154, is <hyprtree.h>, the header file that describes the HyperTree structures. It defines several global values that you will change to suit your needs. The first one to consider is NODELEN. This is the byte length of a HyperTree node. If the node is too short, it will contain a small number of keys, and the tree will have many levels. This circumstance would affect retrieval performance because each search must navigate all the levels of the tree. If the node length is too long, the individual node searches, which are serial, would be affected.

The next value in <hyprtree.h> is MAXLEVELS, the maximum number of levels in a tree. This value is a function of the average number of keys in a node and the total number of keys in the index. You can conservatively set it to a safe high value such as 10. Ten levels of HyperTree would be a big index, indeed. The value itself is used as the dimension for an array of pointers. We need some value to use, because we do not know the upper limit until we reach it at run time.

MAXKEYLENGTH is used to limit the length of a key string. This value will depend on your application. It prevents a key from completely occupying a node. You should set it so that each node can contain at least three maximum-length keys. This will assure the correct behavior of the HyperTree algorithms.

You will need to consider the KEYVALUE typedef in <hyprtree.h>. As shown in the listing, KEYVALUE is a long integer. That usage will support the examples that follow, which use the KEYVALUE as a pointer into a file. KEYVALUE could be any valid C type including a structure. Remember that each leaf node will contain KEYVALUEs for the keys, so do not make them too long. If each key in your HyperTree is a vector to some complex data structure, put that structure into its own file and use the KEYVALUE as a pointer into that file.

Building a HyperTree

Listing Two, page 154, is <addkey.c>, the code needed to build a HyperTree. To prepare for this process, you need to have selected your keywords and phrases from the text database and sorted them into the collating sequence of the tree. The search algorithm described later assumes a case-insensitive collating sequence. You must provide the data value to associate with each keyword. That data type is described in <hyprtree.h> as the typedef KEYVALUE. The value will be the value returned by the searching algorithms. It could be a pointer to the text position in a text file, or it could be a pointer to a data structure that describes files, chapters, and paragraphs. Depending on the architecture of your Hypertext database, the KEYVALUE value could have the database document, chapter, and paragraph identifications encoded into its bit string. The important thing here is that you have prepared to build the HyperTree by extracting keywords, combining each of them with a data value to be returned, and sorting them. We'll have an example of that process later in this column.

In some applications, it might be appropriate to further process the keys following the sort. The duplicate values would be deleted, and the data value associated with each key value would become a pointer into a more complex data record or structure that describes the location and significance of the key value.

Building the HyperTree consists of reading the keys in sequence and filling nodes. When a node is full, the next value is added, in sequence, to the next higher node with the node number of the current node as its associated data value. Subsequent keys are added to a new node at the same level as the one that filled up. This upward growth is a recursive operation.

You call the addkey function to add nodes to a HyperTree. Its parameters are a pointer to the key's string and the KEYVALUE associated with the key. The first time you call this function, it creates the index file. When you are done adding keys, you call it with a NULL key string pointer to tell it to finish up. The addkey function calls the recursive addkeyentry function. This function is the tree-level manager for adding keys. The calls to it from addkey specify that keys are to be added to level zero. The addkeyentry function calls itself by specifying progressively higher levels as nodes fill and the tree grows.

Each node contains the node number of its parent. This data element supports the search of a HyperTree and so must be built when the tree is built. Because parents are growing when the children are fully grown -- these metaphors are backwards, it seems -- the program does not know the parent node number until the parent is filled. The adopt function takes care of that. When the writenode function writes a node, it scans the node and extracts the node numbers of each of its children. Then it calls the adopt function to write the parent's node number into each of the node records of the children.

Searching a HyperTree

Listing Three, page 156, is <srchtree.c>, which contains the functions that search a HyperTree. Several of the functions modify the position of the search pointers to a key. Others return the key's string or associated value.

The first search function is findkey. You pass it a pointer to a string that contains the key value you want to find. If it finds a match, it returns a true value. Otherwise it returns a false value. In either case, it positions the search pointers to the key that terminated the search. You may now call current_value to retrieve the KEYVALUE of the key that terminated the search, or current_keystring to retrieve the key's string value. This latter function is useful when findkey returns a false value to 4 say that it found no matching entry in the HyperTree. The terminating key is the next highest one in the collating sequence of the HyperTree.

There are other search functions that are used to navigate the HyperTree in its collated sequence. These are firstkey, lastkey, nextkey, and prevkey. They modify the search pointer's position, and subsequent calls to current_ keyvalue and current_keystring will reflect the new position. Each function returns a true value if it can satisfy the request. If firstkey or lastkey returns a false value, the index is empty. If nextkey returns a false value, the search pointers are already positioned at the end of the HyperTree's collating sequence. If prevkey returns a false value, the search pointers are already positioned at the beginning of the HyperTree's collating sequence.

The findkey search of a HyperTree begins at the root node. The search functions find the root node number by reading the HyperTree header record. To search the tree, the function searches the root node for a match on the argument. Keys are recorded in sequence in a node, so the search proceeds from left to right. If the key is not there, the function picks up the node pointer to the left of the key that terminated the search, reads that node, and starts over. This continues until the search either finds a match or terminates in a leaf node.

When you request the KEYVALUE by calling the current_keyvalue function, the search algorithm must navigate from the key's position in the HyperTree's system of levels to the leaf level. KEYVALUEs exist only at the leaf level. Higher levels contain pointers to lower levels. If you land on a key in a non-leaf node, you find the KEYVALUE by navigating down to the leaf this way. Begin by reading the node pointed to by the pointer just to the right of the matching key. Then read the node pointed to by the lower-key node pointer in the header block of the current node. Continue doing that until you reach a leaf. The lower-key pointer field contains (in a union) the KEYVALUE of the matching key. It doesn't make sense when I explain it, but it really does work that way.

A HyperTree Example

To put all this to use, I devised a simple hypertext-like application that builds a HyperTree index from a text file and lets you search it. We begin with some text file to test with. It should be made of unadorned ASCII with CR/LF pairs terminating each line. Use your editor to mark some key index values in the text by surrounding the keys with angle brackets (less-than, greater-than pairs). Make lots of keys. Now we can build a HyperTree.

Listing Four, page 158, and Listing Five, page 158, are <keyextr.c> and <keybuild.c>, two simple filter programs to build the HyperTree. You must link the object file from <keybuild.c> with that from <addkey.c>. Compile <keyextr.c> by itself. Run the two programs along with the SORT filter (assuming MS-DOS) this way:

  keyextr<textfile.dat | sort | keybuild

where <textfile.dat> is the name of the file of ASCII text. The <keyextr> program extracts the key values you marked with angle brackets into a quoted string, which is followed by a comma and a numeric value that represents the key's position in the text file. The extracted file would look like this:

  "fortran", 125 "the merry widow", 257 "godfrey daniel", 3244

The SORT filter sorts the extracted values and pipes them to the <keybuild> program, which builds the HyperTree. That's all there is to that. You can look at the index with a dump utility. Its name is <hyprtree.ndx>.

To use the new HyperTree index, you will build the example program in Listing Six, page 159, <hyprsrch.c>. Its command line specifies the text file name and a key string to search, such as this one:

  hyprsrch textfile.dat "windows on the world"

The program searches the HyperTree by using the findkey function. It then displays a screen of text starting at the text position represented by the KEYVALUE of the key that terminated the search. The top of the screen shows the actual key value that was found along with the position. The bottom of the screen is a menu that lets you press F, L, N, P, or Q to move to the first, last, next, or previous key, or to quit and terminate the program.

HyperTree Extended

The small example we used here is merely a taste of how you can use the HyperTree data structure. The technique could be used as the retrieval engine for a spell checker or thesaurus program. It could be used as an inverted index to other documents from within a document. It could even be used in the concordance-like retrieval system that we built into TEXTSRCH.

There are some areas where HyperTree could be enhanced. Here are a few that come to mind.

In the example application, we do not bind the index to the file it indexes, requiring instead that you name the file on the command line. Your application would no doubt manage this association for the user. One way would be to include the text file's name as a part of the TREEHDR structure.

Because the HyperTree node contains keys in a collated sequence, you could add the data compression technique called "run length encoding" to the node structure. Many adjacent keys will begin with identical character sequences as in this list:

  program
  programmer
  programming
  progress
  prototype

A simple escape sequence could specify how many characters the key inherits from the one that preceeds it. The list might then look this way:

  program
  \ 7mer
  \ 8ing
  \ 5ess
  \ 3totype

The random-then-sequential properties of HyperTree make it a natural for wild card retrievals. If you use the ? character in the first position of a key ("?obbs," for example), you would do searches on all combinations of the key with letters and numbers in the first position. If the question mark was further into the key (for example, "Doc?or Dobbs"), you would use the findkey function to position yourself near the likely match and the nextkey function to retrieve all the possibilities, discarding those that do not fit.

Hypertext Editing

There are two programmer's editor tools -- BTAGS and 4C -- that offer Hypertext-like features to the PC programmer who is working with a system that involves a lot of source code files. We'll look at each of them in turn.

BTAGS is a program that you use along with the Brief programmer's editor. Brief is one of the most popular programmer's editors for the PC and it includes a comprehensive C-like macro language. BTAGS is a source file preprocessor and a set of Brief macros that implement a feature similar to the one used in the Unix vi editor. Here is how it works.

The BTAGS program builds a "tags" file by reading all your source files and finding where all the functions are defined. The tags file conforms to the Unix ctags format, which identifies each function identifier, the file where it is declared, and a string taken from the declaration. The string is built so that if the editor searches the file for the string, it will find the function declaration.

The BTAGS brief macros add command keys to the Brief editor. With the cursor on a function call, you press Ctrl-T, and the macro opens a new Brief window, reads the file where the function is declared, and positions the cursor on the function declaration. If you are not near a call to the function you want, Ctrl-E lets you type in a function name. Ctrl-B returns you to where you were before the last Ctrl-T or Ctrl-E.

BTAGS works only with function declarations and typedefs. It would be better if it also recorded external variables and structure members. Another minor annoyance is that the Ctrl-T macro works only if the cursor is on the first character of a function name. You get the Brief macro language source code for the macros, so you could hack a fix to that one if you wanted to.

4C is a programmer's editor that is similar to BTAGS in that its source code analyzer program produces a ctags-compatible tags file. 4C produces ctags records for all the C language constructs, not just function declarations. 4C includes its own editor, which uses the tags file to allow you to jump around through your source files by putting the cursor anywhere on a variable name and pressing a function key. If you do prefer to use your own editor, you can tell 4C to call it instead of its own.

The 4C editor and its invocation of a user-specified editor employ a display strategy that I find unnatural. If you call up the main function, that's all you get. You cannot page beyond or ahead of it. If you call another function or variable, that's all you see. I find it frustrating to have to call a function by name when I already know that it appears within scrolling reach of where I am at the time.

At SD90 last February, the 4C folks were showing a new version of 4C that they plan on releasing later this year. The enhanced version provides interfaces to a variety of editors, including Brief, Multi-edit, Vedit, ME, and Slick, while the source-code analyzer supports C++, object-oriented Pascals, Modula-2, and ASM. The database query allows lookups by filename and directory and, for object-oriented languages, lookups by class and class hierarchy.

The upshot is that I like the BTAGS integration with Brief because Brief is the editor I use. I like 4C's inclusion of all the C constructs in their tags file. Both products use source analyzers that produce ctags-compatible tags files, so I tried the obvious. I used 4C to build a tags file and BTAGS brief macros to process them. Of course it did not work, but on the surface it looks like some hacking would bring it into line. Perhaps later.

Product Information

BTAGS SD Enterprises P.O. Box 621 Carpinteria, CA 93013 805-566-1317 Price: $49.95

4C Tri-Technology Systems Inc. 1225 S. Elgin Forest Park, IL 60130 312-366-7595 Price: $119

_C PROGRAMMING COLUMN_ by Al Stevens

[LISTING ONE]

<a name="0151_000b">

/* ---------- hyprtree.h --------- */

#define NODELEN 256         /* length of a node   */
#define MAXLEVELS 10        /* tree levels        */
#define MAXKEYLENGTH 25     /* maximum key length */

#define TRUE 1
#define FALSE !TRUE

#define KSPACE (NODELEN-sizeof(NODEHDR))
#define HYPERTREE "hyprtree.ndx"

/* ----- computes length of a key in a node -------- */
#define keylength(nd,kp) (strlen(kp) + 1 + \
    ((nd).nodehdr.isleaf?sizeof(KEYVALUE):sizeof(NODEPOINTER)))

/* ---------- node pointers and key values ---------- */
typedef long KEYVALUE;
typedef int NODEPOINTER;

typedef union {
    KEYVALUE keyvalue;
    NODEPOINTER nodepointer;
} KEYPOINTER;

/* ---------- header record for a hypertree ---------- */
typedef struct {
    NODEPOINTER rootnode;
} TREEHDR;

/* ---- header structure for a hypertree node -------- */
typedef struct  {
    int isleaf;         /* true if the node is a leaf */
    NODEPOINTER parent; /* node number of parent */
    KEYPOINTER lower;   /* keys < this node (or value if leaf)*/
} NODEHDR;

/* --------- node record for a hypertree ---------- */
typedef struct {
    NODEHDR nodehdr;
    char keys[KSPACE];
} TREENODE;

/* ------------- prototypes --------------- */
void addkey(char *key, KEYVALUE keyvalue);
int findkey(char *key);
int firstkey(void);
int lastkey(void);
int nextkey(void);
int prevkey(void);
KEYVALUE current_keyvalue(void);
char *current_keystring(void);



<a name="0151_000c"><a name="0151_000c">
<a name="0151_000d">
[LISTING TWO]
<a name="0151_000d">

/* ---------------- addkey.c -------------- */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "hyprtree.h"

static void addkeyentry(int level, char *key,
                KEYPOINTER keypointer, NODEPOINTER lowernode);
static void nullnode(int level);
static int freespace(TREENODE node);
static void writenode(int level);
static void adopt(NODEPOINTER child, NODEPOINTER parent);

static TREENODE *nodes[MAXLEVELS];
static NODEPOINTER nodenbr[MAXLEVELS];
static NODEPOINTER nextnode;

static FILE *htree;
static TREEHDR th;

/*
 *  Add a key string value and its associated KEYVALUE to
 * the hypertree.
 */
void addkey(char *key, KEYVALUE keyvalue)
{
    KEYPOINTER keypointer;
    if (key == NULL)    {
        /* ------ terminal key, complete the tree ------ */
        if (htree != NULL)  {
            int level = 0;
            /* -------- write the unwritten nodes -------- */
            while (nodes[level] != NULL)    {
                writenode(level);
                free(nodes[level]);
                level++;
            }
            /* ------ update the header block ---------- */
            th.rootnode = nodenbr[level-1];
            fseek(htree, 0L, SEEK_SET);
            fwrite(&th, sizeof(TREEHDR), 1, htree);

            fclose(htree);
        }
    }
    else    {
        keypointer.keyvalue = keyvalue;
        if (strlen(key) <= MAXKEYLENGTH)
            addkeyentry(0, key, keypointer, 0);
    }
}

static void addkeyentry(int level, char *key,
                KEYPOINTER keypointer, NODEPOINTER lowernode)
{
    char *kp;

    if (nodes[level] == NULL)   {
        /* -------- build a new node ---------- */
        if (level == 0) {
            /* ------ building a new tree ------ */
            htree = fopen(HYPERTREE, "wb+");
            /* ----- write a NULL header for now ----- */
            fwrite(&th, sizeof(TREEHDR), 1, htree);
        }
        if ((nodes[level] = malloc(sizeof(TREENODE))) == NULL)  {
            fputs("\nOut of memory!", stderr);
            exit(1);
        }
        nodes[level+1] = NULL;
        nullnode(level);
        /* ---- point to the node of the lower keys --- */
        nodes[level]->nodehdr.lower.nodepointer = lowernode;
        /* --- assign the next node number to this node --- */
        nodenbr[level] = ++nextnode;
    }
    if (keylength(*nodes[level],key) >
            freespace(*nodes[level]))   {
        /* -------- this node is full -------- */
        KEYPOINTER keyp;
        NODEPOINTER lowernode;
        /* ---- write the node to the index file ---- */
        writenode(level);
        /* ---- remember the node nbr at this level ---- */
        lowernode = nodenbr[level];
        /* --- assign a new node number to this level --- */
        nodenbr[level] = ++nextnode;
        /* --- grow or add to parent with the current key --- */
        memset(&keyp, 0, sizeof(KEYPOINTER));
        keyp.nodepointer = nodenbr[level];
        addkeyentry(level+1, key, keyp, lowernode);
        /* - now set the node at this level to NULL values - */
        nullnode(level);
        nodes[level]->nodehdr.lower = keypointer;
    }
    else    {
        /* ------ insert the key into the node ------ */
        kp = nodes[level]->keys;
        /* ----- scan to the end of the node ----- */
        while (*kp)
            kp += keylength(*nodes[level], kp);
        strcpy(kp, key);
        /* ----- attach the key pointer to the key ----- */
        kp += strlen(kp)+1;
        if (level == 0)
            *((KEYVALUE *)kp) = keypointer.keyvalue;
        else
            *((NODEPOINTER *)kp) = keypointer.nodepointer;
    }
}

/* --------- build a null node ------------ */
static void nullnode(int level)
{
    memset(nodes[level], 0, NODELEN);
    nodes[level]->nodehdr.isleaf = level == 0;
}

/* ------ compute space remaining in a node ------- */
static int freespace(TREENODE node)
{
    int sp = KSPACE;
    char *kp = node.keys;

    while (*kp) {
        sp -= keylength(node,kp);
        kp += keylength(node,kp);
    }
    return sp;
}

/* ------ write a completed node ------- */
static void writenode(int level)
{
    long where = (nodenbr[level]-1)*NODELEN+sizeof(TREEHDR);

    fseek(htree, where, SEEK_SET);
    fwrite(nodes[level], NODELEN, 1, htree);
    if (level)  {
        /* - this is a parent node, update its children - */
        char *kp = nodes[level]->keys;
        adopt(nodes[level]->nodehdr.lower.nodepointer,
                    nodenbr[level]);
        while (*kp) {
            adopt(*((NODEPOINTER *)(kp+strlen(kp)+1)),
                    nodenbr[level]);
            kp += keylength(*nodes[level], kp);
        }
    }
}

/* ---- write a parent node number into a child node ---- */
static void adopt(NODEPOINTER child, NODEPOINTER parent)
{
    NODEHDR nodehdr;
    long here = ftell(htree);
    long where = (child-1)*NODELEN+sizeof(TREEHDR);

    fseek(htree, where, SEEK_SET);
    fread(&nodehdr, sizeof(NODEHDR), 1, htree);
    nodehdr.parent = parent;
    fseek(htree, where, SEEK_SET);
    fwrite(&nodehdr, sizeof(NODEHDR), 1, htree);
    fseek(htree, here, SEEK_SET);
}




<a name="0151_000e"><a name="0151_000e">
<a name="0151_000f">
[LISTING THREE]
<a name="0151_000f">

/* ----------- srchtree.c ------------ */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "hyprtree.h"

/* ---------- search packet ----------- */
typedef struct  {
    NODEPOINTER node;
    char *ky;
} SEARCH;

FILE *htree;
static TREEHDR th;
static TREENODE node;
static SEARCH current;

static void readnode(NODEPOINTER nodepointer);
static void opentree(void);

/*
 * Find a specified key in the tree.
 * Return TRUE if found.
 * Return FALSE if not found.
 */
int findkey(char *key)
{
    opentree();
    if (th.rootnode)    {
        SEARCH save = current;
        readnode(th.rootnode);
        while (TRUE)    {
            int cmp;
            current.ky = node.keys;
            while (*current.ky) {
                cmp = stricmp(key, current.ky);
                if (cmp < 0)    {
                    save = current;
                    break;
                }
                if (cmp == 0)
                    return TRUE;
                current.ky += keylength(node, current.ky);
            }
            if (!node.nodehdr.isleaf)   {
                if (current.ky == node.keys)
                    readnode(node.nodehdr.lower.nodepointer);
                else
                    readnode(*((NODEPOINTER *)
                        (current.ky-sizeof(NODEPOINTER))));
            }
            else    {
                current = save;
                readnode(current.node);
                break;
            }
        }
    }
    return FALSE;
}

/*
 * Find the first sequential key in the tree.
 * Return TRUE if found.
 * Return FALSE if not found (the tree is empty).
 */
int firstkey(void)
{
    opentree();
    if (th.rootnode)    {
        readnode(th.rootnode);
        while (!node.nodehdr.isleaf)
            readnode(node.nodehdr.lower.nodepointer);
        current.ky = node.keys;
        return TRUE;
    }
    return FALSE;
}

/*
 * Find the last sequential key in the tree.
 * Return TRUE if found.
 * Return FALSE if not found (the tree is empty).
 */
int lastkey(void)
{
    NODEPOINTER np;
    opentree();
    if (th.rootnode)    {
        int len = 0;
        np = th.rootnode;
        current.node = th.rootnode;
        current.ky = node.keys;
        do  {
            SEARCH save = current;
            readnode(np);
            current.ky = node.keys;
            while (*current.ky) {
                len = keylength(node, current.ky);
                current.ky += len;
            }
            if (current.ky == node.keys)    {
                readnode(save.node);
                current.ky = save.ky;
                break;
            }
            else
                np = *((NODEPOINTER *)
                    (current.ky-sizeof(NODEPOINTER)));
        } while (!node.nodehdr.isleaf);
        current.ky -= len;
        return TRUE;
    }
    return FALSE;
}

/*
 * Find the next sequential key in the tree.
 * Return TRUE if found.
 * Return FALSE if not found (the tree is empty or at the end).
 */
int nextkey(void)
{
    opentree();
    if (th.rootnode)    {
        if (current.ky == NULL)
            return firstkey();
        current.ky += keylength(node, current.ky);
        if (!node.nodehdr.isleaf)   {
            readnode(*((NODEPOINTER *)
                    (current.ky-sizeof(NODEPOINTER))));
            current.ky = node.keys;
            while (!node.nodehdr.isleaf)
                readnode(node.nodehdr.lower.nodepointer);
        }
        /* ----- while at the end of a node ------ */
        while (*current.ky == '\0') {
            NODEPOINTER child = current.node;
            if (node.nodehdr.parent == 0)
                break;
            readnode(node.nodehdr.parent);
            current.ky = node.keys;
            if (child == node.nodehdr.lower.nodepointer)
                break;
            while (*current.ky) {
                NODEPOINTER this = *((NODEPOINTER *)
                    (current.ky+strlen(current.ky)+1));
                current.ky += keylength(node, current.ky);
                if (this == child)
                    break;
            }
        }
        return *current.ky != '\0';
    }
    return FALSE;
}

/*
 * Find the previous sequential key in the tree.
 * Return TRUE if found.
 * Return FALSE if not found
 * (the tree is empty or at the beginning).
 */
int prevkey(void)
{
    char *kp;
    opentree();
    if (th.rootnode)    {
        if (current.ky == NULL)
            return lastkey();
        if (!node.nodehdr.isleaf)   {
            /* ----- navigate to end of the lower leaf ----- */
            NODEPOINTER np;
            if (current.ky == node.keys)
                np = node.nodehdr.lower.nodepointer;
            else
                np = *((NODEPOINTER *)
                    (current.ky-sizeof(NODEPOINTER)));
            while (!node.nodehdr.isleaf)    {
                readnode(np);
                current.ky = node.keys;
                while (*current.ky)
                    current.ky += keylength(node, current.ky);
                np = *((NODEPOINTER *)
                    (current.ky-sizeof(NODEPOINTER)));
            }
        }
        /* ----- while at the beginning of a node ------ */
        while (current.ky == node.keys) {
            NODEPOINTER child = current.node;
            if (node.nodehdr.parent == 0)
                break;
            readnode(node.nodehdr.parent);
            current.ky = node.keys;
            if (child == node.nodehdr.lower.nodepointer)
                continue;

            while (*current.ky) {
                NODEPOINTER this = *((NODEPOINTER *)
                    (current.ky+strlen(current.ky)+1));
                current.ky += keylength(node, current.ky);
                if (this == child)
                    break;
            }
        }
        /* -------- go to previous key in node -------- */
        if (current.ky != node.keys)    {
            kp = node.keys;
            while (kp+keylength(node, kp) < current.ky)
                kp += keylength(node, kp);
            current.ky = kp;
            return TRUE;
        }
    }
    return FALSE;
}

/*
 * Return the key value associated with the most recently
 * retrieved key.
 */
KEYVALUE current_keyvalue(void)
{
    KEYVALUE rtn;
    if (!node.nodehdr.isleaf)   {
        SEARCH save = current;
        current.ky += keylength(node, current.ky);
        while (!node.nodehdr.isleaf)    {
            NODEPOINTER np;
            if (current.ky == node.keys)
                np = node.nodehdr.lower.nodepointer;
            else
                np = *((NODEPOINTER *)
                    (current.ky-sizeof(NODEPOINTER)));
            readnode(np);
            current.ky = node.keys;
        }
        rtn = node.nodehdr.lower.keyvalue;
        current = save;
        readnode(save.node);
        return rtn;
    }
    return *((KEYVALUE *)(current.ky+strlen(current.ky)+1));
}

/*
 * Return the key string value of the most recently
 * retrieved key.
 */
char *current_keystring(void)
{
    return current.ky;
}

/* -------- open the tree ----------- */
static void opentree(void)
{
    if (htree == NULL)  {
        if ((htree = fopen(HYPERTREE, "rb")) == NULL)   {
            fputs("\n" HYPERTREE " not found", stderr);
            exit(1);
        }
        fread(&th, sizeof(TREEHDR), 1, htree);
    }
}

/* ---------- read a specified node ------------- */
static void readnode(NODEPOINTER nodepointer)
{
    long where = (nodepointer-1)*NODELEN+sizeof(TREEHDR);

    fseek(htree, where, SEEK_SET);
    fread(&node, NODELEN, 1, htree);
    current.node = nodepointer;
}




<a name="0151_0010"><a name="0151_0010">
<a name="0151_0011">
[LISTING FOUR]
<a name="0151_0011">

/* -------- keyextr.c ---------- */

/*
 * Build a test HyperTree from standard input.
 * <> delimiters define the key values.
 * This is pass one, which builds the sortable table.
 */

#include <stdio.h>
#include <string.h>
#include "hyprtree.h"

#define KEYTOKEN '<'
#define TERMINAL '>'
#define QUOTE    '"'
#define ESCAPE   '\\'

void main(void)
{
    int c;
    while ((c = getchar()) != EOF)  {
        if (c == KEYTOKEN)  {
            long fileposition = ftell(stdin)-1;
            int counter = 0;
            putchar(QUOTE);
            while ((c = getchar()) != TERMINAL) {
                if (c == EOF || counter++ == MAXKEYLENGTH)
                    break;
                if (c == QUOTE)
                    putchar(ESCAPE);
                putchar(c);
            }
            putchar(QUOTE);
            putchar(',');
            printf("%ld\n", fileposition);
        }
    }
}





<a name="0151_0012"><a name="0151_0012">
<a name="0151_0013">
[LISTING FIVE]
<a name="0151_0013">

/* ----------- keybuild.c ----------- */

/*
 * Build a test HyperTree from standard input.
 * This is pass three, which builds the HyperTree
 * from the sorted table.
 */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "hyprtree.h"

#define QUOTE    '"'
#define ESCAPE   '\\'

void main(void)
{
    int c;
    char key[MAXKEYLENGTH+1];
    char kv[20];
    KEYVALUE keyvalue;

    while ((c = getchar()) != EOF)  {
        if (c == QUOTE) {
            int counter = 0;
            char *cp = key;
            while ((c = getchar()) != QUOTE)    {
                if (c == EOF || counter++ == MAXKEYLENGTH)
                    break;
                if (c == ESCAPE)
                    c = getchar();
                *cp++ = c;
            }
            *cp = '\0';
            if (getchar() == ',')   {
                char *kp = kv;
                while ((c = getchar()) != '\n' && c != EOF)
                    *kp++ = c;
                *kp = '\0';
                keyvalue = atol(kv);
                addkey(key, keyvalue);
            }
        }
    }
    addkey(NULL, keyvalue);
}




<a name="0151_0014"><a name="0151_0014">
<a name="0151_0015">
[LISTING SIX]
<a name="0151_0015">

/* ------------- hyprsrch.c -------------- */

/*
 * Search the test HyperTree for a match on the
 * command-line parameter.
 * Display a screen of text starting at the
 * position indicated for the matching (or closest)
 * entry in the HyperTree.
 */

#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include <conio.h>
#include "hyprtree.h"

#define SCRLINES 25

void main(int argc, char *argv[])
{
    if (argc < 3)
        fprintf(stderr, "Usage: hyprsrch textfile keyvalue");
    else    {
        int kb = 0;
        FILE *fp = fopen(argv[1], "r");
        if (fp == NULL) {
            fprintf(stderr, "No such file as %s", argv[1]);
            return;
        }
        findkey(argv[2]);
        while (toupper(kb) != 'Q')  {
            int lc = 4, c;
            KEYVALUE kv = current_keyvalue();
            printf("\n%s (%ld)", current_keystring(), kv);
            printf("\n------------------------------------\n");
            if (kv) {
                fseek(fp, kv, SEEK_SET);
                while ((c = getc(fp)) != EOF && lc < SCRLINES){
                    if (c == '\n')
                        lc++;
                    putchar(c);
                }
            }
            printf("\nF = first, "
                     "L = last, "
                     "N = next, "
                     "P = previous, "
                     "Q = quit...");
            kb = getch();
            switch (toupper(kb))    {
                case 'F':
                    firstkey();
                    break;
                case 'L':
                    lastkey();
                    break;
                case 'N':
                    nextkey();
                    break;
                case 'P':
                    prevkey();
                    break;
                default:
                    break;
            }
        }
    }
}










Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.