The Power of LISP for C Programmers

The trick is removing the links from the records


June 03, 2007
URL:http://www.drdobbs.com/cpp/the-power-of-lisp-for-c-programmers/199900573

What's all this list processing stuff about anyway? Even if you're a programmer with no LISP experience, you probably know that LISP stands for list processing. You probably use linked-list structures in your code from time to time and wonder what LISP can do that you can't. Like other languages, LISP has unique properties that make it a powerful language for solving certain types of problems. Unique language properties are often quite handy and an enterprising programmer will incorporate those properties into another language.

The two best-known LISP properties are symbolic processing and list processing. Both concepts can be developed in other languages with a set of generic functions or procedures that operate on appropriate data structures and by developing a proper mental attitude concerning the abstraction.

In this article, I address generic list-processing procedures in C. The same concepts can be adapted for other high-level languages and assembly language.

Most C programmers have used linked-list structures such as the one in Listing One.

typedef struct record
{
  struct record *link; /* next record */
  char    data[ 100];  /* text date   */
} RECORD;
Listing One: Linked-list structure.

In this case, the first item in the structure is a link pointer to the next structure, and the rest of the structure is a place to store data. This, of course, is a simple example. Real-world programming requirements usually result in more complex data struc tures for keeping track of things like airline reservations, satellite ephemeries, journal entries, and so on. Lists are usually built fr om these structures (Figure 1).

Figure 1: Common approach to linked lists.

List maintenance procedures can be developed to add, delete, and sort list items, among other things. Essentially, by doing so we are processing a list. The difference is that these list procedures usually have a special purpose -- for example, we might write a procedure to add a new record to the head of the list with code as shown in Listing Two.

RECORD *addrecord(rec.list)
RECORD *rec;     /* new record */
RECORD *list;    /* pointer to head of list */
{
   rec->link=list;
   return rec;
} 
USAGE: 
head = addrecord(arecord.head);
Listing Two: Use of linked-list structure.

Unfortunately, such a procedure can only be used to maintain lists composed of RECORD data structures. However, if we could develop a method of adding a record to a list regardless of the record's structure, we could use this method for every list in our program. LISP performs just this method; the notion of a list is embedded in the language, as are generic procedures for list processing. Let's do the same for C.

The trick is removing the links from the records. We need to define a data structure to construct lists separate from the one that defines list items (Figure 2).

Figure 2: Improved approach to linked lists.

A CONS is a data structure that can construct lists. It consists of two cells, called the "car" cell and the "cdr" (pronounced "could-er") cell. A CONS is a simple structure, but don't let its simplicity fool you. A CONS is to symbolic processing what a binary digit is to numerical processing. By combining CONS we can form lists, binary trees, hierarchies, and an infinite number of other useful data structures.

List Processing

Let's take a top-down approach. We'll start with a real-world programming problem and define the type and requirements of the list procedures we'll need. We'll use the corresponding LISP functions so those of you who are LISPers will see the C implementations.

Listing Three contains the entire source code for a sort program. The program reads lines of text from the standard input, sorts them, and writes them to the standard output. It can be used on UNIX or MS-DOS systems as a filter program; in conjunction with other programs, pipes, and redirection, the program can perform more complex processing. Three special-purpose procedures exist in addition to main: getaline, output-list, and string-compare. The other procedures perform general-purpose list processing and can be used for many programming problems. Once a library of such procedures has been built, a new application can be developed with minimal effort.

/*********************************************************/
/*    File:         sort.c                               */
/*                                                       */
/*    Description:  This file contains source code for   */
/*                  a program written in C which sorts   */
/*                  text files.  It is a demonstration   */
/*                  of LISP-like list procedures.        */
/*                                                       */
/*    Version Date     History                           */
/*    1.00    14SEP88  Original                          */
/*********************************************************/

#include 
#include 
#include 
#include 

/*********************************************************/
/*    typedefs                                           */
/*********************************************************/

typedef void   *POINTER;      /* General purpose pointer */

typedef struct                /* A CONS is two pointers  */
   {
   POINTER  car;
   POINTER  cdr;
   } CONS;
typedef CONS    *LIST;        /* A LIST is a pointer to  */
                              /* a CONS                  */


/*********************************************************/
/*    return codes at the program level                  */
/*********************************************************/

#define  RET_NORMAL  0  /* Normal return code            */
#define  RET_IOERR   2  /* I/O Error                     */
#define  RET_MEMERR  3  /* Memory allocation error       */

/*********************************************************/
/*    Manifest Constants                                 */
/*********************************************************/

#define  LINESIZE 500   /* size of local buffer          */
                        /* lines larger then this will   */
                        /* read as multiple lines        */


/*********************************************************/
/*    Prototypes                                         */
/*********************************************************/

LIST     reverse(LIST  lst);
LIST     merge(LIST lst1,LIST lst2,int (*compare)());
LIST     sort(LIST lst,int (*compare)());
CONS     *cons(POINTER a,POINTER b);
LIST     cpush(CONS *z,LIST *lst);
LIST     push(POINTER item,LIST *lst);
CONS     *cpop(POINTER *lst);
POINTER  pop(POINTER  *lst);
LIST     outputlist(LIST lst,FILE *stream);
int      string_compare(char *a,char *b);
char     *getaline(FILE *stream);


/*********************************************************/
/*    main     sort program                              */
/*                                                       */
/*    Reads lines of text from stdin, sorts them, and    */
/*    outputs the result to stdout.                      */
/*********************************************************/
int main(void)
{

   LIST mylist=NULL;       /* List of text records as    */
                           /* read from stdin            */

   char  *lp;              /* pointer to duplicate string*/

   while  ( (lp = getaline(stdin)) != NULL )
      push(lp,&mylist);
   outputlist(sort(mylist,string_compare),stdout);

   return RET_NORMAL;
}


/*********************************************************/
/*    getaline         get a line of text                */
/*                                                       */
/*    Returns a pointer to a line of text read from      */
/*    stream.  Returns NULL if end of file.  Exits       */
/*    on error.                                          */
/*********************************************************/
char *getaline(FILE *stream)
{
   char  line[LINESIZE];   /* input buffer               */
   char  *lp;              /* points to strdup buffer    */

   if (fgets(line,LINESIZE,stream) != NULL)
      {
      if ( (lp = strdup(line)) != NULL)
         return lp;        /* normal return              */
      else
         exit(RET_MEMERR); /* strdup could not malloc    */
      }
   else
      {
      if (feof(stream))
         return NULL;      /* end of file                */
      else
         exit(RET_IOERR);  /* I/O error in fgets         */
      }
}

/*********************************************************/
/*    string_compare    compares strings                 */
/*                                                       */
/*    Returns TRUE value if string a is ordered after    */
/*    string b. Returns  FASE if string a is equal to    */
/*    or ordered before string b.                        */
/*********************************************************/
int string_compare(char *a,char *b)
{
   return strcmp(a,b) > 0;
}

/*********************************************************/
/*    push          push an item onto a list             */
/*                                                       */
/*    Uses cons to create a CONS and sets its car cell   */
/*    to item.  Its cdr is set to *lst.  *lst is set to  */
/*    point to the CONS created.  A pointer to the CONS  */
/*    is also returned.                                  */
/*********************************************************/
LIST push(POINTER item,LIST *lst)
{
   return *lst = cons(item,*lst);
}

/*********************************************************/
/*    cons          create a CONS                        */
/*                                                       */
/*    Uses malloc to allocate a CONS and set its         */
/*    car cell to a and its cdr cell to b.  Exits if     */
/*    malloc fails!                                      */
/*********************************************************/
CONS *cons(POINTER a,POINTER b)
{
   CONS  *z;
   z = (CONS *) malloc(sizeof(CONS));
   if (z == NULL) exit(RET_MEMERR);
   z->car = a;
   z->cdr = b;
   return z;
}  


/*********************************************************/
/*    pop         pop an item off a list                 */
/*                                                       */
/*    Uses cpop to pop a CONS off lst.  Return NULL      */
/*    if lst is empty.  free the CONS and return the     */
/*    car cell of the CONS.                              */
/*********************************************************/
POINTER  pop(LIST  *lst)
{
   POINTER  item;
   CONS     *z;

   z = cpop(lst);
   if(z == NULL) return NULL; /* list is empty           */

   item = z->car;             /* get what cons pointed to*/

   free(z);                   /* release the cons        */

   return item;
}

/*********************************************************/
/*    cpop         pop a CONS off a list                 */
/*                                                       */
/*    if *lst is empty return NULL, else set *lst to the */
/*    the car cell of the first CONS on *lst and return  */
/*    a pointer to the first CONS on *lst.               */
/*********************************************************/
CONS    *cpop(LIST *lst)
{
   LIST  rvalue;
   rvalue = *lst;
   if (rvalue != NULL)
      *lst = rvalue->cdr;
   return rvalue;
}

/*********************************************************/
/*    outputlist   output each item in a list            */
/*                                                       */
/*    start with the first item in the list, output it   */
/*    to stream assuming it is a string, repeat for each */
/*    item in the list.  Return lst.                     */
/*********************************************************/
LIST outputlist(LIST lst,FILE *stream)
{
   CONS *acons;
   LIST  l=lst;

   while ( acons=cpop(l) != NULL )
      fputs( (char *) acons->car,stream);
   return lst;
}


/*********************************************************/
/*    cpush      cons push                               */
/*                                                       */
/*    Adds a CONS to the head of a list                  */
/*    modify the cdr cell of *z to point to *lst.  set   */
/*    *lst to point to the CONS z and return a pointer   */
/*    to the CONS.                                       */
/*********************************************************/
LIST cpush(CONS *z,LIST *lst)
{
   z->cdr = *lst;
   return *lst = z;
}

/*********************************************************/
/*    reverse      reverse a list                        */
/*                                                       */
/*    distructively modify the cdr cells of the CONS     */
/*    which make up lst so that a new list is created    */
/*    which is the reverse of the original list.         */
/*    return the new list.                               */
/*********************************************************/
LIST  reverse(LIST  lst)
{
   LIST  rlst=NULL;

   while (lst != NULL) cpush(cpop(&lst),&rlst);
   return rlst;
}

/*********************************************************/
/*    split      split a list in half                    */
/*                                                       */
/*    Find the mid CONS in lst.  set its cdr cell to     */
/*    NULL.  return the remainder of lst, i.e. a pointer */
/*    to the next CONS after the mid CONS.               */
/*********************************************************/
LIST split(LIST lst)
{
   LIST tail=lst->cdr;

   if ( (lst == NULL) || (tail == NULL) )
      return lst;

   while( (tail != NULL) && ( (tail=tail->cdr) != NULL) )
      {
        lst = lst->cdr;
        tail = tail->cdr;
      }
   tail = lst->cdr;
   lst->cdr=NULL;
   return tail;
}

/*********************************************************/
/*    sort        sort a list                            */
/*                                                       */
/*    If cmp is a two argument ordering procedure,       */
/*    sort the items in lst based on cmp and             */
/*    return the results                                 */
/*********************************************************/
LIST  sort( LIST lst,int (*cmp)())
{
   LIST lst2;

   if ((lst == NULL) || (lst->cdr == NULL)) return lst;

   lst2 = split(lst);
   return merge(sort(lst,cmp),sort(lst2,cmp),cmp);
}


/*********************************************************/
/*    merge        merge two lists                       */
/*                                                       */
/*    If cmp is a two argument ordering procedure,       */
/*    merge the items in l1 and l2 based on cmp and      */
/*    return the results                                 */
/*********************************************************/
LIST  merge(LIST l1,LIST l2,int (*cmp)())
{
   LIST  rlst=NULL;

   while ( (l1 != NULL) && (l2 != NULL) )
      cpush((cmp(l2->car,l1->car) ? cpop(&l1) : cpop(&l2)), &rlst);

   while ( l1 != NULL ) cpush(cpop(&l1),&rlst);
   while ( l2 != NULL ) cpush(cpop(&l2),&rlst);

   return reverse(rlst);
}
Listing Three: Efficient sort program.

The main procedure begins with a simple loop that reads each line of text and puts the text on a list called "mylist", which is declared as a LIST type. Think of a list as a new data abstraction available to you as a C programmer. A list is a data entity to which you can add and recover items regardless of type and do all kinds of things that seem reasonable to do with lists. Our program includes a sampling of list-processing procedures needed to satisfy our program's requirements. In addition, you can write your own procedures to do list processing. Your procedures should be generic and independent of the type of the items in the list. In fact, we can even make up a list that contains items of varying types. For example, we might store employee records, bowling-league scores, and parts lists in the same list. Yes, we can even store lists in lists.

The program has a simple approach. We get text lines one at a time from the standard input and push them onto a list. When we reach the end of file, the list is sorted and output on the standard output. We can see from the code in main that a list can have properties similar to a stack, since we are pushing the lines of text onto mylist. But what are we pushing and where does it go? Well, lp is the variable we are using and it is a POINTER. Therefore, we must be pushing a POINTER onto the list.

getaline is a simple procedure that uses the library procedure fgets to read a string from the designated input stream (stdin in our case). The string is stored in a local buffer called line. fgets terminates when LINESIZE-1 characters have been read or if a newline character is encountered, whichever comes first. fgets will return a null if an end of file or an input error occurs.

Under normal circumstances, a null-terminated string will be stored in line and getaline will then use the library routine strdup to duplicate the string and return a POINTER to the duplicate. strdup uses the memory-allocation routine malloc to get space to store the duplicate. If the system is out of memory, strdup will return null and getaline will exit with a return parameter indicating the error. getaline returns a pointer to a string that has been stored in memory allocated with malloc. When no more data is available, getaline will ret urn null unless an error occurs, in which case it will exit with an error indication. getaline's logic for handling abnormal conditions is adequate; however, production code might include some diagnostic messages to the standard error stream.

Let's return to the question of what gets stored on a list. We see from its use that the push procedure pushes a POINTER onto a list. In the example, lp points to a type char. It should be obvious by now that the pointer does not have to point to a char; it can point to any type of C item. push and other list procedures in our program work with lists similar to Figure 2. With this approach, we can build lists independent of items stored in the lists. The list is made up of CONS linked together by their cdr cells. The car cell in the CONS points to the item stored in the list. To simplify the code, we use typedefs to define the necessary data structures. Take a look at the definition of a CONS and a LIST.

List Procedures

Listing Three shows the push procedure we use to add an item to the head of a list. The item to be added is defined as a generic POINTER that points to void. push uses cons to allocate a new CONS and set its car and cdr cells. push also updates the LIST variable so that it now points to the correct CONS at the head of the list. The LIST variable is passed to push by reference -- a POINTER to the LIST variable is passed so push can update it. Finally, push returns a POINTER to the head of the list as a convenience to the caller.

cons returns a POINTER to a CONS, but where did it get the CONS? In a LISP implementation, CONS are managed as an allocated-memory resource. At initialization, they are collected together and put on a list called the "free list." Since LISP uses CONS liberally, they must be managed efficiently. In addition, CONS are frequently used to make temporary copies of lists that may or may not be used. As a result, a method or recovering unused CONS and storing them on the free list becomes necessary. We'll use malloc for the necessary memory to hold a CONS. We'll let the system clean up after us by restoring all dynamically allocated memory after exiting.

cons uses malloc and checks to see whether it granted our request. Since there is always a chance malloc cannot allocate memory, we must account for this possibility. Again, production code should have better diagnostics for such error conditions.

pop is the opposite of push: it removes an item from a list. pop removes the first CONS from the list, retrieving its car cell and then releasing the CONS. pop, like push, must be passed a POINTER to a list rather than the LIST variable itself. That's because pop needs to modify the LIST variable to point to the second item in the list. pop uses procedure cpop to retrieve the first CONS and modify the LIST variable. cpop is very useful for walking through a list without releasing the CONS structures that make up the list. We use cpop in several places in our program.

outputlist is a procedure that iteratively removes each item from the list and sends it to the output stream (stdout in our case). However, outputlist makes a copy of the LIST variable and leaves the original list intact. outputlist is not a general list procedure, as it makes assumptions about each item in the list (it assumes each item is a string). Generic functions in LISP will send the printed representation of an argument to the output stream regardless of the argument type.

These functions depend on a more elaborate strategy for data structure than we'll be using here. We'll have to accept the fact that our output function only works for lists of strings. outputlist uses the library procedure fputs, which outputs a string. outputlist takes a LIST as its first argument. We supply this argument by using the sort procedure, which returns a LIST. Syntactically, the fact that sort returns a POINTER to the sorted list is convenient for calling out putlist.

LISP functions always return something. We follow this convention here even in our outputlist procedure, allowing us to use constructs such as:

outputlist 
(sort(outputlist(item,&1st), compare))))

Here the innermost procedure, push, returns a list used by the next procedure, outputlist, which in turn returns a list used by sort, which returns a list used by the outermost outputlist. Now one can see why LISP stands for "Lots of Insidious Sets of Parentheses."

Sort Technique

sort (Listing Three) is an example of a common LISP technique. Self-calling LISP functions are frequently encountered. This technique, called "recursion," is available to programmers in most modern languages but is more common in LISP. Many programming problems lend themselves to recursive solutions, particularly list processing. Recursion can be a very powerful technique, but it must be used cautiously since each recursive call uses stack space and poorly contructed procedures can run through the stack. sort, however, is a recursive program that works very well.

sort checks to see if the list is empty or contains only one item. If so, its work is very easy. A list that is empty or contains only one item is already sorted, so sort just returns the list itself. Otherwise, sort splits the list, sorts each half, and merges the results. It sorts each half with sort, a procedure that we know will sort a list. Fortunately, the problem, which the recursive call to sort must handle, has been reduced by half. Eventually, the list will be either reduced to one, or empty, in which case the first statement in sort will cause the recursion to terminate. Since the problem is divided by half each time, the recursive depth is equal to the base-2 logarithm of the size of the list. A list of 65536 items only recurses to a depth of 16!

sort has a second parameter that is a procedure: the string-compare that was passed by main. sort is designed to sort lists consisting of different types of items. To determine the sorting order, one must be able to determine the ordering sequence for the items. For strings, string-compare(a,b) returns a value greater than zero if a comes before bin the collating order. To sort a list of items that are not strings, the user only needs to supply an appropriate compare procedure.

sort uses split to break the list into two halves. split walks two POINTERs through the list. The first POINTER is the variable 1st that was passed to split as a POINTER to the list head. Since C passes arguments by value, we are really working with a copy of the LIST variable and can use it as we wish without destroying the variable in the calling procedure. The second POINTER, tail, is stepped through the list twice as fast as the first. When tail reaches the end of the list, the first pointer, 1st, will point to the midpoint of the list. The list is pruned at this point and a POINTER to the second half of the list is returned.

Merge Procedure

merge's strategy is to build a temporary list in reverse-sorted order by pushing CONS onto the list that are selectively popped off the appropriate input list. The appropriate input list is chosen based on the compare function passed as a parameter to merge. When either of the input lists is exhausted, merge transfers the remaining items from the other list until it's empty. Finally, merge returns the reverse of the temporary list. merge manipulates CONS and thus uses cpop and cpush. This process is more efficient than using pop and push since these procedures use free and malloc to release and allocate a CONS on each call.

reverse, a procedure that reverses a list, uses cpop and cpush to change the order of the CONS in a list. Like merge, reverse pops from one list and pushes to another, yielding a list whose order is reversed from the original.

The sort program is written in ANSI-compatible C so it should be adaptable to any system. I have compiled it using Microsoft C 5.1 and a compact memory model, allowing access to all available memory under MS-DOS. Text files of over 250K can be sorted using the program (the sort program provided with MS-DOS is limited to files of 64K or less). On an 8MHz 8086 system, sorting a file of 239K bytes took 40 seconds. On an 18 MHz 80386, the same sort took less than five seconds. In general, the sort program is several times faster than the one provided with MS-DOS.

These procedures can be improved in many ways. For example, many text files contain blank lines. Each time one of these lines l is read, space is allocated to hold the '\n' (newline) character as well as the terminating '\0' (null). This problem results in numerous cases of duplicate strings that contain identical data. The storage space for a string exceeds that needed for just the characters in a string since each memory fragment required by the memory allocation scheme has overhead. If we could remove these duplicates, we could save data. For our purposes, the list, which consists of CONS with pointers to the strings, is not affected if the car cell of several CONS point to the same string. We might employ a method of searching our list of existing strings to see if we have one that matches the new one before allocating the memory for it. Another method is to recode some of the simple procedures as macros. For example, push on most systems is faster and takes less space as a macro.

The sort program gives an indication of the power of list processing for nonLISP programmers. The small set of procedures shown here provides a powerful toolbox for solving the very large class of problems that lends itself to list processing. Of course, many additions can be made to this toolbox. Try it. Or better yet, if you don't already speak LISP, learn it -- you'll be glad you did.

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.