List Processing
Let's take a top-down approach. We'll start with a real-world programming problem and define the type and requirements of the list procedures we'll need. We'll use the corresponding LISP functions so those of you who are LISPers will see the C implementations.
Listing Three contains the entire source code for a sort program. The program reads lines of text from the standard input, sorts them, and writes them to the standard output. It can be used on UNIX or MS-DOS systems as a filter program; in conjunction with other programs, pipes, and redirection, the program can perform more complex processing. Three special-purpose procedures exist in addition to main: getaline, output-list, and string-compare. The other procedures perform general-purpose list processing and can be used for many programming problems. Once a library of such procedures has been built, a new application can be developed with minimal effort.
/*********************************************************/ /* File: sort.c */ /* */ /* Description: This file contains source code for */ /* a program written in C which sorts */ /* text files. It is a demonstration */ /* of LISP-like list procedures. */ /* */ /* Version Date History */ /* 1.00 14SEP88 Original */ /*********************************************************/ #include <stdlib.h> #include <stdio.h> #include <string.h> #include <malloc.h> /*********************************************************/ /* typedefs */ /*********************************************************/ typedef void *POINTER; /* General purpose pointer */ typedef struct /* A CONS is two pointers */ { POINTER car; POINTER cdr; } CONS; typedef CONS *LIST; /* A LIST is a pointer to */ /* a CONS */ /*********************************************************/ /* return codes at the program level */ /*********************************************************/ #define RET_NORMAL 0 /* Normal return code */ #define RET_IOERR 2 /* I/O Error */ #define RET_MEMERR 3 /* Memory allocation error */ /*********************************************************/ /* Manifest Constants */ /*********************************************************/ #define LINESIZE 500 /* size of local buffer */ /* lines larger then this will */ /* read as multiple lines */ /*********************************************************/ /* Prototypes */ /*********************************************************/ LIST reverse(LIST lst); LIST merge(LIST lst1,LIST lst2,int (*compare)()); LIST sort(LIST lst,int (*compare)()); CONS *cons(POINTER a,POINTER b); LIST cpush(CONS *z,LIST *lst); LIST push(POINTER item,LIST *lst); CONS *cpop(POINTER *lst); POINTER pop(POINTER *lst); LIST outputlist(LIST lst,FILE *stream); int string_compare(char *a,char *b); char *getaline(FILE *stream); /*********************************************************/ /* main sort program */ /* */ /* Reads lines of text from stdin, sorts them, and */ /* outputs the result to stdout. */ /*********************************************************/ int main(void) { LIST mylist=NULL; /* List of text records as */ /* read from stdin */ char *lp; /* pointer to duplicate string*/ while ( (lp = getaline(stdin)) != NULL ) push(lp,&mylist); outputlist(sort(mylist,string_compare),stdout); return RET_NORMAL; } /*********************************************************/ /* getaline get a line of text */ /* */ /* Returns a pointer to a line of text read from */ /* stream. Returns NULL if end of file. Exits */ /* on error. */ /*********************************************************/ char *getaline(FILE *stream) { char line[LINESIZE]; /* input buffer */ char *lp; /* points to strdup buffer */ if (fgets(line,LINESIZE,stream) != NULL) { if ( (lp = strdup(line)) != NULL) return lp; /* normal return */ else exit(RET_MEMERR); /* strdup could not malloc */ } else { if (feof(stream)) return NULL; /* end of file */ else exit(RET_IOERR); /* I/O error in fgets */ } } /*********************************************************/ /* string_compare compares strings */ /* */ /* Returns TRUE value if string a is ordered after */ /* string b. Returns FASE if string a is equal to */ /* or ordered before string b. */ /*********************************************************/ int string_compare(char *a,char *b) { return strcmp(a,b) > 0; } /*********************************************************/ /* push push an item onto a list */ /* */ /* Uses cons to create a CONS and sets its car cell */ /* to item. Its cdr is set to *lst. *lst is set to */ /* point to the CONS created. A pointer to the CONS */ /* is also returned. */ /*********************************************************/ LIST push(POINTER item,LIST *lst) { return *lst = cons(item,*lst); } /*********************************************************/ /* cons create a CONS */ /* */ /* Uses malloc to allocate a CONS and set its */ /* car cell to a and its cdr cell to b. Exits if */ /* malloc fails! */ /*********************************************************/ CONS *cons(POINTER a,POINTER b) { CONS *z; z = (CONS *) malloc(sizeof(CONS)); if (z == NULL) exit(RET_MEMERR); z->car = a; z->cdr = b; return z; } /*********************************************************/ /* pop pop an item off a list */ /* */ /* Uses cpop to pop a CONS off lst. Return NULL */ /* if lst is empty. free the CONS and return the */ /* car cell of the CONS. */ /*********************************************************/ POINTER pop(LIST *lst) { POINTER item; CONS *z; z = cpop(lst); if(z == NULL) return NULL; /* list is empty */ item = z->car; /* get what cons pointed to*/ free(z); /* release the cons */ return item; } /*********************************************************/ /* cpop pop a CONS off a list */ /* */ /* if *lst is empty return NULL, else set *lst to the */ /* the car cell of the first CONS on *lst and return */ /* a pointer to the first CONS on *lst. */ /*********************************************************/ CONS *cpop(LIST *lst) { LIST rvalue; rvalue = *lst; if (rvalue != NULL) *lst = rvalue->cdr; return rvalue; } /*********************************************************/ /* outputlist output each item in a list */ /* */ /* start with the first item in the list, output it */ /* to stream assuming it is a string, repeat for each */ /* item in the list. Return lst. */ /*********************************************************/ LIST outputlist(LIST lst,FILE *stream) { CONS *acons; LIST l=lst; while ( acons=cpop(l) != NULL ) fputs( (char *) acons->car,stream); return lst; } /*********************************************************/ /* cpush cons push */ /* */ /* Adds a CONS to the head of a list */ /* modify the cdr cell of *z to point to *lst. set */ /* *lst to point to the CONS z and return a pointer */ /* to the CONS. */ /*********************************************************/ LIST cpush(CONS *z,LIST *lst) { z->cdr = *lst; return *lst = z; } /*********************************************************/ /* reverse reverse a list */ /* */ /* distructively modify the cdr cells of the CONS */ /* which make up lst so that a new list is created */ /* which is the reverse of the original list. */ /* return the new list. */ /*********************************************************/ LIST reverse(LIST lst) { LIST rlst=NULL; while (lst != NULL) cpush(cpop(&lst),&rlst); return rlst; } /*********************************************************/ /* split split a list in half */ /* */ /* Find the mid CONS in lst. set its cdr cell to */ /* NULL. return the remainder of lst, i.e. a pointer */ /* to the next CONS after the mid CONS. */ /*********************************************************/ LIST split(LIST lst) { LIST tail=lst->cdr; if ( (lst == NULL) || (tail == NULL) ) return lst; while( (tail != NULL) && ( (tail=tail->cdr) != NULL) ) { lst = lst->cdr; tail = tail->cdr; } tail = lst->cdr; lst->cdr=NULL; return tail; } /*********************************************************/ /* sort sort a list */ /* */ /* If cmp is a two argument ordering procedure, */ /* sort the items in lst based on cmp and */ /* return the results */ /*********************************************************/ LIST sort( LIST lst,int (*cmp)()) { LIST lst2; if ((lst == NULL) || (lst->cdr == NULL)) return lst; lst2 = split(lst); return merge(sort(lst,cmp),sort(lst2,cmp),cmp); } /*********************************************************/ /* merge merge two lists */ /* */ /* If cmp is a two argument ordering procedure, */ /* merge the items in l1 and l2 based on cmp and */ /* return the results */ /*********************************************************/ LIST merge(LIST l1,LIST l2,int (*cmp)()) { LIST rlst=NULL; while ( (l1 != NULL) && (l2 != NULL) ) cpush((cmp(l2->car,l1->car) ? cpop(&l1) : cpop(&l2)), &rlst); while ( l1 != NULL ) cpush(cpop(&l1),&rlst); while ( l2 != NULL ) cpush(cpop(&l2),&rlst); return reverse(rlst); }
The main procedure begins with a simple loop that reads each line of text and puts the text on a list called "mylist", which is declared as a LIST type. Think of a list as a new data abstraction available to you as a C programmer. A list is a data entity to which you can add and recover items regardless of type and do all kinds of things that seem reasonable to do with lists. Our program includes a sampling of list-processing procedures needed to satisfy our program's requirements. In addition, you can write your own procedures to do list processing. Your procedures should be generic and independent of the type of the items in the list. In fact, we can even make up a list that contains items of varying types. For example, we might store employee records, bowling-league scores, and parts lists in the same list. Yes, we can even store lists in lists.
The program has a simple approach. We get text lines one at a time from the standard input and push them onto a list. When we reach the end of file, the list is sorted and output on the standard output. We can see from the code in main that a list can have properties similar to a stack, since we are pushing the lines of text onto mylist. But what are we pushing and where does it go? Well, lp is the variable we are using and it is a POINTER. Therefore, we must be pushing a POINTER onto the list.
getaline is a simple procedure that uses the library procedure fgets to read a string from the designated input stream (stdin in our case). The string is stored in a local buffer called line. fgets terminates when LINESIZE-1 characters have been read or if a newline character is encountered, whichever comes first. fgets will return a null if an end of file or an input error occurs.
Under normal circumstances, a null-terminated string will be stored in line and getaline will then use the library routine strdup to duplicate the string and return a POINTER to the duplicate. strdup uses the memory-allocation routine malloc to get space to store the duplicate. If the system is out of memory, strdup will return null and getaline will exit with a return parameter indicating the error. getaline returns a pointer to a string that has been stored in memory allocated with malloc. When no more data is available, getaline will ret urn null unless an error occurs, in which case it will exit with an error indication. getaline's logic for handling abnormal conditions is adequate; however, production code might include some diagnostic messages to the standard error stream.
Let's return to the question of what gets stored on a list. We see from its use that the push procedure pushes a POINTER onto a list. In the example, lp points to a type char. It should be obvious by now that the pointer does not have to point to a char; it can point to any type of C item. push and other list procedures in our program work with lists similar to Figure 2. With this approach, we can build lists independent of items stored in the lists. The list is made up of CONS linked together by their cdr cells. The car cell in the CONS points to the item stored in the list. To simplify the code, we use typedefs to define the necessary data structures. Take a look at the definition of a CONS and a LIST.