Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

An Improved Lisp-Style Library for C


SEP92: AN IMPROVED LISP-STYLE LIBRARY FOR C

This article contains the following executables: CLISP.ARC

Douglas is a mathematician for the U.S. Army. He can be reached through the DDJ offices.


I was first exposed to Lisp in 1975, while working as a research mathematician for the Army. Since then, I've been fortunate enough to develop most of my algorithms using Lisp. I've found it to be a wonderful language, because it allows me to express abstract ideas quickly and naturally, while ignoring mundane considerations such as memory allocation and type declarations. So I was very pleased to see Daniel Ozick's article, "A Lisp-Style Library for C," in the August 1991 issue of Dr. Dobb's Journal. At the time, I had just begun translating a large program from Common Lisp to C.

However, once the translation process was underway, I discovered that some of the predicates in Daniel's library did not function properly. Also, the library was missing some "standard" Lisp predicates and structures to which I'd become accustomed. Fortunately, Daniel's approach is modular enough to encourage serious tinkering, so in the course of a few months, I developed a more complete library of predicates that behave like their Common Lisp equivalents.

Unfortunately, I then discovered that Ozick's garbage collector prevented my program from executing like its Lisp counterpart. By redesigning the garbage-collection process, I've now succeeded in making the system efficient, nearly automatic, and easy to use. The result is an improved Lisp library in C which I call "C-lisp." This library should facilitate the translation of Lisp programs to C.

I've tested the library on a variety of platforms, using different C and C++ compilers. The platforms include an IBM 486, a Macintosh FX, and a Sun SPARCII. The compilers used were Borland C, Symantec's ThinkC, Saber/CenterLine C++, and Sun C++ V2.1.

The Missing Predicates

Certain familiar Lisp predicates and structures were missing from Ozick's original implementation. These include Lisp atoms and associated property structures, the eq and equal predicates, and the list and append functions.

A key strength of Lisp is the language's ability to represent highly abstract data structures--sets, trees, and lattices--using Lisp atoms and their associated indicator-property structures. As you may recall, an indicator-property structure (sometimes called an "attribute/value pair") is a way of associating a set of features and corresponding values with a single atom. You can use indicator-property pairs to represent a real-world object in terms of an associated descriptor set. For example, consider Tweety--a small yellow bird, one year old, which cannot fly. In Lisp, the atom which is the pointer to Tweety would have the set of indicators and associated properties shown in Example 1.

Example 1: A partial list of descriptors for Tweety.

             Indicator  Property
  --------------------------------

  LISP ATOM  Name       Tweety

             Animal     T

             Type       bird

             Age        one year

             Color      yellow

             Fly        nil

             Size       small

You might argue that Tweety could also have been represented using a list. However, although the list would contain Tweety's descriptors, you would have a problem referencing it. The only referencing requirement is that Tweety's label be unique, since the entity Tweety is assumed to be unique. (Note that another Tweety with an identical set of Lisp descriptors may exist, but it is not the original Tweety.) Lisp atoms with unique symbol names solve this problem.

You can use the gensym (generate symbol) function from the C-lisp library to generate unique atoms. Then, you use the put_prop (put property) function to add/update indicators and their associated properties. Likewise, given an indicator, you can use the get_prop function to retrieve the corresponding property value. The remprop function removes both the indicator and its associated property from an atom. The code in Example 2 shows how this is done.

Example 2: Using gensym, put_prop, get_prop, and remprop.

  Object foo = gensym();                 /* generate a new Lisp atom  */

  put_prop(foo, "name", "Tweety");       /* give it name = Tweety     */
  write_object (get_prop (foo, "name"))  /* print name ("Tweety")     */
  remprop(foo, "name");                  /* remove name               */
  write_object (get_prop(foo,"name"));   /* print name (prints "Nil") */

EQ is not Equal

In Common Lisp, (eq x y) returns True if and only if x and y address the same memory location. However, x and y can be considered "equal" if they are structurally similar (isomorphic) objects. In the case of Lisp predicates such as member, using eq to test for equality is frequently inappropriate. For example, consider this Lisp fragment: (setf foo '(d e f)) (setf foo2 '(a b (d e f))) (member foo foo2:test#'eq). As shown, the member predicate returns Nil; by contrast, (member foo foo2 :test #'equal) returns the desired True.

I implemented a C-lisp lisp_equal function which recursively examines complex structures, testing for C equality (==) only as appropriate. The lisp_equal predicate is used in is_member, which returns True or Nil; in member, which returns the list whose car is lisp_equal to the first function argument; and in remove_item, which behaves like its Common Lisp counterpart. The C-lisp functions remove_duplicates, intersection, and set_differenc use is_member, member, and/or remove_item as well.

List and Append

Frequently in a Lisp program you want to create a list of lists. Occasionally, some of the member lists may be empty. The original implementation, which used a NULL (the empty set) as an argument terminator in calling the list function, didn't allow this. The C statement foo=list(make_ integer(33),NULL) results in foo being equal to (33). This means that foo= list(NULL,NULL) returns NULL rather than (Nil).

To make this function work properly, I changed the calling protocol to use the special symbol T_EOF as an argument terminator. Then the C statement foo = list(NULL,NULL,T_EOF) produces the desired result of (Nil Nil).

The append function was also anomalous. As originally implemented, C-lisp append behaved like Lisp nconc, a smash function that destroys its first argument. For example, the C statement foo = append(foo_bar,foo2) results in foo_bar being destroyed. I modified append to copy foo_bar and to cons each element of (reverse foo_bar) to the front of list foo2, which is modified as expected. The function nconc is now available as well.

Improving Garbage Collection

The improvements described earlier are incremental changes to the original implementation. After these were complete, I discovered a problem which necessitated a redesign of the original memory manager and garbage collector (GC).

The original design was admittedly simple in both concept and implementation. Although "real" Lisp systems use a variety of sophisticated GC algorithms (such as mark-and-sweep and generation scavenging), Daniel's implementation did not use a GC as such. Instead, memory was partitioned into two disjoint sets: temporary and persistent, and the mark and mark_persistent functions tag objects as one or the other. When memory needs to be reclaimed, the application program calls free_to_mark, which uses the free() function of the C standard library to deallocate memory. Objects are normally stored in temporary memory, where they get reclaimed by the GC. To protect an object from the GC, you must copy it into persistent memory. Eventually, this results in running out of temporary memory, because dynamic changes to a C-lisp data structure require many copies of the original object. A typical sequence is shown in Example 3(a). This is a result of having only two memory types.

Example 3: (a) Using the original garbage collector; (b) using the improved garbage collector, (c) how to protect a symbol's plist.

  (a)

  mark();                       /* set memory type = temporary           */
  ...
  mark_persistent();            /* set memory type = persistent          */
  foo = copy_object (foo_bar);  /* foo gets copied into persistent
                                   memory                                */
  unmark_persistent();          /* set memory type = temporary           */
  ...
  free_to_mark();               /* free temporary-tagged memory          */

  (b)

  foo = list (make_integer(33),NULL,T_EOF);  /* make object foo       */
  mark_object (foo);            /* mark the object (set protect bit)  */
  collect_garbage();            /* invoke the garbage collector       */
  write_object(foo);            /* display object (prints "(33 NULL)" */
  unmark_object (foo);          /* clear protect bit                  */
  collect_garbage();            /* invoke the garbage collector again */
  write_object(foo);            /* object no longer exists ("error")  */

  (c)

  foo_bar = gensym("girl");
  put_prop (foo_bar,"age", make_integer(25));
  mark_object(foo_bar);         /* foo_bar gets marked                */
  foo=gensym("boy");
  mark_object(foo);
  put_prop(foo,"age",make_integer(30));    /* C-bit set on foo        */
  put_prop (foo, "friend", foo_bar);
  mark_object(foo);                        /* second marking of foo
                                               needed                 */
  collect_garbage();
  write_object(get_prop (foo, "friend"));  /* displays girl-1         */

A better approach is to tag C-lisp objects, not memory. Listing One (page 112) is my memory-manager implementation. This more closely resembles the GCs in actual Lisp implementations, while still remaining rudimentary and nonautomatic. Each C-lisp object has a tag which indicates that object's type--symbol, pair, or integer. Depending upon the computer architecture, the minimal tag size is 1 byte. I use three bits of the type byte to specify the object type. Two of the remaining five bits are used during garbage collection to classify an object as "protected" (P bit) or "changed" (C bit). Normally, an object's P bit is 0, and the object is considered "unmarked." Unmarked objects may be reclaimed by the GC; objects with P bit equal to 1 are protected from the GC. You set an object's P bit with the function mark_object and clear it with unmark_object. Calls to mark_object are efficient because previously marked objects are ignored. Example 3(b) shows a typical code fragment.

Now suppose foo is an object of type symbol, and foo is marked. Foo's property list (plist) can be changed by using either put_prop or remprop, as described previously. If foo is not protected, the changed portion of foo's plist will be reclaimed during garbage collection with possibly disastrous results. To avoid this, both put_prop and remprop set the C bit if and only if the P bit is set. The mark_object function reexamines objects whose C bit is set, searching for any unmarked portion of the plist. To make this process memory efficient, a special unmark algorithm, free_structure, is used to unmark the portion of foo's plist being removed or replaced by put_prop or remprop. The free_structure function differs from unmark_object in that the value of the P bit for objects of type symbol is left unchanged. This strategy, illustrated in Example 3(c), prevents the unwanted reclamation of memory by the GC for any symbol objects which may be associated with foo. Example 4 shows the pseudocode for the key routines in the garbage collector.

Example 4: Pseudocode for memory allocation and garbage collection.

  At Start-Up:
        Initialize the GC stack pointer to NULL.
        Return.

  When a C-lisp object is created:
        Obtain memory from the system by using malloc().
        Push GC stack pointer to location of last GC stack location.
        Return.

  Upon a call to collect_garbage();
        CG:     Pop pointer from GC_Stack.
                If pointer is NULL {
                      Pop pointer from Protect_Stack.
                      If pointer is NULL
                           Push pointer onto GC_Stack
                      Else
                           Return
                }
             Else {
                      If object type > 7
                            Pass object location to free().
                      else
                            Push pointer onto Protect_Stack.
              Goto CG.
               }
  Upon calls to put_prop() or remprop():
          If object type > 7 {
                  Remove/post information on Object's plist.
          }
          else {
                  Set object's C bit to 1
                  If there is data to remove from plist {
                          Call free-structure(data).
                  }
                  Post information.
  }

To test the garbage collector, I wrote a program that first creates a lattice structure, and then cycles through the following steps:

  • Increase the size of the lattice.
  • Mark-and-copy the lattice.
  • Collect garbage.
  • Unmark the lattice.
  • Repeat.
In one test run using an O(n{2}) algorithm, the original implementation quickly depleted available memory in less than 12 iterations. The revised implementation was still running at 1035 iterations, at which time I halted the process.



_AN IMPROVED LISP-STYLE LIBRARY FOR C_
by Douglas Chubb



[LISTING ONE]
<a name="01f0_000d">

/*
   File MEMORY.C, part of C-LISP Library written by Douglas Chubb, 1991-92.
   Memory management using pointers and two marking bits as part of Object "type"
   declaration.
*/

/** Memory Allocation and Deallocation Functions **/
/* Include Files */
#include <stdio.h>
#include <stdlib.h>
#include "lisp-header.h"
#include "int-lisp-syms.h"
/** Variables **/
/* memory_pointer_list -- pointer to linked list of memory storage blocks */
Pointer memory_pointer_list = NULL;
/* temp_pointer_list -- pointer to linked list of temporally allocated blocks */
Pointer temp_pointer_list = NULL;
/** Functions **/

void initialize_garbage_collector (void)  {
  memory_pointer_list = NULL;     
  temp_pointer_list = NULL;  
}

/* push_memory_pointer -- push pointer to block on 'memory_pointer_list' */
void push_memory_pointer (Pointer p)  {
  * (Pointer *) p = memory_pointer_list;
  memory_pointer_list = p;
}

/* pop_memory_pointer -- pop pointer to block from 'memory_pointer_list' */
Pointer pop_memory_pointer (void)  {
  Pointer p;
  p = memory_pointer_list;
  if (p != NULL) {
    memory_pointer_list = * (Pointer *) p;
    return (p);
  } else
    error ("pop_memory_pointer: 'memory_pointer_list' is empty");
}

/* push_temp_pointer -- push pointer to block on 'memory_pointer_list' */
void push_temp_pointer (Pointer p)  {
  * (Pointer *) p = temp_pointer_list;
  temp_pointer_list = p;
}

/* pop_temp_pointer -- pop pointer to block from 'temp_pointer_list' */
Pointer pop_temp_pointer (void)  {
  Pointer p;
  p = temp_pointer_list;
  if (p != NULL) {
    temp_pointer_list = * (Pointer *) p;
    return (p);
  } else
    error ("pop_temp_pointer: 'temp_pointer_list' is empty");
}

/* collect_garbage -- 'safe_free' all malloc'ed data */
void collect_garbage (void)  {
  Pointer p, pp;
  if(memory_pointer_list == NULL)
    error ("collect_garbage: memory_pointer_list empty'");
  else {
    temp_pointer_list = NULL;
    while (memory_pointer_list != NULL) {
      p = pop_memory_pointer();
      pp = (char *) p + sizeof (Pointer);
      safe_free (pp);
    }
    while(temp_pointer_list != NULL)
      push_memory_pointer(pop_temp_pointer());
    /* fill marked_block stack  */
  }
}

/* "C"'free' with first byte of block set to zero */
void safe_free (void *p)  {
  if(type((char *) p) <= 7) {
    * (char *) p = (char) 0;
    /* free block, including header, for link in memory_pointer_list */
    free ((char *) p - sizeof (Pointer));
  } else /* maybe store data temporarily on 'temp_pointer_list'  */
    push_temp_pointer((char *) p - sizeof (Pointer));
}

/* safe_malloc -- Unix 'malloc' wrapped inside test for sufficient memory */

Pointer safe_malloc (size_t size)  {
  Pointer memory;
  static long num_calls = 0;
  /* allocate block, including header for link in 'memory_pointer_list' */
  memory = malloc (size + sizeof (Pointer));
  num_calls++;
  /*   total_space += size; */
  if (memory != NULL) {
    push_memory_pointer (memory);
    /* return beginning of user data block */
    return ((char *) memory + sizeof (Pointer));
  } else
    error ("safe_malloc: out of memory"     
           " (number malloc calls = %ld) \n ",  num_calls);
}

/* mark_object -- recursively marks object "type" negative to save object
   iff object is either "unmarked" or, if "marked", object has
   not been changed by 'put_prop' or 'remprop' functions. 
*/
void mark_object (Object obj) {
  if (obj == NULL || (type(obj) > 7 && (type(obj) & '\040') == 0))
    return; /* 'obj' marked, but NOT changed => return */
  else {
    type(obj) = ntype(obj);
    mark2_object(obj);
    type(obj) = '\100' | ntype(obj); /* remove "changed = 040" tag */
  }
}

/* mark2_object -- recursively marks the object "type" negative */
void mark2_object (Object obj)  {
  if (obj == NULL) return;
  else
    switch (ntype(obj)) {
    case SYMBOL:
           if(type(obj) > 7 && (type(obj) & '\040') == 0)
             return;
           else {
             type(obj) = '\100' | ntype(obj);
             if(get_prop(obj, "pn") == NULL)
               symbol_plist(obj) = first_put(list(make_string("pn"),
                                       make_string(symbol(obj)->print_name),
                                       T_EOF), symbol_plist(obj));
             mark2_object(symbol_plist(obj));
             mark2_object(symbol(obj)->value);
           }
        break;
    case STRING:
    case INTEGER:
    case FUNCTION:
        break;
    case PAIR:
         type(obj) = type(obj) | '\100'; /* mark type negative */
         mark2_object(first(obj));
         mark2_object (but_first(obj));
         break;
    default:
         error ("\nmark2_object: not standard object: %d", type(obj));
         break;
    }
    type(obj) = type(obj) | '\100'; /* mark type negative */
}

/* unmark_object -- recursively marks Object-type positve to free Object */
void unmark_object (Object obj)  {
  if (obj == NULL || type(obj) <= 7)
    return;
  else
    switch (ntype(obj)) {
    case SYMBOL:
         if(type(obj) == ntype(obj)) return;
         else {
           type(obj) = ntype(obj);
           unmark_object(symbol_plist(obj));
           unmark_object(symbol(obj)->value);
           symbol(obj)->print_name = string(get_prop(obj, "pn"));
         }
         break;
    case STRING:
    case INTEGER:
    case FUNCTION:
         break;
    case PAIR:
         type(obj) = ntype(obj); /* remove protect bit */
         unmark_object (first(obj));
         unmark_object (but_first(obj));
         break;
    default:
         error("unmark_object: not standard object");
         break;
    }
    type(obj) = ntype(obj); /* remove protect bit */
}


Copyright © 1992, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.