Polymorphic C

PCC, Greg's Polymorphic C interpreter, combines the benefits of incremental compilation with a mainstream language, making it particularly useful in developing and debugging C routines used in Windows applications.


August 01, 1994
URL:http://www.drdobbs.com/cpp/polymorphic-c/184409288

Figure 1


Copyright © 1994, Dr. Dobb's Journal

Figure 1


Copyright © 1994, Dr. Dobb's Journal

AUG94: Polymorphic C

Polymorphic C

An extended C interpreter

Greg Voss

Greg is a C++ and Smalltalk consultant and the "OOP Alley" columnist for Windows Tech Journal. He can be contacted at [email protected].


Failure is crucial to learning and productivity. The faster you find out what is wrong, the faster you can make it right. If I can fail six times a minute, I'll acquire knowledge six times faster than a programmer who fails only once every minute. To facilitate this sort of failure, I wrote an interpreter that lets me combine the benefits of incremental compilation with those of using a mainstream language.

My Polymorphic C interpreter (PCC) has proven useful in developing and debugging C routines used in Windows applications. The environment comes close to approximating the feel of developing programs in Lisp and Smalltalk. Once subroutines are debugged and stable, they can be passed off to standard commercial C compilers and incorporated into DLLs, which in turn can be called from PCC statements and functions still undergoing development.

PCC does not interpret C; it interprets a variant I call "Polymorphic C." Polymorphic C adds to C features traditionally associated with Smalltalk, Actor, and Lisp: flexible (polymorphic) typing, incremental compilation, and direct, interactive access to the symbol table and to all objects in the run-time image of the program being developed. The program is designed with a direct-manipulation GUI interface that lets you select individual expressions, statements, or blocks for immediate compilation and execution.

The Implementation Language

One of the first steps in developing PCC was choosing the implementation language. Both Smalltalk and Lisp offer advantages over C or C++: In addition to incremental compilation, they have built-in support for processing lists and ordered collections, which can dramatically change the nature of development. Lisp's strong support for macros and symbolic processing makes it a better choice for language processing than Smalltalk. In contrast, Smalltalk has better support for complex windowing interfaces. Lisp's run-time type checking virtually eliminates program and system crashes associated with typical Windows development tools. Ultimately, I chose Franz's Allegro Common Lisp, which enabled me to develop PCC in about six months, spending an average of ten to fifteen hours a week.

The grammar for Polymorphic C is a nearly complete implementation of the ANSI-compatible C grammar specified in the appendix of Kernighan and Ritchie's The C Programming Language, second edition. Preprocessor directives, such as #define, #include, and #if, are not supported. It is a relatively trivial exercise to add preprocessor directives to Polymorphic C, but these directives do not play the same central role in incremental compilation that they play in batch processing of C source files. All grammar productions other than preprocessor directives are recognized.

Polymorphic C provides no library, instead allowing direct, interactive calls to any Windows DLL, including the entire Windows API. In addition, Polymorphic C allows subroutines to be implemented in either C or Lisp, thereby giving C programmers access to all Common Lisp, CLOS (Common Lisp Object System), and Common Graphics functions and objects. C's standard-library function, printf, for example, is much easier to implement in Lisp than in C. A complete implementation of printf is provided with Polymorphic C as an example of a nontrivial, C subroutine; see Listing Three, page 95. Compilation of subroutines of up to several hundred lines of C source takes one or two seconds. With small subroutines, compilation is instantaneous. Run-time type checking adds a degree of safety not currently available in popular commercial compilers for C.

C Interpreters

There have been incremental compilers and interpreters for C on the PC platform ever since its inception, from Tiny C, cTerp, and Instant-C, to Al Stevens's latest endeavor, Quincy. Some of the early DOS interpreters suffered from the awkward disadvantage of shuffling code back and forth between an incremental development tool and a tool for producing the final application. The awkwardness of this shuffling between development code and production code had much more to do with the speed limitations of the 8086 and the 640K memory limitations of DOS than with the basic idea of incremental compilation.

As Windows has grown in popularity, Windows programs have increased in size and complexity so that, even with the fastest compilers, you can rarely run through a minor edit-compile-link cycle in less than 30 seconds. A minute is more typical, and several minutes is not unusual. With the large amount of memory and the fast processor speed of most machines used with Windows, incremental compilation and run-time development images make more sense than ever. More than 640K of memory allows the interpreter to cache more objects in memory and dramatically reduce the amount of processing necessary to modify a program image.

Using Polymorphic C

One difference between standard C programming and programming with incremental C compilers is the granularity of the compilation unit or module. What is the minimum program quantum that can effectively be processed by the compiler, linker, and loader? In standard C, the file is the minimum quantum for the compiler. But linking and loading are still separate, time-consuming steps. In Smalltalk and Lisp, the expression is the minimum processing quantum. Loading of the compiled expression is instantaneous, and linkage occurs dynamically while the expression is being evaluated. In Polymorphic C, the statement is the minimum processing quantum for the compiler/linker/ loader. For this reason, the main interface for program development in Polymorphic C is the C Statement Editor (see Figure 1), the primary interface used in developing C statements and functions. The C Statement Editor assumes you are compiling, running, or translating a single C statement. This is not a limitation of the PCC compiler, but rather an intentional restriction designed to help catch errors such as missing semicolons, commas, and curly braces. The compiler can be smarter if it can make assumptions about the type of construct it is parsing.

While it may be adequate to move back and forth between compilation and execution of several related statements, compound statement blocks provide the most convenient way of executing several statements at once. For convenience, variables need not be declared as long as they are assigned a value before they are used in expressions. Declaring local variables, however, does help keep the symbol table clean. Local variables are added to the symbol table on block entry and removed from the symbol table on block exit.

A typical scenario for usage is one in which you build statements incrementally, making liberal use of the polymorphic echo() function to output the values of variables and check for unexpected behavior. For more control over the format of output, you call printf() instead of echo().

Program Architecture

From a structural point of view, it's best to think of PCC as four interconnected objects: a parser, a scanner, a symbol table, and a reader (input system). There are other, secondary objects, but the main action takes place between these four primary objects. Table 1 shows the primary objects and their roles in PCC.

Implicitly, there is another major component of PCC: the run-time interpreter. Actually the interpreter is Allegro's Lisp interpreter, so no special global object called interpreter actually exists. However, to properly understand the support mechanism of the run-time environment for programs generated by PCC (or any C compiler), it's important to think of the interpreter as an entity in itself. For typical C compilers, the run-time interpreter is the CPU, which interprets the meaning of sequences of numbers assumed to be machine instructions. Run-time languages can also be defined at a higher level of abstraction; for example, the various byte-code sequences used by Basic, Pascal, and Smalltalk interpreters, and the p-code used by Microsoft C. Such higher-level program representations require support code above and beyond the CPU to implement language behavior at run time.

PCC has a global object, *c-environment*, which maintains state data relevant to the most recently compiled program or statement. If a special run-time interpreter were required by PCC, it would be placed inside *c-environment*, which should be thought of as the image in memory that would be set up by a program loader when a C program is invoked from the command line. The environment sets up the code (text) and data segments (both initialized and uninitialized) as well as the stack for automatic variables and the heap used by the translated program.

As a convention, global variables in Lisp are preceded and followed by an asterisk. Thus, the four primary objects in PCC have the names *c-char-reader*, *c-scanner*, *c-parser*, and *symbol-table*.

In PCC, the parser takes legal C token sequences and turns them into Lisp code. The actual objects generated by the parser are lists--Lisp symbolic expressions which can be evaluated by the Lisp interpreter. At one point, I considered generating code for a stack machine--very much like Forth--but decided that there were advantages to using Lisp itself as the target language. The primary advantages include the outstanding debugging support provided by Lisp browsers and inspectors, steppers, and tracing mechanisms.

Listing One shows the internal instance variables of the four primary objects in PCC. Careful study of the class definitions in Listing One shows the natural hierarchical relationship between a parser, scanner, and reader. The parser has a pointer (reference) to the scanner, which in turn has a pointer to the character reader. The parser can ask the scanner for the next input token without knowing anything about the reader. Similarly, the scanner asks the reader for input characters to build up tokens.

While it is possible to access the scanner and the reader through the global variables *c-scanner* and *c-char-reader*, you don't need to do this--except while testing methods for the scanner and reader. Thus, the primary interface to the compiler is through *c-parser*, which is mediated through the local menu of the C Statement Editor. However, it can also be accessed directly through Lisp symbolic expressions. It's convenient to have *symbol-table* as a global so that you can dump and inspect the contents of the table after a statement or program is compiled or when program execution is suspended at a break point while debugging.

PCC Objects and Methods

The PCC reader has 7 methods, the scanner has 27, and the symbol table has 8. About 60 methods in the parser class are central to compilation. Of these, 50 methods are direct implementations of the productions for the ANSI-C grammar specified in Kernighan and Ritchie. The remaining 10 parser methods provide support for managing input strings, retrieving tokens and line numbers, matching tokens against input, and reporting errors. Error recovery is enabled by exception handling (catch and throw) to unwind the interpreter's stack when errors are encountered in the middle of a parse.

The most revealing function in PCC is the test-xlate-statement in Listing Two . This is a function rather than a method of c-parser-class, so it can be invoked even if instances of c-parser-class, c-scanner, and c-input-class have not yet been created. The only required argument to this function is the identifier string; this argument contains the text source for a single C statement, which may be a compound statement and therefore arbitrarily complex.

The code for test-xlate-statement shows all the major types of transactions you are likely to ask the parser to carry out if you are not working through a GUI interface like the C Statement Editor. In fact, the C Statement Editor calls test-xlate-statement directly whenever you issue the compile command from the local menu. The C Statement Editor passes whatever text is highlighted as the string argument to test-xlate-statement.

First, the run-time environment and error handlers are initialized. Next, the echo run-time library subroutine is installed in *symbol-table*. This could be done once globally rather than each time a statement is translated, but leaving it here shows the interface for installing functions in the symbol table and also reveals the simplicity of the definition for echo.

Now for the heart of the compiler: An instance of c-parser-class called *c-parser* is created by the Lisp expression: (setf *c-parser* (make-instance 'c-parser-class)).

A lot goes on behind the scenes here. The constructor for c-parser-class actually creates an instance of, and a link to, a c-scanner-class object. The scanner in turn creates a character-reader object and initializes a pointer to it. This design lets the user focus all interaction on *c-parser*. Setting the input string for translation is handled in the expression: (source-string *c-parser* string).

Normally the string is retrieved from the text highlighted in the C Statement Editor window. This expression sends the message source-string to the object *c-parser* with the argument string. Compared to Smalltalk and C++, Lisp inverts the order of the object/message syntax: In Lisp the message comes first. For a variety of reasons, CLOS makes message sends use syntax identical to that used for function invocation. The first argument is the object receiving the message.

To get the translation started, the token pointer must be advanced to point to the first input token. Then the statement translator is invoked. If translation is successful, a Lisp expression is left on top of the parse stack. To run the resulting program, the Lisp expression can be evaluated as follows: (eval (car (stack *c-parser*))).

The function car returns the first element of its list argument--in this case, the parse stack, implemented as a list. At any time, you can print the Lisp expression in Allegro's TopLoop window, which is similar to Smalltalk's Transcript window; for example, (pprint (car (stack *c-parser*))).

The translate menu command in the C Statement Editor invokes a similar expression to show the Lisp translation of a C program in a newly opened edit window.

When interacting directly with the C Statement Editor, the aforementioned mechanisms are entirely hidden. You have a sense of compiling and running programs directly by highlighting text and issuing applicable menu commands. Output normally directed to C's stdout device appears in Allegro Lisp's TopLoop window. Windows applications would probably use dialog boxes for input and output. Standard dialogs are readily accessible through Allegro's Common Graphics package or by calls to the Windows API.

Conclusion

I have found PCC to be a practical alternative to popular, commercial C development environments, although PCC is best used as an enhancement to commercial compilers rather than a replacement. Hopefully PCC will provide insight into the use of incremental compilation in off-the-shelf tools.

Figure 1 The C Statement Editor window is the primary interface for writing code.

Table 1: PCC primary components.

Component    Description

reader       Remembers input string for C source.
             Keeps track of current position in input stream.
             Remembers current character.
             Keeps track of current line number (for errors and 
               warnings reported by scanner or parser).
             Maintains pushback stack for characters.

scanner      Converts input to token stream.
             Allows putback of one token.
             Remembers current token and last-read token.
             Strips comments from input.
             Installs identifiers in symbol table.

parser       Recognizes and validates legal token sequences 
               (sentences). 
             Translates token stream into expressions 
               in target language (Lisp). 
             Parses type declarations and generates type 
               structures. 
             Recognizes and reports syntax and semantic errors 
               (some semantic errors are recognized and reported 
               by the run-time interpreter).

symbol       Stores values and types associated with identifiers 
table          (strings). Implemented as hash table for fast lookup.
             Identifiers used as lookup key.
             Stores values for constants, variables, and code for 
               functions.
             Scoping: Allows shadowing of variables redefined in 
               local blocks.

Listing One


;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; CLASS c-input-class
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defclass c-input-class ()
  ((source-stream     :accessor source-stream
                      :initform nil
                      :initarg :source-stream)
   (line              :accessor line :initform nil)
   (line-size         :accessor line-size :initform 0)
   (line-number       :initform 0 :accessor line-number)
   (index             :accessor index :initform 0)
   (putback-stack     :accessor putback-stack :initform nil)))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; CLASS c-scanner-class
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defclass c-scanner-class ()
  ((token           :accessor token :initform nil)
   (text            :accessor text :initform nil)
   (value           :accessor value :initform nil)
   (current-char    :accessor current-char :initform nil)
   (previous-token  :accessor previous-token :initform nil)
   (previous-text   :accessor previous-text :initform nil)
   (previous-value  :accessor previous-value :initform nil)

   (char-reader     :accessor char-reader :initform nil)
   (lexeme-buffer   :accessor lexeme-buffer
                    :initform (make-array '(1024)
                               :element-type 'string-char
                               :initial-element (character 0)
                               :fill-pointer 0))))
 
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; CLASS c-parser-class
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defclass c-parser-class ()
    ((scanner       :accessor scanner       ; link to scanner
                    :initform nil)
     (look-ahead    :accessor look-ahead    ; look-ahead symbol
                    :initform -1)
     (stack         :accessor stack         ; parse stack
                    :initform nil)))
 
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; CLASS symbol-table
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defclass symbol-table ()
    ((table         :accessor table
                    :initform (make-hash-table
                                 :size 1000
                                 :test #'equal))))

Listing Two


;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; Function acting as primary interface between
;;; C Statement Editor window and *c-parser*.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defun test-xlate-statement (string         ; source text
                           &optional      ; optional args follow
                           (print nil)    ; print translation if t
                           (interpret t)) ; run program if t
                                          ;
 (ct-block-reset *c-environment*)         ; Reset all compile-time
                                          ; block data.
                                          ; 
 (catch 'exit-parser             ; Exception handler: Throw
                                          ; expressions in parser 
                                          ; methods are caught here.
    ;; ==================================================
    ;; Define 'echo' function. (to be installed as a 
    ;; function in *symbol-table*). Simulates a compiled C function 
    ;; which echos its argument to the TopLoop (Transcript) window. 
    ;; Argument can be any data type. (i.e. argument is 
    ;; polymorphic--hence name 'Polymorphic C').
    ;; ==================================================

    (symbol-install *symbol-table* "echo" 'BUILT-IN
       #'(lambda (&rest x)
            (dolist (e x) (format t "~a " e))
            (format t "~%") x))
    ;; ==================================================
    ;; Make new instance of parser and set source
    ;; string to be translated.
    ;; ==================================================
    (setf *c-parser* (make-instance 'c-parser-class))
    (source-string *c-parser* string)

    ;; ==================================================
    ;; PARSE and TRANSLATE. Creates Lisp expression as top element
    ;; of parse stack (stack *c-parser*).
    ;; ==================================================
    (format t "~%COMPILING~%")
    (advance-token *c-parser*)
    (xlate-statement *c-parser*)
    (format t "~%COMPILE DONE~%")

  ;; ==================================================
  ;; Wrap translation in 'return-label' block 
  ;; so C return statements will work in compound statements--even 
  ;; when enclosing function isn't defined.
  ;; ==================================================
    (let ((s (pop (stack *c-parser*))))
       (push `(block return-label ,s) (stack *c-parser*)))

    ;; ==================================================
    ;; PRINT (optional). Expressions can get long, so printing is 
    ;; optional.
    ;; Pass T (true) as second argument to test-xlate-statement 
    ;; if you want to see the translation.

    ;; ==================================================
    (when print
       (terpri)
       (format t "~%LISP TRANSLATION OF=====================>~%  ~a" string)
       (format t "~%<========================================")
       (pprint (car (stack *c-parser*))))

    ;; ==================================================
    ;; INTERPRET (run--also optional). Evaluate translated 
    ;; Lisp expression (form) on top of parse stack.
    ;; ==================================================
    (when interpret
       (terpri)
       (format t "~%INTERPRETING STATEMENT==================>~%~a" string)
       (format t "~%<========================================")
       (terpri)
       (eval (car (stack *c-parser*))))
    (format t "~%OK~%> ") t))


Listing Three


;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 
;;; printf --- the behavior of C's printf is simulated 
;;; by converting calls to
;;;            printf into Lisp expressions which are evaluated at run time.
;;; The basic algorithm is:
;;;    while more chars in format-string
;;;      get next char
;;;      if % process-conversion
;;;      else if \ process escape character
;;;      else process normal character
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; 
(defun printf (format-string &rest args)
   (do* ((in-stream (make-string-input-stream format-string))
         (out-stream (make-string-output-stream))
         (c (read-char in-stream nil 'eof) (read-char in-stream nil 'eof))
         (out-args) (justify) (prefix) (alt-output)
         (width nil nil) (precision nil nil) (conversion-char))
        ((eq c 'eof)                                   ; terminate loop
         (let* ((f-string (get-output-stream-string out-stream))
                (return-expr
                 (append `(format t ,f-string ) (reverse out-args))))
            (eval return-expr)))
      (macrolet

         ((next-char ()
             '( progn (setf c (read-char in-stream nil 'eof))
                 (if (eq c 'eof)
                    (error "printf: eof in print format string"))))
          (get-number-for (attribute)
             `(progn
                 (when (char= c #\*)
                    (setf ,attribute '*)
                    (push (pop args) out-args)
                    (next-char)
                    (format t " * for ~a specifier " ',attribute))
                 (when (digit-char-p c)
                    (do ((stream (make-string-output-stream)))
                        ((not (digit-char-p c))
                         (setf ,attribute (get-output-stream-string stream)))
                       (write-char c stream)
                       (next-char))))))
         (case c
            (#\% ;;(princ 'print-directive)
             (next-char)
             (when (char= c #\-) (setf justify 'left) (next-char))
             (when (member c '(#\+ #\space)) (setf prefix c)(next-char))
             (when (char= c #\#) (setf alt-output t) (next-char))
             ;; field width
             (when (or (char= c #\*) (digit-char-p c))
                (get-number-for width))
             ;; precision
             (when (char= c #\.)
                (next-char)
                (get-number-for precision))
             (case c
                (#\c ;;(princ 'character)
                 (write-string "~c" out-stream)
                 (push `(character ,(pop args)) out-args))
                (#\s ;;(princ 'string)
                 (write-char #\~ out-stream)
                 (when width
                    (if (eq width '*)
                       (write-char #\v out-stream)
                       (write-string width out-stream)))
                 ;; parameters done, now modifiers
                 (unless (eq justify 'left)
                    (write-string ":@" out-stream))
                 (write-char #\a out-stream)
                 (push (pop args) out-args))
                (#\d ;;(princ 'integer)
                 (write-char #\~ out-stream)
                 (when width
                    (if (eq width '*)
                       (write-char #\v out-stream)
                       (write-string width out-stream)))
                 (write-char #\d out-stream)
                 (push (pop args) out-args))
                (#\f ;;(princ 'float)

                 (write-char #\~ out-stream)
                 (if width
                    (if (eq width '*)
                       (write-char #\v out-stream)
                       (write-string width out-stream)))
                 (write-char #\, out-stream)
                 ;; if precision not specified, set default precision to 6
                 (when (null precision) (setf precision "6"))
                 (cond ((eq precision '*)
                        (write-char #\v out-stream)
                        (write-char #\f out-stream))
                       ((zerop (parse-integer precision))
                        ;; Precision specified as "0"
                        ;; To avoid printing decimal point
                        ;; used d (decimal) format instead of f.
                        ;; This is what C's printf does.  See
                        ;; K&R 2nd Ed. pp. 12-13.  You must also
                        ;; explicitly round the argument in Lisp
                        ;; or it will print as a float
                        (write-char #\d out-stream)
                        (let ((arg (pop args)))
                           (push `(round ,arg) out-args)))
                       (t (write-string precision out-stream)
                          (write-char #\f out-stream)
                          (push (pop args) out-args))))
                ((#\x #\X)
                 (write-char #\~ out-stream)
                 (write-char #\x out-stream)
                 (push (pop args) out-args))
                (#\% (princ 'percent)
                 (write-char #\% out-stream))))
            (#\\                                ; escape character
             (next-char)
             (case c
                (#\t (princ 'tab)
                 (write-string "~,8@T" out-stream))
                (#\r (princ 'carriage-return))
                (#\n (princ 'integer) (write-string "~%" out-stream))))
            (t                                  ; normal character
               (write-char c out-stream))))))
;;;
;;; Install printf function in *symbol-table*
;;;
(symbol-install *symbol-table* "printf" 'BUILT-IN
         (function printf))


Copyright © 1994, Dr. Dobb's Journal

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.