Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

C Programming


JUL94: C PROGRAMMING

The Quincy Preprocessor (Continued)

I should be grouchy writing this column. It's overdue as usual, the lawn needs mowing, the old car needs a new idler arm, I need a haircut, the deadline for my new book looms near, papers for the Borland conference are due next week, and federal tax returns must be filed by Friday. When things start to pile up like that, I react swiftly and with purpose. I take the day off.

Yesterday Judy and I drove over to Lakeland to the annual Sun-and-Fun fly-in. Her cousin-in-law, Mike Horvath, is an owner of Kolb Aircraft, a company that manufactures ultralight airplanes, and they were displaying their wares at the show. As a former pilot of real airplanes, I have always viewed those noisy little toys with the same distant regard usually reserved for venomous snakes, earthquakes, pitbulls, and women scorned.

Former pilot, indeed. Some time ago the FAA in their infinite wisdom decided that due to a malfunctioning pancreas I was no longer fit to pilot airplanes. That ruling being nonnegotiable and my condition being irreversible, I set aside my doubts to take another look at ultralights. You don't need to be a licensed pilot to fly one. And with good reason, I always thought. No sane pilot would want to. When they first came out, they were little more than tube and nylon hang gliders with patio chairs and lawn-mower engines strapped on. They've come a long way since then, and if I ever get all of these other things taken care of, I might just slap one together and take it out for some bumps and circuits.

I looked at Mike's product and a bunch of others. If I didn't say here and now that the Kolb FireStar was the clear, standout best on the field, I'd have trouble at home. Fortunately, it turns out to be true. One company was selling fan-driven parachute machines similar to the one that landed in the middle of a heavyweight title fight and later on the roof of Buckingham Palace. The birdman who flew it was naked and painted green when the Bobbies took him into custody. See, that's what I mean. You have to have a few bulbs burned out to get into some of these hobbies. I can't believe that I'm seriously considering it. The salesman didn't want to talk about the naked, green fan-man. He said it gave his product a bad name. As part of his sales approach he asked about my profession. Then all he wanted to talk about was his PC and whether Visual C++ would have templates soon. It's everywhere.

Preprocessing

This month continues the Quincy C-interpreter project. Last month, I began describing Quincy's preprocessor. Since writing that column, I've made a number of changes to the preprocessor. I'm building Quincy at the same time that I'm writing the introductory C book that Quincy supports. Developing the book's exercises reveals Quincy's bugs and shortcomings. For example, last month I said that Quincy tends to reflect the subset of the language that I use, and among the things that I rarely use are the # and ## preprocessing macro tokens. To add Standard C's assert.h functions, which I use a lot, I found that I needed the # "stringizing" token. Since I was adding that, adding the ## argument-pasting token was not much more trouble, so I did. Perhaps I'll find a use for it later. As I predicted last month, adding those tokens meant that I had to change the internal representation of argument-replacement tokens in macros. They are now 8-bit integers with the most significant bit set instead of #<digit> pairs, as described last month.

Last month's column deferred the parts of the preprocessor that resolve #define macros and evaluate #if expressions. We continue that discussion now.

Resolving Macros

When the preprocessor scan in preproc.c (from last month) finds an identifier, it calls the ResolveMacro function preexpr.c (Listing One), which tests to see if the identifier has been redefined in a macro. If not, the function returns the identifier itself. If so, the function returns the string that represents the resolution of the macro. Macros can invoke other macros, so the ResolveMacro process is recursive.

ResolveMacro calls FindMacro to see if the identifier has been redefined. If not, the function returns the original identifier copied into the space specified by the caller. If the identifier has been redefined, the function must resolve the macro. Some macros are simple identifier substitutions in that they have no parameters and therefore, accept no arguments; #define ESC 27 for example.

In the case of such substitutions, the function makes the substitution and returns to the top to test the substituted value to see if it, too, invokes a macro substitution.

When a macro has parameters, ResolveMacro calls CompileMacro to substitute the macro definition replacing the parameters in the macro definition with the macro call's arguments. Since that substitution can result in a completely different code sequence, ResolveMacro scans the result and calls itself to resolve any identifiers in the new substituted string.

The CompileMacro function expands a macro call, including argument substitution for parameters. First, it builds an array of pointers to the macro call's arguments. Then it scans the macro definition, copying ASCII characters to the result. When it hits a parameter token, which is an 8-bit integer with the most significant bit set, the function copies the corresponding argument from the array to the result. The tokens are integer values that represent the parameter number. The most significant bit identifies the integer as a parameter token rather than an ASCII value to be copied.

If the macro definition includes the # character outside of a literal, the substitution surrounds the argument that immediately follows with double quotes, turning it into a string. A ## pair causes the immediately preceding and following arguments to be concatenated.

Expression Evaluation

The #if and #elif preprocessing directives test a constant expression and include or exclude the lines of code that follow it, depending on whether the expression evaluates to a true or false value. Evaluating expressions uses a recursive-descent expression analyzer that starts with a call to MacroExpression.

A recursive-descent parser calls the function that processes operators with the lowest precedence first. That function starts by calling the function for operators with the next-higher precedence. Then it loops, looking for the operators that it manages, processing them when it finds them, and breaking out of the loop when it does not. If the operator is binary, the function saves the first operand and again calls the next-higher precedence function to get the second operand. Then it computes its result by applying the two operands and the operator. Each of the operator functions returns the value that it evaluates.

The highest-precedence function extracts a numerical value to contribute to the expression; this function is at the bottom of the recursive descent. First, it looks to see if there is a left parenthesis. If so, it calls the top of the descent to evaluate the expression in parentheses, making sure that a right parenthesis terminates the expression. Then it returns the result of the evaluation.

If there is no left parenthesis, the highest-precedence function checks for unary operators. If it finds one, the function calls itself to get an expression value against which to apply the operator.

The highest-precedence function parses out numerical constants and returns their value. It uses the interpreter's tokenize function to convert the C constant into a number.

If the highest-precedence function finds an identifier, the function calls ResolveMacro to convert the identifier to a value that the function can use in a call to itself to get a number. If there is no macro, the identifier is translated into a 0 value.

Piles of Books

My desk is piled high with reference books used in research for the C book I'm writing and in verifying C-language features as I build Quincy. I'll discuss some of those books here.

The ANSI-standard document is at the top of the pile. Its official name is The American National Standard for Programming Languages-C. My copy is the December 1988 draft, which is close to, but not, the final copy. They added some sections at the beginning, so the paragraph numbers are off, but otherwise the draft seems to be complete. The ANSI-draft document package includes a rationale document that "summarizes the deliberations of X3J11."

Every C programmer should have a copy of the ANSI document. You can order the official version by calling 212-642-4900 and pressing 1 twice. Ask for ANSI/ISO 9899-1990. The cost is $130.00, plus shipping and handling. Rather than pay that much, you could buy a cheaper book that has it all. The Annotated ANSI C Standard (Osborne/McGraw-Hill), reproduces the complete ANSI specification, with annotations by Herb Shildt. The book was released in 1993, but the copyright notice cites only the 1990 copyright date of the original ISO/ANSI standard document. The purpose of the book is to translate the standard document's terse technical specification into a more readable form. A side benefit is that it publishes the complete ANSI C standard at a substantially lower cost ($39.95 list) than ANSI's. The book devotes two facing pages with identical page numbers to each page from the standard document--the standard-document page on the left and Herb's comments on the right. The standard pages were printed from plates provided to the publisher by ANSI, so you can be sure that you are getting the real McCoy. Herb's comments are relevant and to the point, making some of the more arcane parts of the document understandable.

One time, I turned to this book when I doubted what I read in the draft. The fprintf formatting string includes a %n token that I have never used, and the ANSI language that discusses it was less than clear. At first glance it looked like a typo. Here was a situation where Herb's comments might shed light, or, if not, maybe the final document would be clearer than the draft. Wouldn't you know it? That was a page that the publisher inadvertently left out. I called Herb, and he said that subsequent printings corrected the error. I have the third printing. When you get the book, if pages 131 and 132 are the same, ask for a more recent printing. I was able to clear up my question about fprintf on my own. A more careful reading of the terse language in the ANSI draft proved that it was correct.

Next in my pile is P.J. Plauger's The Standard C Library (Prentice-Hall, 1992), a book that I discussed in an earlier column. Quincy implements a subset of the standard C-library functions, mostly by passing calls to them through to the Borland C++ run-time library. Sometimes, however, as with stdarg.h macros and setjmp.h functions, that is not possible. Other times, as with assert.h macros and functions, the compiler's implementation might work, but not well enough. An assert call can abort the interpreted program, but it shouldn't abort Quincy, too. Plauger's book explains how all of these libraries are implemented.

Both editions of The C Programming Language (1978 and 1988), by Brian Kernighan and Dennis Ritchie are prominent in my pile: the first, to provide a historical reference of how C used to be; the second, for Ritchie's occasional reflections on the effects that an international committee had on his creation. For example, "The rules for nested uses of ## are arcane_." I'll say.

Then there are the books about compilers and interpreters. Although Quincy's K&R interpreter was mostly written when I started this project, adding the ANSI constructs and improving its performance required considerable modifications to the interpreter code. Three books that have helped me better understand the architecture of language translation are: Principles of Compiler Design, by Alfred Aho and Jeffrey Ullman (Addison-Wesley, 1977), the well-known "dragon book;" Writing Compilers & Interpreters, by Ronald Mak (John Wiley & Sons, 1991); and Writing Interactive Compilers and Interpreters, by P.J. Brown (John Wiley & Sons, 1979). The Brown book is my favorite. It doesn't take itself too seriously and explains things in ways that we poor, humble, country programmers can understand.

Variable Arguments: stdarg.h

Having mentioned stdarg.h, I'll use this column to discuss what it does and how Quincy implements it. It's one of those oddities that sets C apart from other languages and complicates the business of writing a language translator. The ability for a C function to have a variable number of arguments of variable types is an enigma of sorts. On one hand, the prototype mechanism that ANSI borrowed from C++ wants all of a function's arguments to be defined at compile time. That way, the compiler can validate the function definition and all calls of the function. On the other hand, functions such as printf need to have variable arguments whose number and types can be determined at run time, based on format specifiers in a fixed argument whose type is declared. The prototype for printf is shown in Example 1(a).

The ellipse token (_) tells the compiler not to check the types or presence of any of the arguments past the first one, and that's all that a C compiler has to do to support variable arguments. There may be any number of fixed arguments before the ellipse.

Writing a function that accepts variable arguments involves using some macros defined in stdarg.h. At first glance, it might seem that implementing a variable argument list would involve some trickery in the compiler, but, in fact, the whole thing is done with macros. The real trick is in knowing how arguments get passed. Example 1(b) is the Quincy implementation of stdarg.h. It works very much like the one found in most compilers that pass adjacent arguments on a stack in increasing address order.

To understand how the macros work, you need to look at how you use them. Example 1(c) is a simple function with variable arguments. It assumes that the first argument is an int containing the number of arguments that follow, and those following arguments are pairs of one int and one char* each. It doesn't have to be that rigid. The printf function infers the types of arguments from the formatting tokens in the first argument. That's why you see such strange stuff when your formatting string and your arguments don't match.

Here's how the macros work. The va_list variable becomes a reference point for the scan of the variable argument list. The va_start macro call uses that variable and the function's last fixed argument as its own arguments. The va_list variable is a pointer to a char type by virtue of the typedef in stdarg.h. The va_start macro assigns to the variable the address of whatever follows its second argument, which should be the last fixed argument to the function. This technique assumes that arguments are passed on the stack, that they are adjacent, and that the first listed argument has a lower address than the second, the second, a lower address than the third, and so on. The va_start macro takes the address of its second argument, adds to that the size of the second argument, casts the resulting address to type va_list, and assigns the address to the va_list variable named by its first argument.

After this, it is the responsibility of the using function to know what types are in the argument list. The va_arg macro takes the address of its va_list first argument, casts it to an address of the type specified by its second argument, dereferences the object pointed to by that address, and increments the va_list variable by the type size named as its second argument. By using the va_arg macro in an expression, you can dereference the argument itself where the macro call occurs.

With most macro definitions, the macro calls resemble function calls with valid C expressions as arguments. The va_arg macro is an exception. Its second argument has to be a type, which is why va_arg must be a macro and cannot be implemented as a function. The macro expansion inserts the second argument into a cast and two sizeof expressions. The va_start macro cannot be a function, either. It needs to know the size of its second argument, which can be of any valid type, and a C function cannot work that way.

In Quincy, the va_end macro is what assembly-language programmers call a NOP (pronounced "no op," also used to describe nonproductive programmers). It does nothing. Some C implementations use it to clean up the variable-argument environment.

"C Programming" Column Source Code

Quincy, D-Flat, and D-Flat++ are available to download from the DDJ Forum on CompuServe, DDJ Online, and the Internet by anonymous ftp; see "Availability," page 3. If you cannot get to one of the online sources, send a diskette and a stamped, addressed mailer to me at Dr. Dobb's Journal, 411 Borel, San Mateo, CA 94402. I'll send you a copy of the source code. It's free, but if you want to support my Careware charity, include a dollar for the Brevard County Food Bank.

Example 1: (a) Prototype for printf; (b) Quincy implementation of stdarg.h; (c) a function with variable arguments.

(a)
      int printf(const char *fmt, ...);
(b)
typedef char *va_list;
#define va_start(ap,pn) (void)((ap)=(va_list)&(pn)+sizeof(pn))
#define va_arg(ap,ty) (*(ty*)(((ap)+=sizeof(ty))-sizeof(ty)))
#define va_end(ap) (void)0
(c)
#include <stdio.h>
#include <stdarg.h>
void foo(int n,...)
{
    va_list ap;
    va_start(ap,n);
    while (n--) {
        printf("\n%d:",va_arg(ap,int)); /* int argument   */
        puts(va_arg(ap,char*));         /* char* argument */
    }
    va_end(ap);
}

Listing One


/* ---------- preexpr.c ---------- */
#include <string.h>
#include <stdlib.h>
#include "qnc.h"
#include "preproc.h"

void bypassWhite(unsigned char **cp);

/* ------- compile a #define macro ------- */
static void CompileMacro(unsigned char *wd, MACRO *mp,unsigned char **cp)
{
    char *args[MAXPARMS];
    int i, argno = 0;
    char *val = mp->val;
    int inString = 0;

    if (**cp != '(')
        error(SYNTAXERR);
    /* ---- pull the arguments out of the macro call ---- */
    (*cp)++;
    while (**cp && **cp != ')')    {
        char *ca = getmem(80);
        int parens = 0, cs = 0;
        args[argno] = ca;
        bypassWhite(cp);
        while (**cp)    {
            if (**cp == ',' && parens == 0)
                break;
            if (**cp == '(')
                parens++;
            if (**cp == ')')    {
                if (parens == 0)
                    break;
                --parens;
            }
            if (cs++ == 80)
                error(SYNTAXERR);
            *ca++ = *((*cp)++);
        }
        *ca = '\0';
        argno++;
        if (**cp == ',')
            (*cp)++;
    }
    /* -- build the statement substituting arguments for the parameters -- */
    while (*val)    {
        if (*val & 0x80 || (*val == '#' && !inString))    {
            char *arg;
            int stringizing = 0;
            if (*val == '#')    {
                val++;
                if (*val == '#')    {
                    val++;
                    continue;
                }
                else    {
                    *wd++ = '"';
                    stringizing = 1;
                }
            }
            arg = args[*val & 0x3f];
            while (isspace(*arg))
                arg++;
            while (*arg != '\0')
                *wd++ = *arg++;
            if (stringizing)    {
                while (isspace(*(wd-1)))
                    --wd;
                *wd++ = '"';
            }
            val++;
        }
        else if ((*wd++ = *val++) == '"')
            inString ^= 1;
    }
    *wd = '\0';
    for (i = 0; i < argno; i++)
        free(args[i]);
    if (argno != mp->parms)
        error(ARGERR);
    if (**cp != ')')
        error(SYNTAXERR);
    (*cp)++;
}
/* ---- resolve a macro to its #defined value ----- */
int ResolveMacro(unsigned char *wd, unsigned char **cp)
{
    unsigned char *mywd = getmem(MAXMACROLENGTH);
    MACRO *mp;
    int sct = 0;
    ExtractWord(wd, cp, "_");
    while (alphanum(*wd) && (mp = FindMacro(wd)) != NULL &&
                        sct != MacroCount)    {
        if (mp->val == NULL)
            break;
        if (mp->isMacro)    {
            unsigned char *mw = mywd;
            int inString = 0;
            CompileMacro(mywd, mp, cp);
            while (*mw)    {
            if (*mw == '"' && (mw == mywd || *(mw-1) != '\\'))
                    inString ^= 1;
                if (!inString && alphanum(*mw))    {
                    ResolveMacro(wd, &mw);
                    wd += strlen(wd);
                }
                else
                    *wd++ = *mw++;
            }
            *wd = '\0';
        }
        else
            strcpy(wd, mp->val);
        sct++;
    }
    free(mywd);
    return sct;
}
/* --- recursive descent expression evaluation for #if --- */
static int MacroPrimary(unsigned char **cp)
{
    /* ---- primary:
            highest precedence;
            bottom of the descent ---- */
    int result = 0, tok;
    if (**cp == '(')    {
        /* ---- parenthetical expression ---- */
        (*cp)++;
        result = MacroExpression(cp);
        if (**cp != ')')
            error(IFERR);
        (*cp)++;
    }
    else if (isdigit(**cp) || **cp == '\'')    {
        /* --- numerical constant expression ---- */
        char num[80];
        char con[80];
        char *ch = *cp, *cc = con;
        while (isdigit(*ch) || strchr(".Ee'xX", *ch))
            *cc++ = *ch++;
        *cc = '\0';
        tokenize(num, con);
        *cp += cc - con;
        switch (*num)    {
            case T_CHRCONST:
                result = *(unsigned char*) (num+1);
                break;
            case T_INTCONST:
                result = *(int*) (num+1);
                break;
            case T_LNGCONST:
                result = (int) *(long*) (num+1);
                break;
            default:
                error(IFERR);
        }
    }
    else if (alphanum(**cp))    {
        /* ----- macro identifer expression ----- */
        unsigned char *np = getmem(MAXMACROLENGTH);
        unsigned char *npp = np;
        result = (ResolveMacro(np, cp) == 0) ? 0 : MacroPrimary(&npp);
        free(np);
    }
    else    {
        /* ----- unary operators ----- */
        tok = **cp;
        (*cp)++;
        result = MacroPrimary(cp);
        switch (tok)    {
            case '+':
                break;
            case '-':
                result = -result;
                break;
            case '!':
                result = !result;
                break;
            case '~':
                result = ~result;
                break;
            default:
                error(IFERR);
        }
    }
    bypassWhite(cp);
    return result;
}
/* ----- * and / operators ----- */
static int MacroMultiplyDivide(unsigned char **cp)
{
    int result = MacroPrimary(cp);
    int iresult, op;
    for (;;)    {
        if (**cp == '*')
            op = 0;
        else if (**cp == '/')
            op = 1;
        else if (**cp == '%')
            op = 2;
        else
            break;
        (*cp)++;
        iresult = MacroPrimary(cp);
        result = op == 0 ? (result * iresult) :
                 op == 1 ? (result / iresult) :
                           (result % iresult);
    }
    return result;
}
/* ------ + and - binary operators ------- */
static int MacroAddSubtract(unsigned char **cp)
{
    int result = MacroMultiplyDivide(cp);
    int iresult, ad;
    while (**cp == '+' || **cp == '-')    {
        ad = **cp == '+';
        (*cp)++;
        iresult = MacroMultiplyDivide(cp);
        result = ad ? (result+iresult) : (result-iresult);
    }
    return result;
}
/* -------- <, >, <=, and >= operators ------- */
static int MacroRelational(unsigned char **cp)
{
    int result = MacroAddSubtract(cp);
    int iresult, lt;
    while (**cp == '<' || **cp == '>')    {
        int lt = **cp == '<';
        (*cp)++;
        if (**cp ==  '=')    {
            (*cp)++;
            iresult = MacroAddSubtract(cp);
            result = lt ? (result <= iresult) : (result >= iresult);
        }
        else    {
            iresult = MacroAddSubtract(cp);
            result = lt ? (result < iresult) : (result > iresult);
        }
    }
    return result;
}
/* --------  == and != operators  -------- */
static int MacroEquality(unsigned char **cp)
{
    int result = MacroRelational(cp);
    int iresult, eq;
    while ((**cp == '=' || **cp == '!') && *(*cp+1) == '=')  {
        eq = **cp == '=';
        (*cp) += 2;
        iresult = MacroRelational(cp);
        result = eq ? (result==iresult) : (result!=iresult);
    }
    return result;
}
/* ---------- & binary operator ---------- */
static int MacroBoolAND(unsigned char **cp)
{
    int result = MacroEquality(cp);
    while (**cp == '&' && *(*cp+1) != '&')    {
        (*cp) += 2;
        result &= MacroEquality(cp);
    }
    return result;
}
/* ----------- ^ operator ------------- */
static int MacroBoolXOR(unsigned char **cp)
{
    int result = MacroBoolAND(cp);
    while (**cp == '^')    {
        (*cp)++;
        result ^= MacroBoolAND(cp);
    }
    return result;
}
/* ---------- | operator -------- */
static int MacroBoolOR(unsigned char **cp)
{
    int result = MacroBoolXOR(cp);
    while (**cp == | && *(*cp+1) != |)    {
        (*cp) += 2;
        result |= MacroBoolXOR(cp);
    }
    return result;
}
/* ---------- && operator ---------- */
static int MacroLogicalAND(unsigned char **cp)
{
    int result = MacroBoolOR(cp);
    while (**cp == '&' && *(*cp+1) == '&')    {
        (*cp) += 2;
        result = MacroBoolOR(cp) && result;
    }
    return result;
}
/* ---------- || operator -------- */
static int MacroLogicalOR(unsigned char **cp)
{
    int result = MacroLogicalAND(cp);
    while (**cp == | && *(*cp+1) == |)    {
        (*cp) += 2;
        result = MacroLogicalAND(cp) || result;
    }
    return result;
}
/* -------- top of the descent ----------- */
int MacroExpression(unsigned char **cp)
{
    bypassWhite(cp);
    return MacroLogicalOR(cp);
}


Copyright © 1994, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.