Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

C Programming


JUN94: C PROGRAMMING

C PROGRAMMING

The Quincy Preprocessor

Al Stevens

Last month I introduced Quincy, a new "C Programming" column project. Quincy is a C-language teaching interpreter with an interactive D-Flat user interface. Its original version was a K&R interpreter. The new project is much closer to Standard C with a CUA integrated environment.

This month I'll discuss the interpreter's preprocessor, which implements a subset of Standard C's preprocessing operators. Quincy supports #if, #ifdef, #ifndef, #else, #elif, #endif, #define, #undef, #include, and the backslash (\) line-continuation character in macros. It does not support the # "stringizing" and ## concatenation operators in macros, but I might add them later. Quincy also does not support the #line, #error, or #pragma directives.

A preprocessor reads C source code and translates it for the compiler. The preprocessor deletes comments, excess white space, and code that compile-time conditionals (#if, and so on) delete. It also resolves #define macros and inserts other source-code files that the #include directive specifies. The preprocessor maintains line-number integrity in the output source code so a source-level debugger can set breakpoints and step through the code.

Traditionally, the preprocessor is a stand-alone program that runs as the first pass of a compile, producing a temporary file for the second pass to read. Quincy is an interactive interpreter, so the preprocessor is implemented through a function that the interpreter calls before it begins translating the code.

A p Descendent

A preprocessor is a complex piece of code. The original Quincy did not have any preprocessing, although it supported simple #define macro substitutions without parameters. Other preprocessing directives were comments. You could put an #include statement in, for example, but it did nothing. All of the library functions were built in, and K&R C did not have prototypes, so a preprocessor was not necessary. The current version has header files with prototypes and macros. Some header files even have functions. Consequently, a preprocessor became necessary.

Not wanting to reblaze old trails, I went looking for an existing C preprocessor to adapt. My first thought was to download the Gnu version. I'm sure it's tucked away somewhere in one of those megabytes of Gnu uploads, but I couldn't tell which one from the file descriptions, and I sure didn't want to download all of that stuff. A search of the likely CompuServe libraries with PREPROCESSOR keywords didn't turn up anything productive, either, so I did the obvious--I turned to the Doctor for help.

Years ago, DDJ published an article with a preprocessor for the Small C compiler. The program was called "p." I found it in one of the annual bound editions. Because of its age, the source code is not available electronically, so I typed it in and compiled it. By gum, it worked. It's not the program you see in this issue, but the example showed me how to handle all of the nested #if, #ifdef, #ifndef, #else, and #elif operators. The p program is an interesting study in how we used to recklessly treat pointers and integers interchangeably. I used to write programs that way. Trying to adapt the p code to ANSI C showed me how much the standard language encourages better coding practices. Eventually, I gave up and just extracted the logic I wanted. Even though I couldn't use the p code itself, the exercise demonstrates the endurance of the early DDJ issues. Don't throw anything away.

Preprocessing

Listing One, page 143, is preproc.c, the Quincy preprocessor. There are other parts, which the preprocessor shares with the interpreter, and I will discuss them in later columns, but preproc.c is the main thread.

Quincy calls the PreProcessor function after the programmer types or loads a source program and tells Quincy to run it. The function accepts two parameters, a pointer to the preprocessed code, and a pointer to the raw source code. When the function returns, the preprocessed code is ready to be translated.

The Quincy source-code model consists of one source-code module in memory, which may have been loaded from disk, and zero or more #include files that are on disk. Because the environment is an interactive interpreter, there is no link process, so there are no other compiled object modules or libraries with which to link. The preprocessor translates the source code of the main and #include files into one source-code stream. Each input source-code file and the preprocessed source-code file must, therefore, fit into a 64K buffer.

After some housekeeping, the PreProcessor function calls the PreProcess function to translate the code. This function is the top level of the preprocessing loop, which calls itself from a lower level when it encounters an #include statement in the code. The function processes source code one line at a time. The program passes through the input buffer by calling the ReadString function, which first determines the length of the next line in the input buffer, allocates a line buffer to hold the line, and copies it into the line buffer.

Throughout the preprocessing, the program uses the ExtractWord function to pull logical words from the input stream. This function accepts a pointer to a buffer to receive the word, the address of the input-stream pointer, and a string of special characters that are allowed in the word. The function copies characters as long as they are alphabetic, numeric, or one of the specified allowed characters. Usually the underscore is the only non-alphanumeric character allowed in C identifiers. Preprocessing tokens themselves allow no special characters. When the program extracts the filename from the #include directive, it allows periods, dollar signs, underscores, and backslashes.

Tests for white space in the source code are done by the isSpace macro in preproc.h, Listing Two, page 146. This test recognizes Quincy's internal notation for tab expansion, which uses the tab and form-feed characters with the most-significant bit set.

Preprocessing Directives

Each preprocessing directive must, by definition, be on its own source-code line. If the program finds a pound sign (#) in the first non-white-space character, the line is assumed to be a preprocessing directive, and the function extracts the directive keyword. To convert the directive into a token, the program calls FindPreProcessor, passing the directive's keyword. This function is in a different place in Quincy--the place where all symbol translations occur. There are functions that translate C-language identifiers and keywords into character tokens. A switch statement tests the directive token and calls a function to process it.

#include

The #include directive tells the program to include another source file. The program maintains a linked list of source-code files that contribute to the running program. This list stays in place while the program is running so the interpreter can identify the location of errors. Quincy recognizes the difference between #include <filename> and #include "filename". If you use angle brackets, Quincy looks for the file in the subdirectory where the Quincy executable is located. Otherwise, it looks in the current subdirectory. The preprocessor makes sure the source program does not include a file more than once. This is to avoid #include loops, such as when file A includes file B, which includes file A.

Each source file being processed has its own context, and the #include logic saves the current context, reads the new file into a fresh buffer, and calls PreProcess to continue the process. When PreProcess returns, the program frees the buffer, restores the context, and returns to continue processing the previous source file. Each context includes a file number and line number. As Quincy emits preprocessed source-code lines, it generates newline tokens, which are just newline characters followed by C comments that specify the current file and line number like: /*1:3*/. This format is valid C-language source code and provides the debugger with file- and line-number information for setting breakpoints and reporting errors.

#define and #undef

Quincy supports the #define directive with recursive argument substitutions. That operation divides into two parts, the logic that records the macro itself and the logic that substitutes arguments for parameters when the source program calls the macro.

The DefineMacro function adds a new macro to a linked list of defined macros, first making sure the macro is not already defined. A macro may or may not have a parameter list. One with no parameters may or may not have an empty parameter list. One with no parentheses at all is meant to be used for simple substitutions. The DefineMacro function breaks the macro into three strings: the macro name, its parameter list, and the macro definition. Then it calls the AddMacro function. This function builds an array of pointers to the parameter identifiers in the macro. Then it converts the matching identifiers in the macro definition into parameter-number tokens. A macro that looks like this in source code: #define min(a,b) (a<b?a:b) looks like this internally: min (#0<#1?#0:#1).

The ResolveMacro function (to be discussed in a later column) substitutes the arguments in the parameter call with the matching argument numbers in the macro definition. If I decide to implement the # and ## operators later, I will probably need to use a different token for the internal parameter numbers.

The #undef directive removes the macro named by its argument from the linked list of #define macros. If no such macro is defined, the program ignores the directive.

Compile-Time Conditionals

The #ifdef and #ifndef directives test to see if the macro specified by the argument is defined. If so, the directives set the Skipping variable accordingly. The #if and #elif directives each test their respective constant arguments, which may involve calls to other macros, for a positive or negative value and set the Skipping variable if the value is true. The Skipping variable tells the preprocessor when to skip source code. Since these #if forms can be nested, they each increment the IfLevel variable and use it to set the Skipping variable. This is the logic I borrowed from the aforementioned p.

The #if and #elif directive functions call MacroExpression, which is a recursive-descent parsing algorithm that evaluates constant expressions. I'll be discussing expression evaluation in a later column. For now, it is enough to know that MacroExpression returns a false value if the argument expression evaluates to 0, or returns a true value otherwise.

The #else and #endif directives manage the Skipping value based on the current IfLevel setting. These variables have the following meaning: If the Skipping variable is greater than 0, the preprocessor ignores all source-code lines except those that have compile-time conditional directives. While the IfLevel variable is greater than zero, the program is within one or more levels of nested #ifs and #elses. Every #if form increments IfLevel and, if Skipping is not set and the argument's value is true, sets Skipping to the IfLevel value.

For the #endif, #else, or #elif directives to be valid, the IfLevel variable must be greater than 0. #endif decrements the IfLevel variable. If the IfLevel variable is greater than 0 at the end of the preprocessing stage, there is an unterminated #if macro form somewhere in the source code.

Code Output

If the first character in the source-code line was not a pound sign, and the program is not skipping source lines because of a compile-time conditional, the function calls the OutputLine function to process a source-code line. Every identifier on a source line is searched against the table of #define macros to see if the identifier is a macro. Every nonidentifier-- operators, constants, literals, and so on--is passed to the preprocessed output. The OutputLine function inserts the file/line-number token comments and strips white space and comments from the input.

Resolving Macros

To convert identifiers, the OutputLine function calls the ResolveMacro function, which translates its result into the string pointed to by its first argument. The result is either the identifier itself when it is not a macro invocation, or the resolution of the macro. Resolving macros is a recursive operation, because macros often call other macros. The ResolveMacro function is a part of the code that evaluates expressions.

Quincy Error Checking

Quincy does some of its error checking during code compilation and some during run time. This reflects its interactive interpreter status. I could go overboard and turn Quincy's dialect of C into a strong run-time type and bounds-checking language, but that would belie Quincy's role as a C interpreter. The original Quincy allowed you to use full expressions to initialize global variables, for example. That was easy to do because everything was interpreted at run time. That does not, however, reflect the way C works, and so, even though it added work to change the behavior, the new Quincy emulates the compiled C program when it interprets the source code.

Error checking stops the compiling or interpretation of the program at the first error and returns part of the IDE to the editor. If the cursor is on the offending source-code line, an appropriate error message displays. If the error is in an #include file, the error message names the file and the line number where the error was found. Since #include files may contain executable code--some of the standard header files do-- these errors, too, can occur during translation or run time.

The programmer sees no difference between compiling and run time. When you tell Quincy to run or step through a program, it runs the preprocessor, the lexical scanner, the translator, and then begins interpreting.

Subsets

Looking at Quincy's subset of C, in both the interpreter and the preprocessor, I find it reflects the ways I use the C language. For example, last month I said that Quincy does not support the typedef operator. It does now. I kept missing it.

A notable exception to that rule is the goto statement. I never use it in a program, but I put support for it into Quincy. The original Quincy did not support goto because of the way the interpreter constructed and destroyed local variables. goto would have been hard to implement. The new interpreter uses different logic for local variables, and goto is relatively easy to accommodate. Rather than force my view of goto on students and other teachers, I decided to include it and let them decide for themselves.

The only reason Quincy does not yet support multidimensional arrays is that the code necessary to parse and process their initializers is hard to fit into the program. Even though most of the existing program is gone, the underlying structure of the interpreter is the same, and I keep running into walls I have to tear down in order to add something. It bothers me that the feature is missing, however, and I intend to put it in.

If you find yourself wanting a particular feature, let me know. Remember Quincy's purpose, though, which is to help students teach themselves C. Whether or not I add a feature depends on how difficult it is and how relevant it is to learning C at the primary level. For instance, I probably won't put #pragmas in.

C is not an easy language to interpret. It has some nutty constructs. There are comma-separated declarators, with and without initializers; initializers that must be constants under some circumstances and may be full expressions under others; auto-increment and decrement operators on either side of a variable identifier; an incestuous relationship between pointers and arrays; and so on. Don't misunderstand me. As a programmer, I like using those features in C and C++. But parsing and interpreting them are something else again. The compiler builders have my respect. Doing a translator by hand makes you appreciate why they came up with tools such as LEX and YACC to make the job easier.

Quincy's Influence on D-Flat

Using D-Flat as the user interface for Quincy was a natural choice. Practically everything I needed was already there, and of course, there was no learning curve. I did, however, find some things about D-Flat I wanted to change as a direct result of using Quincy.

The first area to improve was the editor. For years D-Flat users beat me up for not having an editor that expands and collapses tabs. My answer was always that D-Flat provides a basic edit-box class. If you need more than that, use the window-class derivation technique to build one. Well, finally, I needed one for Quincy, so I built the Editor class specifically for that purpose. You can stop beating me up now.

The second area was the Help system. To begin with, there has been an insidious bug in the Help system for a while. For some reason, it would crash an application upon exit to DOS if you did a lot of navigating around the help database using the hypertext links. I always suspected a heap problem but could never get the program to crash consistently enough to find it. Quincy relies heavily on the Help system in its tutorials. I had to fix that bug. I tore apart all of the hypertext stuff and overhauled it to not use the heap so flamboyantly. The bug seems to have gone away.

Next was the size of the Help database. D-Flat loads the database by reading all of the text and building an internal table of help windows. Quincy's database is going to be big. It was taking a long time on slower machines just to start the program. I modified D-Flat's program that compresses the help file to build the table and add it at the end of the file. Now D-Flat applications load much faster regardless of the help database size.

I never liked the D-Flat File Open and Save As common dialogs. I designed them according to the CUA spec. When I built D-Flat++, I improved the design to look more like those in Windows 3.1. Before I started Quincy, I decided to port the improved design to D-Flat.

The last change was to accommodate the tutorial. Not all Quincy users will need or want it, so I built it as a second Help database. I had to modify D-Flat to allow an application to switch between Help databases.

As a result of these changes, you need D-Flat version 18 or later to build Quincy.

Why Not D-Flat++?

You might wonder why Quincy uses D-Flat rather than D-Flat++. Sometimes I ask myself the same question. First, Quincy is a C program. Converting it to C++ would have added work. In retrospect, I can see that it might have saved some work, too, but that's another story. Second, D-Flat has more features than D-Flat++, most notably the hypertext Help system, which is central to the tutorial. Porting that feature to D-Flat++ would have been a sizeable job. Finally, Quincy is a C interpreter. Something said to me that writing a C interpreter in C++ was backwards, kind of like going to a hog-calling contest in a Lexus. It just didn't sit right.

C Programming Source Code

Quincy, D-Flat, and D-Flat++ are available to download from the DDJ Forum on CompuServe and on the Internet by anonymous ftp. See page 3 for details. If you cannot get to one of the online sources, send a diskette and a stamped, self-addressed mailer to me at Dr. Dobb's Journal, 411 Borel, San Mateo, CA 94402. I'll send you a copy of the source code. It's free, but if you want to support the Careware program, include a dollar for the Brevard County Food Bank. They help hungry and homeless citizens.

[LISTING ONE]



/* -------- preproc.c -------- */
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <dos.h>
#include <sys\stat.h>
#include "qnc.h"
#include "preproc.h"

static MACRO *FirstMacro;
int MacroCount;

/* --- #included source code files --- */
typedef struct SourceFile    {
    unsigned char *fname;
    struct SourceFile *NextFile;
} SRCFILE;
static SRCFILE *FirstFile;
static SRCFILE *LastFile;
static SRCFILE *ThisFile;
static unsigned char FileCount;

static int Skipping;
static int IfLevel;
static unsigned char *Line;
static unsigned char *Word;
static unsigned char *FilePath;
static unsigned char *Ip, *Op;
static unsigned char *IncludeIp;

/* ------ local function prototypes ------- */
static void FreeBuffers(void);
static void PreProcess(void);
static void OutputLine(void);
static void DefineMacro(unsigned char*);
static void Include(unsigned char*);
static void UnDefineMacro(unsigned char*);
static void If(unsigned char *);
static void Elif(unsigned char *);
static void IfDef(unsigned char *);
static void IfnDef(unsigned char *);
static void Else(void);
static void Endif(void);

static void UnDefineAllMacros(void);
static int ReadString(void);
static void WriteChar(unsigned char);
static void WriteWord(unsigned char*);

/* --- preprocess code in SourceCode into pSrc --- */
void PreProcessor(unsigned char *pSrc,unsigned char *SourceCode)
{
    Op = pSrc;
    Ip = SourceCode;
    Ctx.CurrFileno = 0;
    Ctx.CurrLineno = 0;
    IfLevel = 0;
    Skipping = 0;
    Word = getmem(MAXMACROLENGTH);
    FilePath = getmem(128);
    PreProcess();
    if (IfLevel)
        error(IFSERR);
    FreeBuffers();
}
/* --- delete all preprocessor heap usage on error --- */
void CleanUpPreProcessor(void)
{
    FreeBuffers();
    DeleteFileList();
}
/* ---- free heap buffers used by preprocessor ---- */
static void FreeBuffers(void)
{
    UnDefineAllMacros();
    free(IncludeIp);
    free(Line);
    free(FilePath);
    free(Word);
    IncludeIp = NULL;
    FilePath  = NULL;
    Word      = NULL;
    Line      = NULL;
}
/* ---- bypass source code white space ---- */
void bypassWhite(unsigned char **cp)
{
    while (isSpace(**cp))
        (*cp)++;
}
/* ---- extract a word from input --- */
void ExtractWord(unsigned char *wd, unsigned char **cp, unsigned char *allowed)
{
    while (**cp)    {
        if (isalnum(**cp) || strchr(allowed, **cp))
            *wd++ = *((*cp)++);
        else
            break;
    }
    *wd = '\0';
}
/* ---- internal preprocess entry point ---- */
static void PreProcess()
{
    unsigned char *cp;
    while (ReadString() != 0)    {
        if (Line[strlen(Line)-1] != '\n')
            error(LINETOOLONGERR);
        cp = Line;
        bypassWhite(&cp);
        if (*cp != '#')    {
            if (!Skipping)
                OutputLine();
            continue;
        }
        cp++;
        /* --- this line is a preprocessing token --- */
        bypassWhite(&cp);
        ExtractWord(Word, &cp, "");
        switch (FindPreProcessor(Word))    {
            case P_DEFINE:
                if (!Skipping)
                    DefineMacro(cp);
                break;
            case P_ELSE:
                Else();
                break;
            case P_ELIF:
                Elif(cp);
                break;
            case P_ENDIF:
                Endif();
                break;
            case P_IF:
                If(cp);
                break;
            case P_IFDEF:
                IfDef(cp);
                break;
            case P_IFNDEF:
                IfnDef(cp);
                break;
            case P_INCLUDE:
                if (!Skipping)
                    Include(cp);
                break;
            case P_UNDEF:
                if (!Skipping)
                    UnDefineMacro(cp);
                break;
            default:
                error(BADPREPROCERR);
                break;
        }
    }
}
/* ----- find a macro that is already #defined ----- */
MACRO *FindMacro(unsigned char *ident)
{
    MACRO *ThisMacro = FirstMacro;
    while (ThisMacro != NULL)    {
        if (strcmp(ident, ThisMacro->id) == 0)
            return ThisMacro;
        ThisMacro = ThisMacro->NextMacro;
    }
    return NULL;
}
/* ----- compare macro parameter values ---- */
static int parmcmp(char *p, char *t)
{
    char tt[80];
    char *tp = tt;
    while (alphanum(*t))
        *tp++ = *t++;
    *tp = '\0';
    return strcmp(p, tt);
}
/* ---- add a newly #defined macro to the table ---- */
static void AddMacro(unsigned char *ident,unsigned char *plist,
                                                     unsigned char *value)
{
    char *prms[MAXPARMS];

    MACRO *ThisMacro = getmem(sizeof(MACRO));
    ThisMacro->id = getmem(strlen(ident)+1);
    strcpy(ThisMacro->id, ident);
    /* ---- find and count parameters ---- */
    if (plist)    {
        /* ---- there are parameters ---- */
        ThisMacro->isMacro = 1;
        plist++;
        while (*plist != ')')    {
            while (isspace(*plist))
                plist++;
            if (alphanum(*plist))    {
                if (ThisMacro->parms == MAXPARMS)
                    error(DEFINERR);
                prms[ThisMacro->parms++] = plist;
                while (alphanum(*plist))
                    plist++;
            }
            while (isspace(*plist))
                plist++;
            if (*plist == ',')
                plist++;
            else if (*plist != ')')
                error(DEFINERR);
        }
    }
    /* --- build value substituting parameter numbers --- */
    if (value != NULL)    {
        /* ---- there is a value ---- */
        ThisMacro->val =
            getmem(strlen(value)+1+ThisMacro->parms);
        if (ThisMacro->parms)    {
            char *pp = ThisMacro->val;
            while (*value)    {
                if (alphanum(*value))    {
                    int p = 0;
                    ExtractWord(Word, &value, "_");
                    while (p < ThisMacro->parms)    {
                        if (parmcmp(Word, prms[p]) == 0)  {
                            sprintf(pp, "#%d", p);
                            pp += 2;
                            break;
                        }
                        p++;
                    }
                    if (p == ThisMacro->parms)    {
                        strcpy(pp, Word);
                        pp += strlen(Word);
                    }
                }
                else
                    *pp++ = *value++;
            }
            *pp = '\0';
        }
        else
            /* --- no parameters, straight substitution --- */
            strcpy(ThisMacro->val, value);
    }
    ThisMacro->NextMacro = FirstMacro;
    FirstMacro = ThisMacro;
    MacroCount++;
}
/* ----- #define a new macro ----- */
static void DefineMacro(unsigned char *cp)
{
    unsigned char *vp = NULL, *vp1;
    unsigned char *lp = NULL;
    bypassWhite(&cp);
    ExtractWord(Word, &cp, "_");
    if (FindMacro(Word) != NULL)
        error(REDEFPPERR);    /* --- already defined --- */
    /* ---- extract parameter list ---- */
    if (*cp == '(')    {
        lp = cp;
        while (*cp && *cp != ')' && *cp != '\n')
            cp++;
        if (*cp++ != ')')
            error(DEFINERR);
    }
    bypassWhite(&cp);
    /* ---- extract parameter definition ---- */
    if (*cp)
        vp = getmem(strlen(cp)+1);
    vp1 = vp;
    while (*cp && *cp != '\n')    {
        char *cp1 = cp;
        while (*cp && *cp != '\n')
            cp++;
        --cp;
        while (isSpace(*cp))
            --cp;
        cp++;
        strncpy(vp1, cp1, cp-cp1);
        vp1[cp-cp1] = '\0';
        vp1 = vp + strlen(vp)-1;
        if (*vp1 != '\\')
            break;
        ReadString();
        cp = Line;
        bypassWhite(&cp);
        vp = realloc(vp, strlen(vp)+strlen(cp)+1);
        if (vp == NULL)
            error(OMERR);
        vp1 = vp + strlen(vp)-1;
    }
    if (strcmp(Word, vp))
        AddMacro(Word, lp, vp);
    free(vp);
}
/* ----- remove all macros ------ */
static void UnDefineAllMacros(void)
{
    MACRO *ThisMacro = FirstMacro;
    while (ThisMacro != NULL)    {
        MACRO *tm = ThisMacro;
        free(ThisMacro->val);
        free(ThisMacro->id);
        ThisMacro = ThisMacro->NextMacro;
        free(tm);
    }
    FirstMacro = NULL;
    MacroCount = 0;
}
/* ------ #undef a macro ------- */
static void UnDefineMacro(unsigned char *cp)
{
    MACRO *ThisMacro;
    bypassWhite(&cp);
    ExtractWord(Word, &cp, "_");
    if ((ThisMacro = FindMacro(Word)) != NULL)    {
        if (ThisMacro == FirstMacro)
            FirstMacro = ThisMacro->NextMacro;
        else     {
            MACRO *tm = FirstMacro;
            while (tm != NULL)    {
                if (ThisMacro == tm->NextMacro)    {
                    tm->NextMacro = ThisMacro->NextMacro;
                    break;
                }
                tm = tm->NextMacro;
            }
        }
        free(ThisMacro->val);
        free(ThisMacro->id);
        free(ThisMacro);
        --MacroCount;
    }
}
/* ------ #include a source code file ------ */
static void Include(unsigned char *cp)
{
    FILE *fp;
    int LocalInclude;
    int holdcount;
    unsigned char holdfileno;
    SRCFILE *holdfile;
    unsigned char *holdip;
    struct stat sb;

    holdfile = ThisFile;
    *FilePath = '\0';
    bypassWhite(&cp);
    /* ---- test for #include <file> or #include "file" ---- */
    if (*cp == '"')
        LocalInclude = 1;
    else if (*cp == '<')
        LocalInclude = 0;
    else
        error(BADPREPROCERR);
    cp++;
    /* ---- extract the file name ---- */
    ExtractWord(Word, &cp, ".$_\\");
    if (*cp != (LocalInclude ? '"' : '>'))
        error(BADPREPROCERR);
    /* ---- build path to included file ---- */
    if (!LocalInclude)    {
        unsigned char *pp;
        strcpy(FilePath, _argv[0]);
        pp = strrchr(FilePath, '\\');
        if (pp != NULL)
            *(pp+1) = '\0';
    }
    strcat(FilePath, Word);
    /* --- test to see if the file was already included --- */
    ThisFile = FirstFile;
    while (ThisFile != NULL)    {
        if (stricmp(Word, ThisFile->fname) == 0)
            return;
        ThisFile = ThisFile->NextFile;
    }
    /* ---- add to list of included files --- */
    ThisFile = getmem(sizeof(SRCFILE));
    ThisFile->fname = getmem(strlen(Word)+1);
    strcpy(ThisFile->fname, Word);
    if (LastFile != NULL)
        LastFile->NextFile = ThisFile;
    ThisFile->NextFile = NULL;
    LastFile = ThisFile;
    if (FirstFile == NULL)
        FirstFile = ThisFile;
    /* ----- get file size ----- */
    stat(FilePath, &sb);
    /* - save context of file currently being preprocessed - */
    holdip = Ip;
    holdcount = Ctx.CurrLineno;
    holdfileno = Ctx.CurrFileno;
    /* --- file/line numbers for #included file --- */
    Ctx.CurrFileno = ++FileCount;
    Ctx.CurrLineno = 0;
    /* -------- open the #included file ------ */
    if ((fp = fopen(FilePath, "rt")) == NULL)
        error(INCLUDEERR);
    /* ---- allocate a buffer and read it in ---- */
    Ip = IncludeIp = getmem(sb.st_size+1);
    fread(Ip, sb.st_size, 1, fp);
    fclose(fp);
    /* ----- preprocess the #included file ------ */
    PreProcess();
    free(Ip);
    IncludeIp = NULL;
    /* restore context of file previously being preprocessed */
    Ctx.CurrFileno = holdfileno;
    Ctx.CurrLineno = holdcount;
    Ip = holdip;
    ThisFile = holdfile;
}
/* ---- delete files from the file list ---- */
void DeleteFileList(void)
{
    ThisFile = FirstFile;
    while (ThisFile != NULL)    {
        SRCFILE *sf = ThisFile;
        free(ThisFile->fname);
        ThisFile = ThisFile->NextFile;
        free(sf);
    }
    FirstFile = LastFile = NULL;
    FileCount = 0;
}
/* -------- #if preprocessing token -------- */
static void If(unsigned char *cp)
{
    IfLevel++;
    if (!Skipping)    {
        if (MacroExpression(&cp) == 0)
            Skipping = IfLevel;
    }
}
/* -------- #ifdef preprocessing token -------- */
static void IfDef(unsigned char *cp)
{
    IfLevel++;
    if (!Skipping)    {
        bypassWhite(&cp);
        ExtractWord(Word, &cp, "_");
        if (FindMacro(Word) == NULL)
            Skipping = IfLevel;
    }
}
/* -------- #ifndef preprocessing token -------- */
static void IfnDef(unsigned char *cp)
{
    IfLevel++;
    if (!Skipping)    {
        bypassWhite(&cp);
        ExtractWord(Word, &cp, "_");
        if (FindMacro(Word) != NULL)
            Skipping = IfLevel;
    }
}
/* -------- #else preprocessing token -------- */
static void Else()
{
    if (!Skipping && IfLevel == 0)
        error(ELSEERR);
    if (Skipping == IfLevel)
        Skipping = 0;
    else if (Skipping == 0)
        Skipping = IfLevel;
}
/* -------- #elif preprocessing token -------- */
static void Elif(unsigned char *cp)
{
    if (IfLevel == 0)
        error(ELIFERR);
    if (Skipping == IfLevel)
        Skipping = (MacroExpression(&cp) == 0);
}
/* -------- #endif preprocessing token -------- */
static void Endif()
{
    if (!Skipping && IfLevel == 0)
        error(ENDIFERR);
    if (Skipping == IfLevel)
        Skipping = 0;
    --IfLevel;
}
/* ----- write a preprocessed line to output ----- */
static void OutputLine()
{
    unsigned char *cp = Line;
    unsigned char lastcp = 0;
    while (isSpace(*cp))
        cp++;
    if (*cp != '\n')    {
        char eol[20];
        sprintf(eol, "\n/*%d:%d*/", Ctx.CurrFileno, Ctx.CurrLineno);
        WriteWord(eol);
    }
    while (*cp && *cp != '\n')    {
        if (isSpace(*cp))    {
            while (isSpace(*cp))
                cp++;
            if (alphanum(*cp) && alphanum(lastcp))
                WriteChar(' ');
        }
        if (alphanum(*cp))    {
            ResolveMacro(Word, &cp);
            WriteWord(Word);
            lastcp = 'x';
            continue;
        }
        if (*cp == '/' && *(cp+1) == '*')    {
            int inComment = 1;
            cp += 2;
            while (inComment)    {
                while (*cp && *cp != '\n')    {
                    if (*cp == '*' && *(cp+1) == '/')    {
                        cp += 2;
                        inComment = 0;
                        break;
                    }
                    cp++;
                }
                if (inComment)    {
                    lastcp = ' ';
                    if (ReadString() == 0)
                        error(UNTERMCOMMENT);
                    cp = Line;
                }
            }
            continue;
        }
        else if (*cp == '"')    {
            WriteChar(*cp++);
            while (*cp != '"')    {
                if (*cp == '\n' || *cp == '\0')
                    error(UNTERMSTRERR);
                WriteChar(*cp++);
            }
        }
        lastcp = *cp++;
        WriteChar(lastcp);
    }
}
/* ----- write single character to output ---- */
static void WriteChar(unsigned char c)
{
    *Op++ = c;
}
/* ----- write a null-terminated word to output ----- */
static void WriteWord(unsigned char *s)
{
    int lastch = 0;
    while (*s)    {
        if (*s == '"')    {
            /* --- the word has a string literal --- */
            do
                WriteChar(*s++);
            while (*s && *s != '"');
            if (*s)
                WriteChar(*s++);
            continue;
        }
        if (isSpace(*s))    {
            /* --- white space --- */
            while (isSpace(*s))
                s++;
            /* --- insert one if char literal or id id --- */
            if (lastch == '\'' ||
                    (alphanum(lastch) && alphanum(*s)))
                WriteChar(' ');
        }
        lastch = *s;
        WriteChar(*s++);
    }
}
/* ------ read a line from input ---- */
static int ReadString()
{
    unsigned char *lp;
    Ctx.CurrLineno++;
    if (*Ip)    {
        int len;
        /* --- compute the line length --- */
        lp = strchr(Ip, '\n');
        if (lp != NULL)
            len = lp - Ip + 2;
        else
            len = strlen(Ip)+1;
        if (len)    {
            free(Line);
            Line = getmem(len);
            lp = Line;
            while ((*lp++ = *Ip++)  != '\n')
                if (*(lp-1) == '\0')
                    break;
            if (*(lp-1) == '\n')
                *lp = '\0';
            return 1;
        }
    }
    return 0;
}
/* ----- find file name from file number ---- */
char *SrcFileName(int fileno)
{
    ThisFile = FirstFile;
    while (ThisFile != NULL && --fileno)
        ThisFile = ThisFile->NextFile;
    return ThisFile ? ThisFile->fname : NULL;
}


[LISTING TWO]



/* ------- preproc.h -------- */

#ifndef PREPROC_H
#define PREPROC_H

/* ---- #define macro table ---- */
typedef struct MacroTbl    {
    unsigned char *id;      /* macro identification */
    unsigned char *val;     /* macro value          */
    int isMacro;            /* true if () macro     */
    unsigned char parms;    /* number of parameters */
    struct MacroTbl *NextMacro;
} MACRO;

extern int MacroCount;

#define isSpace(c) \
    ( c == ' ' || c == '\t' || c == '\t'+128 || c == '\f'+128)
#define MAXMACROLENGTH 2048

int MacroExpression(unsigned char **);
int ResolveMacro(unsigned char *, unsigned char **);
MACRO *FindMacro(unsigned char*);
void ExtractWord(unsigned char *, unsigned char **, unsigned char *);
#endif


Copyright © 1994, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.