Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Open Source

m1: A Mini Macro Processor


In Macro Processors and Techniques for Portable Software, P.J. Brown says:

When attending computer conferences and the like, I have listened to (and probably delivered) my full share of boring lectures, but there is one class of bore who easily outshines all the others: This is the man who talks in full details about the way his system has been implemented."

I'm going to try to outshine even the bores of Brown's nightmares by not only describing the implementation but also giving the complete code. We'll start with the simple version that supports definition and replacement.

Listing One shows the complete Awk program, which contains two pattern-action pairs.

Listing One

awk '
/^@define[ \t]/ { name = $2
         $1 = $2 = ""; sub(/^[ \t]+/, "")
         symtab[name] = $0
         next
        }
        { for (i in symtab)
                gsub("@" i "@", symtab[i])
         print
        }
' $*

The first pattern recognizes @define lines. Its action stores the name, erases the @define and name fields and the white space around them, then stores the remainder of the input line in the symbol table (implemented as an Awk associative array). Execution then proceeds with the next input line. The null second pattern ensures that the action will be executed on all other input lines. The for loop iterates over all entries in the symbol table, and the gsub globally substitutes replacement values for their names. The print statement writes the transformed input line.

In the next version of the program we will add a simple include facility. The input line @include filename is replaced by the contents of filename. We will restructure the program around a recursive routine to read files and add functions to make it easier to extend.

Listing Two shows the resulting code. If the program is invoked with a single argument, the BEGIN block takes that as the name of the input file; otherwise it processes the standard input. The function dofile processes a file, dodef processes a definition, and dosubs applies the substitutions in the symbol table to its input string. The dodef function uses a complex regular expression in a sub command to remove the first two fields (because setting them to blanks — as in the first version — causes Awk to replace all field separators with a single blank).

Listing Two

awk '
function dofile(fname) {
        while (getline < fname > 0) {
                if (/^@define[ \t]/)
                         dodef()
                else if (/^@include[ \t]/)
                        dofile(dosubs($2))
                else
                        print dosubs($0)
         }
         close(fname)
}

function dodef( name) {
        name = $2
        sub(/^[ \t]*[^ \t]+[ \t]+[^ \t]+[ \t]+/, "")
        symtab[name] = $0     
}                       

function dosubs(s, i) {
        for (i in symtab)
                gsub("@" i "@", symtab[i], s)
        return s
}      

BEGIN {    if (ARGC == 2) dofile(ARGV[1])
           else dofile("/dev/stdin")
      }   
' $*

So far we have assumed that macro definitions expand into unadorned text. But look what happens when the replacement text contains further macro calls, as in:

@define DIR/usr/jlb/macro.paper
@define PROBSECFILE @DIR@/sec2.in.

After these definitions, the string @PROBSECFILE@ should be expanded into /usr/jlb/macro .paper/sec2.in. The previous implementation may or may not handle this correctly (details are left as an exercise for Awkophiles). The implementation of dosubs in Listing Three handles nested macros by repeatedly expanding the string until no more expansions are made.

Listing Three

function dosubs(s,  changes, i) {
        do {
                changes = 0
                for (i in symtab)
                       changes += gsub("@" i "@", symtab[i], s)
        } while (changes)
        return s
}

That version is correct but slow; we can speed it up with a guard to check for the common case of no remaining @ characters:

   ...
changes = 0
if (s ~ /@.*@/)
      for (i in symtab)
          ...

Without the guard, the program takes 5.4 seconds to process one large file; with the guard, the time drops to 2.3 seconds. The faster version of dosubs described in "A Substitution Function" takes just 0.8 seconds on the same file.

A Substitution Function

Several versions of the dosubs function perform macro substitution. The final version of the program (Listing Four) uses an even faster version of the function.

The idea is to process the string from left to right, searching for the first substitution to be made. We then make the substitution and rescan the string starting at the fresh text. We implement this idea by keeping two strings: the text processed so far is in L (for left), and unprocessed text is in R (for right). Here is the pseudocode for dosubs (the final version will be shown in Listing Four).

L = Empty
R = Input String
while R constains an "@" sign do
        let R = A @ B; set L = L A and R = B
        if R contains no "@" then
               L = L "@"
             break
        let R = A @ B; set M = A and R = B
        if M is in Symtab then
                R = SymTab[M] R
        else
                L = L "@" M
                R = "@" R
return L R

Sometimes you want to make a file you can conditionally change. Consider the arduous task of writing a Ph.D. thesis, which can strain even the best professor-student relationship. A friend of mine organized his thesis so that by setting a given flag, he could remove all reference to his thesis advisor. The version he showed his advisor (whom we'll call "Professor Newton" to protect the innocent) was compiled from a file like this:

@define WANTNEWT 1
   ...
@if WANTNEWT
This area was profoundly influenced by the groundbreaking work of Professor Newton.
@fi

For his private amusement, the poor student could recompile the document after setting WANTNEWT to zero. The semantics of the @if statement are that the text up to the next @fi statement is included if the variable is defined and not equal to zero. To implement the statement, we need this function to discard text:

function gobble (fname) {
        while (getline < fname > 0)
              if (/^@fi/)
                   break
}

We then add these lines to the chain of if statements in dofile:

} else if (/^@if[ \t]/) {
       if (!($2 in symtab) ||
                     symtab[$2] == 0)
                   gobble(fname)
}...


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.