Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Open Source

m1: A Mini Macro Processor


The complete m1 program has a couple of additions to this simple conditional. Text may contain nested if statements; gobble is modified to keep a counter of the current if/fi nesting. The @unless statement is the complement of @if — it includes the subsequent text (up to the same @fi delimiter) if the variable is undefined or defined to be zero.

The final version of m1 also supports multiline @defines. If a @define line ends with a backslash (\), the text is continued on the next line (discarding white space before the first text character). To implement long defines, we make the minor change to dodef to continue reading text as long as lines end with a backslash. We must also make a major change to the I/O structure of the entire program because macro expansion can generate lines that need to be read by the dofile function. The new readline function reads a line from the text buffer if it is not empty; otherwise, it reads from the current file. The string s can be pushed back onto the input stream by concatenating it on the front (left) of buffer by the idiom buffer = s buffer.

The complete program is adorned with several other bells and whistles. Here are the most interesting and important:

  • Comments. It is immoral to design a language without comments. Lines that begin with @comment are therefore ignored.
  • Error checking. The final Awk program has a number of if statements that check for weird conditions, which are reported by the error function.
  • Defaults. The @default statement is a @define that takes effect only if the variable was not previously defined; we'll see its use shortly. We could get the same effect with an @unless around a @define, but the @default is used frequently enough to merit its own command.
  • Performance. When dofile reads a line of text unadorned with @ characters, it performs several tests and function calls. The final version adds a new if statement to print the line immediately.

Figure 1 summarizes the m1 language.


@comment Any text '.
@define name value
@default name value Set if name undefined
@include filename
@if varname Include subsequent text if varname!=0
@fi Terminate @if or unless
@unless varnmae Include subsequent text if varname!==0
Anywhere in line @name@
Figure 1: The m1 Language

The m1 program could be extended in many ways. Here are some of the biggest temptations to "feeping creaturism":

  • A long definition with a trail of backslashes might be more graciously expressed by a @longdefine statement terminated by a @longend.
  • An @undefine statement would remove a definition from the symbol table.
  • I've been tempted to add parameters to macros, but so far I have gotten around the problem by using an idiom described in the next section.
  • It would be easy to add stackbased arithmetic and strings to the language through @push and @pop commands that read and write variables.
  • As soon as you try to write interesting macros, you need to have mechanisms for quoting strings (to postpone evaluation) and forcing immediate evaluation.

Listing Four contains the complete implementation of m1 in about 100 lines of Awk, which is significantly shorter than other macro processors.

Listing Four

awk '
function error(s) {
         print "m1 error: " s | "cat 1> &2"; exit 1
} 

function dofile(fname, savefile, savebuffer, newstring) {
         if (fname in activefiles)
                  error("recursively reading file:  " fname)
         activefiles[fname] = 1
         savefile = file; file = fname
         savebuffer = buffer; buffer = ""    
         while (readline()  ! = EOF)  {
                  if (index($0, "@") == 0) {
                           print $0
                  } else if (/^ @define[ \t]/) {
                           dodef()
                  } else if (/^ @default[ \t]/) {
                           if (!($2 in symtab))
                                    dodef()
                  } else if (/^ @include[ \t]/) {
                           if (NF != 2) error("bad include line")
                           dofile(dosubs($2))
                  } else if (/^ @if[ \t]/) {
                           if (NF != 2) error("bad if line")
                           if (!($2 in symtab) || symtab[$2] == 0)
                                    gobble()
                  } else if (/^ @unless[ \t]/) {
                           if (NF != 2) error("bad unless line")
                           if (($2 in symtab) && symtab[$2] != 0)
                                    gobble()
                  } else if (/^ @fi[ \t]?/) { # Could do error checking
                  } else if (/^ @comment[ \t]?/) {
                  } else {
                          newstring = dosubs($0)
                          if ($0 == newstring || index(newstring, "@") == 0)
                                   print newstring
                          else 
                                   buffer = newstring "\n" buffer
                  }
         }
         close(fname)
         delete activefiles[fname]
         file = savefile
         buffer = savebuffer

} 

function readline( i, status) {
         status = ""
         if (buffer != "") {
                  i = index(buffer, "\n")
                  $0 = substr(buffer, 1, i-1)
                  buffer = substr(buffer, i+1)
         } else {
                  if (getline <file <= 0)
                             status = EOF
         }
         return status
}

function gobble( ifdepth) {
         ifdepth = 1
         while (readline()) {
                  if (/^ @(if|unless)[ \t]/)
                           ifdepth++
                  if (/^ @fi[ \t]?/ && --ifdepth <=0)
                          break
         }
}

function dosubs(s, l, r, i, m) {
         if (index(s, "@") == 0)
                  return s
         l = "" # Left of current pos; ready for output
         r = s  # Right of current; unexamined at this time
         while ((i = index(r, "@")) != 0) {
                  l = l substr(r, 1, i-1)
                  r = substr(r, i+1)     # Currently scanning @
                  i = index(r, "@")
                  if (i == 0) {
                           1 = 1 "@"
                            break
            }
            m = substr(r, 1, i-1)
            r = substr(r, i+1)
            if (m in symtab) {
                  r = symtab[m] r
            } else {
                  1 = 1 "@" m
                  r = "@" r
            }
     }
     return l r
}

function dodef(fname, str) {
      name = $2
      sub(/^ [ \t]*[^ \ t]+[ \t]+[^ \ t]+[ \t]+/, "")
      str = $0
      while (str ^\$/) {
               if (readline() == EOF)
                        error("EOF inside definition")
               sub(/^[ \t]+/, "")
               sub(^\$/, "\n" $0, str)
      }
      symtab[name] = str
}
 
BEGIN { EOF = "EOF"
        if (ARGC == 1) dofile("/dev/stdin")
     else if (ARGC == 2) dofile(ARGV[1])
     else error("usage: m1 fname")
      }  
' $*

The program uses several techniques that can be applied in many Awk programs:

  • Symbol tables are easy to implement with Awk's associative arrays.
  • The program makes extensive use of Awk's string-handling facilities: regular expressions, string concatenation, gsub, index, and substr.
  • Awk's file handling makes the dofile procedure straightforward.
  • The readline function and pushback mechanism associated with buffer are of general utility.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.