Programming Paradigms

Michael presents some general principles for classifying parallel processing algorithms.

December 01, 1988
URL:http://www.drdobbs.com/database/programming-paradigms/184408047

Figure 2

DEC88: STRUCTURED PROGRAMMING

STRUCTURED PROGRAMMING

Cross-Referencing Pascal and Modula-2 programs

Kent Porter

The SUB.PAS utility that I did for the August issue of DDJ drew favorable comments from the esteemed readership. Compliments are always good for my ego (keep 'em coming!), but, more important, they let me know what you want. Because you obviously like utilities, here we go with another one, this one specifically targeted toward programmers.

It's a symbolic cross-reference utility for Pascal and Modula-2 programs. XREF prints an alphabetized listing showing the line numbers where all identifiers occur, an "identifier" being either a symbolic name or a language keyword. Because of the syntactic similarities between Pascal and Modula-2, the same program fits both languages, with only minor nods in one direction or the other. You can use the cross-reference listing to find all occurrences of any identifier, and also to spot variables declared but never referenced. That's its utilitarian side.

From a more theoretical viewpoint, XREF implements some unusual dynamic data structures that have applications in compilers and other text-and token-processing programs. There's probably a definitive name for these data structures buried in Knuth or some such. I call them "Binary B+ Trees" for reasons that will become apparent later.

Because of space limitations the version of XREF published in Listing One, page 126, is written in Turbo Pascal. There's also a version written for JPI's TopSpeed Modula-2, called XREF.MOD. It doesn't appear here, but you can get it by downloading from CompuServe or ordering a diskette from DDJ There are only two significant difference between the Pascal and Modula programs other than the source languages. One is that the Pascal version is not case sensitive by default and the Modula-2 version is. This is consistent with the languages themselves. The other is that the Pascal version assumes a filename extension of .PAS if none is explicitly furnished, but the Modula program makes no assumptions. Again, this is consistent; Pascal source files typically have the .PAS extension, whereas Modula source files might have .DEF or .MOD extensions.

The full command line for the Pascal version is thus

     XREF <Filepath.[Ext]> [/C|/N]

and for the Modula-2 program it's

     XREF <Filepath.Ext> [/C|/N]

The optional switches /C or /N specify case sensitivity and non-case sensitivity, respectively. By coercing the utility into non-case sensitivity, you group identifiers such as myVar and MyVar (equivalent in Pascal but not in Modula), receiving a cross-reference listing for MYVAR on the final output. The /N switch thus results in a slightly shorter cross-reference listing for long programs, especially those written in Pascal.

Listing One shows a corollary utility, NUMBER.PAS. The format of the listing shows exactly what NUMBER.PAS does: inserts a line number before each source line. This program furnishes a listing that makes it easy to use the cross-reference shown in Listing Two, page 126, the output for NUMBER.PAS produced by XREF using the command line

     XREF NUMBER.PAS /C

For example, you can see that the identifier Len (initial cap) occurs in line 23 of NUMBER.PAS, and len (all lowercase) appears in lines 11, 24, and 27. A similar situation exists for identifiers Nbr and nbr. Had I specified /N on the command line --the default in the Pascal version --the cross-reference listing would have grouped occurrences of Len and len, and also of Nbr and nbr. The report heading would also have been different.

Note that the utility includes some language keywords but not others. For example, it shows occurrences of WORD and NOT, but excludes FOR, WHILE, BEGIN, and END. The rationale is to apply a selective common-sense filter that gets rid of nuisance keywords while showing those you might be interested in finding. Note, as an example, that lines 268-282 in Listing Three, page 127 --the XREF.PAS source program--eliminate the keyword TO used in FOR loops, but not DOWNTO. The reason is that decrementings are unusual, and you might want to find all loops that count down by cross-referencing the DOWNTO keyword. If you disagree with my list of nuisance keywords, you can easily modify the XREF source accordingly.

Now let's look at the internals of XREF, which are the parts that make it not only useful but interesting from a programmer's viewpoint.

How It Works

The main body for XREF begins at line 400 in Listing Three. The flow is broken down into several steps: initializing global variables, processing the command line, opening the file, announcing the program, and processing the file. There's nothing surprising in any of these macro steps.

The wrinkle occurs at line 454, where XREF decides what to do according to whether or not the cross-reference is case sensitive. Let's consider the implications. If case is not an issue, as in Pascal, all identifiers are reduced to uppercase and are thus implicitly in alphabetic order. With case sensitivity, however, the ordering of identifiers is complicated by the ASCII collating order. Identifiers that start with an uppercase letter always precede those that begin with a lowercase letter. Thus, Zulu comes before alpha. Makes sense to the computer, but the human mind rebels.

For that reason the program branches at line 454. If case sensitive, it re-alphabetizes the symbol table using its symbols shifted to uppercase. This resorts the table according to absolute alphabetic order, not ASCII collating sequence. XREF then reports the results in an order such as able, Across whisky, WORD, zebra, Zulu. If not case sensitive, XREF simply lists the identifiers in the existing order (for example, ABLE, ACROSS WHISKY, WORD, ZEBRA, ZULU).

Let's delve further inside and see what's going on.

Of Parsing and Tokens and Such

Writing a symbol cross-reference utility is somewhat like writing a compiler. You have to pull the lexical elements keywords and identifiers -out of source text lines, and then organize them in some meaningful way that makes it easy to refer back to them later. A compiler has to pay more attention to detail than XREF does, of course, but there's still much in common between the two. And the same principles apply to writing HyperText data bases and other text-manipulating pre grams.

Plucking text elements from source lines is called parsing, and the resulting elements are often referred to as toe kens. For example, let's say we have the source line

     IF This < That THEN

For our purposes, this line contains four text tokens that the parser must produce: IF, This, That, and THEN. The fifth token is the less-than operator; a compiler would pay attention to it as well, but for a symbolic cross-reference it's ignored.

A parser operates according to rules that define what is and is not a data element of a specific category. In effect, it feels its way through a line of text, deciding what to do next on a character-by character basis. For example, if the parser encounters a space at the start of a line, it feels its way through successive characters until it finds a nonspace. At that point, it switches modes, accumulating characters of a specific class until it finds a character outside that class: say another space. When that happens, it returns the toe ken it has assembled. The parser's caller then has to decide what to do with the token.

Figure 1: A binary tree inherently organizes data in left-to-right order by key value.

Figure 2: A B Tree consists of variable-lenght lists of pointers indicating wither other list modes not end modes

In Pascal and Modula-2, characters of a specific class can be represented by sets. For example, the characters that constitute symbols are made up of the digits, letters, period, underscore, and caret (pointer indicator). Line 34 in Listing Three specifies them as the SymChars set. The Token function tests characters for inclusion in this set at lines 132, 169, and 178. If the character being "felt" is not in the SymChars set, then it must not be part of a valid token. A failure of the test triggers some appropriate action, such as sending the token accumulated to date back to the caller. That's what happens in line 178; it sends the token back to the caller at line 267, which assigns it to the variable Symbol and initiates other processing.

Source programs almost always contain comments and quoted literals. Because these are not symbols, it's necessary to skip over them, lest the cross-reference listing become cluttered with irrelevant information. The token parser does this by watching for the bookend symbols that open and close comments and quoted literals. For example, a Pascal comment can begin with a left curly brace. When the parser finds one, it scans the intervening text until it encounters a right curly brace, indicating the end of the comment. It behaves similarly for a literal enclosed by single or double quotes.

The (* and *) comment brackets used in both Pascal and Modula-2 are a little more complicated, but not much. On finding a left parenthesis, the parser checks to see if the next character is an asterisk; if so, it begins searching for another asterisk followed by a right parenthesis. The scanning operation is performed by the FindEndOfComment procedure, which receives a parameter specifying the end-of-comment character to look for.

Because comments can be nested, the Token procedure tracks the nesting level using the CommentLevel variable. Ordinarily this variable has a value of o to indicate that the parser is outside any comments or quoted literals. The value goes up and down as the nesting level changes; as long as it's nonzero, all text elements are ignored.

Valid symbols and the line numbers where they occur go into an elaborate dynamic data structure.

TheBinaryB+ Tree

Binary trees are two-way branching data structures that lend themselves to fast searches and the inherent ordering of data elements by the key value ( Figure 1, page 89). At any given node, all the nodes conceptually to the left have keys of a lower value, and all nodes to the right have higher-valued keys. B Trees (Figure 2, page 89) are different. Here, each node consists of a variable-length list of pointers, each pointing to a lower-level object that is either an end node (a "leaf") or another list. A familiar example of a B Tree is the DOS directory structure. A B+ Tree is a minor variant, the difference being that the list elements also point to one another, forming a linked list. A Binary B+ Tree combines both.

Here's how it works in XREF. For each new symbol found by the parser, an object of type SymTreeNode (lines 23 - 29 in Listing Three) is inserted into the binary tree indicated by the global Head pointer. The BNode function starting at line 186 finds the insertion point by comparing the new symbol with those' already in the tree, and branching left or right accordingly. The comparison is based on ASCII collating sequence, which separates lowercase identifiers from equivalent uppercase identifiers if the cross-reference is case sensitive. If the new symbol is not already in the tree, the point of insertion occurs when the search reaches a NIL pointer. The NewNode procedure adds the binary tree node and returns a pointer so that its caller can update the parent node's pointer. Thus the to-level binary tree (symbol table) grows as the parser discovers new tokens.

But that's not all. Each symbol tree node contains a pointer called XList, which leads the way to a subsidiary linked list containing the line numbers where the symbol occurs in the source listing. When a new symbol node is being created, the Append procedure starting at line 213 begins the line number list by allocating a node of type XLineNode (line 17), which contains the first line number where the symbol is found. Append notes the address in the owning symbol tree node.

Now let's say that the parser finds the second occurrence of a particular symbol. In this case, BNode finds a match as it searches the tree and returns a pointer to the node for that symbol. Append gets control again, but this time --finding that the line-number list already exists --it searches the singly linked list until it finds the tail, indicated by a NIL pointer in an object of type XLineNode. It allocates a new node of this type, stores the line number, and updates the old tail's Next pointer.

The Binary B+ Tree thus assumes the form shown in Figure 3, page 90, where each symbol node potentially points to its left and right neighbors and also to its line-number list. This structure makes for fast searches and updates, assuming that the tree is fairly well balanced. In most cases it will be, because identifiers occur in alphabetically random order in source text. The structure also offers the advantage of automatically sorting symbols into ASCII order.

Figure 3: A Binary B+ Tree organizes data according to the key value like a binary tree, but also employs linked lists in the manner of B* Trees.

Traversals of a Binary B+ Tree are the same as the recursive techniques employed in connection with binary trees. For example, to do something with nodes in lowest to highest sequence by key value, you can use inorder traversal as follows:

PROCEDURE InOrder (Node:NodePointer);

BEGIN
      IF Node <> NIL THEN BEGIN
           InOrder (Node^.Left);
           (* Do something with this node*)
           InOrder (Node^.Right);
      END;
END;

The first recursive call forces the program to strive left, visiting successive nodes until it reaches a NIL pointer. The stack keeps track of the order in which nodes are visited. Upon hitting a NIL pointer, control backs up to the most recent node and does to it whatever is required. The second recursive call then moves to the right from this node, where the leftward striving begins again. Thus all the nodes are visited in lowest to highest order by key value. Traversal ends when every NIL pointer has been found.

The Report, Alphabetize, and Count procedures in Listing Three are all based on this in-order traversal model. The body of Alphabetize (lines 379 - 386) adds another wrinkle by disposing of the symbol tree node just before leaving it for the last time. Doing something to a node after working on all its left and right children is called postorder traversal.

And what purpose does Alphabetize serve? You'll recall that the symbol tree is ordered by ASCII sequence, which means that Z comes before a. In noncase sensitive mode all symbols are shifted to uppercase, so it doesn't matter. If case sensitivity is required, though, we need to resort the table so that able comes before Acme as the mind expects, and not somewhere toward the end of the printout. That's what Alphabetize does. It structures a new symbol tree based on the alphabetic order of the uppercase equivalents of each symbol. This new tree replaces the old one, which is why the body of the procedure disposes of the old nodes after copying their contents. The work is now bone except for printing the results.

Getting the Cross-Reference Listing

The Report procedure performs an inorder traversal of a Binary B+ Tree: either the original indicated by the global bead pointer, or the reordered one hinted to by Alpha. The IF logic starting at line 454 decides how to handle be situation depending on the Case_Sensitive switch.

Report is chiefly a printer control routine that recursively visits the tree nodes in order, thus producing an alphabetic listing. It prints the symbol and, in parentheses, the number of occurrences. It then follows the singly linked list associated with the node, printing each line number. The Col variable keeps track of the horizontal position and triggers a new line when the current line is full. The global LineNumber variable, no longer needed to' track the source line, is pressed back into service to control pagination of the printed output. The Count procedure runs through the symbol tree one last time, yielding a symbol count that appears at the end of the report.

One final point: If the comment level is not zero at the conclusion of the parsing and symbol table creation, you have either a runaway comment or an unterminated string somewhere in the source. The program reports this problem at line 450; it's equivalent to an "Unexpected end of file" error message from a compiler.

So there it is, a handy utility for Pascal and Modula-2 programmers, as well as an intriguing dynamic data structure that you can apply to all sorts of problems that require the ordering of complex information.

_STRUCTURED PROGRAMMING COLUMN_ by Kent Porter

[LISTING ONE]



   1| Program number;
   2|
   3|   { Puts line numbers at start of each line, stores in file by same
   4|     name except extension is .NUM }
   5|
   6| USES crt;
   7|
   8| VAR   Filename, Newname : STRING [80];
   9|       I, O              : TEXT;
  10|       Line              : STRING [135];
  11|       Nbr, len, p       : WORD;
  12|       Num               : STRING [4];
  13|
  14| BEGIN
  15|   Nbr := 0;
  16|   Newname := '';
  17|   IF ParamCount < 1 THEN BEGIN
  18|     Writeln ('USAGE: NUMBER <Filename.ext>');
  19|     EXIT;
  20|   END;
  21|   Filename := ParamStr (1);
  22|
  23|   Len := pos ('.', Filename);
  24|   IF len = 0 THEN
  25|     Newname := Filename + '.NUM'
  26|   ELSE BEGIN
  27|     FOR p := 1 TO len DO
  28|       Newname := Newname + Filename [p];
  29|     Newname := Newname + 'NUM';
  30|   END;
  31|   Assign (I, Filename);
  32|   {$I-}
  33|   Reset (I);
  34|   {$I+}
  35|   IF IOResult <> 0 THEN BEGIN
  36|     Writeln ('Unable to open ', Filename);
  37|     EXIT;
  38|   END;
  39|   Assign (O, Newname);
  40|   Rewrite (O);
  41|   Writeln;
  42|
  43|   WHILE NOT eof (I) DO BEGIN
  44|     Readln (I, Line);
  45|     INC (Nbr);
  46|     GotoXY (1, WhereY-1); Writeln (Nbr);
  47|     Str (Nbr:4, Num);
  48|     Line := Num + '| ' + Line;
  49|     Writeln (O, Line);
  50|   END;
  51|   Close (O);
  52|   Close (I);
  53|   GotoXY (1, WhereY-1); Writeln (nbr, ' lines in file');
  54|   Writeln ('Output is in ', Newname);
  55| END.
  56|
  57|

[LISTING TWO]



Case-sensitive symbolic cross-reference for number.pas

0 (3), 15, 24, 35
1 (7), 17, 21, 27, 46, 46, 53, 53
135 (1), 10
4 (2), 12, 47
80 (1), 8
Assign (2), 31, 39
Close (2), 51, 52
crt (1), 6
eof (1), 43
EXIT (2), 19, 37
Filename (7), 8, 21, 23, 25, 28, 31, 36
GotoXY (2), 46, 53
I (6), 9, 31, 33, 43, 44, 52
IOResult (1), 35
Len (1), 23
len (3), 11, 24, 27
Line (5), 10, 44, 48, 48, 49
Nbr (5), 11, 15, 45, 46, 47
nbr (1), 53
Newname (9), 8, 16, 25, 28, 28, 29, 29, 39, 54
NOT (1), 43
Num (3), 12, 47, 48
number (1), 1
O (5), 9, 39, 40, 49, 51
p (3), 11, 27, 28
ParamCount (1), 17
ParamStr (1), 21
pos (1), 23
Program (1), 1
Readln (1), 44
Reset (1), 33
Rewrite (1), 40
Str (1), 47
STRING (3), 8, 10, 12
TEXT (1), 9
USES (1), 6
WhereY (2), 46, 53
WORD (1), 11
Writeln (7), 18, 36, 41, 46, 49, 53, 54

-- 40 symbols reported

[LISTING THREE]



   1| PROGRAM Xref;
   2|
   3| { Builds and lists a Pascal/Modula-2 symbol cross-reference report   }
   4| { Uses binary trees and doubly-linked lists to effect B-Tree         }
   5| { Command line is XREF <filename.ext> [/C|/N]                        }
   6| { /C makes xref case-sensitive                                       }
   7| { /N makes it non-case sensitive (default)                           }
   8| { Turbo Pascal 5.0 (4.0 will work, too)                              }
   9| { K. Porter, DDJ, December '88 Structured Programming Column         }
  10|
  11| USES crt, printer;
  12|
  13| TYPE  SymString = STRING [39];
  14|       CharSet = SET OF CHAR;
  15|       LineString = STRING [135];
  16|       XLinePtr = ^XLineNode;      { Pointer to xref line number node }
  17|       XLineNode = RECORD          { Xref line number structure (SLL) }
  18|         Line : WORD;
  19|         Next : XLinePtr;
  20|       END;
  21|
  22|       SymTreePtr = ^SymTreeNode;       { Pointer to symbol tree node }
  23|       SymTreeNode = RECORD                 { Binary tree symbol node }
  24|         Symbol       : SymString;
  25|         UCsymbol     : SymString;
  26|         Count        : WORD;
  27|         XList        : XLinePtr;
  28|         LLink, RLink : SymTreePtr;
  29|       END;
  30|
  31| CONST Quote =  #39;
  32|       DQuote = #34;
  33|       Eject =  #12;
  34|       SymChars : CharSet = ['0'..'9','A'..'Z','a'..'z','.','_','^'];
  35|       PComment : CharSet = ['{', '}', '(', '*', ')', Quote, DQuote];
  36|       Heading  = ' symbolic cross-reference for ';
  37|
  38| VAR   Filepath       : STRING [80];
  39|       Case_Sensitive : BOOLEAN;
  40|       F              : TEXT;
  41|       Head, Alpha    : SymTreePtr;
  42|       CommentLevel   : WORD;
  43|       Line           : LineString;
  44|       LineNumber     : WORD;
  45|       NSymbols       : WORD;
  46| { ------------------------------------------------------------------ }
  47|
  48| PROCEDURE FindEndOfComment (VAR line : LineString;
  49|                             VAR i    : WORD;
  50|                                 eoc  : CHAR);
  51|   { Scan until end of current comment is found }
  52|
  53| VAR   ch        : CHAR;
  54|       Searching : BOOLEAN;
  55|
  56| BEGIN
  57|   Searching := TRUE;
  58|   WHILE Searching DO BEGIN
  59|     WHILE i <= Length (Line) DO BEGIN
  60|       ch := Line [i];
  61|       INC (i);
  62|       IF ch = eoc THEN
  63|         CASE eoc OF
  64|           '}':    Searching := FALSE;
  65|           '*':    IF line [i] = ')' THEN BEGIN
  66|                     Searching := FALSE;
  67|                     INC (i);
  68|                   END;
  69|           Quote:  Searching := FALSE;
  70|           DQuote: Searching := FALSE;
  71|         END;
  72|
  73|       IF Searching = FALSE THEN BEGIN
  74|         DEC (CommentLevel);
  75|         EXIT;
  76|       END;
  77|     END;
  78|
  79|     { If we get here, the comment continues on the next line }
  80|     Readln (F, Line);
  81|     i := 1;
  82|     INC (LineNumber);
  83|   END;
  84| END;
  85| { --------------------------- }
  86|
  87| FUNCTION UpShift (VAR Symbol : SymString) : SymString;
  88|   { Return upshifted version of passed string }
  89|
  90| VAR   p : INTEGER;
  91|       s : SymString;
  92|
  93| BEGIN
  94|   s := '';
  95|   FOR p := 1 TO Length (Symbol) DO
  96|     s := s + UpCase (Symbol [p]);
  97|   UpShift := s;
  98| END;
  99| { --------------------------- }
 100|
 101| FUNCTION NewNode (VAR Symbol : SymString) : SymTreePtr;
 102|   { Allocate and set up new symbol node, return pointer }
 103|
 104| VAR   node : SymTreePtr;
 105|
 106| BEGIN
 107|   NEW (node);
 108|   Node^.Symbol   := Symbol;
 109|   Node^.UCSymbol := UpShift (Symbol);
 110|   Node^.Count    := 1;
 111|   Node^.XList    := NIL;
 112|   Node^.RLink    := NIL;
 113|   Node^.LLink    := NIL;
 114|   Node^.RLink    := NIL;
 115|   NewNode := node;
 116| END;
 117| { --------------------------- }
 118|
 119| FUNCTION Token (VAR line : LineString;
 120|                 VAR i : WORD) : SymString;
 121|   { Return next symbol or keyword from line }
 122|   { Index to next char returned as a side-effect }
 123|   { Also checks for comments }
 124|
 125| VAR   sym          : SymString;
 126|       ch, ScanChar : CHAR;
 127|       nch          : WORD;
 128|
 129| BEGIN
 130|   { Scan for first valid alphanumeric or for comment }
 131|   ScanChar := #0;
 132|   WHILE (NOT (Line [i] IN SymChars)) AND (i <= Length (line)) DO BEGIN
 133|     ch := line [i];
 134|     INC (i);
 135|     IF ch IN PComment THEN BEGIN
 136|       CASE ch OF
 137|         Quote: BEGIN
 138|                  INC (CommentLevel);
 139|                  ScanChar := Quote;
 140|                END;
 141|         '{':   BEGIN
 142|                  INC (CommentLevel);
 143|                  ScanChar := '}';
 144|                END;
 145|         '}':   IF CommentLevel > 0 THEN
 146|                  DEC (CommentLevel);
 147|         '(':   IF line [i] = '*' THEN BEGIN
 148|                  INC (CommentLevel);
 149|                  ScanChar := '*';
 150|                  INC (i);
 151|                END;
 152|         '*':   IF line [i] = ')' THEN
 153|                  IF CommentLevel > 0 THEN BEGIN
 154|                    DEC (CommentLevel);
 155|                    INC (i);
 156|                  END;
 157|       END;
 158|       IF CommentLevel > 0 THEN
 159|         FindEndOfComment (line, i, ScanChar);
 160|     END;
 161|   END;
 162|
 163|   { Pull out the symbol }
 164|   sym := '';
 165|   nch := 1;
 166|   IF i < Length (Line) THEN
 167|     REPEAT
 168|       ch := Line [i];
 169|       IF ch IN SymChars THEN BEGIN
 170|         IF (ch = '^') AND (nch = 1) THEN
 171|           { Skip leading pointer char }
 172|         ELSE BEGIN
 173|           sym := sym + ch;
 174|           INC (nch);
 175|         END;
 176|         INC (i);
 177|       END;
 178|     UNTIL (NOT (ch IN SymChars)) OR (i > Length (Line));
 179|   IF NOT Case_Sensitive THEN
 180|     Token := UpShift (sym)
 181|   ELSE
 182|     Token := sym;
 183| END;
 184| { --------------------------- }
 185|
 186| FUNCTION BNode (VAR sym : SymString) : SymTreePtr;
 187|   { Find sym's node in binary tree, or add it if it doesn't exist }
 188|
 189| VAR   Node, Parent : SymTreePtr;
 190|
 191| BEGIN
 192|   Node := Head;
 193|   WHILE ((Node^.Symbol <> sym) AND (Node <> NIL)) DO BEGIN
 194|     Parent := Node;
 195|     IF sym < Node^.Symbol THEN
 196|       Node := Node^.LLink
 197|     ELSE
 198|       Node := Node^.RLink
 199|   END;
 200|   IF Node <> NIL THEN                  { Node exists for this symbol }
 201|     INC (Node^.Count)
 202|   ELSE BEGIN                      { Else add new node to binary tree }
 203|     Node := NewNode (sym);
 204|     IF sym < Parent^.Symbol THEN           { Update parent's pointer }
 205|       Parent^.LLink := Node
 206|     ELSE
 207|       Parent^.RLink := Node
 208|   END;
 209|   BNode := Node;
 210| END;
 211| { --------------------------- }
 212|
 213| PROCEDURE Append (Target : SymTreePtr; LineNbr : WORD);
 214|   { Add line cross-ref to target's dependent list }
 215|
 216| VAR    XR, Parent : XLinePtr;
 217|
 218| BEGIN
 219|   IF Target^.XList = NIL THEN BEGIN     { First occurrence of symbol }
 220|       NEW (XR);
 221|       XR^.Line := LineNbr;
 222|       XR^.Next := NIL;
 223|       Target^.XList := XR;
 224|     END
 225|   ELSE BEGIN                        { Append to end of existing list }
 226|       XR := Target^.Xlist;
 227|       REPEAT
 228|         Parent := XR;
 229|         XR := XR^.Next
 230|       UNTIL XR = NIL;                             { Find list's tail }
 231|       NEW (XR);                                       { Append there }
 232|       XR^.Line := LineNbr;
 233|       XR^.Next := NIL;
 234|       Parent^.Next := XR;
 235|   END;
 236| END;
 237| { --------------------------- }
 238|
 239| PROCEDURE AddToTree (VAR Symbol : SymString; LineNbr : WORD);
 240|   { Place symbol into binary tree, add line xref to dependent list }
 241|
 242| VAR   Target : SymTreePtr;
 243|
 244| BEGIN
 245|   IF Head = NIL THEN BEGIN          { The tree is empty, so start it }
 246|       Head := NewNode (Symbol);            { Build first binary node }
 247|       Append (Head, LineNbr);                { Build first XREF node }
 248|     END
 249|   ELSE BEGIN
 250|       Target := BNode (Symbol);
 251|       Append (Target, LineNbr);
 252|   END;
 253| END;
 254| { --------------------------- }
 255|
 256| PROCEDURE Process (VAR Line : LineString);
 257|   { Controls parsing and construction of BTree }
 258|
 259| VAR   Symbol  : SymString;
 260|       p, oldp : WORD;
 261|
 262| BEGIN
 263|   p := 1;
 264|   IF Length (Line) > 0 THEN
 265|     WHILE p <= Length (Line) DO BEGIN
 266|       oldp := p;
 267|       Symbol := Token (line, p);                        { Get symbol }
 268|       IF Symbol = 'BEGIN' THEN Symbol := ''     { Weed out nuisances }
 269|       ELSE IF Symbol = 'END'     THEN Symbol := ''
 270|       ELSE IF Symbol = 'IF'      THEN Symbol := ''
 271|       ELSE IF Symbol = 'THEN'    THEN Symbol := ''
 272|       ELSE IF Symbol = 'ELSE'    THEN Symbol := ''
 273|       ELSE IF Symbol = 'DO'      THEN Symbol := ''
 274|       ELSE IF Symbol = 'WHILE'   THEN Symbol := ''
 275|       ELSE IF Symbol = 'FOR'     THEN Symbol := ''
 276|       ELSE IF Symbol = 'TO'      THEN Symbol := ''
 277|       ELSE IF Symbol = 'VAR'     THEN Symbol := ''
 278|       ELSE IF Symbol = 'INC'     THEN Symbol := ''
 279|       ELSE IF Symbol = 'DEC'     THEN Symbol := ''
 280|       ELSE IF Symbol = 'OF'      THEN Symbol := ''
 281|       ELSE IF Symbol = 'PROGRAM' THEN Symbol := ''
 282|       ELSE IF Symbol = 'END.'    THEN Symbol := '';
 283|       IF Length (Symbol) > 0 THEN
 284|         AddToTree (Symbol, LineNumber);        { Place info in BTree }
 285|       IF p = oldp THEN INC (p);              { Prevents endless loop }
 286|     END;
 287| END;
 288| { --------------------------- }
 289|
 290| PROCEDURE Report (Node : SymTreePtr);
 291|   { Print symbol cross-reference listing }
 292|   { In-order (recursive) traversal of binary tree, printing the info
 293|       and dependent list for each node }
 294|
 295| VAR   Col, Width : WORD;
 296|       Lnode      : XLinePtr;
 297|
 298|   PROCEDURE NewLine;
 299|     { Control pagination }
 300|   BEGIN
 301|     Writeln (LST);
 302|     Col := 0;
 303|     INC (LineNumber);
 304|     IF LineNumber > 58 THEN BEGIN
 305|       Write (LST, Eject);
 306|       Writeln (LST, 'Continuing cross-reference for ', Filepath);
 307|       Writeln (LST);
 308|       LineNumber := 2;
 309|     END;
 310|   END;                                        { End nested procedure }
 311|
 312| BEGIN
 313|   IF node <> NIL THEN BEGIN
 314|     Report (Node^.LLink);                    { Follow left-hand path }
 315|
 316|     { Print info from node }
 317|     Col := 0;
 318|     Write (LST, Node^.Symbol, ' (', Node^.Count, ')');
 319|     Col := Col + Length (Node^.Symbol) + 6;
 320|
 321|     { Print line number references }
 322|     Lnode := Node^.XList;
 323|     While Lnode <> NIL DO BEGIN
 324|       IF Col > 0 THEN
 325|         Write (LST, ', ', Lnode^.Line)
 326|       ELSE
 327|         Write (LST, '  ', Lnode^.Line);
 328|       IF Lnode^.Line < 10 THEN Width := 1
 329|         ELSE IF Lnode^.Line > 99 THEN Width := 3
 330|           ELSE Width := 2;
 331|       Col := Col + Width + 2;
 332|       IF (Col > 70) AND (Lnode^.Next <> NIL) THEN NewLine;
 333|       Lnode := Lnode^.Next;
 334|     END;
 335|     NewLine;
 336|
 337|     Report (Node^.RLink);              { Then follow right-hand path }
 338|   END;
 339| END;
 340| { --------------------------- }
 341|
 342| PROCEDURE Alphabetize (sym : SymTreePtr);
 343|   { Rearrange tree when case-sensitive so that upper- and lower-case
 344|     identifiers occur in alphabetic order regardless of case }
 345|   { RECURSIVE: Traverses symbol table in-order, builds alpha list }
 346|
 347|   PROCEDURE Resort (sym : SymTreePtr);
 348|     { NESTED: Place new node in tree headed by Alpha pointer }
 349|
 350|   VAR   Node, Parent : SymTreePtr;
 351|         UCsymbol     : SymString;
 352|
 353|   BEGIN
 354|     IF Alpha = NIL THEN BEGIN       { Make first node in sorted tree }
 355|         Alpha := NewNode (sym^.symbol);
 356|         Alpha^.count    := sym^.count;
 357|         Alpha^.XList    := sym^.XList;
 358|       END
 359|     ELSE BEGIN                               { Add new node in order }
 360|         UCsymbol := UpShift (sym^.symbol);
 361|         Node := Alpha;
 362|         WHILE node <> NIL DO BEGIN            { Find insertion point }
 363|           Parent := node;
 364|           IF UCsymbol < Node^.UCsymbol THEN    { based on U/C symbol }
 365|             Node := Parent^.LLink
 366|           ELSE
 367|             Node := Parent^.RLink;
 368|         END;
 369|         Node := NewNode (sym^.symbol);                    { Add node }
 370|         Node^.Count    := sym^.count;
 371|         Node^.XList    := sym^.XList;
 372|         IF UCsymbol < Parent^.UCsymbol THEN
 373|           Parent^.LLink := node
 374|         ELSE
 375|           Parent^.RLink := node;
 376|       END;
 377|   END;
 378|
 379| BEGIN  { Body of Alphabetize }
 380|   IF sym <> NIL THEN BEGIN
 381|     Alphabetize (sym^.LLink);                     { Do nodes to left }
 382|     Resort (sym);                          { Realphabetize this node }
 383|     Alphabetize (sym^.RLink);                { Now do nodes to right }
 384|     Dispose (sym);                         { All thru with this node }
 385|   END;
 386| END;
 387| { --------------------------- }
 388|
 389| PROCEDURE Count (Node : SymTreePtr);
 390|   { Count nodes in tree }
 391| BEGIN
 392|   IF node <> NIL THEN BEGIN
 393|     Count (Node^.LLink);
 394|     INC (NSymbols);
 395|     Count (Node^.RLink);
 396|   END
 397| END;
 398| { --------------------------- }
 399|
 400| BEGIN
 401|   { Initialize }
 402|   Head := NIL;
 403|   Alpha := NIL;
 404|   CommentLevel := 0;
 405|   LineNumber := 0;
 406|   NSymbols := 0;
 407|
 408|   { Process command line }
 409|   IF ParamCount < 1 THEN BEGIN
 410|     Writeln ('USAGE: XREF <Filename.[ext]> [/C|/N]');
 411|     EXIT;
 412|   END;
 413|   Filepath := ParamStr (1);
 414|   IF pos ('.', Filepath) = 0 THEN
 415|     Filepath := Filepath + '.PAS';          { Default is Pascal file }
 416|   Case_Sensitive := FALSE;            { Set default case sensitivity }
 417|   IF ParamCount = 2 THEN                 { or alter per command line }
 418|     IF (ParamStr (2) = '/c') OR (ParamStr (2) = '/C') THEN
 419|       Case_Sensitive := TRUE;
 420|
 421|   { Open the file }
 422|   Assign (F, Filepath);
 423|   {$I-}
 424|   Reset (F);
 425|   {$I+}
 426|   IF IOResult <> 0 THEN BEGIN
 427|     Writeln ('Unable to open ', Filepath);
 428|     EXIT;
 429|   END;
 430|
 431|   { Announce the program }
 432|   ClrScr;
 433|   IF Case_Sensitive THEN
 434|     Write ('Case-sensitive')
 435|   ELSE
 436|     Write ('Non-case sensitive');
 437|   Writeln (Heading, Filepath);
 438|   Writeln;
 439|
 440|   { Process the file }
 441|   WHILE NOT eof (F) DO BEGIN
 442|     Readln (F, line);
 443|     INC (LineNumber);
 444|     GotoXY (1, WhereY-1); Writeln (LineNumber);  { Meter line number }
 445|     Process (Line);
 446|   END;
 447|   Close (F);
 448|   GotoXY (1, WhereY-1); Writeln (LineNumber, ' lines in file');
 449|   IF CommentLevel <> 0 THEN
 450|     Writeln ('Unbalanced comments detected');
 451|
 452|   { Alphabetize tree into non-ASCII order if case-sensitive }
 453|   LineNumber := 3;
 454|   IF Case_Sensitive THEN BEGIN
 455|       Alphabetize (Head);
 456|       Writeln (LST, 'Case-sensitive', Heading, Filepath);
 457|       Writeln (LST);
 458|       Report (Alpha);
 459|       Count (Alpha);
 460|     END
 461|   ELSE BEGIN
 462|       Writeln (LST, 'Non-case sensitive', Heading, Filepath);
 463|       Writeln (LST);
 464|       Report (Head);
 465|       Count (Head);
 466|     END;
 467|   Writeln (LST);
 468|   Writeln (LST, '-- ', NSymbols, ' symbols reported');
 469|   Write (LST, Eject);
 470| END.