Channels ▼
RSS

Design

Recursive Descent, Tail Recursion, & the Dreaded Double Divide


Truck is a programmer in Southern California and can be contacted at truck@ttscp.com.


Tail Recursion


It's simple enough, almost textbook. The textbook in this case is Compilers: Principles, Techniques and Tools, by Aho, Sethi, and Ullman (Addison-Wesley, 1986)—the "Dragon Book"—with a smattering of elementary recursion thrown in. It took me a while to figure it out, though. I was working on a formula evaluator for my company's latest product. I've implemented a lot of parsers, so this was supposed to be a fairly easy job. The last time I'd implemented one, I used a YACC-type parser, so this time I decided to write a recursive-descent parser, just for practice.

The formulas were straightforward enough and I soon had a decent parser, except for one problem. Addition, subtraction, and multiplication all passed their unit tests. So did division—until I decided to try double divide expressions. I fed it the expression, "18/6/3" and it gave me the answer "9."

I was expecting "1"; how did it get "9"? At first I thought it was a bug, but it turns out this is the "correct" answer, at least according to the parsing method.

Recursive-descent parsing works by mimicking a language grammar in code. The standard grammar for expressions looks something like this:

e := t
| t '+' e
| t '-' e
t := f
| f '*' t
| f '/' t
| '(' e ')'

That is, an expression e can be parsed as a term t, optionally followed by a plus or minus sign, followed by an expression. A term can be a factor f, optionally followed by a multiplication or division sign followed by a term, or it can be an expression enclosed in parentheses.

A recursive-descent parser takes a string of symbols ("18/6/3") and tries to match it recursively by calling a routine to match the expected symbol. The C++ code to implement the division rule of the grammar:

t := f
| f '/' t

looks like Listing One (leaving out the messy details of tokenizing with a lexor and error processing). The parse for "18/6/3" gets constructed in recursive calls by the Term routine. In the first call, the factor "18" and the operator "/" at the front of the input string are processed and the rest is passed recursively to the Term function for further evaluation. At this point, the parse looks like Figure 1(a). On the second call, "6" and "/" are processed; see Figure 1(b). On the third call, the "3" at the end of the string is processed by the Factor function; see Figure 1(c).

Because the character after the last "3" is not a division, the Term function returns with a result of 3.

It's only at this point that the second pending division can be performed and the Term function returns the result: 6 divided by 3 gives 2; see Figure 1(d).

Now the first pending division can be performed: 18 divided by 2 gives 9.

More simply, the recursive-descent parser works from right to left, where we humans generally work from left to right.

I tried several alternatives to get around this. I even considered defining it as the "correct" behavior—a feature, if you will—but a quick survey of my office indicated that I would have an uphill battle. It was when my boss started talking about left associativity of the division operation that the double divide attained "dreaded" status: I knew I'd have to find a solution.

In a YACC parser, you could simply mark division as left associative and that would take care of it. How do you handle it in a recursive-descent parser?

If I could rewrite the grammar to move the recursion to the front, then the parser would operate left to right, the same way my boss does. The simple way to do this:

t := t '/' f

doesn't work because the parser gets stuck in an infinite recursive loop (a term is a term is a term is a term is a term...) and doesn't stop until it runs out of stack space.

Back to the textbook. The Dragon book talks about eliminating left recursion from grammars. I won't bore you with the details, but if you follow the book's process, you come up with this:

t := f
| f g
g := '/' t

You create an intermediate result g, which takes the recursion off into another rule. For a simple calculator, this isn't much help because the corresponding routine would have to return an expression to be evaluated, not a simple number.

It does suggest another way of looking at it, however. In the Lisp community, where they do everything with recursion, there are special rules for speeding up calculations. What compiler/parser people call "right recursion," Lisp people call "tail recursion" because it occurs at the end (tail) of the calculation. (See the accompanying text box entitled "Tail Recursion" for details.)

Tail recursion can always be replaced by a simple loop. The Lisp people like this because it saves on stack space and recursive function calls; in other words, it's faster and uses less memory.

I like it because it gives me a way of reordering the calculations. The revised grammar rule for division now looks like this:

t := f ( '/' t )*

where I'm extending BNF using regular expression notation (the "*" at the end indicates zero or more occurrences of the expression in parenthesis).

The code for this turns out to be as simple as the original code. The only change is in the Term routine. I replace the if with a while (Listing Two), and the resulting parser gets the right answer.

DoubleDivide.cpp (available electronically; see "Resource Center," page 4) is the full code for the test program. The test program uses parsing objects to make comparing the results of the formula evaluation easier, but the recursive descent is performed by the methods of the parsing object in the same manner just described.

Recall I mentioned the lexor (see DoubleDivide.cpp). In compiler writing, it is common to separate out the job of turning the input stream into tokens. This makes it possible to get the next token from many different places, a necessity for recursive descent. For this article, I've coded a simple lexor; all it does is return either the next char or turn a string of digits into an int.

Following the lexor is the abstract parser class with some utility functions to identify the output. After that, the two implementations of the parser are in the classes RecursiveDescent and TailRecursion.

There is no provision for anything other than the expected token in these routines. This is the way recursive descent works. If the expected token doesn't show up, it's an error and the compiler reports it as such. In a more complete implementation, there would be some logic based on the tokens to determine which BNF rules to apply. For instance, a star would indicate multiplication and a slash division. If I implemented the rule about parenthesized expressions:

t := '(' e ')'

the Term routine would have to look at the next token, and if it was an open parenthesis, it would call the Expression routine. I've left all these details out because it complicated the presentation of tail recursion.

I hate writing repetitive code. My test functions in DoubleDivide.cpp are an example. The DoesParseMatchResult tests the results but also collects the printing for both header and detail in one place, so I can modify the formatting in one place. The actual printing function, Line, is a template function, so it will handle both the headers (char const *) and results (ints). The class is just a convenient way to group the two static functions.

One of the nice things about recursive descent is the way the functional expression closely follows the BNF rules for the language. In the case of the dreaded double divide, making the result meet expectations turned out to have an equally nice functional expression. Even my boss liked the result.

DDJ



Listing One

Lexor lexor( "18/6/3" );
int Factor() {
    int d = lexor.GetNumber();
}
int Term() {
    int t = Factor();
    if( lexor.CurrToken() == '/' ){
        lexor.MoveNext();
        int f = Term();
        t = t / f;
    }
   return t;
}
Back to article


Listing Two
int Term() {
    int t = Factor();
    while( lexor.CurrToken() == '/' ){
        lexor.MoveNext();
        int f = Factor();
        t = t / f;
    }
    return t;
}
Back to article

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video