Stack Machines, Expression Evaluation, and the Magic of Reverse Polish
To teach a friend how compilers work, I've written a compiler for a small subset of Pascal. It lexically analyses and parses a program, code-generates the syntax tree into code for a stack machine not unlike the Ocode virtual machine once used by C's ancestral language BCPL, and interprets the code. I'm going to write a few essays about the compiler and its target machine; in this first one, I'll demonstrate how a stack machine evaluates expressions.
A good place to start is reverse Polish. As many readers will know, this is a notation for expressions whereby all operators and functions follow their arguments. It's important because once we have the reverse Polish form of an expression, we can trivially transcribe it into a program that evaluates the expression.
Here's a very simple example. The reverse Polish form of the expression
11 * 22<br />is
11 22 *<br />
This works with any number of operators. So, for example, the reverse Polish form of the expression
11 * 22 + 33<br />is
11 22 * 33 +<br />
Notice that this is the reverse Polish for "E + 33", which is "E 33 +", with "E" replaced by the reverse Polish for "11 * 22". Reverse Polish has this lovely recursive structure.
What's magic about this is that we can transcribe any reverse-Polish expression into a machine-code program that evaluates the expression. Providing, that is, that our machine has a stack.
So, assume a stack. Let there be a "loadconst" instruction, which pushes a constant onto the stack. And let there be an "add" instruction, which pops the top two elements off the stack, adds them, and pushes the result back. Let "mult" operate similarly. And let us transcribe the reverse Polish above, replacing constants by "loadconst", and renaming operators to the corresponding instruction.
Doing this,
11 22 * 33 +<br />becomes
loadconst 11 loadconst 22 mult loadconst 33 add<br />
I fed the instruction sequence above to my stack-machine interpreter, and this is what it displayed. The stack, which starts off empty, is shown as a list, with the topmost element on the left:
Stack: []<br /><br />About to do OP(loadconst, 11)<br />Stack: [11]<br /><br />About to do OP(loadconst, 22)<br />Stack: [22, 11]<br /><br />About to do OP(mult)<br />Stack: [242]<br /><br />About to do OP(loadconst, 33)<br />Stack: [33, 242]<br /><br />About to do OP(add)<br />Stack: [275]<br />
Here's a bigger example. I'm going to evaluate this expression:
((11 * 22 + 33 * 44) / 121 = 14)<br />and<br />(10 = 20 / 2)<br />
And here is the same expression rewritten in reverse Polish. The line breaks have no significance, they just stop me overflowing the blog's right-hand margin:
11 22 * 33 44 * +<br />121 /<br />14 =<br />10 20 2 /<br />=<br />and<br />
This is the reverse Polish transcribed into machine instructions:
loadconst 11<br />loadconst 22<br />mult<br />loadconst 33<br />loadconst 44<br />mult<br />add<br />loadconst 121<br />div<br />loadconst 14<br />eq<br />loadconst 10<br />loadconst 20<br />loadconst 2<br />div<br />eq<br />logand<br />
I've used a greater variety of instructions here. The "div" instruction divides the element next to the top of the stack by the element on the top, popping them and pushing back the integer quotient. The "eq" instruction compares the top two elements, pops them, and pushes back 1 (signifying TRUE) if they are equal, 0 (signifying FALSE) otherwise. And the "logand" instruction does a logical "and", popping the top two elements and pushing back 1 if they are both 1, otherwise 0.
Now here is my interpreter interpreting those instructions:
Stack: []<br /><br />About to do OP(loadconst, 11)<br />Stack: [11]<br /><br />About to do OP(loadconst, 22)<br />Stack: [22, 11]<br /><br />About to do OP(mult)<br />Stack: [242]<br /><br />About to do OP(loadconst, 33)<br />Stack: [33, 242]<br /><br />About to do OP(loadconst, 44)<br />Stack: [44, 33, 242]<br /><br />About to do OP(mult)<br />Stack: [1452, 242]<br /><br />About to do OP(add)<br />Stack: [1694]<br /><br />About to do OP(loadconst, 121)<br />Stack: [121, 1694]<br /><br />About to do OP(div)<br />Stack: [14]<br /><br />About to do OP(loadconst, 14)<br />Stack: [14, 14]<br /><br />About to do OP(eq)<br />Stack: [1]<br /><br />About to do OP(loadconst, 10)<br />Stack: [10, 1]<br /><br />About to do OP(loadconst, 20)<br />Stack: [20, 10, 1]<br /><br />About to do OP(loadconst, 2)<br />Stack: [2, 20, 10, 1]<br /><br />About to do OP(div)<br />Stack: [10, 10, 1]<br /><br />About to do OP(eq)<br />Stack: [1, 1]<br /><br />About to do OP(logand)<br />Stack: [1]<br />
The two equality comparisons both returned TRUE, that is 1; and the logical "and" combined these, pushing a final 1 onto the stack.
In my next essay on this topic, I'll extend the machine to implement variables, jumps, and input and output. By the way, it has not been unknown for people to code directly in reverse Polish — as The Evolution of RPN & Numeric Entry in Hewlett-Packard's The Museum of HP Calculators tells us.
popx@j-paine.org

