Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Channels ▼


The Commando Forth Compiler

Inside the Compiler

Because awk automatically breaks input into word tokens, processing the Forth code is simple. The compiler makes two passes. First, it reads all words into a few arrays. My PC has plenty of memory so I've decided life is too short to spend time optimizing memory usage on the PC. Once all the words are complete, the compiler starts pass two. In this pass, it finds the main word (nominally main) and compiles it. During compilation it keeps a list (todo) of all the words that were used that haven't been compiled yet (the compiled word list is in the done array). When it finishes compiling, it picks another word from the todo list and compiles that word.

The output is assembly language destined for the cross assembler. A special pair of words, { and }, allow words to contain assembly language. What's more, words that contain just a single word or assembly language line are expanded inline automatically when used in other words. You can also force this behavior for any word by using the special macro word in a definition (you can also use inline as a synonym for macro). For example:

: x someword ;
: y macro word2 word3 ;
: main x y ;

Is really the same as writing:

: main someword word2 word3 ;

And the generated code will look roughly like:

	call someword
	call word2
	call word3

Without the macro optimization, the generated code would look roughly like this:

	call x
	call y

I say "roughly," because the compiler emits special labels for Forth words. That's because Forth allows any arbitrary string of non-space characters to comprise a word. So #*$#! is not only a comic book expletive, it is also a legal Forth word. But it isn't a legal assembler label. So the compiler makes arbitrary labels for words and also for other places it needs labels (for example, in if statements).


The compiler performs several optimizations. For example, if a word call occurs right before a return, the compiler will emit a single jump in place of the two instructions. In fact, if the compiler gets ready to jump to a word at the end, and that word has not been compiled yet, the compiler will go ahead and emit the word, saving the final jump as well as the call and return. Some of these optimizations are done on the fly during compilation and others are done by the optimization script that manipulates the initial assembly language output.

Of course, I mentioned earlier that words that are themselves single words (or those marked as macros or inline) are expanded inline, which is another optimization. Another optimization is that words that contain only literals are replaced by the literal itself.

The post optimization script also handles the common case where a value is pushed on the stack and then immediately popped off the stack. The script replaces this push/pop combination with a single move. However, it does not look deeper than one level. That is to say for the code sequence:


The optimizer will only move B to X but the A to Y transfer will still use the stack.


The compiler recognizes special comments as directives. These are typically found at the start of a file and control the compiler's operation. For example, the \!org directive sets the start address of the program. You can find all of the directives in the compiler's source, but here are a few of the more common ones:

  • \!include Include another file into this one (note: included last, not at the current point!)
  • \!org Set origin (default 0x200)
  • \!quick Use quick jump and calls (note: some library calls don't respect this)
  • \!main Use this word for main (default _pre_main)
  • \!wordcomment If set, include comment showing forth words in a compiled word
  • \!ignorecase If set, ignore case in words
  • \!nomainloop Halt if main returns when this is set

The Built-In Words

The compiler has several built-in words (these would be "immediate" words in a normal Forth interpreter):

*:- Define a new word (must be first thing on a line, although definition may span lines)
* ;- End word definition (must be last thing on a line)
* ;- constant. Must have a single constant argument, be on a line by itself, and be global (outside any colon definition)

314 constant pi*100
* variable. Creates a word that defines an address. Optional argument saves space after the variable. Must be on a line by itself and global

variable x 32
variable y
* value. Create a variable that is initialized (still use @ and ! with these). Must be on a line by itself and global

33 value rpm
{...}. Copies tokens out to the output (use label if you need a label, one line per {})

: zerodisp { MOV FZERO,FIO_DISP } ;
* label. Inserts a label in the output (see goto)
* goto. Inserts a jump in the output

: forever label gohere goto gohere ;
macro (or inline).Inside a word definition tells the compiler to copy the word "in place" instead of making a standalone word. This is always true for literals and single word words. The compiler recognizes inline as a synonym for macro.

: 4x macro 2* 2* ;
: foo 10 4x .disp ; ( same as 10 2* 2* .disp )
* if, then, endif, else. Standard Forth "if" words (endif is a synonym for then)
* char. Push a character on the stack

: exclaim char '!' emit ;
* recurse. Call current word
* begin, until, again, while, repeat. Standard Forth looping words
* '- Get execution token

: doadd ' + execute ;
* regcall. Used as the first word of a colon definition, causes a register to be allocated and initialized for calling this word. Obviously this does not make sense to use with a macro. Registers start at 63 and work downward. You must not use any of the registers used by the monitor unless you don't use the monitor (say, register 1F and below to be safe).

: foo regcall + + 2/ ;

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.