JForth: Implementing Forth in Java



May 21, 2008
URL:http://www.drdobbs.com/embedded-systems/jforth-implementing-forth-in-java/207801675

Craig Lindley is a hardware engineer who, for many years, has been writing large-scale Java applications. Craig has authored more than 30 technical articles on various programming topics and published five books, dealing mostly with multimedia. His Digital Audio With Java is all about processing sound with DSP software written in Java. Craig can be contacted at [email protected] .


After years programming large software systems, I occasionlly feel the need to do some programming on the small scale, on small hardware. Sometimes I just need to dispense with object orientation and structured design and write some code in assembler, C, or (my personal favorite) Forth. Programming close to the metal can be rewarding on many levels, not the least of which is that it lets you your arms around a complete problem and solve it outside of the group think mentality.

If there is a list somewhere of obscure computer programming languages Forth is on it. Although Forth occupies a place on the evolutionary continuum of programming languages that continues to this day, many programmers never got to or never wanted to see what Forth was all about. Programmers who did experiment with Forth typically fell into one two camps -- those that thought it was the best language ever (at least for a certain class of problems), and those who hated it completely. There was rarely any middle ground taken. Discussions of Forth often take on the tone of religious debates akin to discussions of what is the best programming editor.

I was an early convert to Forth after having worked at JPL on inter-planetary spacecraft where Forth was used for command, control, and testing purposes, Rolm Corporation on Forth-based PBX control and test applications, and Sun Microsystems where Forth is used as the boot environment on many Sun workstations. As an interesting aside, Forth is still going strong in the space industry.

Then as now, Forth is an excellent choice of programming language/programming environment/operating system for small computer systems where direct control of the hardware, of input and output (I/O) and/or a small memory footprint are required.

What Is Forth Anyway?

Forth is a stack-based procedural programming language that was invented by Charles H. Moore around 1968. By 1970 Forth was being used to control the space telescopes at Kitt Peak Observatory in Arizona. At that time programming tools for small computer systems were rather crude and computer memory was very expensive. C compilers were not yet widely available, BASIC was still fully interpreted and therefore very slow, and Fortran compilers were too expensive for many. Programming during this time period usually meant assembler language or worse yet machine language. Forth offered a higher level of programming that increased productivity immensely while still offering the assembly language interface if required for an application. And as counter-intuitive as it seems, the larger a Forth program got the more efficient it used memory. In fact, Forth programs could be smaller than equivalent assembler language programs on the same hardware platform.

Forth source code is line-oriented and can be entered interactively via a keyboard or stored on a mass storage device of some kind and loaded in. In the early days, Forth source code was stored in 1K blocks on disk, but today it is stored in normal files. A line of Forth source consists of a series of space character delimited tokens. The Forth interpreter reads in the tokens and either executes them interactively or, if in compile mode, compiles them for later execution.

The Forth word is a language element that can be used interactively or within a program. Source code for a particular word is called its "definition." Primitive Forth words typically have their definitions coded in the CPU's native machine language, whereas non-primitive words have their definitions coded in Forth. A Forth program is run by executing the highest level word which defines the program.

The Forth word ":" (colon) puts the interpreter into compile mode and the word ";" (semicolon) ends compile mode. Immediate Forth words execute even in compile mode, whereas non-immediate Forth words have their addresses compiled into the word being defined.

Many things make Forth unique but two things need to be specifically pointed out.

Let's use the Forth word DUP as an example of argument passing. DUP duplicates the value currently on the top of the data stack. DUP takes its input value from the stack and pushes two copies of that value back onto the stack.

Using RPN for expression evaluation takes getting used to when you're used to normal algebraic or infix evaluation. The algebraic expression 2 + 5 * 3 evaluates as 2 + (5 * 3), whereas the RPN evaluation becomes 2 5 3 * +. Expression conversion to RPN is easy if you keep the following in mind:

It should be noted that Forth leaves all data typing issues to the programmer. Typing is usually handled by having Forth words defined for each data type. For example, the addition operator "+" expects its arguments to be integers and integers only. If floating-point addition is required in an application a special operator like "F+" might be defined. (In the Forth implementation discussed later, I have overloaded operators like "+" to handle all valid data types.)

Traditionally Forth was implemented as a threaded, interpretive language (TIL) though that is not necessarily the case today where Forth can be compiled into machine code for direct execution just like C++. A TIL implementation (as discussed in this article) has to do with how a language's source code is translated into executable form. TILs generally employ a two-phase translation process. In the first phase (compile mode), source code is interpreted into a list of fully resolved address references to underlying functionality. At runtime, a thread of execution is established from the top-level Forth word through the lists of compiled addresses of the words with which it was defined. Forth source code is converted into a fully analyzed internal form (lists of addresses) by the first phase of the translation process so the compiler is no longer needed. In addition, no further interpretation is used at runtime as the thread of execution has been fully established.

Three components -- a dictionary, the inner interpreter, and the outer interpreter -- make up traditional Forth implementations. Typically implemented with a linked list, the dictionary contains an entry for every Forth word (primitives and non-primitives) that can be executed interactively or used in the definition of new Forth words. New, successfully compiled Forth words are added to the dictionary for subsequent lookup.

The outer interpreter provides users with a crude command-line interface to the Forth environment. It typically provides a prompt such as ">" and waits for user input. The outer interpreter gathers up a line of input at a time and submits that line to the inner interpreter where all the action happens. If the line is interpreted successfully, "OK" is output and the process repeats until no more input is available.

Listing One is pseudocode for the inner interpreter I used in my Java implementation. It is fairly typical in its operation.

begin
    prepare tokenizer for parsing line of text
    while tokens exist on line
        get next token
        if not in compile mode
            if token is string literal
                push string onto data stack
                continue
            endif
            search dictionary for word
            if word found in dictionary
                execute it
            else
                try to parse word as number
                if numeric
                    push number onto data stack
                else
                    output error indication - word followed by ?
                    return false
                endif
            endif
        else ( in compile mode )
            if token is string literal
                add string to word being defined
                continue
            endif
            search dictionary for word
            if word found in dictionary
                if word marked as immediate
                    execute word
                else
                    add word to word being defined
                endif
            else ( word not found in dictionary )
                try to parse word as number
                if numeric
                    add number to word being defined
                else
                    output error indication - word followed by ?
                    return false
                endif
            endif
        endif
     end while
     return true
end
Listing One: Pseudocode for the Inner Interpreter

The following examples give you a sense of how Forth works. The ">" prompt and the "OK" are provided by the outer interpreter.

> 1 . 1 OK

Since we are not in compile mode, the first 1 is identified as a numeric literal and is pushed onto the data stack. The "." (called "dot") causes the item at the top of the stack to be popped off and output to the console; hence the second 1.

> 10 HEX . A OK

This example performs number base conversion. Unless changed, Forth uses base 10 for numeric operations. Here we push a decimal 10 value onto the stack, change the numeric base to HEX and then use dot to print the top of the stack value. An "A" is printed as A is 10 in hex.

>  : hello_world  "*** Hello World ***" .  cr cr ; OK

This is an example of Forth compilation. Here we define a new word called hello_world which echoes the string *** Hello World *** to the console followed by a couple of carriage returns. The final OK signifies successful compilation. Later when hello_world is typed on the command line, the string will be output.

> "c:/archive/java/jforth/testcode" load OK

Here a file/pathname string is pushed onto the data stack and the Forth word load pops this string off of the stack and compiles the Forth code in the file. This is how you would compile a complete Forth program. It is interesting to note that both compile and runtime behaviors can be combined in a file of loaded Forth code. Below, the definition of dl1 is compiled and the very next statement executes the new word interactively.

( Do loop with positive 1 loop increment ) 
: dl1 10 0 do i . loop ;
" do loop with positive 1 loop increment: " . dl1 cr

Consult any book on Forth for hundreds of other examples.

Java Implementation

You might first ask why I was motivated to implement Forth in Java and the answer would be to see how it might be done. Before being flamed, I should state the implementation I provide should really be called "Forth-like" as it stems from my recollection of how Forth works rather than a strict adherence to Forth standards. Also it was not done with any practical purpose in mind other than as a feasibility demonstration for fun. Finally, the Java implementation which I call "JForth," is not particularly efficient in terms of speed, memory usage, or anything else, but it served its purpose. The complete source code for JForth (packaged in a jar file which contains all of the javadocs, compiled classes, and source files) is available online.


Running JForth

Running JForth requires a Java 1.5 or newer JRE being available on your target machine. The jar file, Jforth.jar, available from DDJ, contains all of the javadocs, source code, test code and executable code required to run JForth. The code can either be run in a command shell directly out of the jar file using the command:

java -jar com.craigl.jforth.JForth

or the code can be extracted from the jar file and compiled in your preferred programming environment. Either way, once the main JForth class is executed you will be given the prompt > and the program will wait for you to enter Forth expressions. Type the Forth word "bye" to end your Forth session.

The file of test code, "testcode" must be extracted from the jar file to be executed. To compile and run this test code at the JForth prompt type:

>  "path_to_testcode_file/testcode" load <Enter>

In JForth, all words are derived from the abstract class BaseWord which implements the ExecuteIF interface. The BaseWord class maintains the name of the Forth word, a flag indicating whether the word is immediate or not and another flag indicating whether this word is a primitive.

The ExecuteIF interface consists of a single method that all words must implement:

public int execute(OStack dataStack, OStack variableStack);

This method takes references to the data and the variable stacks and returns a non-zero int value on success.

Derived from BaseWord are the PrimitiveWord and NonPrimitiveWord classes. Primitive words are defined in Java code; non-primitive words contain a list of BaseWords (both primitive and non-primitive) that make up the definition. A primitive word is executed by calling its execute method to run the Java code which makes up its definition. The execute method in a non-primitive word traverses its list of BaseWords calling the execute method of each sequentially. Listing Two shows the definition of the Forth multiplication operator as an example of how primitive words are implemented.

...
new PrimitiveWord("*", false, new ExecuteIF() {
	public int execute(OStack dStack, OStack vStack) {
	
	if (dStack.size() < 2)
		return 0;
	
	Object o1 = dStack.pop();
	Object o2 = dStack.pop();

	// Determine if both are of the same type
	if ((o1 instanceof Integer) && (o2 instanceof Integer)) {
		int i1 = ((Integer) o1).intValue();
		int i2 = ((Integer) o2).intValue();
		i2 *= i1;
		dStack.push(new Integer(i2));
		return 1;
	}	else	{
		System.out.println("* - cannot multiply strings");
		return 0;
		}
	}
  ...
Listing Two: Forth Multiply

The Forth dictionary is provided by the WordsList class which encapsulates a typed LinkedList. Helper methods in the class provide required functionality. Both the inner and the outer interpreters are methods in the JForth main class.

Two stacks are used in this implementation -- the data stack and the variable stack. Both use the stock Java Stack class. These stacks only contain objects. Currently the allowed objects on the stack are: Integers, Strings, BaseWords, and various types of control words used for conditional expressions and loop constructs. (Note: Since generics are used in this JForth implementation, Java 1.5 or newer must be used to compile and run the provided code.)

See the sidebar Running JForth to see how to run this code.

Typing "words" at the command line prompt will show you all available JForth words as shown below:

bye key random load array @ +! ! r@ r> >r variable constant forget wordsd words ; : hex decimal binary spaces cr . xor or and abs min max mod / * 2- 2+ 1- 1+ - + false true not 0> 0= 0< > = < depth rot over swap drop dup end begin +loop loop leave j i do else then if execute ' ( 

To see details about the defined words type "wordsd" at the command prompt.

Words:
Name: "bye", Primitive: true, Immediate: false
Name: "key", Primitive: true, Immediate: false
Name: "random", Primitive: true, Immediate: false
Name: "load", Primitive: true, Immediate: false
Name: "array", Primitive: true, Immediate: false

Most of the words implemented in JForth work as they do in Forth. To understand behavior in detail, consult the file JForth.java (available in JForth.jar) where all of the primitive words are defined.

Conclusion

Writing JForth was a worthwhile exercise that was also fun. Just about every programming task we do teaches us something and this was no exception. If this article made some readers want to experiment with Forth, all the better.

Forth not only had a place in the past ,it has a place in the future as well. The following quote from the article "A Forth Apologia" (Programmer's Journal, Volume 6, Number 6, November/December 1988, page 56) sums things up very well. It is truer now then it was then.

Forth development systems are available for nearly every CPU in existence, and Forth is consistently the first third-party high level language to appear on a new CPU architecture, personal computer, or operating system. Forth continues to enjoy wide use in real time control and data acquisition applications, and can be found in the most unexpected places: from the bar-code-reader "wand" carried by your friendly Federal Express driver, to the American Airlines baggage control system, to instrumentation for SpaceLab experiments, to a networked monitoring and control system with some 400 processors and 17,000 sensors at an airport in Saudi Arabia. While I think it likely that Forth will always remain a somewhat obscure language, its speed and interactivity coupled with minimal hardware requirements and ready portability are a unique combination of assets. In its niche, Forth has no real competition, and I expect it will be around for a long time to come.

Here, here to that!

Resources

The following resources may be helpful for learning more about Forth.

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.