Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Channels ▼

Web Development

The Return of Tiny Basic

Source Code Accompanies This Article. Download It Now.

January, 2006: The Return of Tiny Basic

Tom is a consultant and can be contacted at [email protected]

In January 1976, Dr. Dobb's Journal of Tiny Basic Calisthenics and Orthodontia, Running Light without Overbyte was launched on the popularity of the article "Build Your Own Basic," originally written anonymously by Dennis Allison. He was the "D" of "Dobbs" in the new magazine name. The "B" was Bob Albrecht, who ran a store-front, walk-in, computer time-share service called the "People's Computer Company" and who published a newsprint tabloid with the same moniker (PCC). The BYOB article first appeared in PCC after people began to realize that this $400 computer kit they bought was too hard to program in machine language, and Bill Gates's Altair Basic was considered too expensive at $150.

I was not the first implementor. Dick Whipple and John Arnold were there ahead of me. But while there were a number of Tiny Basic projects well underway before I started, mine was distinctive in several ways. For instance, the others all ran on Intel CPUs (mostly 8080), I only did other chips. They were all free, but I charged $5 up front. Many people fondly remember using my Tiny Basic on their 8080 or Z80 when it was most likely actually Lee Chen Wang's Palo Alto Tiny Basic that they were using.

Another significant difference is that my Tiny Basic was the only one to use Allison's original IL code (available electronically; see "Resource Center," page 4) pretty much as defined. There were several bugs I had to correct, and I added some opcodes to support a couple of Tiny Basic functions, but the BYOB interpreter was written in a special-purpose pseudocode, which Dennis recommended interpreting. I knew about interpreters, having done several of them over the years. Everybody else hand-translated the pseudocode to assembly language. So, as far as I know, my Tiny Basic was the only one with a two-level interpreter.

This technology is still interesting and useful today. In this article, I examine what's involved in recreating my original Tiny Basic in today's universal assembly language—ANSI Standard C.

The first issue of DDJ (which is available electronically as a PDF e-zine; see "Resource Center," page 4), reproduced the original concept article and three follow-up articles from the PCC, showing the progression in Allison's thinking. It's a valuable study in the formulation of a sophisticated software solution to an understandable problem. I don't have space here to recreate all the thinking he alludes to, but I do touch on some of the technical issues and trade-offs of implementing pseudocode interpreters.

Virtual Programming Languages

My introduction to virtual machines (VM) came as an off-hand remark at my first full-time job. Drexel Heater pointed out that regularly using my own library of Fortran subroutines amounted to designing my own programming language. Although never articulated so clearly, this idea pervades the programming community even today, particularly in C++ with the Standard Template Library, which by operator overloading actually changes the meaning of the Standard C operators. This concept gives you ultimate control over software design. Where Alan Kay predicts the future by inventing it, we predict software behavior by inventing the computer it runs on.

Every programming language is the machine language of some abstract computer whose machine operations are exactly that language's primitives. If you remove an operator from the language by not using it, you have effectively changed (limited) the computer. But let's increase, not reduce, the power of the machine. We do this by making the operations do more. That was Drexel's point about the subroutines.

C is more powerful than binary absolute, not because you can do more (you can't), but because the same programmer effort accomplishes more. A simple assignment—four keystrokes—hides a great deal of work the compiler does to allocate the variable: It chooses a representation of the value as well as the hardware instructions to copy those bits into that memory location. All this work is necessary, but not part of the problem the programmer is solving. Correspondingly, those four keystrokes are necessary, but not part of the specified program requirements. The most powerful programming language is where you state the requirements in a task-specific language, then the computer does it. That is the programming language you should write in.

Dennis Allison's (and my) Tiny Basic gets close to a direct implementation of a task-specific programming language.

Grammar as Programming Language

Allison encouraged his collaborators to read a compiler book. At the risk of appearing self serving, I repeat the advice. The Art of Compiler Design: Theory and Practice, which I cowrote with James Peters (Prentice Hall, 1991; ISBN 0130481904), emphasizes the context-free grammar as the primary design tool for the entire grammar. (You can download a free copy of my self-compiling compiler-compiler, written entirely in a grammar, at http://www.IttyBittyComputers.com/IttyBitty/TAGC/ TAGinfo.html). I learned this emphasis from Allison's Tiny Basic. The important point is that the grammar is a compact notation for expressing a program as a sequence of statements, and each statement is (in the case of Tiny Basic) a keyword followed by other specified parts, and so on, as in Figure 1, the original Tiny Basic grammar. The grammar is itself written in a precise formal language. The grammar is a program; all that remains is to compile (translate) the grammar-program into machine language and run it.

Well, not quite. The grammar only specifies syntax. You also need to specify semantics, what to do when the grammatical (correct) program is accepted and run. I'll expand on the grammar somewhat, by means of English pseudocode: Tiny Basic reads lines. Each line is a statement, possibly with a line number in front (first grammar rule). If it has a line number, insert the numbered statement into the stored program; otherwise just do whatever the statement is (semantics). Repeat indefinitely.

A statement is one of the following 11 things: The keyword PRINT followed by a list of expressions to print, which are evaluated one by one, printed, then spaced over to the next "zone"; or else it is the keyword IF followed by an expression, which is evaluated, then a relational operator and another expression that is also evaluated and compared to the first expression value, followed by the keyword THEN and then finally another statement, which is executed if the comparison is True; or else a statement is the keyword GOTO followed by a line number expression, which is evaluated and becomes the next line to execute; and so on.

Similarly, an expression is a term, possibly preceded by a sign, then followed by any number of additional terms each preceded by + or -. Evaluate each term, then add or subtract each successive term. The result is the expression value.

A term is a factor, followed by any number of additional factors, each preceded by a * or /. You get the idea.

Now you have specified the whole interpreter. That is the program, more complete than the grammar alone because it also says how to do the computation. But still, even less than the grammar, it is not understandable by 1975 microcomputers. Even now, 30 years later, only people understand English—and not all that well. So let's formalize this a little and make it more like a conventional programming language.

For "Tiny Basic reads lines" I use the programming command GETLINE to read each line. Still buried in there are all the mechanics of accepting characters into a line buffer, stopping when users press the Enter key, and preventing buffer overrun, details that do not concern us at this level.

For "possibly with a line number in front," I use the command/test TSTL and the label of that part of the program to handle the line if there is no number. The English explanation: To "insert the numbered statement into the stored program," I spell INSERT. The English word "insert" could mean any number of things in other contexts, none of them relevant to interpreting a Tiny Basic program. Thus, this single word is sufficient to explain the whole intent of this operation. Notice also that the grammar says nothing about insertion; it is semantics—what to do with this line now that you know it has a line number.

For "Repeat indefinitely" (not in Allison's grammar, although it should be there as a star), you can write JMP with the name of the line where we started. That starts to look like a real programming language. That's exactly what it is: the assembly language for a VM that interprets Tiny Basic. Table 1 presents a partial correlation between grammar, English pseudocode, and IL, both Allison's original in DDJ #1 and my own subsequent TBIL. Listing One shows the resulting IL from DDJ #1, with some of the errors corrected. This program in IL will execute a Tiny Basic statement. The operators TST, TSTV, TSTN, and PRS all use a cursor to find characters of the Tiny Basic line. Other operations (NXT, XPER) move the cursor so it points to a Tiny Basic line. I corrected a few obvious errors. TBILasm is an assembler written in Tiny Basic. TBILasm and my extensions showing the hexadecimal byte codes are available as part of the Tiny Basic Experimenter's Kit (TBEK.txt), available at http://www.IttyBittyComputers.com/IttyBitty/TinyBasic/TBEK.txt.)

Just as the particular words of the English-language pseudocode depend on who is saying it and how they happen to think of the problem at that moment, the particular expression of the IL is also idiosyncratic. You can see that in the difference between Allison's IL and mine. Some of it is just spelling differences (to simplify my assembler) and added features. However, there are also some differences in what the words actually do, particularly in startup and line advance, where Allison was initially somewhat vague. No two independently written programs will ever exactly match, even if they perform identically.

The point of this exercise is to express all of the relevant program operation in a concise form. In Tiny Basic, a large part of the program specification is already in a concise grammar, but we needed to add semantics. Other projects would have different requirements.

Pseudocode Trade-Offs

One of my objectives in doing this C Tiny Basic project was to investigate the mechanics for putting a graphical user interface (GUI) on a Windows program. Three years ago, when Apple killed the only commercially viable WYSIWYG operating system that ever existed, I threw away 67,000 lines of active Mac-only code. I can rewrite that program, but I need to get away from proprietary platforms the vendor can arbitrarily discontinue. C/C++ is an ANSI-standard multiplatform programming language, but there is no standard GUI. So I chose to build a virtual GUI, emulated on whatever platform is available, and implemented interpretively (in an IL) like Tiny Basic. Tiny Basic is thus a good testbed for it.

There is a performance cost to interpretation. It depends on the implementation language and the IL design, but typically costs three to 10 machine instructions or lines of source-code overhead on each VM operation, besides the code to do the operation. Very high-level operations such as the TBIL or my GUI emulator spend a lot of time doing each operation, so the overhead is a small penalty. A low-level VM such as Java bytecode, which does much less in each operation, typically runs an order of magnitude slower than native code to do the same job. Because the VM is relatively fixed and widely used, the cost of implementing a just-in-time (JIT) compiler from bytecode to native machine code is justified. Other VMs such as Forth, which has no fixed IL, can reduce the cost by careful IL design: Depending again on the host hardware, a threaded-code (Forth) interpreter costs only one to three overhead machine instructions for each VM operation, giving it performance comparable to compiled code.

Another trade-off to consider is readability (maintainability). Anybody who studied computer science in the last four decades can read grammars; however, TBIL is as hard to read, but not so widespread, as an assembly language. Forth is completely idiosyncratic to the individual programmer and application; it is truly a "write-only" language.

Against that readability fog index we trade off panorama, the ability to see in a single view everything that is going on. Allison's IL was only 110 lines, which could be viewed in less than one column of tiny font on the original PCC tabloid, or in two screenfulls on a modern high-res display. Furthermore, it divides nicely in half, the first half being the statement executor, and the rest is expressions and other subroutines. My own TBIL, at 230 lines, is correspondingly less readable.

Implementation Options

There are a variety of techniques to choose from in implementing intermediate-level or pseudocode languages. Most of us are familiar with the early Java runtime interpreter and its replacement on all modern platforms by a JIT compiler. Apple went through the same transition when moving its customer base from the Motorola 68000 CPU family to the IBM Power PC family, and again when dropping the (classic) Mac OS for UNIX. The 68000 instruction set was fully interpreted on the first PowerMacs, at about a 10× performance cost. Several independent developers (including myself) later got substantial performance gains by implementing a binary-to-binary JIT compiler. The "Classic" runtime environment is similarly carried along in OS X to run legacy Mac code, with comparable interpretation penalties.

Allison's (and my) TBIL was implemented in the same manner by a direct interpreter, which preserves the original pseudocode. Similarly, the TBIL interpreter itself preserves the original Basic code in memory as it interprets it. Most full Basic interpreters did a simple kind of JIT translation from Basic text to condensed internal tokens. The Whipple and Arnold Tiny Basic effectively did the same by hand-translating the IL to machine code as they wrote. This removes the panorama of the original IL, but they didn't even use an assembler, choosing rather to code everything in octal absolute, a surprisingly popular Luddite methodology whose proponents have never survived long in the marketplace.

The implementation of Forth bears closer examination because of its heritage in my colleague Drexel Heater's remark. A VM consisting entirely of a library of subroutines called by the high-level problem solution is conceptually not so different from the TBIL approach: Each IL opcode can be thought of as a subroutine (for indeed it is) that is called by the IL code where it is used. Whipple and Arnold replaced those byte codes with the actual subroutine calls, which made their VM more closely resemble my Fortran VMs of 30 years ago. Forth takes that one step further and replaces all the overhead of a normal subroutine call with just the address of the subroutine. In this "threaded code," each successive subroutine call does nothing more than load the next address in sequence into the CPU program counter. The parameters are handled the same way because every operator is threaded code. It still needs to stack and unstack the current IL address entering and leaving nonprimitive subroutines, but the whole interpreter is straightforward.

Interpreter Speed

The direct interpretation technology chosen for Tiny Basic sacrifices up to an order of magnitude execution speed, but the speed of modern computers reduces the impact. Moreover, the memory bus bandwidth limits in modern computers can completely cancel the speed advantage of native code over interpretation. How can this be? Consider that the Tiny Basic virtual machine (VM) is completely defined in 443 bytes of IL code; the same code in handwritten assembly is more than 2K, a factor of 5×. Although Tiny Basic is tiny, if the IL interpreter generally fits into the primary CPU cache with its IL, while the much larger native code does not, the IL code will execute at native CPU speed while the native code thrashes much slower.

Twelve years ago, I implemented a dynamic recompiler to convert legacy 68000 code being interpreted in Apple's PowerMac computers, into native Power PC code. I expected (and got) about 4× speedup—for small programs. Larger programs actually ran slower than the pure interpreter. At first, I suspected my recompiler cache was thrashing, forcing unnecessary recompilation, but careful instrumentation showed that the recompiler was executing less than 10 percent of the total time. A logic analyzer showed that most of the time was spent waiting for the CPU cache to reload from main memory. I added a large L2 cache, but got only minor improvements. Modern CPUs executing instructions 20 or more times faster than the memory bus exhibit the problem even worse. Recall that the cost of vanilla interpretation is 10×. An IL interpreter could actually make your application run faster than native code, depending on actual code sizes. I called this "the RISC penalty" because Reduced Instruction Set Computer (RISC) code is so much less dense than complex instruction sets; high-level ILs are even more dense, exaggerating the effect.

Ignoring the RISC penalty, what else can impact performance? The choice of implementation language is significant. Although "C/C++" are often hyphenated as if they were but one language, they really are not. Though compiler vendors have done a lot to mitigate the overhead cost of late-binding methods in support of inheritance and polymorphism, it is still significant compared to pure C.

Even in pure C, the cost of calling functions is substantial compared to inline code. If the interpreter is carefully designed as a big switch statement inside a while(true) and no function calls, modern optimizing compilers will give nearly optimal machine speed. In my first attempt at Tiny Basic IL in C, I used autoincrement on byte and int pointers for all memory accesses to get maximum performance. I later replaced that with function calls for compatibility to my previous implementations, which specified number stacks implemented as pairs of bytes in memory. This could still be inlined using C macros to eliminate the function call overhead, but at the cost of larger code.

A master artisan controls his tools. For a programmer, that means understanding what kind of code the compiler gives you for each statement type. Allison recommended a cascading tree of nested IFs in his "Build Your Own Basic" articles, but his objective was minimal space. I recommend a switch statement for better speed. A good compiler can select any of many individual snippets of code in a single indexed jump in constant time, whereas nested IF trees require log(n) or more individual tests. The Forth interpreter overhead is so small it can be replicated at the end of every operation code, so there isn't even the overhead to jump to a shared interpreter loop.

Following Allison's suggestion, my original TBIL interpreters were implemented in assembly language and optimized for minimal space. In this project, I wrote in ANSI Standard C with a modest goal (not entirely served) of tuning for speed. The C source code and some Tiny Basic programs that run in it are available electronically from DDJ and at http://www.IttyBittyComputers.com/IttyBitty/TinyBasic/.


Listing One


START:  INIT                  ;INITIALIZE 
        NLINE                 ;WRITE CRLF 
        TSTL    XEC           ;TEST FOR LINE NUMBER 
        INSERT                ;INSERT IT (MAY BE DELETE) 
        JMP     CO 
XEC:    XINIT                 ;INITIALIZE 


        TSTV    S17           ;YES, PLACE VAR ADDRESS ON AESTK 
        TST     S17,'='       ;(This line originally omitted) 
        DONE                  ;REPORT ERROR IF NOT NEXT 
        STORE                 ;STORE RESULT 
        NXT                   ;AND SEQUENCE TO NEXT 
S1:     TST     S3,'GO'       ;GOTO OT GOSUB? 
        TST     S2,'TO'       ;YES...TO, OR...SUB 
        CALL    EXPR          ;GET LABEL 
        DONE                  ;ERROR IF CR NOT NEXT 
        XPER                  ;SET UP AND JUMP 
S2:     TST     S17,'SUB'     ;ERROR IF NO MATCH 
        CALL    EXPR          ;GET DESTINATION 
        DONE                  ;ERROR IF CR NOT NEXT 
        SAV                   ;SAVE RETURN LINE 
        XPER                  ;AND JUMP 
S3:     TST     S8,'PRINT'    ;PRINT 
S4:     TST     S7,'"'        ;TEST FOR QUOTE 
        PRS                   ;PRINT STRING 
S5:     TST     S6,','        ;IS THERE MORE? 
        SPC                   ;SPACE TO NEXT ZONE 
        JMP     S4            ;YES JUMP BACK 
S6:     DONE                  ;ERROR IF CR NOT NEXT 
S7:     CALL    EXPR 
        PRN                   ;PRINT IT 
        JMP     S5            ;IS THERE MORE? 
S8:     TST     S9,'IF'       ;IF STATEMENT 
        CALL    EXPR          ;GET EXPRESSION 
        CALL    EXPR          ;GET EXPRESSION 
        TST     S17,'THEN'    ;(This line originally omitted) 
        JMP     STMT 
S10:    CALL    VAR           ;GET VAR ADDRESS 
        INNUM                 ;MOVE NUMBER FROM TTY TO AESTK 
        STORE                 ;STORE IT 
        TST     S11,','       ;IS THERE MORE? 
        JMP     S10           ;YES 
S11:    DONE                  ;MUST BE CR 
        NXT                   ;SEQUENCE TO NEXT 
        DONE                  ;MUST BE CR 
        RSTR                  ;RESTORE LINE NUMBER OF CALL 
        NXT                   ;SEQUENCE TO NEXT STATEMENT 
S13:    TST     S14,'END' 
S14:    TST     S15,'LIST'    ;LIST COMMAND 
S15:    TST     S16,'RUN'     ;RUN COMMAND 
S16:    TST     S17,'CLEAR'   ;CLEAR COMMAND 
        JMP     START 

S17:    ERR                   ;SYNTAX ERROR 

EXPR:   TST     E0,'-' 
        CALL    TERM          ;TEST FOR UNARY -. 
        NEG                   ;GET VALUE 
        JMP     E1            ;NEGATE IT 
E0:     TST     E1A,'+'       ;LOOK FOR MORE 
E1A:    CALL    TERM          ;TEST FOR UNARY + 
E1:     TST     E2,'+'        ;LEADING TERM 
        CALL    TERM 
        JMP     E1 
E2:     TST     E3,'-'        ;ANY MORE? 
        CALL    TERM          ;DIFFERENCE TERM 
        JMP     E1 
E3:T2:  RTN                   ;ANY MORE? 
TO:     TST     T1,"*"
        CALL    FACT          ;PRODUCT FACTOR. 
        JMP     T0 
T1:     TST     T2,'/' 
        CALL    FACT          ;QUOTIENT FACTOR. 
        JMP     T0 

FACT:   TSTV    F0 
        IND                   ;YES, GET THE VALUE. 
F0:     TSTN    F1            ;NUMBER, GET ITS VALUE. 
F1:     TST     F2,'('        ;PARENTHESIZED EXPR. 
        CALL    EXPR 
        TST     F2,')' 
F2:     ERR                   ;ERROR. 

RELOP:  TST     RO,'=' 
        LIT     0             ;= 
R0:     TST     R4,'<' 
        TST     R1,'=' 
        LIT     2             ;<= 
R1:     TST     R3,'>' 
        LIT     3             ;<> 
R3:     LIT     1             ;< 
R4:     TST     S17,'>' 
        TST     R5,'=' 
        LIT     5             ;>= 
R5:     TST     R6,'<' 
        LIT     3 
        RTN                   ;(This line originally omitted) 
R6:     LIT     4 
Back to article

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.