Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

The Small Scripting Language


Oct99: The Small Scripting Language

Thiadmer writes multimedia software and animation toolkits for ITB CompuPhase in the Netherlands. He can be contacted at [email protected].


Many years ago, I retyped the Small-C compiler from Dr. Dobb's Journal. (See "Putting C on a Microcomputer: The Original Small-C," by Ron Cain, and "The Small-C Compiler," by J.E. Hendrix, on Dr. Dobb's Small-C Resource CD-ROM; http://www.ddj.com/store/.) Having just grasped the basics of the C language, working on the Small-C compiler was a learning experience of its own. The compiler, as published, generated code for an 8080 assembler. The first modification I needed to make was to adapt it to the 8086. Over the years, I used it to write low-level system software, extending the compiler with new features and fixing many details. Eventually, as I was moving toward bigger applications in more conventional environments, the Small-C compiler was replaced by main-stream development environments.

In early 1998, I was looking for a scripting language for an animation toolkit. Among the languages that I evaluated were Lua (see "Lua: An Extensible Embedded Language," by Luiz Henrique de Figueiredo, Roberto Ierusalimschy, and Waldemar Celes, DDJ, December 1996), BOB (see "Your Own Tiny Object-Oriented Language," by David Betz, DDJ, September 1991), Scheme, REXX Java, ScriptEase, and Forth. None of these languages met my requirements completely. Experimenting with Al Stevens's Quincy C interpreter (see "C Programming," by Al Stevens, DDJ, May-December 1994) brought me to the idea that a simplified C might be a good fit. I dusted off Small-C. The result is "Small," a simple, typeless, 32-bit extension language with a C-like syntax. The Small compiler outputs P-code (or bytecode) that subsequently runs on an abstract machine. Execution speed, stability, simplicity, and a small footprint were essential design criteria for both the language and the abstract machine.

Small is not a proper subset of C; that is, Small programs are not necessarily compilable with a C compiler. In other words, "Small" is Small-C without the "C." The complete Small toolkit and example programs are available at http://www .compuphase.com/small.htm and from DDJ (see "Resource Center," page 5).

The Language

Small is a descendent of the original Small-C, which in turn is a subset of C. The fundamental changes I made were the removal of the primitive type system and the substitution of pointers by references. Many of the other modifications are a direct or indirect consequence of these changes. To get a feel for the language, look at Listings Two (the Sieve of Eratosthenes) and Three (the day of the week, using Zeller's congruence algorithm).

The variable type system has a useful side effecttype checking. The type of variable or constant is usually also an indication of its purpose. By not having types, Small would lack an automatic way to catch a whole class of common programming errors. That is why Small introduces variable tagnames that denote the purpose, or usage, of a variable, but without describing a memory layout of the data object. The Small compiler checks the tagnames of parameters passed to functions, and of operands on both sides of a binary operator.

A language without pointers needs the ability to pass function arguments by reference. Without it, creating a function that reorders or sorts data would require a few clumsy workarounds, probably involving global variables. Small supports pass-by-reference arguments with a syntax similar to that of C++. Arrays are always passed by reference.

C language functions can pass output values via pointer arguments. The standard function scanf(), for example, stores the values or strings that it reads from the console into its arguments. You can design a function in C so that it optionally returns a value through a pointer argument. If the caller of the function does not care for the return value, it passes NULL as the pointer value. The standard function strtol() is an example of a function that does this. This technique frequently saves you from declaring and passing dummy variables. Small replaces pointers with references, but references cannot be NULL. Thus, Small needed a different technique to drop the values that a function returns via references. Its solution is the use of an argument placeholder that is written as an underscore character ("_"). Prolog programmers will recognize it as a similar feature in that language. The argument placeholder reserves a temporary anonymous data object (called a "cell" in Small) that is automatically destroyed after the function call.

The temporary cell for the argument placeholder should still have a value. Therefore, a function must specify what value each passed-by-reference argument will have upon entry when the caller passes the placeholder instead of an actual argument. By extension, I also added default values for arguments that are pass-by-value. The feature to optionally remove all arguments with default values from the right was copied from C++.

When speaking of BCPL and B, Dennis Ritchie said that C was invented in part to provide a plausible way of dealing with character strings when one begins with a word-oriented language (see "The C Programming Language," by D.M. Ritchie, S.C. Johnson, M.E. Lesk, and B.W. Kernighan, DDJ, May 1980). Small provides two options for working with strings: packed and unpacked. In a packed string, every character fits in a cell. The overhead for a typical 32-bit implementation would be large; one character would take 4 bytes. Packed strings store up to four characters in one cell, at the cost of being significantly more difficult to handle. Modern BCPL implementations provide two array indexing methods: one to get a word from an array and one to get a character from an array. Small copies this concept, although the syntax differs from that of BCPL. The packed string feature also led to the new operator char. Example 1 shows how to access cells and characters.

Unicode applications often have to deal with two character sets: 8-bit ASCII for legacy file formats and standardized transfer formats (like many of the Internet protocols), and the 16-bit Unicode character set. Although the Small compiler has an option that makes characters 16-bit (so only two characters fit in a 32-bit cell), a more convenient approach may be to store 8-bit character strings in packed strings and 16-bit (Unicode) strings in unpacked strings. This turns a weakness in Small, the need to distinguish packed strings from unpacked strings, into a strength. Small can make the distinction quite easily, because of the way that packed characters are aligned in a cell. Example 2 is a function that distinguishes a packed string from an unpacked string and a function that determines the length of both packed and unpacked strings.

Small supports named parameters in addition to the more common positional parameters. Argument names are often easier to recall than argument positions, especially if the argument list of a function is long. Named parameters are also more convenient to use than positional parameters if many parameters have default values; contrast Example 3(a) with 3(b), for instance.

In general, I have tried to keep Small close to C. For example, Small has the same operator set as C (with the exception of a few operators that deal with structures and unions). It is generally agreed upon that some operators in this table have counterintuitive precedence. In an expression parser that I wrote for the interactive multimedia development system EGO (which has an equally large set of operators), I have had favorable experiences with a different organization of operators in their precedence levels. It would have been a simple step to adapt Small to the operator set and the precedence levels that I prefer. For the sake of similarity with C, I resisted such a change.

Minor differences between Small and C are:

  • When the body of a function is a single instruction, the braces (for a compound instruction) are optional; see Example 4(a), the ubiquitous "hello world" program.

  • Escape characters are called "control characters" in Small, and they start with a caret ("^") rather than a backslash ("\"). I prefer the caret to the backslash, because path names in DOS and Windows use backslashes. Small provides a #pragma to change the control character back into a backslash.

  • Variables have no type. To declare a new variable, you use the keyword new; see Example 4(b). Variables may be declared anywhere in the function; they need not precede any instruction in the block. The first expression of a for statement may also hold a variable declaration.

  • Arrays can be filled with an incrementing or decrementing sequence using so called "progressive initializers;" see Example 5.

  • Variable number of arguments is also supported (always passed by reference).

  • Direct support for assertions. Small also has an "assertion directive" to flag compile-time errors.

  • The cases in a switch statement are not drop through.

  • There is no preprocessor. Conditional compilation is supported, but #define can only declare simple numeric constants.

  • char is an operator, not a type.

  • The empty statement is an empty compound block ("{}"), not a semicolon.

Interfacing with C Programs

The required functions for the abstract machine are all gathered in a single C file (assuming that you use the ANSI C version of the abstract machine), but there are two additional files for core functions for Small programs and a basic set of console I/O functions. The abstract machine itself is a data structure. That is, you can create two or more abstract machines by declaring more variables of the AMX type.

The abstract machine has no function to actually read a file from disk, but once you obtain a memory image of a compiled Small file in memory, you use it to initialize the abstract machine. At this stage, all native functions are also registered to the abstract machine. Function amx_Register returns an error code if it finds one or more functions in the compiled Small program that it cannot resolve from the list of native functions. You may have several lists of native functions, so you can continue to call amx_Register until it returns a success flag. If you have several abstract machines, you must register the native functions for every abstract machine separately.

The third step is to run the program by calling amx_Exec. You can start running from the main function or from any function in the Small program that is declared public.

Listing One is a run-time program that loads and runs a compiled Small program from the command line. The Small compiler optionally inserts symbolic information and line number information in the compiled file. The abstract machine provides a function to browse through all global symbols, as well as a debug callback function that is called on every event that might be of interest to a debugger (such as a function call or return, or the start of a new instruction). By installing a callback routine, an application can provide source-level debugging of the Small programs with relative ease.

The Abstract Machine

It appears to be some kind of a tradition to design an abstract machine as a stack machine. All of the abstract machine implementations that I studied were stack machines, from the B language in 1972, to Java today. With Small, I decided to deviate from this path, because I considered the fact that a stack machine cannot take advantage of processor registers in the same way as an abstract machine with pseudoregisters.

The abstract machine mimics a dual-register processor. In addition to the two general-purpose registers, it has a few internal registers; see Table 1. Notably missing from the register set is a flags register. The abstract machine keeps no separate set of flags; instead, all conditional branches are taken depending on the contents of the PRI register.

Every instruction consists of an opcode followed by 0 or 1 parameters. Each opcode is 1 byte in size; an instruction parameter has the size of a cell (usually 4 bytes).

Most instructions have implied registers as operands. This reduces the number of operands and the amount of time needed to decode an instruction.

In several cases, the implied register is part of the name of the opcode. For example, PUSH.pri is the name of the opcode that stores the PRI register on the stack. This instruction has no parameters. Its parameter (PRI) is implied in the opcode name, see the Small manual for a list of all opcodes and their semantics.

Threading

In an indirect threaded interpreter, each opcode is an index in a table that contains a jump address for every instruction. A threaded abstract machine is conventionally written in assembler, because most high-level languages cannot store label addresses in an array. The GNU C compiler (GCC), however, extends the C language with an unary && operator that returns the address of a label. This address can be stored in a void * variable type and may be used later in a goto instruction. Basically, the following snippet does the same as goto start;:

void *ptr = &&start;

goto *ptr;

The ANSI C version of the abstract machine uses a large switch statement to choose the correct instructions for every opcode. The GNU C version of the abstract machine runs twice as fast as the ANSI C version. Fortunately, GNU C runs on quite a few platforms. This means that the fast GNU C version is still fairly portable.

Two benchmark programs (the Sieve of Eratosthenes and the calculation of Fibonacci numbers via recursion) indicate that the ANSI C version of Small's abstract machine is about as fast as the Java Virtual Machine (JVM) 1.0. This should be seen as an order-of-magnitude measure; the Java and Small languages and their goals are different enough to make any comparison questionable.

Compiling the Tools

The Small compiler and the abstract machine are written in ANSI C as much as possible. I have compiled the sources with 16-bit and 32-bit compilers of different brands. There are several compile options that you can set to adjust the compiler and the abstract machine to your platform or your preferences; see Table 2.

Conclusion

Again, the Small toolkit, including documentation in Postscript format, source of the compiler and the abstract machine, and example programs, is available at http://www .compuphase.com/small.htm. Updates and additional notes will also be posted at that location as they become available.

DDJ

Listing One

#include <stdio.h>
#include <stdlib.h>
#include "amx.h"

void *loadprogram(AMX *amx,char *filename)
{
  FILE *fp;
  AMX_HEADER hdr;
  void *program = NULL;

  if ((fp = fopen(filename,"rb")) != NULL) {
    fread(&hdr, sizeof hdr, 1, fp);
    if ((program = malloc((int)hdr.stp)) != NULL) {
      rewind(fp);
      fread(program, 1, (int)hdr.size, fp);
      fclose(fp);
      if (amx_Init(amx,program,NULL) == AMX_ERR_NONE)
        return program;
      free(program);
    } /* if */
  } /* if */
  return NULL;
}

int main(int argc,char *argv[])
{
extern AMX_NATIVE_INFO core_Natives[];
extern AMX_NATIVE_INFO console_Natives[];

  AMX amx;
  cell ret;
  int err;
  void *program;

  if (argc != 2 || (program = loadprogram(&amx,argv[1])) == NULL) {
    printf("Usage: SRUN <filename>\n\n"
           "The filename must include the extension\n");
    return 1;
  } /* if */

  core_Init();
  amx_Register(&amx, core_Natives, -1);
  err = amx_Register(&amx, console_Natives, -1);

  if (err == AMX_ERR_NONE)
    err = amx_Exec(&amx, &ret, AMX_EXEC_MAIN, 0);

  if (err != AMX_ERR_NONE)
    printf("Run time error %d on line %ld\n", err, amx.curline);
  else if (ret != 0)
    printf("%s returns %ld\n", argv[1], (long)ret);

  free(program);
  core_Exit();

  return 0;
}

Back to Article

Listing Two

/* Print all primes below 100, using the
 * "Sieve of Eratosthenes" algorithm */
#include <console>
main()
    {
    const max_primes = 100;
    new series[max_primes] = { true, ... };

    for (new i = 2; i < max_primes; ++i)
        if (series[i])
            {
            printf("%d ", i);
            /* filter all multiples of this "prime" from the list */
            for (new j = 2 * i; j < max_primes; j += i)
                series[j] = false;
            }
    }

Back to Article

Listing Three

/* illustration of Zeller's congruence algorithm to
 * calculate the day of the week given a date */
#include <console>

weekday(day, month, year)
    {
    if (month <= 2)
        month += 12, --year;
    new j = year % 100;
    new e = year / 100;
    return (day + (month+1)*26/10 + j + j/4 + e/4 - 2*e) % 7;
    }
readdate(&day, &month, &year)
    {
    print("Give a date (dd-mm-yyyy): ");
    day = getvalue(_,'-','/');
    month = getvalue(_,'-','/');
    year = getvalue();
    }
main()
    {
    new day, month, year;
    readdate(day, month, year);

    new wkday = weekday(day, month, year);
    printf("The date %d-%d-%d falls on a ", day, month, year);
    switch (wkday)
        {
        case 0:
            print("Saturday");
        case 1:
            print("Sunday");
        case 2:
            print("Monday");
        case 3:
            print("Tuesday");
        case 4:
            print("Wednesday");
        case 5:
            print("Thursday");
        case 6:
            print("Friday");
        }
    print("^n");
    }




Back to Article


Copyright © 1999, Dr. Dobb's Journal

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.