Dr. Dobb's | Extensibility in Tcl

Extensibility in Tcl

One of the major reasons the Tcl scripting language has been widely adopted is its extensibility. Tcl's creator describes the design decisions he made to ensure this quality.

June 01, 1999
URL:http://www.drdobbs.com/open-source/extensibility-in-tcl/184410965

Jun99: Extensibility in Tcl

John is CEO of Scriptics Corp., and creator of the Tcl scripting language and the Tk toolkit. He can be reached at ouster@ scriptics.com.

Most programming languages are designed to be self-contained worlds. As a programmer, you choose a language, then do all your programming in that one language. It's often hard to make code written in one language work well with code in another language, so picking a particular language may prevent you from using other languages.

The Tcl scripting language has a different design philosophy. Instead of containing everything you need, Tcl was designed as an integration language to tie together pieces of code written in other languages. Tcl works well with almost any imaginable language or application, and most of the interesting functions you use in a Tcl script are implemented outside of Tcl.

Tcl's flavor comes in large part from the fact that it is extensible. It was designed from the start to make it as easy as possible to add to Tcl's built-in features by writing code in C or other languages. As a result, Tcl has been used in thousands of different situations to automate tasks or integrate disparate resources. In this article, I'll focus on how extensibility works in Tcl.

Why Extensibility?

Tcl is used in two common ways, both of which require extensibility.

As an embedded command language. This was my original motivation when I created Tcl. The idea was to build the Tcl interpreter as a library package that could be linked into an application as its command language, as shown in Figure 1. Tcl provides generic facilities that any command language needs, including variables, control structures (such as if and while), procedures, and string manipulation. Each application then adds its own features into the Tcl language as extensions, creating a powerful command language that can be used to automate and extend the application with Tcl scripts. I wanted the same base language to be usable for almost any application, so Tcl had to support as broad a variety of extensions as possible. Furthermore, extensions needed to behave naturally, as if they had been designed into Tcl from the beginning: There shouldn't be obvious differences between extensions and built-in facilities.
As a platform for integration applications. I did not foresee this usage when I created Tcl, but it has become the most common way of using Tcl today. When used for integration, Tcl is a stand-alone platform rather than a piece of another application. The extension mechanism connects Tcl to resources being managed, such as applications, databases, news feeds, devices, or the Web (see Figure 2). Tcl scripts can then be used to coordinate all the resources and build new functionality on top of their base features. The integration task can be as simple as connecting an application to its user via a graphical user interface, or as complex as the control system for an oil-well platform, which manages hundreds of devices and applications. For any language to be good for integration, it must connect to a huge variety of other resources; Tcl's extension mechanism allows this.

The bottom line is that extensibility gives tremendous power to a scripting language. Extensibility makes it possible for Tcl to connect to resources and automate functions that were previously manual. In addition, extensibility lets Tcl connect to multiple disparate resources and integrate them to operate in a coordinated fashion.

Tcl Architecture

When designing Tcl, I developed the C APIs for extension at the same time as the language itself, and made deliberate tradeoffs in the design of the language to simplify and empower the extension mechanism. This resulted in an unusual design process. The goals that influenced Tcl's architecture include:

The core Tcl language should have as little structure and flavor as possible. Structure implies limitations, so a more structured language limits the kinds of things that extensions can do. Similarly, if a language has a strong flavor (such as complicated or restricted syntax), it will clash with extensions that need a different flavor. I wanted Tcl to take on the flavor of whatever extensions it is used with.
The language should be extensible in as many ways as possible. It should be easy to add not only new commands, but also new data types and even new control structures.
The extension mechanism should be as simple as possible.
Extensions should have access to all elements of the internal state of an interpreter, such as variables.
Data and code should be represented inside Tcl in a way that can easily be passed back and forth to extensions written in C. This, and the desire for as little structure as possible, led to the use of strings for almost everything.
The facilities of the core Tcl language should be implemented using the same mechanisms as extensions. The set of things that can only be done inside the Tcl core should be as small as possible.

Given these goals, I decided that interpreting a Tcl script should be a two-phase process. In the first phase, the Tcl interpreter parses a section of code, identifies an extension to execute it, and passes control to the extension. In the second phase, the extension executes the code. Control then returns to the Tcl interpreter to parse the next section of code. Ideally, the Tcl interpreter should understand only the bare minimum needed to parse some code and pass control to an extension. Everything else in the interpretation of the script should be left to the extension; this gives maximum power and flexibility to extensions.

Inspired by UNIX shells such as sh, I decided on a language syntax based on commands and words. A Tcl script consists of one or more commands, and each command consists of one or more words. For example, the command set a 43 sets the value of variable a to 43. It has three words: set, a, and 43. The interpreter parses the command and breaks it into words. It then uses the first word (set) as the name of the command, locates a C command procedure to execute the command, and invokes the command procedure, passing it all of the words as arguments. Some command procedures, such as the one for set, are part of the Tcl interpreter; these are called "built-in commands." Other command procedures are part of extensions. There is no difference between a built-in command and an extension except that the command procedures for built-in commands are part of the Tcl interpreter, so they are available in every Tcl application.

In addition to breaking up commands into words, the Tcl interpreter performs a few other string manipulations before passing the words to a command procedure. Listing One, which illustrates most of these features, contains five commands separated by newlines. In the second command, the $ invokes variable substitution: The letters after the $ are taken as the name of a variable, and the value of the variable is substituted into the command in place of the variable name. Thus the command procedure receives 43 as its third word, not $a, and variable b is assigned that value.

The [] construct in the third command invokes command substitution: Everything between the brackets is processed as a separate command and the result is substituted into the outer command. expr treats its argument (43+10 after the variable substitution) as an arithmetic expression and returns the value of the expression, which is 53. This value is passed to the set command and assigned to variable c.

The fourth command shows how double quotes can be used to specify words containing spaces: Everything between the quotes is passed to the command procedure as a single word. puts is a command that prints its argument; in this case it prints the message The value of c is 53. If a word is enclosed in curly braces (as in the last command), then the information between the braces is passed to the command procedure verbatim without substitutions. Thus the $ is printed by puts and does not cause variable substitution to occur.

The Tcl interpreter knows nothing about commands except what is required to break them up into words and perform the substitutions just described. As far as the Tcl interpreter is concerned, all values are strings -- including commands, words, and results. Any further interpretation of information is carried out by command procedures. Thus only the command procedure for expr knows that its arguments are numbers and operators.

Control structures such as if and while are just commands that treat their arguments as Tcl scripts; see Listing Two for an example. The command procedure for foreach receives four words: foreach, i, 2 4 6 8 10, and the Tcl script contained between the curly braces. foreach implements a loop; for each of the values 2 through 10, it sets variable i to that value and then invokes the Tcl interpreter recursively, passing it the last argument of foreach as the script to execute. Only the command procedure for foreach knows that its third word is actually a list of values and the fourth word is a nested Tcl script. Because the script is enclosed in braces, no substitutions occur before it is passed to the foreach command procedure; however, when the script is passed back to the Tcl interpreter for each iteration of the loop, the braces are no longer present so substitutions are done. Tcl procedures are created in a similar fashion by invoking a command proc that takes as its arguments a procedure name, a list of arguments, and a Tcl script that is the procedure's body.

People often ask why Tcl requires the use of the set and expr commands, instead of traditional assignment statements with implicit arithmetic, such as c=a+10. The reason is that this would have predefined many features of the language. For example, a command couldn't have "=" as its second word without causing assignment, and "+" would always invoke addition. This would have reduced the power of extensions to apply their own meanings to their arguments, so it would have limited Tcl's extensibility.

A Simple Command Procedure

To create a new Tcl extension, you implement one or more new commands, writing a command procedure for each. Traditionally, command procedures have been written in C, and that's what I'll use here. However, you can also write command procedures in C++ or Java (using an extension called "TclBlend" that connects Tcl to Java; see "TclBlend: Blending Tcl and Java," by Scott Stanton, DDJ, February 1998). Once you've written the command procedures for your extension, you compile them, load them into an application containing Tcl, and register them with the Tcl interpreter by telling Tcl the name of each command and the address of its command procedure. I'll skip the details of compiling, loading, and registering command procedures to focus on the internals of command procedures.

The first example is a new command, add1, which takes a single integer argument. The command adds "1" to its argument and returns the result. For example, add1 12 returns 13. Listing Three is the command procedure for add1.

Once Add1Cmd has been registered as the command procedure for add1, the Tcl interpreter calls Add1Cmd whenever add1 is invoked. Command procedures receive four arguments. The first argument isn't used in this example; it is used in more complex cases to identify an object associated with the command, such as an open file or graphical control. The interp argument is a handle for the Tcl interpreter where the command was invoked. objc gives a count of the total number of words in the command (including the command name), and objv is an array that has elements that are the values of the words after all substitutions have been performed by the Tcl interpreter. objc and objv are similar to the argc and argv parameters used to pass command-line arguments to a UNIX main() function.

Values are passed around in Tcl using structures of type Tcl_Obj. Each word of a command is represented with a Tcl_Obj, each command returns a Tcl_Obj result, each Tcl variable stores its value in a Tcl_Obj, and so on. Think of a Tcl_Obj as storing a string value of arbitrary length. Tcl provides a library of procedures that convert the string values in Tcl_Objs to/from other forms, such as integers. Tcl_Objs also contain information that improves efficiency by eliminating unnecessary string conversions.

A command procedure returns two values to the Tcl interpreter. The first is a result, which is stored in the interpreter and accessed via procedures such as Tcl_SetResult or Tcl_SetObjResult. The second value is an integer completion code, which is returned as the result of the command procedure. A completion code of TCL_OK means that the command completed successfully. TCL_ERROR means that an error occurred while executing the command and the script should be aborted; in this case the interpreter's result contains an error message to present to the user. Other values, such as TCL_RETURN and TCL_BREAK, are used to handle returns from Tcl procedures and escapes from loops.

The Add1Cmd procedure first makes sure that there were two words in the command (the command name and value to increment); if not, it calls Tcl_SetResult to store an error message string in the interpreter's result, then it returns the TCL_ERROR completion code. If the argument count is correct, Add1Cmd retrieves the integer value of the second word of the command by calling Tcl_GetIntFromObj. This procedure attempts to translate the string value of the argument to an integer. If the operation succeeds, it stores the integer value in i and returns TCL_OK. If the value can't be converted to an integer (the command was add1 dog), then Tcl_GetIntFromObj stores an error message in interp's result and returns TCL_ERROR. When Add1Cmd sees the error return, it returns an error to its caller. This style is used commonly throughout Tcl: Procedures use TCL_OK and TCL_ERROR return values to indicate whether they succeeded; if errors occur, they store error messages in the interpreter's result before returning TCL_ERROR. Once one procedure returns TCL_ERROR, its caller also returns TCL_ERROR until control returns to Tcl, which then aborts the script and displays the error message to users.

If the integer value is converted successfully, Add1Cmd calls Tcl_NewIntObj, which creates a new Tcl_Obj and stores an integer in it, automatically converting the integer value to a string. Then Tcl_SetObjResult stores that object as the interpreter's result and Add1Cmd returns with a successful completion code.

A New Looping Command

To illustrate how straightforward it is to define a new control structure in Tcl, the next example implements a new command called loop. Listing Four shows how loop is used. The loop command takes as arguments the name of a variable, two integers, and a Tcl script. It sets the variable to each integer value in the given range and invokes the Tcl script once for each value. Listing Five is the command procedure that implements the loop command.

LoopCmd uses several new Tcl procedures. Tcl_ObjSetVar2 sets the value of a Tcl variable, given a Tcl_Obj containing the variable's name and a Tcl_Obj containing the value. Tcl_EvalObj is the main entry point to the Tcl interpreter: It is called once for each iteration of the loop to evaluate the loop body. Errors can potentially occur in Tcl_ObjSetVar2 or Tcl_EvalObj. If this happens, the procedure leaves an error message in interp's result and returns TCL_ERROR; this causes LoopCmd to return an error as well. Tcl_DecrRefCount frees the object pointed to by valuePtr if it couldn't be assigned to the variable.

This example demonstrates three features of Tcl:

How new control structures can be implemented as extensions. This is an unusual feature of Tcl that is present in few, if any, other languages.
How the command procedures define the meanings of their arguments (two arguments are treated as integers, one as a variable name, and one as a Tcl script).
How extensions can access the internals of a Tcl interpreter, in this case by reading and writing variables.

More information about Tcl library procedures is available at http://www.scriptics .com/man/.

More On Tcl_Obj Structures

In versions of Tcl before Tcl 8.0, there were no Tcl_Obj structures. Instead, all information was represented with C strings. Each command procedure received an array of strings containing the words of the command and returned a string result in the interpreter instead of a Tcl_Obj. Variable values, scripts, and virtually all other things in Tcl were represented with strings.

Strings provided a simple and powerful way of passing information around, and they made it easy to write extensions that connect Tcl with almost anything -- but they were not efficient. For example, consider set x [expr $x * 2], which multiplies a variable by two. The value of the variable was stored as a string, so the expr command had to convert its arguments from strings to integers, perform the multiplication, then convert the result back to a string. If the command was executed repeatedly then the string conversions happened each time. A similar problem occurred with scripts: Each time the body of a looping command like loop was executed, it was passed into the Tcl interpreter as a string, so the Tcl interpreter had to parse the commands and words from scratch. Consequently, most of the execution time for Tcl scripts was spent converting to and from strings.

Tcl_Objs were introduced in Tcl 8.0 to eliminate unnecessary string conversions; they are now used in most of the places where strings were used in earlier versions of Tcl. A Tcl_Obj stores a string plus an internal representation; see Figure 3. If the value of a Tcl_Obj is required in a form other than a string, then the value is converted and the other form is saved as the internal representation of the Tcl_Obj. If the value is needed again in this other form, it can be retrieved immediately from the Tcl_Obj without recomputing it from the string. For example, the library procedure Tcl_GetIntFromObj creates and reuses integer internal representations. The value of a Tcl_Obj is defined by its string representation: If the string value of a Tcl_Obj is 4.800, it might be converted to a floating-point internal representation of 4.8, but it will still print as 4.800. The internal representation just caches the result of a string conversion to improve performance.

If an internal representation is available when a new Tcl_Obj is created, such as an integer result from an expr command, it is stored in the new Tcl_Obj and the string value of the Tcl_Obj is left empty. If the value is used only as an integer (such as in subsequent expr commands), then no string value is ever created. If the string value is needed, then at that time the integer value is converted to a string; both the integer and string values are stored in the Tcl_Obj so that either can be used in the future without any additional conversions.

The Tcl_Obj mechanism allows for many different kinds of internal representations. For example, lists like the argument to foreach are converted to an internal representation that is an array of Tcl_Objs; this allows faster access than earlier versions of Tcl, which had to rescan the list from its beginning to retrieve any element. Before a Tcl script is executed, it is converted to an internal representation consisting of bytecodes that allow rapid execution. If a script is executed repeatedly, such as a loop body, subsequent executions are even faster because the script doesn't need to be parsed again; this provides a substantial speedup in Tcl 8.0.

To distinguish between different kinds of internal representations, each Tcl_Obj contains a field indicating the type of its internal representation. If a particular type of internal representation is desired (a list, for instance) and another type is present (bytecodes), then the existing internal representation is discarded and replaced with the desired type (a Tcl_Obj can hold only one internal representation at a time). New types can be defined by providing a few methods to implement that type, such as a method to copy the internal representation, one to free the internal representation, and one to regenerate the string value corresponding to the internal representation. Extensions can define new types to speed up their own conversions.

The Tcl_Obj mechanism retains all the flexibility of using strings for representing data, while improving performance dramatically. I've found that most scripts execute two to five times faster under Tcl 8.0 than under previous versions. This gives Tcl about the same speed as Perl and other scripting languages that don't have Tcl's easy extensibility.

Sample Extensions

Tcl's extension mechanism has allowed Tcl to be used for a variety of applications, including the real-time control for oil platforms, automated hardware testing, factory automation, web content generation, financial trading applications, and character animation in motion pictures such as Toy Story and A Bug's Life. In many cases, extensions are created for internal use within an organization. In addition, there are numerous extensions freely available via the Web (visit http://www.scriptics .com/resource/). Examples of open-source extensions include:

Oratcl and Sybtcl, by Tom Poindexter, provide an easy way to access the popular Oracle and Sybase databases (http://www.nyx.net/~tpoindex/tcl.html).
TclX, by Mark Diekhans and Karl Lehenbauer, provides access to many of the UNIX kernel facilities. It also extends the Tcl facilities for manipulating lists, adds its own new data type (keyed lists), creates new control structures for scanning files, and adds a profiling mechanism to Tcl (http://www.neosoft .com/TclX/).
[incr Tcl], by Michael McLennan, adds object-oriented programming to Tcl. [incr Tcl] adds a class mechanism with objects, methods, and inheritance (http:// www.tcltk.com/itcl/).
Expect, by Don Libes, simulates users typing at terminals, making it possible to automate terminal-oriented applications. It adds new control structures that associate Tcl scripts with patterns of output generated by the application (http:// expect.nist.gov/).
Tk, a GUI toolkit I created, lets you create GUIs from Tcl. It also adds an event binding mechanism to associate Tcl scripts with UI events such as button clicks and keystrokes (http://www .scriptics.com/software/download.html).

Conclusion

Extensibility is one of the key reasons for Tcl's success. For example, extensibility made it easy to implement the Tk toolkit, which is one of the most common reasons people give for using Tcl. Extensibility also lets Tcl be used as a general-purpose automation tool -- it can be connected to, or embedded in, almost anything and used to automate previously manual tasks. For example, Tcl has become the language of choice for automated hardware and software testing. Lastly, extensibility has made Tcl into a powerful integration platform where the base language is augmented with extensions to connect to disparate resources, and Tcl scripts are written to coordinate the resources.

DDJ

Listing One

set a 43
set b $a
set c [expr $a+10]
puts "The value of c is $c"
puts {Lunch costs $6.95}

Back to Article

Listing Two

foreach i {2 4 6 8 10} {
    puts "$i squared is [expr $i*$i]"
}

Back to Article

Listing Three

#include <tcl.h>
int Add1Cmd(ClientData dummy, Tcl_Interp *interp, int objc,
        Tcl_Obj *objv[]) {
    int i;
    if (objc != 2) {
        Tcl_SetResult(interp, "wrong number of arguments", TCL_STATIC);
        return TCL_ERROR;
    }
    if (Tcl_GetIntFromObj(interp, objv[1], &i) != TCL_OK) {
        return TCL_ERROR;
    }
    Tcl_SetObjResult(interp, Tcl_NewIntObj(i+1));
    return TCL_OK;
}

Back to Article

Listing Four

set factorial 1
loop i 1 7 {
    set factorial [expr $factorial*$i]
}
puts "7 factorial is $factorial"

Back to Article

Listing Five

#include <tcl.h>
int LoopCmd(ClientData dummy, Tcl_Interp *interp, int objc,
        Tcl_Obj *objv[]) {
    int current, last, code;
    Tcl_Obj *valuePtr;

    if (objc != 5) {
        Tcl_SetResult(interp, "wrong number of arguments", TCL_STATIC);
        return TCL_ERROR;
    }
    if (Tcl_GetIntFromObj(interp, objv[2], ¤t) != TCL_OK) {
        return TCL_ERROR;
    }
    if (Tcl_GetIntFromObj(interp, objv[3], &last) != TCL_OK) {
        return TCL_ERROR;
    }
    for ( ; current <= last; current++) {
        valuePtr = Tcl_NewIntObj(current);
        if (Tcl_ObjSetVar2(interp, objv[1], (Tcl_Obj *) NULL,
                valuePtr, TCL_LEAVE_ERR_MSG) == NULL) {
            Tcl_DecrRefCount(valuePtr);
            return TCL_ERROR;
        }
        code = Tcl_EvalObj(interp, objv[4]);
        if (code != TCL_OK) {
            return code;
        }
    }
    return TCL_OK;
}

Back to Article

Figure 1: When Tcl is embedded in an application, it provides basic programming facilities that form the core of a command language for the application. The application then adds its own functions into the Tcl interpreter as extensions.

Figure 2: Tcl can also be used as a platform for integration: Extensions connect the Tcl interpreter to various resources, then Tcl scripts can be written to coordinate the resources and extend their facilities.

Figure 3: In Tcl 8.0 and later versions, Tcl_Obj structures are used to represent most data. A Tcl_Obj can hold a string value (with length) and also an equivalent but more efficient internal representation. Small internal representations can be stored directly in the Tcl_Obj; larger values are allocated separately with a pointer stored in the Tcl_Obj. The type field identifies the current form of the internal representation and makes the internal representation mechanism extensible. The reference count allows Tcl_Objs to be shared.