C/C++

Tools for Flexible Scripting

By Sergei Savchenko, January 01, 2000

Building scripting languages can be a time-consuming tasks. Sergei presents a pair of tools to simplify development.

Jan00: Tools for Flexible Scripting

Sergei works for CAE Electronics in Quebec, Canada. He can be contacted at [email protected].

Building scripting languages can be a time-consuming undertaking. To ease the job, I'll present two tools that simplify the development of small scripting instruments -- a C++ class template (called "FORMULA") for parsing and evaluation of expressions of different type and syntax, and a C function that lets applications reconstruct different data structures saved on secondary storage. The source code for both (available electronically; see "Resource Center," page 5) is portable and should compile and work on any platform with C/C++ compilers.

Parsing and Evaluation

The FORMULA class template implementation requires only a minimal amount of coding for instantiation with any particular type of underlining object and any particular syntax. You can instantiate this class for both basic C++ types (such as int and float), as well as newly created types. For proper instantiation, the class template must locate two static functions -- one describing the syntax of the expressions for the parser (see Listing One), and another describing the semantics of applications of functions and variable lookups for the evaluation stage (see Listing Two).

As Listing One illustrates, the syntax is described in terms of several lists of symbols that represent:

Characters that must be skipped before any term (such as spaces).
Terminators (such as brackets).
Unary functions.
Binary functions.

Some characters can serve multiple purposes and will be present in several lists. A space, for example, should be skipped before a term, but also often serves as a terminator limiting the term. Similarly, the minus sign can describe both a unary and a binary function. Thus, it must be placed into both lists. The order of binary operators in the list determines their evaluation precedence. Multiplications must be placed after additions in the list in order to guarantee the priority of the multiplication.

Unary and binary functions are limited to one character in length. Under some circumstances, this may be an unacceptable limitation that requires a few modifications to the class template and syntax description function.

These lists and conventions are used by the parser, which analyzes the expression and constructs its internal representation based on reversed notation. When the expression's value needs to be computed, this representation is used by the recursive evaluator, which calls the second of the two provided functions that describes the semantics of the expressions. As Listing Two shows, this function must be capable of evaluating the:

Value of variables or constants.
Applications of unary functions.
Applications of binary functions.
Applications of multivariable functions.

A multivariable function is recognized by the parser when the opening bracket immediately follows some term, such as function(a,b).

To provide for textual terms, all characters that are confined between quotes ('a text') are considered as one literal term. The opening and the closing quotes are different. This lets the parser match opening against closing quotes, thus providing an opportunity for nesting of literal terms within other literal terms. This capability often comes in handy. For example, it allows for consistent treatment of passing parameters to functions by expression as opposed to passing parameters by value. In other words, with function(a+b), for instance, the result of the expression a+b will be passed, whereas with function('a+b'), the expression is passed as a literal constant that can be parsed and handled in a particular manner within the evaluator. Because the opening and closing quotes are different, the FORMULA class is consistent and allows the nesting or passing of parameters by expression, such as in function('function('a+b')'), where the outer quotes delimit the term passed into the outer function and the inner quotes specify the expression to be passed into the inner function.

This class template should let you keep the amount of coding of the expression evaluators to a minimum. It can be used within a larger interpreted scripting language, such as SRDL, a prototype of a scripting language for databases I developed (see "SRDL: A Small Relational Database Language," Dr. Dobb's Sourcebook, April/March 1997).

In the case of SRDL, the scripting language was built with two instances of the FORMULA class -- one instance implementing the relational algebra (operations on tables), and the other implementing the expressions of the field algebra (operations on fields). The latter were passed by expression into the procedures implementing operations on tables and evaluated with respect to the current context.

Data Interpretation

Extensive data structures require specialized approaches for retrieval and accommodation on secondary storage. Smaller data structures -- especially those that are read entirely into memory -- are also common. In the latter case, you often use some scripting strategy to store and read these data structures.

The tool I'll now present lets you describe any C data structure in a script file and reconstruct it in the program with a single function call. The script file carries both typing information and data definitions. The typing information is used by the interpreting function to allocate the proper amount of memory for the components of the data structure.

Figure 1 presents the grammar of the data description language. As you can see in the last line, the language is a sequence of statements of three kinds. The statements serve to define a type, describe data, and specify how the data structure should be passed back into the interpreting application.

The syntax for typing and data definition is similar to C. A type is defined to be either one of the basic types (see Figure 1) or one of the derived types, such as an array or a structure. Both derivation rules are recursive and let you have, for example, arrays of structures or structures of arrays. Rather than follow C-like syntax, where an array is described similar to int[N] (dimension follows the type), I use a Pascal-like convention that lets you specify an array as [N]int (type follows the dimension). This convention is more logical and simplifies the parser, enabling it to be built using a recursive descent approach. As such, the parser doesn't have to remember previously read terms to make a decision about a current term. Since the reason for having the typing information is to ensure proper memory allocation (and not to provide access to the individual elements of the data structure), members in structures are nameless.

Data definition is also similar to that in C, with the difference being that elements of an array are placed into square brackets for consistency with the typing syntax. The interpreting function reads the script, remembering the type definitions and using them to reconstruct the data descriptions that must be accompanied by their type names. The syntax for data definition is presented in Figure 1.

Many C data structures contain pointers. The data description language provides for the basic type ptr representing a typeless pointer. A name of a variable or a name of another script file encountered in the data definition statement specifies the value for the pointer (see Listing Four).

The export statement of the language specifies the name of the variable, the pointer to which must be returned by the interpreting function (see Listing Five). As Listings Four and Five show, only a single call is required to reconstruct a structure that can further be normally used within the application, assuming that the type definition in the script file and the application are the same. Admittedly, direct allocation of a C structure is often dangerous because some compilers may perform alignment of the elements of structures to optimize performance or to suit hardware requirements. Thus, a structure's elements may not occupy continuous locations in memory. The data interpretation functions verify the alignment and use this information to properly construct the structures, thereby avoiding the danger.

What's convenient about this tool (which can be used by a variety of applications) is that data can be stored externally from the application with minimal effort. You don't have to build individual languages for every particular kind of data, just describe the type and use a single data interpretation function for any data structure. This tool also provides the necessary error checking, which is sometimes neglected when you are pressed to build a scripting tool.

Of course, a similar mechanism requires certain modifications and rethinking if it is to be adapted for C++. Also, if such a language is supplied and made accessible to the end user, typing information must be kept separate to prevent desynchronization with the types defined in the application program.

DDJ

Listing One

const char* syntax(int type,int& tag)
{
 switch(type)
 {
  case SNTX_FILTER:     return(" \n");
  case SNTX_TERMINATOR: return(" !|&+-*/(),\n");
  case SNTX_UNARY:      return("-!");
  case SNTX_BINARY:     return("|&+-*/");
 }
}

Back to Article

Listing Two

void evaluate(int type,int op,TEXT& txt,int *c)
{
 switch(type)
 {
  case EVAL_VARIABLE: c[0]=atoi(txt.contents()); break;
  case EVAL_UNARY:    switch(op)
                      {
                       case '!': c[0]=!c[0]; break;
                       case '-': c[0]=-c[0]; break;
                      }
                      break;
  case EVAL_BINARY:   switch(op)
                      {
                       case '*': c[0]=c[0]*c[1];  break;
                       case '/': c[0]=c[0]/c[1];  break;
                       case '+': c[0]=c[0]+c[1];  break;
                       case '-': c[0]=c[0]-c[1];  break;
                       case '&': c[0]=c[0]&&c[1]; break;
                       case '|': c[0]=c[0]||c[1]; break;
                      }
                      break;
  case EVAL_MULTY:    if(txt=="funct") { c[0]=c[0]+2*c[1]; break; }

 }
}

Back to Article

Listing Three

main()
{
 FORMULA<int> expr;
 expr="(funct(1,2)+4)*2";
 printf(" %d\n",expr.value());
}

Back to Article

Listing Four

type datastruct { int int [3]float ptr }

var [2]int arr [ 1 2 ]
var [4]datastruct ds
[
 { 1 2 [3 4 5] arr }
 { 6 7 [8 9 10] arr }
 { 11 12 [13 14 15] arr }
 { 0 0 [0 0 0] arr }
]
export ds

Back to Article

Listing Five

#include "data.h"
struct datastruct
{
 int a,b;
 float k[3];
 int *pt;
};
main()
{
 int i;
 struct datastruct *ds=D_data("test.dat");
}

Back to Article

1 2 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

C/C++

Tools for Flexible Scripting

Parsing and Evaluation

Data Interpretation

Listing One

Listing Two

Listing Three

Listing Four

Listing Five

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

C/C++

Tools for Flexible Scripting

Parsing and Evaluation

Data Interpretation

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content