C/C++

PETE: The Portable Expression Template Engine

By Scott Haney, James Crotinger, Steve Karmesin and and Stephen Smith, October 01, 1999

PETE, short for "Portable Expression Template Engine," is a C++ framework that lets users easily add expression-template functionality to container classes and perform complex expression manipulations.

Oct99: Programmer's Toolchest

The authors are members of the technical staff in the Advanced Computer Laboratory at Los Alamos National Laboratory. They can be contacted at http://www [email protected].

Applicative Operator Functors

In 1995, Todd Veldhuizen and David Vandevoorde developed the expression-template technique to transform arithmetic expressions involving array-like containers into efficient loops that rivaled hand-coded C in speed (see "Expression Templates," by Todd Veldhuizen, C++ Report, July 1995 and "Linear Algebra with C++ Template Metaprograms," by Todd Veldhuizen and Kumaraswamy Ponnambalam, DDJ, August 1996). Expression templates are now at the heart of highly efficient scientific container libraries such as Pooma (http://www.acl.lanl.gov/pooma/) and Blitz++ (http://monet.uwaterloo.ca/ blitz/). Unfortunately, the technique's perceived complexity and the belief that it is limited to evaluating array expressions have deterred many potential expression-template users. Consequently, we developed a portable C++ framework that lets users easily add expression-template functionality to container classes and perform complex expression manipulations. This package is called "PETE," short for "Portable Expression Template Engine."

PETE is a compact package, consisting of about 3000 lines of ANSI/ISO C++. As the name suggests, PETE is widely portable and has been tested under UNIX with KAI C++ and EGCS, under Windows 95/98/NT with CodeWarrior Professional 4 and the Intel C++ compiler (part of VTune 3.0), and under MacOS with CodeWarrior Professional 4. Other compilers with ANSI-compliant template support should be able to compile the package with little difficulty. Achieving maximum performance requires aggressive inlining. In this regard, we get the best performance with KCC.

PETE is a mature framework. As part of the Pooma framework, we have used PETE in several large scientific codes in the areas of hydrodynamics, linear accelerator modeling, neutron transport, and fusion device modeling on platforms ranging from laptops to the 1.6 TFLOP Accelerated Strategic Computing Initiative (ASCI) BlueMountain supercomputer.

PETE is available electronically from DDJ (see "Resource Center," page 5) and the Los Alamos Advanced Computing Laboratory at http://www.acl.lanl .gov/ pete/).

An Overview of PETE

To illustrate PETE's features, we'll add full expression-template evaluation functionality to Standard Template Library (STL) vectors in fewer than 20 lines of code. Our goal is to transform statements such as:

A += -B + 2 * C;

(where A and C are vector<double>s and B is a vector<int>) into the single efficient inlined loop in Example 1.

The first step is to supply overloaded arithmetic operator functions. Unlike conventional overloaded operators, which immediately apply an operation and typically generate temporaries to hold intermediate results, PETE operator functions return objects that can be combined to incrementally build up the parse tree of an expression. In our example, operator-(const vector<T, Allocator>&) returns an object of type

Expression<UnaryNode<OpUnaryMinus, vector<T, Allocator>::const_iterator> >

while

operator*(T1, const vector<T2, Allocator>&)

returns an

Expression<BinaryNode<OpMultiply, Scalar<int>, vector<T2, Allocator>::const_ iterator> >

These objects are combined by operator+(const Expression<T1>&, const Expression<T2>&) to produce the type in Figure 1. By comparing this to the graphical representation of the parse tree, you can see that the structure of the parse tree is encoded in the object's type. The UnaryNode and BinaryNode classes represent nonterminal nodes containing an applicative operator functor, such as OpMultiply (see the accompanying text box entitled "Applicative Operator Functors"), and one or more leaves, consisting of scalars, iterators, or other nonterminal nodes. The class Scalar is used to wrap noncontainer objects, like integers, which have a constant value wherever the expression is evaluated. Finally, the class Expression wraps nonterminal nodes and thereby provides a common interface for all expression objects.

PETE knows how to store nonterminal nodes and scalars in the expression parse tree, but it must be told how to store the container objects that are passed as arguments to the operator functions. This is done in Listing One by specializing the CreateLeaf class for vector<T, Allocator>. There are basically two choices: You can store a helper class like an iterator or you can store a reference to the container class itself. To store an iterator as in Figure 1, you simply provide a Leaf_t typedef that specifies that a const_iterator is to be stored at the leaves and a make member function that takes a container as an argument and turns it into a Leaf_t by simply calling vector's begin function. Classes for which CreateLeaf is not specialized are assumed to be scalar objects.

By default, PETE supports 45 built-in operators including all of the C++ mathematical operators along with a collection of common mathematical functions like sin(). A where(a,b,c) function is also provided since the conditional expression a ? b : c cannot be overloaded. PETE includes an easy-to-use tool called "MakeOperators" to automatically generate all of the required functions. For example, the four-line MakeOperators input file in Example 2 produces the approximately 4000 lines of code and 249 functions needed to fully support arithmetic involving STL vectors. These functions are written to the Listing1Operators.h file (see Listing One). As Example 2 shows, a MakeOperators class specification consists of the template arguments (ARG) and the class (CLASS). MakeOperators substitutes unique numbers for the string "[n]" to number the arguments for binary and ternary operators. MakeOperators can also be used to add custom functions to the expression-template system.

MakeOperators can generate global assignment operator functions such as operator+=. This feature is optional so you can define your own assignment operators if you want. MakeOperators can also generate an assign function to handle straight assignment for classes, like the standard vector, where it is not possible to add an operator= member function. Therefore, if you are willing to put up with somewhat awkward syntax like assign(X,Y+Z) in lieu of X=Y+Z, it is possible to add expression-template functionality to a class without changing or adding any member functions.

Example 3 shows the generated definition of operator+= for accumulating a PETE expression into an STL vector. This passes the arguments along with an applicative operator functor, OpAddAssign, on to a user-supplied evaluate function. Delegating the actual assignment operation to a functor allows a single evaluate function to be used by all of the various assignment operators.

Listing One contains the evaluate function for our STL vector example. This function iterates through the elements of the container, assigning the values computed from the expression on the right-hand side (RHS) of the assignment operator to the appropriate elements of the container on the left-hand side (LHS).

Because we store iterators in the expression tree, evaluation consists of two parts. First, for each iterator position, the RHS expression must be dereferenced and assigned to the LHS container. Second, the iterators stored in the RHS expression must be bumped to the next element. The first of these tasks is performed by

op(*iterLHS, forEach(rhs, DereferenceLeaf(), OpCombine()));

Here op is our applicative operator functor object. This call to its operator() will perform the appropriate assignment. For example, if op is an OpAddAssign object, this code inlines to

*iterLHS += forEach(rhs, DereferenceLeaf(), OpCombine()));

The forEach function is the most important tool in PETE. This function performs a compile-time post-order parse-tree traversal (see "Generic Programming in POOMA and PETE," by James A. Crotinger, Julian Cummings, Scott Haney, William Humphrey, Steve Karmesin, John Reynders, Stephen Smith, and Timothy J. Williams, Lecture Notes in Computer Science, 1999). Its general form is:

forEach(Expression, LeafTag, CombineTag);

This function call traverses the nodes of the Expression object, applying an operation selected by LeafTag at the leaves, and combining the results from nonleaf nodes' children according to CombineTag. In our example, each leaf iterator is dereferenced (as indicated by the DereferenceLeaf tag), and the results are combined using the operators stored at the nonleaf nodes (as indicated by the OpCombine tag). Thus, if RHS represents -B + 2 * C, the forEach call above will inline to the code

-*iterB + 2 * *iterC,

where iterB and iterC are the current values of the iterators to vectors B and C, respectively. Dereferencing STL iterators and combining results according to arithmetic rules are common operations. Therefore, PETE predefines the actions associated with DereferenceLeaf and OpCombine (as well as several others). After evaluating the expression, the leaf iterators are bumped by applying the IncrementLeaf operation at the leaves and ignoring the return values by specifying the NullCombine tag:

forEach(rhs, IncrementLeaf(),

NullCombine());

This call inlines to

++iterB; ++iterC;

Thus, we see that evaluate expands to precisely the code in Example 1.

Advanced Examples

Extending the aforementioned example to allow vectors and lists to be mixed in expressions turns out to be simple using PETE's tools. First, modify the MakeOperators input file to add entries for list, as in Example 4. Then add a custom CreateLeaf specialization for list, storing an iterator in the leaf nodes, and a custom evaluate function that can deal with a list on the left side. Listing Two demonstrates the mixed usage.

You can easily extend evaluation of STL vector expressions to check for size conformance (source code that illustrates this is available electronically; see "Resource Center," page 5). This introduces important techniques for extending PETE's functionality. The necessary operators are the same as in the first example, so you can reuse the MakeOperators file in Example 2. There is a new wrinkle here though. We need to access the vector objects themselves at the leaves because a vector's size cannot be inferred from a single iterator. By default, PETE stores leaves by value, which is appropriate for iterators or for container classes with shallow copy semantics. However, vectors have deep copy semantics, so we must modify PETE's behavior to have it store references to the vectors in the leaves. This requires a straightforward modification to CreateLeaf to wrap the container in a Reference object that tells PETE's nonterminal nodes to store it by reference.

Conformance testing is done by passing the size of the LHS container down to the leaves, with each leaf comparing the LHS size with its own and returning a bool to indicate conformance. To do this, you first define a SizeLeaf tag to express the fact that you want to check size conformance at the leaves. This tag is initialized with the size of the LHS, and it uses operator()(int) to compare its value with the argument. The actual leaf operations are delegated to the LeafFunctor class, which is templated on the leaf type and on the leaf tag. You specify PETE's actions on the expression's leaves by specializing LeafFunctor for a particular combination of leaf type and tag. LeafFunctor must supply an apply method that performs the desired action and a Type_t typedef giving apply's return type. When you introduce a new leaf tag (one that is not already defined in PETE), you must tell PETE what to do when it applies this tag to a leaf that contains a scalar value. This is done by specializing LeafFunctor for your tag and Scalar<T>. In this case, apply is specialized to return True for scalars (they always conform), and to invoke the SizeLeaf test on the leaf's size if the leaf is a vector.

Because the leaves contain vectors instead of iterators, the evaluate function must be modified. PETE provides an EvalLeaf1 tag that expresses the desire to evaluate a container at a particular index, which is stored as a data member of the EvalLeaf1 object. Although PETE supplies this tag, you have to tell PETE explicitly what this means for your container class. For the case of applying the EvalLeaf1 operation to an STL vector, you simply extract the index from the EvalLeaf1 object and return the appropriate element from the vector. Finally, you can write evaluate. It is similar to the first example, except that it includes a conformance test based on the value returned from:

forEach(rhs, SizeLeaf(lhs.size()),

AndCombine());

This traverses the parse tree, checking conformance of the leaves with lhs.size(), and combining the results with AndCombine, which simply takes the logical AND of the results from a node's children. This will return True if all of the leaves conform, in which case you then loop through the elements, evaluating the RHS at the ith element with:

forEach(rhs, EvalLeaf1(i), OpCombine())

The final example is the most complex, demonstrating the use of PETE to synthesize types. These types can be used to implement compile-time polymorphism, whereby highly efficient custom code can be generated based on the types involved in an expression. The goal here is to write a function, elem(expression, i), that evaluates the expression for the ith element of each term, where the terms can be either list or vector objects. If all of the leaves are vectors, the iterators in the leaves are directly offset and dereferenced. If any of the leaves contain lists, the iterators are first advanced to the appropriate position before dereferencing. Thus, the elem function has to determine the most general (lowest-common-denominator) iterator type for the expression and then perform the appropriate action depending on this type. This example is a bit simplistic: Instead of repeatedly incrementing the iterators, elem could just use the STL advance algorithm, in which case this type synthesis would be unnecessary to obtain an efficient evaluation. However, the principles here are applicable to more advanced situations.

To access the iterator types in an expression, you first need to add a new leaf tag, GetIteratorTagLeaf, and specialize LeafFunctor to define Type_t to be the iterator tag for each possible leaf type (source code that illustrates this is available electronically). You do not need to define an apply method here (or in the Combine2 functors) because this function is not used in type computations. Next, you specialize the Combine2 functor. Combine2 is only applied to binary-tree nodes in the parse tree. Its template parameters are the types of the node's left and right children, the applicative operator type (ignored here), and the combine-tag type. You define a new combine tag, IteratorTagCombine, and specialize on the various combinations of iterator tags to define Combine2::Type_t to be the most general iterator tag of the ones returned from the children. Thus, LeafFunctor will get the iterator tag types appropriate for the containers in the expression. These types are propagated up, and Combine2 selects the most general iterator type. This is further propagated up and combined with results from other subtrees until the root is reached, at which point the most general iterator for the entire expression will be determined.

Finally, you write the elem functions. The general one takes an expression and an offset. Here you see the direct use of the template metaprogram class ForEach. The forEach function used in most of this article is simply a template function wrapper for ForEach::apply(). This is done because template functions can infer their template parameters from their arguments, thus providing a simpler interface. However, for type computations you must use ForEach directly. ForEach::Type_t is the return type of the forEach function. This fact is used to deduce the return type for the elem function. Inside of elem, you use ForEach to deduce the appropriate iterator tag for the whole expression. The typedef statement, defining IteratorTag_t, will cause the ForEach template metaprogram to go through the expression, performing the type computation outlined earlier. You then use the resulting iterator tag type to select the appropriate overloaded version of elem. These are straightforward extensions of our earlier examples.

Conclusion

PETE is a powerful, yet easy-to-use, framework for adding expression-template functionality to a container class. You simply define a few external traits classes and provide a few lines of input to a tool that writes a required header file. PETE can also be used in a completely external manner, without making any changes to the user's container classes. Finally, PETE can be used for more general expression manipulations, such as conformance checking and type synthesis.

Acknowledgment

This work was performed under the auspices of the U.S. Department of Energy by Los Alamos National Laboratory under contract #W-7405-ENG-36.

DDJ

Listing One

#include <iostream.h>
#include <vector.h>

#include "PETE/PETE.h"
#include "Listing1Operators.h"
// We need to specialize the CreateLeaf traits class for STL vectors so that
// operators know what to stick in the leaves of the expression tree. In this
// case, we store a const_iterator at the leaves.

template<class T, class Allocator>
struct CreateLeaf<vector<T, Allocator> >
{
  typedef typename vector<T, Allocator>::const_iterator Leaf_t;
  inline static Leaf_t make(const vector<T, Allocator> &a) 
                                              { return a.begin(); }
};
// Loop over vector and evaluate the expression at each location.
template<class T, class Allocator, class AssignOp, class RHS>
inline void evaluate(vector<T, Allocator> &lhs, 
                                       const AssignOp &op, const RHS &rhs)
{
  typename vector<T, Allocator>::iterator iterLHS = lhs.begin();
  while (iterLHS != lhs.end())
    {
      // The actual assignment operation is performed here. PETE operator 
      // tags all define operator() to perform the operation. (In this case 
      // op performs an assignment operation.) forEach is used to compute 
      // the rhs value. DereferenceLeaf gets the values at each node by 
     // deferencing an iterator, and the tag OpCombine tells forEach to use 
     // the operator tags in the expression to combine values together.

      op(*iterLHS, forEach(rhs, DereferenceLeaf(), OpCombine()));

      // Increment the LHS iterator.
      iterLHS++;
      // Now, we have to increment the iterators for everything on the RHS.
      // The results don't need to be combined so we use a NullCombiner.
      forEach(rhs, IncrementLeaf(), NullCombine());
    }
}
int main()
{
  int i;
  const int n = 10;
  vector<double> A, C, E(n);
  vector<int> B, D;

  for (i = 0; i < n; ++i)
    {
      A.push_back(i);
      B.push_back(2*i);
      C.push_back(3*i);
      D.push_back(i);
    }
  A += -B + 2 * C;
  assign(B, 2);
  assign(D, A + B * C);
  A += where(D < 30, B, C);

  assign(E, C);
  E += E - 4 / (sin(C) + 1);
  for (i = 0; i < n; ++i)
    {
      cout << " A[" << i << "] = " << A[i]
           << " B[" << i << "] = " << B[i]
           << " C[" << i << "] = " << C[i]
           << " D[" << i << "] = " << D[i]
           << " E[" << i << "] = " << E[i]
           << endl;
    }
  return 0;
}

Back to Article

Listing Two

#include <iostream.h>
#include <list.h>
#include <vector.h>

#include "PETE/PETE.h"
#include "Listing2Operators.h"

// We need to specialize the CreateLeaf traits class for STL vectors and lists
// so that operators know what to stick in the leaves of the expression tree.
// In these cases, we store const_iterators at the leaves.

template<class T, class Allocator>
struct CreateLeaf<vector<T, Allocator> >
{
  typedef typename vector<T, Allocator>::const_iterator Leaf_t;
  inline static Leaf_t make(const vector<T,Allocator> &a) { return a.begin();}
};
template<class T, class Allocator>
struct CreateLeaf<list<T, Allocator> >
{
  typedef typename list<T, Allocator>::const_iterator Leaf_t;
  inline static Leaf_t make(const list<T, Allocator> &a) { return a.begin(); }
};

// Loop over vector and evaluate the expression at each location.
template<class T, class Allocator, class Op, class RHS>
inline void evaluate(vector<T, Allocator> &lhs, const Op &op, const RHS &rhs)
{
  for (int i = 0; i < lhs.size(); ++i)
    {
      // The actual assignment operation is performed here.
      // PETE operator tags all define operator() to perform the operation.
      // (In this case op performs an assignment operation.)
      // forEach is used to compute the rhs value.  DereferenceLeaf gets the
      // values at each node by deferencing an iterator, and the tag
      // OpCombine tells forEach to use the operator tags in the expression
      // to combine values together.

      op(lhs[i], forEach(rhs, DereferenceLeaf(), OpCombine()));

      // Now, we have to increment the iterators for everything on the RHS.
      // The results don't need to be combined so we use a NullCombiner.

      forEach(rhs, IncrementLeaf(), NullCombine());
    }
}
// Loop over list and evaluate the expression at each location.
template<class T, class Allocator, class Op, class RHS>
inline void evaluate(list<T, Allocator> &lhs, const Op &op, const RHS &rhs)
{
  typename list<T, Allocator>::iterator i = lhs.begin();
  while (i != lhs.end())
    {
      // The actual assignment operation is performed here.
      // PETE operator tags all define operator() to perform the operation.
      // (In this case op performs an assignment operation.)
      // forEach is used to compute the rhs value.  DereferenceLeaf gets the
      // values at each node by deferencing an iterator, and the tag
      // OpCombine tells forEach to use the operator tags in the expression
      // to combine values together.

      op(*i++, forEach(rhs, DereferenceLeaf(), OpCombine()));

      // Now, we have to increment the iterators for everything on the RHS.
      // The results don't need to be combined so we use a NullCombiner.

      forEach(rhs, IncrementLeaf(), NullCombine());
    }
}
int main()
{
  int i;
  const int n = 10;
  vector<double> A, C;
  vector<int> B;
  list<int> D;
  list<double> E;

  for (i = 0; i < n; ++i)
    {
      A.push_back(i);
      B.push_back(2*i);
      C.push_back(3*i);
      D.push_back(i);
      E.push_back(0.0);
    }

  A += -B + 2 * C;

  assign(B, 2);
  assign(D, A + B * C);
  A += where(D < 30, B, C);

  assign(E, C);
  E += E - 4 / (sin(C) + 1);
  list<int>::const_iterator di = D.begin();
  list<double>::const_iterator ei = E.begin();
  for (i = 0; i < n; ++i)
    {
      cout << " A[" << i << "] = " << A[i]
           << " B[" << i << "] = " << B[i]
           << " C[" << i << "] = " << C[i]
           << " D[" << i << "] = " << *di++
           << " E[" << i << "] = " << *ei++
           << endl;
    }

  return 0;
}

Back to Article

1 2 3 4 5 6 7 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

C/C++