Automated Wrapping of Complex C++ Code

We're confident you'll be as tickled as we are with this tool for generating extra-lingual wrappers for C++ code.


January 01, 2003
URL:http://www.drdobbs.com/automated-wrapping-of-complex-c-code/184401607

Automated Wrapping of Complex C++ Code

Introduction

Although many programmers are devoted supporters of C and C++, these compiled languages make prototyping with a large software system difficult. Programmers often prefer interpreted languages for prototyping tasks, partially to remove the delay required to compile and link programs, but also because interpreted languages are less complex than compiled languages and provide productivity tools such as GUI builders (e.g., Tk). Interpreted languages are also useful for system integration tasks because they use simple types evaluated at run time, which facilitates interprocess communication. As a result, interpreted languages such as Tcl, Python, Perl, and Visual Basic have become important parts of the programmer's toolbox.

There are benefits to both interpreted and compiled languages, and environments that include aspects of both are not uncommon. For example, C++ programs often include a built-in interpreter so that an application can be controlled at run time with simple scripts. Fully interpreted languages, such as Tcl, have been developed for this purpose. Unfortunately, the full power of such hybrid compiled/interpreted environments has not been realized, since only a small set of the functionality available to the C/C++ programmer is available to the interpreted portion of the application. This is because the work required to "wrap" the compiled code with an interpreted interface is considerable. Semi-automatic systems like SWIG (<http://swig.org>) have been developed that simplify the process. SWIG, which has the ability to wrap C (and some C++) code into Tcl, Python, Perl, and Guile, still requires configuration files to include interface and conversion specifications. If the source files change, these configuration files may become out of date. Moreover, SWIG cannot directly handle complex, templated C++ code as is typical of generic programming or STL use. Other approaches to this problem, notably that adapted by VTK (<www.vtk.org>), use a custom parser that enforces a programming methodology on the C++ code so that automatic wrapping is possible. However, these solutions also cannot handle complex C++ code and do not provide a wrapped interface to all the C++ functionality.

The limitations of these tools, along with the benefit of a compiled interpreted/compiled environment, motivated us to create CABLE, an open-source, fully automatic system to wrap C++ code of arbitrary complexity. CABLE requires no interface specification in configuration files, only a list of classes and class template instantiations. Our use of CABLE, which was developed for the National Library of Medicine Insight Segmentation and Registration Toolkit (<www.itk.org>), is integrated into our standard build process. C++ code is written and compiled, and CABLE is then run to build an interpreted interface to all public methods. In this way, programmers can easily build applications with the benefits of an interpreted interface and the speed and efficiency of high-performance C++ libraries.

Approach

CABLE's operation consists of four major steps. First, CABLE's parser records information about declarations, template instantiations, and the relationships among types in the original C++ source code. Second, CABLE generates new C++ code that contains calls to the constructors and methods of the classes being wrapped, as well as wrappers for the possible conversions that are defined for each type. Initialization code is generated to register the wrappers and conversion functions with a run-time facility (see below). Next, a compiler builds the newly generated wrapper code and produces a library that works as a loadable module for the interpreted language.

Finally, the user runs an interpreted program that loads the wrapper module. A special facility provided by CABLE for each interpreted language is also loaded along with the module. All wrapper modules register information with a single copy of this facility. This includes information about the wrapped classes, methods, and conversions. Type information is registered for every C++ type involved in a wrapped method or conversion. When the interpreter processes a command involving the wrappers, it hands the command over to the facility, which dispatches the call to the appropriate wrapper. Once a wrapper is invoked, it performs any necessary C++ type conversions and then calls the real method that it wraps.

GCC-XML

CABLE's first step is to parse the C++ source describing the classes to be wrapped. Parsing and representing C++ in its entirety is a challenging problem. Previous wrapper generation tools have used partial parser implementations that try to identify what is interesting and discard the rest. Such parsers often have difficulty dealing with arbitrarily complex code and are therefore not suitable for a fully automated system.

Writing a complete C++ parser from scratch, however, is at least as large a problem as automated wrapper generation. We realized that the easiest way to get a complete parser was to start with a compiler's parser. The GNU C++ compiler (<http://gcc.gnu.org>) is freely available and open source. GCC, which provides a complete implementation of a C++ parser, already runs on a wide variety of platforms and is maintained independently as C++ evolves.

This realization resulted in the creation of GCC-XML (<www.gccxml.org>), an extension to GCC that dumps an XML representation of parsed C++ code. This representation was chosen because XML is easy to write and easy to parse. Such a representation may also have applications beyond CABLE, so GCC-XML is provided as a separate tool. The extension is relatively non-invasive, requiring only one additional source file and a small patch to a few existing files. GCC-XML is available as a stand-alone tool and does not require that a system compiler have the extension included.

CABLE

CABLE performs two major tasks when generating wrapper code. The first task is the parsing phase. A wrapper configuration file written in C++ is passed as input. This file defines (or #includes) the classes to be wrapped along with some configuration code. CABLE runs GCC-XML to parse the C++ configuration. An XML parser reads the GCC-XML output and builds an internal representation of the classes, methods, and types to be wrapped. This representation provides functionality for traversing the structure of the input program. All language-specific wrapper generators can share this representation.

The second task is the generation of wrappers for a given interpreted language. CABLE generates a call to every method, constructor, and conversion that is to be available to the interpreted language. Code to help dispatch the execution of each wrapper is also generated. This includes code to register type information and lists of classes and methods with the facility for the given language. The generator implementations are separate but can share the representation resulting from CABLE's parsing process.

Challenges

Class templates cannot be wrapped directly because each instantiation produces separate code. Any code that is to be executed when called from an interpreted language must exist somewhere in compiled form. One must choose specific template instantiations to be wrapped so that they can be compiled ahead of time. Once a class template has been instantiated, it differs little from any other class, but has a fancy name. Previous tools have had trouble wrapping templated classes, mostly due to the difficulty in parsing a class template and creating instantiations from it. Since CABLE uses GCC-XML to parse C++ code, all of this difficult work is done before CABLE reads it. When CABLE parses the XML output, class template instantiations are no different from other classes. The generated code refers to each instantiation by its complete name, so the compiler will produce wrappers referring to the correct class template instantiation.

Type conversion in C++ is often done implicitly as part of a method call. Each argument to a wrapped method must be converted at run time into what is expected by that method. This means that a single call to a method wrapper from the interpreted language may result in calls to several wrapped conversions before the call to the real method. The results of the conversions may also need to be cleaned up after the call before returning control to the interpreter. The language-specific wrapper facility provided by CABLE maintains type information for every C++ object referenced in the interpreted language. A method wrapper identifies the real type of each argument it is given and dispatches a conversion function if necessary.

Many software tools define their own format for configuration files. For CABLE, a user must specify the classes to be wrapped, the header files in which they are defined, and the names with which the classes are to appear in the generated language wrappers. Instead of defining an arbitrary configuration file format, CABLE uses C++. In fact, the generated wrapper files #include the configuration file itself. This allows users to define classes and their wrapper configuration in a single C++ source file. Any header files needed by the user's classes are automatically brought into the generated wrapper files with no special configuration.

Examples

Wrapping std::string

The standard C++ std::string class provides a good example for wrapping into Tcl. It is standard, widely used, and has constructors, methods, and operators. Listing 1 provides a complete CABLE input file to build a Tcl package called StringTcl. Note that the code specifies nothing about the methods provided by the std::string class, but only gives its name and the header in which it is found. CABLE can be invoked to generate wrappers like this:

cable stringtcl_config.cxx -tcl stringtcl.cxx
A compiler can be used to build the generated wrappers in stringtcl.cxx into a Tcl package. Once the package is built, it can be loaded into a Tcl interpreter with the load command. The package will provide one command called stdstring. Invoking this command with no arguments will call the default constructor of the string class as shown in this Tcl example:

set str1 [stdstring]
This creates a Tcl object to refer to the C++ instance and stores it in the variable str1. One can then call methods on this instance by using it as a command and providing the method name as the first argument. Additional arguments are treated as arguments to the method:

$str1 append "Bar"
puts [$str1 c_str]
Consider this short C++ example:

using namespace std;
string str1;
str1 = "Foo";
string str2(str1);
str1.append("Bar");
int i = str1.find_first_of("a");
string str3 = str1.substr(0, i);
cout << str1.c_str() << endl;
cout << str2.c_str() << endl;
cout << str3.c_str() << endl;
The example can be written in Tcl, after CABLE wrapping, like this:

set str1 [stdstring]
$str1 = Foo
set str2 [stdstring $str1]
$str1 append Bar
set i [$str1 find_first_of a]
set str3 [$str1 substr 0 $i]
puts [$str1 c_str]
puts [$str2 c_str]
puts [$str3 c_str]
Listing 2 provides this code in more detail. The output from either the C++ or Tcl program is:

FooBar
Foo
FooB
Note the close correspondence between the C++ and Tcl code. One of the goals of CABLE is to make using the generated wrappers as intuitive as possible. Most features are targeted at closely duplicating C++ semantics. For example, automatic variables are provided:

proc foo {} {
  set temp [stdstring "Temper"]
  return [$temp substr 0 4]
}
set str [foo]
set str {}
The instance of std::string to which the Tcl variable temp refers will be destroyed as the procedure foo returns, but the result value (another std::string instance) will be stored in Tcl variable str. Setting this variable to something else will destroy the second instance of std::string as well.

Since std::string is a standard class in C++, a Tcl wrapper could be written just once to provide the same functionality described above. This approach does not work well when a class API changes rapidly. If such a class were to be wrapped by hand, the wrappers would have to be updated every time the class changed. CABLE solves this problem by providing automatic wrapping directly from the C++ source of a class.

Wrapping a Templated Counter Class

Consider this simple class template:

template <typename T>
struct Counter {
  Counter(): value(0) {}
  T Get() const { return value; }
  void Set(T v) { value = v; }
  void Reset() { value = 0; }
  void Increment() { ++value; }
private:
  T value;
};
Despite its appearance, writing wrappers for this class by hand would turn into a maintenance nightmare, especially when it is integrated with a larger system. CABLE can be used to automatically produce wrappers for Counter<int> and Counter<float> with only a few lines of configuration:

namespace _cable_
{
 namespace wrappers
 {
  typedef Counter<int> Counter_int;
  typedef Counter<float> Counter_float;
 }
}
This example omits a few lines of package configuration code, but the complete CABLE configuration file is shown in Listing 3. Wrapped instantiations of Counter can be used as any other class:

set c [Counter_int]
lappend result [$c Get]
$c Set 4
lappend result [$c Get]
$c Increment
lappend result [$c Get]
$c Reset
lappend result [$c Get]

set c [Counter_float]
$c Increment
lappend result [$c Get]
puts $result

# Output is "0 4 5 0 1.0"
If the class template for Counter changes, CABLE will automatically regenerate wrappers to incorporate the changes into the interpreted language.

Future Work

CABLE is a young, open-source tool with significant potential. We have successfully used it to wrap the large VTK and ITK systems. Currently, CABLE only generates Tcl wrappers, but we plan on adding Python and Perl wrappers in the future. We have also found that the wrapper library size is bigger than expected; this is a result of the templated type conversion that we use to resolve method invocation. We are now looking at ways of improving this limitation, and of improving the overall speed of the wrapper generation process. Since CABLE is open source, we encourage anyone to contribute. See <http://public.kitware.com/Cable> for more information.

About the Authors

Brad King, a member of Kitware's technical staff, is principal author of CABLE. Brad is a Ph.D. student at Rensselaer Polytechnic Institute with interests in computer vision and programming languages. Brad may be contacted at [email protected].

Dr. William J. Schroeder is president and co-founder of Kitware, Inc. Kitware provides advanced visualization solutions for complex 3-D data. William may be contacted at [email protected].

Listing 1: StringTcl CABLE configuration

Listing 1: StringTcl CABLE configuration

// stringtcl_config.cxx
// CABLE input file to build StringTcl package.

// Include string class's header.
#include <string>

// The symbol "CABLE_CONFIGURATION" is defined only when CABLE is
// reading this file.  Place the CABLE-specific code inside this
// section so it cannot be seen by the compiler.
#ifdef CABLE_CONFIGURATION
namespace _cable_
{
  // Specify package name.  Group configuration allows multiple
  // configuration files to define wrappers for a single package.
  const char* const group="StringTcl1";
  const char* const package="StringTcl";
  const char* const groups[]={"StringTcl1"};
  namespace wrappers
  {
    // Tell CABLE to wrap the "std::string" class.  We want the class
    // wrapper to be called "stdstring".
    typedef std::string stdstring;
  }
}

// Make sure that std::string is a complete type so that all the
// methods are available.
void _cable_instantiate()
{
  sizeof(_cable_::wrappers::stdstring);
}
#endif
 

Listing 2: CounterTcl CABLE configuration

Listing 2: CounterTcl CABLE configuration

# stdstring.tcl
# Use the stdstring wrapper.
load ./libStringTcl.so

# Create an instance of std::string.
set s0 [stdstring]

# Call the C++ assignment operator to set it.
$s0 = "FooBar"

# Print the string's value:
puts [$s0 c_str]

# Create another instance 
set s1 [stdstring "Hello, World!"]
puts [$s1 c_str]

# Find the substring before the first comma.
set s2 [$s1 substr 0 [$s1 find_first_of ,]]
puts [$s2 c_str]

# The output from this script should be:
#   FooBar
#   Hello, World!
#   Hello

Listing 3: CABLE configuration file

Listing 3: CABLE configuration file

// countertcl_config.cxx
// CABLE input file to build CounterTcl package.

// Define Counter class in-line in the configuration file.
template <typename T>
class Counter
{
public:
  Counter(): value(0) {}
  T Get() const { return value; }
  void Set(T v) { value = v; }
  void Reset() { value = 0; }
  void Increment() { ++value; }
private:
  T value;
};

// The symbol "CABLE_CONFIGURATION" is defined only when CABLE is
// reading this file.  Place the CABLE-specific code inside this
// section so it cannot be seen by the compiler.
#ifdef CABLE_CONFIGURATION
namespace _cable_
{
  // Specify package name.  Group configuration allows multiple
  // configuration files to define wrappers for a single package.
  const char* const group="CounterTcl1";
  const char* const package="CounterTcl";
  const char* const groups[]={"CounterTcl1"};
  namespace wrappers
  {
    // Tell CABLE what instantiations to wrap, and what the class
    // wrappers should be called.
    typedef Counter<int> Counter_int;
    typedef Counter<float> Counter_float;
  }
}

// Make sure the Counter instantiations are complete types so that all
// the methods are available.
void _cable_instantiate()
{
  sizeof(_cable_::wrappers::Counter_int);
  sizeof(_cable_::wrappers::Counter_float);
}
#endif

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.