Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

JVM Languages

Creating Libraries for Multiple Programming Languages

Ken Martin, and

, February 01, 2002


Feb02: Creating Libraries for Multiple Programming Languages

Ken, William, and Berk develop software for Kitware Inc. (http://www.kitware.com/). They can be contacted at [email protected].


If you ask five software developers what programming language they use, you'll likely get five different answers. C++, Java, Python, Visual Basic, Perl, and Tcl are just a few of the languages commonly used. While it is convenient to have a wide variety of languages to choose from, each with its own set of advantages and disadvantages, it makes reusable library development a challenge. How do you develop a library that a wide range of software developers can use? How do you support rapid prototyping languages such as Python and Tcl while also providing the performance and efficiency required for production applications in C++? In creating the Visualization Toolkit (VTK) (see "3-D Surface Contours," by W.J. Schroeder and W.E. Lorensen, DDJ, July 1996; and The Visualization Toolkit User's Guide, by W.J. Schroeder et al., Kitware Inc. 2001, http://www.kitware.com/), we faced these issues and developed a strategy for addressing them.

VTK is a library designed to address the visualization needs of academic and commercial communities in fields such as medical and scientific research. As such, it supports a wide variety of developers. There are many researchers using VTK who typically use rapid prototyping languages such as Python or Tcl. They want the flexibility of an interpreted language allowing interaction and modification of a program while it is running. At the same time there are a number of commercial products using VTK where a small memory footprint and fast execution times are critical. These applications are typically done in C or C++, which usually produce faster and more compact code than languages such as Tcl or Visual Basic.

Our approach to writing multilanguage libraries was to write VTK in C++ using object-oriented methodologies, then create tools to automatically wrap the library into other languages. This lets us maintain the core functionality in one language that is fast and efficient, yet makes that functionality available to all languages. We integrate with user interfaces for languages such as Python, Java, Visual Basic (via ActiveX), and Tcl by creating a few support classes that let VTK display its results within a typical widget of the language, such as a JPanel for Java. In this article, we discuss how we did this and the challenges we encountered.

Implementation

There were a number of challenges to making VTK a library that could be used from many programming languages. With more than 500 C++ classes, VTK is a large library that is actively being developed and extended. Any wrapping technique we used had to be automated to keep up with the changes and additions to VTK. A manual process would quickly fall behind the current state of the software. We also required an approach that would let developers wrap their extensions to the VTK library so that they could access their new VTK classes from their preferred language, just like the core classes.

The first problem we ran into was parsing our C++ code. To wrap a library into these other languages, we needed a method of parsing the code and breaking it into its lexical components. To do this we decided to use the traditional language tools LEX for the lexical analysis and YACC (Yet Another Compiler Compiler) to parse the grammar. It turns out, writing LEX and YACC code for C++ is not an easy task because C++ has a plethora of features and its syntax is more context sensitive than other languages. Consequently, we had to make several compromises to facilitate the parsing. The first was our philosophy regarding C++. We believed in using only a subset of features from C++ within a relatively rigid development environment. For example, we did not use templated classes, run-time type checking, or exceptions. In addition, our coding standards dictated that all class names start with a common prefix (vtk), a limit of one class per file, and a set of simple macros are used for performing common set/get methods. VTK also uses reference counting, which is key to handling memory management in the various wrapped languages.

With these restrictions, we were able to develop a LEX- and YACC-based program that could parse VTK header files and extract the pertinent information. Figure 1 is a high-level diagram of the process. The class's name, methods, and method signatures (number of arguments, return type, argument types) are parsed, then stored into a data structure for later use. We made no attempt to handle preprocessor directives, indicating that our wrapping of a class is done based on that class's header file alone, without looking at its superclasses' header files. (We are currently developing a next-generation wrapping approach using GNU C++ to parse the header files. This approach avoids these restrictions.)

One advantage of using an object-oriented design for the library is that it maps easily to most languages. Java and Python are object oriented by design while other languages, such as Tcl, already have a notion of commands as instance method arguments. Through COM, Visual Basic also maps easily to the notion of an instance with methods that can be invoked on it. Example 1 shows the similarities between five languages calling the same method.

The first step in wrapping a class library is to specify which classes to wrap and if they are concrete or abstract. In object-oriented terminology, abstract classes define an API for their subclasses. They are not meant to be instantiated and, under some circumstances, it can be a compiler error to do so (such as classes with pure virtual methods in C++). As such, we do not want to allow abstract classes to be instantiated in any of the wrapped languages, although we do want to wrap any methods they define because subclasses may rely on them.

Once we have specified the class's name and if it is concrete or abstract, the next step is to wrap the methods. Most languages provide mechanisms for extensions via C or C++. This extension support is what we use to wrap the methods of a class into a language. For Java we use the JNI (Java Native Interface), for Tcl we use the Tcl_CreateCommand function, for Visual Basic the Component Object Model (COM), and so on. The difficulty arises in three key areas:

  • Argument type conversions.
  • Memory management.

  • Supporting callbacks.

Argument type conversions are tricky because each language has its own structures for storing data. Some of the conversions, such as going from Java's jdouble to a C++ double, are easy while others, such as going from a Java jobject to a properly cast C++ pointer are more difficult. To aid in this, we created a set of utility conversion functions for each target language. These functions handle all the conversions between C++ and the target language. The most difficult conversions are object conversions due to the typecasting involved. We need to know if an instance of class A can be passed into a function that takes class B. Most object-oriented languages handle these type conversions and type safety automatically, but in other languages we must trace through the class hierarchy to determine if the type conversion can be safely done. To accomplish this, the automated wrapping typically adds type conversion functions to each wrapped class to convert to its superclasses. This can be chained up the hierarchy providing full, safe type conversion; see Listing One.

The second issue in wrapping methods is memory management. When a C++ method returns an instance of a class, who is responsible for freeing the memory? Recent versions of the JNI have introduced global-weak-references that serve this purpose perfectly. A global-weak-reference provides a reference to the object but doesn't prevent the object from being garbage collected. This lets the language interface code know when Java considers the object to be disposed of. A different strategy is used for Visual Basic through COM. Anytime a C++ object is returned from a method a new COM interface object is instantiated and wrapped around the C++ object. The C++ object has its reference count incremented, which requires that the underlying C++ objects support some form of reference counting. COM then manages the reference counting on the interface object in the standard way. When the COM interface is destroyed, it decrements the reference count on the underlying C++ object. This is important because C++ objects can be referenced in the C++ layer and may or may not have COM interfaces pointing at them.

The third issue is providing support for callbacks from C++. Many visualization algorithms are computationally expensive and can take minutes to compute. In these cases, having a progress bar is an advantage. But this requires that the C++ class be capable of invoking a callback to the wrapped language. To accomplish this we used the Subject Observer and Command design patterns (see Design Patterns: Elements of Reusable Object-Oriented Software, by Erich Gamma et al., Addison-Wesley, 1995). A VTK C++ class can have an observer added to its list of observers. When a specified event happens, the observer is notified and a command is invoked. Since the Command design pattern encapsulates a command into a C++ class, we can create subclass commands targeted to each wrapped language. Listing Two shows the Tcl command.

The class stores the Tcl interpreter and a string to execute as instance variables. In a similar manner, the Java command stores a handle to the Java environment (jenv), a Java object (jobject), and a method to invoke. For Visual Basic, we use COM connection points to provide callbacks. Listing Three is the command class for COM. Listings Four and Five illustrate Tcl and Visual Basic code using the observer objects.

In addition to the methods in the C++ classes that are wrapped, the wrapping process can provide some convenience commands that provide additional features not possible in C++. For example, the Tcl wrapper provides a ListMethods command that will list all of the methods for a particular class. Another command, DeleteAllObjects, lets you delete all objects created in the Tcl interpreter.

Comparison with COM and CORBA

Another way to provide this type of multilanguage support from a single library would be to use the Interface Definition Language (IDL) and either COM or CORBA. IDL is a way to define interfaces for objects in a language-neutral way. Both COM and CORBA use variations of IDL for specifying object interfaces. However, IDL only provides the interface for the object, and the implementation must be hand coded in C++, C, or some other language. This adds complexity for library developers who must know both IDL and the implementation language. With our approach, VTK developers need only learn C++ to develop new VTK objects. To use existing objects, users can choose from a variety of wrapped languages. Both COM and CORBA also suffer from portability issues. COM is only available for Windows. CORBA suffers from many different implementations and is often difficult to port from one CORBA implementation to the next.

However, if COM or CORBA bindings are desired, this approach can be used to generate the IDL from the C++ classes. In the VTK ActiViz software from Kitware (our company; http://www.kitware.com/), an IDL interface is created for each VTK class. Listing Six is an example of the IDL and C++ interfaces for the GetClassName method on vtkObject. The implementation objects for the IDL interfaces are automatically generated as well as thin proxy objects that talk to the C++ implementations.

Conclusion

The approach we describe here is a good solution for developing toolkits or libraries written in efficient C++, which also allow for rapid prototyping in scripted languages. Providing multiple programming language bindings for a library also gives the code a much wider audience by not forcing users into a particular language choice to use the software. However, the current implementation does have several drawbacks. Since the parser is not a true ISO compliant C++ parser, developers of the toolkit are required to only use a subset of the C++ language. This can be frustrating for experienced C++ developers who want to use advanced features such as templates and exceptions.

We are currently working on the next-generation wrapping system, which uses the ISO-compliant C++ parser found in the GNU compiler. The GNU compiler generates XML representations of class interfaces. The resulting XML can be parsed more easily than the original C++. This allows for the wrapping languages to take advantage of more C++ features.

DDJ

Listing One

extern "C" JNIEXPORT void* vtkCell_Typecast(void *me,char *dType)
{
  void* res;
  if (!strcmp("vtkCell",dType)) { return me; }
  if ((res= vtkObject_Typecast(me,dType)) != NULL) { return res; }
  return NULL;
}

Back to Article

Listing Two

class vtkTclCommand : public vtkCommand
{
public:
  vtkTclCommand();
  ~vtkTclCommand();   void SetStringCommand(char *arg) 
{ this->StringCommand = arg; };
  void SetInterp(Tcl_Interp *interp) 
{ this->Interp = interp; };
  void Execute(vtkObject *, unsigned long, void *);
private:
  char *StringCommand;
  Tcl_Interp *Interp;
};

Back to Article

Listing Three

class vtkComCommand : public vtkCommand
{
public:
  vtkComCommand(T* o, unsigned long id)
    {
      this->Object = o;
      this->EventId = id;
    }
  virtual void Execute(vtkObject *caller, unsigned  long, void *callData)
    {
      this->Object->Fire_VTKEvent(this->EventId);
    }
  T* Object;
  unsigned long EventId;
}

Back to Article

Listing Four

vtkRenderer renderer
    renderer SetStartRenderMethod start 
proc start{}  {
   puts "Start Render"
}

Back to Article

Listing Five

Dim WithEvents renderer As vtkRenderer
Private Sub renderer_StartEvent()
MsgBox "Start Render"
End Sub

Back to Article

Listing Six

HRESULT GetClassName ([out, retval] BSTR *arg20); //IDL
virtual const char *GetClassName(); // C++ 


Back to Article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.