Getting our two favorite languages to work together is so much easier now.
July 01, 2003
URL:http://www.drdobbs.com/building-hybrid-systems-with-boostpython/184401666
Still, for many programmers, these very differences mean that Python and C++ complement one another perfectly. Performance bottlenecks in Python programs can be rewritten in C++ for maximal speed, and authors of powerful C++ libraries choose Python as a middleware language for its flexible system integration capabilities. Furthermore, the surface differences mask some strong similarities:
These limitations have lead to the development of a variety of high-level wrapping systems, most of which introduce their own specialized languages to control the process. In contrast, Boost.Python presents the user with a high-level C++ interface for wrapping C++ classes and functions, managing complexity behind-the-scenes with static metaprogramming. Boost.Python also goes beyond the scope of earlier systems by providing:
Development with Boost.Python is "user-guided": as much information is extracted directly from the source code to be wrapped as is possible within the limits of pure C++, and some additional information is supplied explicitly by the user. Mostly the process is mechanical and little intervention is required. Because the interface specification is written in the same full-featured language as the code being exposed, the user has unprecedented power available when she does need to take control.
However, it's also important not to translate all interfaces too literally: the idioms of each language must be respected. For example, though C++ and Python both have an iterator concept, the concepts are expressed very differently. Boost.Python has to be able to bridge the interface gap.
It must be possible to insulate Python users from crashes resulting from trivial misuses of C++ interfaces, such as accessing already-deleted objects. By the same token, the library should insulate C++ users from the low-level Python C API, replacing error-prone C interfaces like manual reference-count management and raw PyObject pointers with more-robust alternatives.
Support for component-based development is crucial so that C++ types exposed in one extension module can be passed to functions exposed in another extension without loss of crucial information like C++ inheritance relationships.
Finally, all wrapping must be non-intrusive. In other words, the wrapping must occur without modifying or even seeing the original C++ source code. Existing C++ libraries have to be wrappable by third parties who only have access to header files and binaries.
char const* greet(unsigned x) { static char const* const msgs[] = { "hello", "Boost.Python", "world!" }; if (x > 2) throw std::range_error("greet: index out of range"); return msgs[x]; }To wrap this function in standard C++ using the Python C API, you'd need something like this:
extern "C" // all Python interactions use 'C' linkage // and calling convention { // Wrapper to handle argument/result conversion // and checking PyObject* greet_wrap(PyObject* args, PyObject * keywords) { int x; // extract/check arguments if (PyArg_ParseTuple(args, "i", &x)) { // invoke wrapped function char const* result = greet(x); // convert result to Python return PyString_FromString(result); } // error occurred return 0; } // Table of wrapped functions to be exposed by the module static PyMethodDef methods[] = { { "greet", greet_wrap, METH_VARARGS, "return one of 3 parts of a greeting" } , { NULL, NULL, 0, NULL } // sentinel }; // module initialization function DL_EXPORT init_hello() { // add the methods to the module (void) Py_InitModule("hello", methods); } }Now here's the wrapping code you'd use to expose it with Boost.Python:
#include <boost/python.hpp> using namespace boost::python; BOOST_PYTHON_MODULE(hello) { def("greet", greet, "return one of 3 parts of a greeting"); }and here is the code in action:
>>> import hello >>> for x in range(3): ... print hello.greet(x) ... hello Boost.Python world!Aside from the fact that the C API version is much more verbose than the Boost.Python version, it's worth noting that the C API doesn't handle a few things correctly:
Given:
struct World { void set(std::string msg) { this->msg = msg; } std::string greet() { return msg; } std::string msg; };The following code will expose the preceding code in our extension module:
#include <boost/python.hpp> BOOST_PYTHON_MODULE(hello) { class_<World>("World") .def("greet", &World::greet) .def("set", &World::set) ; }Although this code has a certain pythonic familiarity, people sometimes find the syntax bit confusing because it doesn't look like most of the C++ code they're used to. All the same, this is just standard C++. Because of their flexible syntax and operator overloading, C++ and Python are great for defining domain-specific (sub)languages (DSLs), and that's what we've done in Boost.Python. To break it down:
class_<World>("World")constructs an unnamed object of type class_<World> and passes "World" to its constructor. This creates a new-style Python class called World in the extension module and associates it with the C++ type World in the Boost.Python type conversion registry. We might have also written:
class_<World> w("World");but that would've been more verbose, since we'd have to name w again to invoke its def() member function:
w.def("greet", &World::greet)There's nothing special about the location of the dot for member access in the original example: C++ allows any amount of whitespace on either side of a token, and placing the dot at the beginning of each line allows us to chain as many successive calls to member functions as we like with a uniform syntax. The other key fact that allows chaining is that class_<> member functions all return a reference to *this.
So the example is equivalent to:
class_<World> w("World"); w.def("greet", &World::greet); w.def("set", &World::set);It's occasionally useful to break down the components of a Boost.Python class wrapper in this way, but the rest of this article will stick to the terse syntax.
For completeness, here's the wrapped class in use:
>>> import hello >>> planet = hello.World() >>> planet.set('howdy') >>> planet.greet() 'howdy'
>>> planet = hello.World()However, well-designed classes in any language may require constructor arguments in order to establish their invariants. Unlike Python, where __init__ is just a specially-named method, in C++, constructors cannot be handled like ordinary member functions. In particular, we can't take their address: &World::World is an error. The library provides a different interface for specifying constructors.
Given:
struct World { World(std::string msg); // added constructor ...We can modify our wrapping code as follows:
class_<World>("World", init<std::string>()) ...Of course, a C++ class may have additional constructors, and we can expose those as well by passing more instances of init<...> to def():
class_<World>("World", init<std::string>()) .def(init<double, double>()) ...Boost.Python allows you to overload wrapped functions, member functions, and constructors to mirror C++ overloading.
class_<World>("World", init<std::string>()) .def_readonly("msg", &World::msg) ...and can be used directly in Python:
>>> planet = hello.World('howdy') >>> planet.msg 'howdy'This does not result in adding attributes to the World instance __dict__, which can result in substantial memory savings when wrapping large data structures. In fact, no instance __dict__ will be created at all unless attributes are explicitly added from Python. Boost.Python owes this capability to the new Python 2.2 type system, in particular, the descriptor interface and property type.
In C++, publicly-accessible data members are considered a sign of poor design because they break encapsulation, and style guides usually dictate the use of getter and setter functions instead. In Python, however, __getattr__, __setattr__, and since 2.2, property mean that attribute access is just one more well-encapsulated syntactic tool at the programmer's disposal. Boost.Python bridges this idiomatic gap by making Python property creation directly available to users. If msg were private, we could still expose it as an attribute in Python as follows:
class_<World>("World", init<std::string>()) .add_property("msg", &World::greet, &World::set) ...The example above mirrors the familiar usage of properties in Python 2.2+:
>>> class World(object): ... __init__(self, msg): ... self.__msg = msg ... def greet(self): ... return self.__msg ... def set(self, msg): ... self.__msg = msg ... msg = property(greet, set)
class_<rational<int> >("rational_int") .def(init<int, int>()) // constructor, e.g. rational_int(3,4) .def("numerator", &rational<int>::numerator) .def("denominator", &rational<int>::denominator) .def(-self) // __neg__ (unary minus) .def(self + self) // __add__ (homogeneous) .def(self * self) // __mul__ .def(self + int()) // __add__ (heterogenous) .def(int() + self) // __radd__ ...The magic is performed using a simplified application of expression templates [1], a technique originally developed for optimization of high-performance matrix algebra expressions. The essence is that instead of performing the computation immediately, operators are overloaded to construct a type representing the computation. In matrix algebra, dramatic optimizations are often available when the structure of an entire expression can be taken into account, rather than evaluating each operation "greedily." Boost.Python uses the same technique to build an appropriate Python method object based on expressions involving self.
class_<Derived, bases<Base1,Base2> >("Derived") ...This has two effects:
1. When the class_<...> is created, Python type objects corresponding to Base1 and Base2 are looked up in the Boost.Python registry and are used as bases for the new Python Derived type object, so methods exposed for the Python Base1 and Base2 types are automatically members of the Derived type. Because the registry is global, this works correctly even if Derived is exposed in a different module from either of its bases.
2. C++ conversions from Derived to its bases are added to the Boost.Python registry. Thus wrapped C++ methods expecting (a pointer or reference to) an object of either base type can be called with an object wrapping a Derived instance. Wrapped member functions of class T are treated as though they have an implicit first argument of T&, so these conversions are neccessary to allow the base class methods to be called for derived objects.
Of course it's possible to derive new Python classes from wrapped C++ class instances. Because Boost.Python uses the newstyle class system, derivation works very much as for the Python built-in types. There is, however, one significant difference: the built-in types generally establish their invariants in their __new__ function, so that derived classes do not need to call __init__ on the base class before invoking its methods:
>>> class L(list): ... def __init__(self): ... pass ... >>> L().reverse() >>>Because C++ object construction is a one-step operation, C++ instance data cannot be constructed until the arguments are available in the __init__ function:
>>> class D(SomeBPLClass): ... def __init__(self): ... pass ... >>> D().some_bpl_method() Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: bad argument type for built-in operationThis happened because Boost.Python couldn't find instance data of type SomeBPLClass within the D instance; D's __init__ function masked construction of the base class. The situation could be corrected by either removing D's __init__ function or having it call SomeBPLClass.__init__(...) explicitly.
// // interface to wrap: // class Base { public: virtual int f(std::string x) { return 42; } virtual ~Base(); }; int calls_f(Base const& b, std::string x) { return b.f(x); } // // Wrapping Code // // Dispatcher class struct BaseWrap : Base { // Store a pointer to the Python object BaseWrap(PyObject* self_) : self(self_) {} PyObject* self; // Default implementation, for when f is not overridden int f_default(std::string x) { return this->Base::f(x); } // Dispatch implementation int f(std::string x) { return call_method<int>(self, "f", x); } }; ... def("calls_f", calls_f); class_<Base, BaseWrap>("Base") .def("f", &Base::f, &BaseWrap::f_default) ;Now here's some Python code that demonstrates:
>>> class Derived(Base): ... def f(self, s): ... return len(s) ... >>> calls_f(Base(), 'foo') 42 >>> calls_f(Derived(), 'forty-two') 9Things to notice about the dispatcher class:
Boost.Python provides a class object that automates reference counting and provides conversion to Python from C++ objects of arbitrary type. This feature significantly reduces the learning effort for prospective extension module writers.
Creating an object from any other type is extremely simple:
object s("hello, world"); // s manages a Python stringobject has templated interactions with all other types, with automatic to-Python conversions. It happens so naturally that it's easily overlooked:
object ten_Os = 10 * s[4]; // -> "oooooooooo"In the example above, 4 and 10 are converted to Python objects before the indexing and multiplication operations are invoked.
The extract<T> class template can be used to convert Python objects to C++ types:
double x = extract<double>(o);If a conversion in either direction cannot be performed, an appropriate exception is thrown at runtime.
The object type is accompanied by a set of derived types that mirror the Python built-in types such as list, dict, tuple, etc. as much as possible. This enables convenient manipulation of these high-level types from C++:
dict d; d["some"] = "thing"; d["lucky_number"] = 13; list l = d.keys();This almost looks and works like regular Python code, but it is pure C++. Of course we can also wrap C++ functions that accept or return object instances.
Boost.Python enables us to "think hybrid." Python can be used for rapidly prototyping a new application; its ease of use and the large pool of standard libraries give us a head start on the way to a working system. If necessary, the working code can be used to discover rate-limiting hotspots. To maximize performance these hotspots can be reimplemented in C++, together with the Boost.Python bindings needed to tie them back into the existing higher-level procedure.
Of course, this top-down approach is less attractive if it is clear from the start that many algorithms will eventually have to be implemented in C++. Fortunately Boost.Python also enables us to pursue a bottom-up approach. We have used this approach very successfully in the development of a toolbox for scientific applications. The toolbox started out mainly as a library of C++ classes with Boost.Python bindings, and for a while the growth was mainly concentrated on the C++ parts. However, as the toolbox is becoming more complete, more and more newly added functionality can be implemented in Python.
Figure 1 shows the estimated ratio of newly added C++ and Python code over time as new algorithms are implemented. We expect this ratio to level out near 70% Python. Being able to solve new problems mostly in Python rather than a more difficult statically typed language is the return on our investment in Boost.Python. The ability to access all of our code from Python allows a broader group of developers to use Python in the rapid development of new applications.
Computationally intensive tasks play to the strengths of C++ and are often impossible to implement efficiently in pure Python, while jobs like serialization that are trivial in Python can be very difficult in pure C++. Given the luxury of building a hybrid software system from the ground up, we can approach design with new confidence and power.
Ralf Grosse-Kunstleve is a scientist in the Physical Biosciences Division of the Lawrence Berkeley National Laboratory in California. He is part of the Computational Crystallography Initiative (cci.lbl.gov) which is leading an international effort aimed at advancing the automation of protein structure determination. The software system being developed by the collaboration has evolved together with the Boost.Python library and was designed from its inception as a hybrid Python/C++ system. The core components are available as an open source toolbox at cctbx.sourceforge.net.
Figure 1: Estimated ratio of newly added C++ and Python code over the course of the development of a hybrid system
Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.