Dr. Dobb's | Object Interconnections: Standard C++ and the OMG C++ Mapping

Object Interconnections: Standard C++ and the OMG C++ Mapping

The OMG mapping from CORBA IDL to C++ was standardized in 1994. As Schmidt and Vinoski explain, there are some good reasons to leave that mapping alone, even though the C++ language changed significantly between 1994 and 1998, when it finally became an international standard. But that doesn't keep Schmidt and Vinoski from having a little fun. In this installment they explore their own alternative "what if" mappings, based on newer features of C++.

January 01, 2001
URL:http://www.drdobbs.com/object-interconnections-standard-c-and-t/184403765

January 2001 C++ Experts Forum/Object Interconnections

Introduction

Long-time readers of "Object Interconnections" know that our column generally covers CORBA programming in C++. In keeping with this theme, our previous column [1] described the history of the OMG IDL C++ Language Mapping [2]. Our goal was to explain the design constraints that shaped the mapping, especially the non-technical ones arising from the political forces that often accompany standardization efforts.

CORBA C++ programmers, especially those relatively new to CORBA or to C++, often wonder why the OMG C++ mapping has not kept up with ISO/ANSI C++ language standardization efforts. As we described in our previous column, the CORBA C++ mapping was standardized in late 1994. Since that time, the C++ mapping has stayed largely the same — the OMG has made only those changes needed to fix bugs and to track the evolution of the CORBA specification.

There are several reasons why there have been no major modifications to the OMG C++ mapping since its initial standardization:

One reason is the desire for stability on the part of both ORB users and ORB vendors. Since changes to the mapping require reworking, retesting, and redeploying C++ ORB implementations and the applications built to use them, major mapping renovations could be costly.
Another reason is that the OMG has never issued an RFP (Request For Proposals) for C++ mapping renovations. Perhaps this is due in part to the difficulties surrounding the first standardization effort.
ORBs are typically used for integration, and as a result they must be highly portable, so they are available on a wide range of platforms. The OMG C++ mapping is already known to be highly portable. Adding dependencies on features of Standard C++, even on simple class templates such as pair, would reduce portability because some features are still unsupported in certain C++ compilers.

In this column, we explore some ideas for hypothetical alternative mappings of OMG IDL to C++ that use Standard C++ features. The primary design criteria for these alternative mappings are flexibility and ease of use, though we also pay attention to performance issues. To avoid creating false expectations, we must stress the word "hypothetical," as there are currently no efforts within the OMG to define any new C++ mappings, nor do we necessarily condone the creation of any official efforts to do so.

Major Areas of the OMG C++ Mapping

The CORBA C++ mapping can be divided into the following four major areas:

IDL data type mapping: defines how IDL data types, such as structs and sequences, are mapped into C++, including how they are passed between client and target object in operation invocations.
Client-side interface mapping: defines how IDL interfaces are mapped into C++ to support client applications invoking distributed objects.
Server-side interface mapping: defines how IDL interfaces are mapped into C++ to support server applications implementing distributed objects.
Pseudo-object mapping: CORBA includes interfaces for aspects of the ORB implementation, including the ORB interface itself, Request, Object, and named value lists (NVList). The interfaces of these pseudo objects are special in that they sometimes are not defined entirely in IDL (e.g., some are defined in a C-like language). These pseudo-object interfaces must also be mapped into C++ to make them available for both client and server applications.

In the remainder of this column, we'll examine the first two areas listed above to see how we might apply features of Standard C++ to each of them. We'll cover the other areas in our next column.

IDL Data Type Mapping

OMG IDL provides types such as string, wstring, sequence, and arrays. In the OMG C++ mapping, strings map to char*, wstrings map to CORBA::WChar* (where WChar is a typedef, usually for wchar_t), sequences map to classes, and arrays map to C++ arrays. For string, wstring, and arrays, these mappings have the drawback that they map to raw unmanaged arrays, which are known to be error-prone in C and C++ (yet another reason to always analyze your code with Purify or BoundsChecker!).

Many Standard C++ algorithms assume sequential containers with bidirectional iterators. Thus, both sequences and arrays from the OMG C++ mapping can already be used with these algorithms. Doing so with sequences requires access to the underlying sequence buffer, which you can obtain by calling the standard CORBA get_buffer method on any sequence. This method returns a contiguous array, and pointers into the array can act as iterators for the standard algorithms. However, this addresses only a small part of the issues — especially memory management issues — that many CORBA developers would like to see addressed by a C++ mapping that takes advantage of Standard C++.

For sequences, the OMG C++ mapping is problematic in that the interfaces of the mapped C++ sequence classes lack certain functionality, such as vector operations. For example, the CORBA sequence mapping supplies no functions that you can use to append, insert, or iterate. Perhaps worse, the sequence mapping forces developers to manipulate the sequence length explicitly to manipulate the data. Here's the code required to append a string element to a sequence of strings, for example:

CORBA::ULong len = sequence.length ();
// Make room for the new element.
sequence.length (len + 1);
sequence[len] = CORBA::string_dup ("a new element");

Similarly, inserting an element into the middle of a sequence requires you to increase the sequence length and move all elements above the new element up by one to make room for it. As we all know, this type of low-level data type manipulation is tedious and error-prone. You can write helper functions that do all of this for you, but the point is that you shouldn't have to, particularly when the functionality is already implemented in the Standard C++ library.

Obviously, mapping these data types to suitable Standard C++ types instead would make them easier to use. The string and wstring types could map to suitable instantiations of the C++ basic_string type, thus providing all the rich data manipulation and memory management features that it supports. Arrays could map to the C++ vector type, thereby providing largely the same benefits.

The choice of a Standard C++ mapping for sequences is not so straightforward, however. Specifically, should a sequence map to vector or to list? The best choice for the mapping depends entirely on how the application intends to use the sequence. For example, if the application intends to perform many insertions in the middle of the sequence, using list rather than vector would be more efficient. In fact, the best choice might even be different for the client than for the server, or different even for different parts of the same client or server application. We'll revisit this problem later.

Implications on Parameter Passing Rules

When developers speak of an IDL C++ mapping based on Standard C++, the mapping of IDL data types to C++ data types is not the only issue they're trying to address. Many developers are concerned about the complexity of the parameter passing rules as well. For example, let's assume the following IDL:

interface A {
    void op (inout string s);
};

Because the argument to operation A::op is an inout argument, it passes first from the client to the target, and then back from the target to the client. A client application developer who invokes A::op must therefore remember to first set the argument being passed in to a valid string value and then remember to free the value that comes back, like this:

char *str = CORBA::string_dup ("in string value");
a->op (str);
// Use string value stored in str, and then free it.
CORBA::string_free (str);

Note that even the initial value must be heap allocated, because if the target object wants to change the value, it can expect to be able to invoke CORBA::string_free to do so. The servant method of the target object implementation might look like this:

void
MyServant::op (char *&str) throw (CORBA::SystemException)
{
    CORBA::string_free (str);
    str = CORBA::string_dup ("out string value");
}

Keep in mind that the CORBA C++ memory management rules were crafted carefully so that they work in both the distributed case and the collocated case [3]. In other words, these rules are the same regardless of whether the client and target are distributed across separate machines, or whether they're collocated within the same process and the client invokes the target object directly.

You might also recall from our previous column that these parameter passing rules were designed to avoid copying. In the collocated case, for example, the string allocated by the target can be passed back to the caller without having to copy it, simply by transferring ownership of the pointer. Let's apply these same rules to a hypothetical mapping in which we map an IDL string to std::string:

string *str = new string ("in string value");
a->op (str);
delete str;

and on the server side:

void
MyServant::op (string *&str) throw (CORBA::SystemException)
{
    delete str;
    str = new string ("out string value");
}

While we gain the benefits of encapsulated string manipulation provided by std::string, this mapping doesn't help much because it still suffers from the need for explicit low-level memory management via C++ operators new and delete, just like the existing OMG C++ mapping. A different approach is to change the parameter passing rules to simply pass the string parameter by reference. For such a client, the code looks like this:

string str ("in string value");
a->op (str);

and for the server:

void
MyServant::op (string &str) throw (CORBA::SystemException)
{
    str = "out string value";
}

This code is about as simple as it can get, especially compared to the original client code, which involved manual string duplication and deletion. At first glance, you might conclude that there's much to be gained from mapping IDL strings to std::string rather than to char*. Unfortunately, the comparison is not really fair, because any experienced CORBA C++ developer would have written the original example like this:

CORBA::String_var str = "in string value";
a->op (str);

This code looks strikingly similar to the std::string version. The String_var type provides for automatic memory management of the underlying character array, much the way that a Standard C++ auto_ptr does, thus eliminating the need for explicit string duplication and deletion.

What it ultimately comes down to is a trade-off of performance versus ease of use. The String_var code benefits from the fact that all copying and allocation is under control of the ORB, not under the control of the std::string implementation, which could have a huge performance impact on the application.The std::string version, on the other hand, is much easier to use because std::string supplies rich string manipulation facilities while String_var supplies none, and because std::string fits seamlessly with the non-CORBA portions of your applications. This last benefit cannot be overlooked, as developers often struggle with the question of how far they should let CORBA data types, such as String_var, intrude into their code [4].

If your application is going to live for a long time and run day after day in a performance-sensitive environment, you might not mind putting the appropriate effort into developing it (i.e., worrying about low-level details such as memory management so that it's as efficient as possible). Conversely, if you're writing a "quick and dirty" application to be used once or twice and then thrown away, then you don't want to waste time worrying about low-level details. Today's OMG C++ mapping is good at allowing you to build efficient applications, but is not as good at allowing you to build quick and dirty applications. It would be nicer if it allowed both.

Client-Side Interface Mapping

In the OMG C++ mapping, IDL interfaces map into C++ classes, and their operations and attributes map into member functions on those classes. This is a very natural and intuitive mapping. Can it be improved?

One possibility for improving the mapping of operations relates to our discussion above of how best to map IDL sequences to C++ classes. Consider the following IDL:

interface A {
    typedef sequence<string> StrSeq;
    StrSeq match (in StrSeq values, in string pattern);
};

With the standard OMG C++ mapping, this interface maps to (roughly) the following client-side proxy class:

class A {
  public:
    class StrSeq { /* sequence class */ };
    StrSeq *match (const StrSeq &values, const char *pat);
    // ...
};

If we assumed that sequences mapped to the std::vector type (and strings to std::string), this client proxy class might appear as follows:

class A {
  public:
    typedef vector<string> StrSeq;
    StrSeq match (const StrSeq &values, const string &pat);
};

As we mentioned above, however, it would be beneficial to allow developers to choose the exact mapping for the StrSeq type based on how their applications will use it. One way to do this is to use C++ member template:

class A {
  public:
    typedef vector<string> StrSeq;

    template<typename Return, typename SeqArg>
    Return match (const SeqArg &values, const string &pat);
};

This approach still defaults the StrSeq typedef to the vector type, but it allows the application to choose what type to pass for the StrSeq type on a per-call basis:

StrSeq values;
// Initialize sequence values, then invoke operation.
list<string> matches =
   a->template match<list<string>, StrSeq> (values, "a.*");

Here, we've passed a vector<string> (a StrSeq) as the sequence argument type, but we're using list<string> as the returned sequence type. We use explicit template member function invocation syntax to indicate the type of the return sequence.

This approach has several interesting implications:

Member template functions cannot be virtual. Typically, ORBs generate client-side proxy classes that are abstract and contain only pure virtual member functions to represent the IDL operations and attributes. They also generate derived concrete classes that they instantiate when necessary. This approach keeps the application unaware of ORB implementation details. Because member template functions cannot be virtual, it rules out this implementation approach and instead requires a delegation-based approach.
The client stub implementation of the match operation must assume that its sequence arguments support only those operations that Standard C++ sequential containers support. The marshaling code, for example, would most likely have to operate by iterating over the container element by element. For some types of sequences, such as sequences of basic types like long and short, this could have negative performance implications, because it disallows the typical approach of simply blasting the whole sequential array of values over the transport in one shot.
Member templates are not portable to all C++ compilers.
Operations with many sequence arguments result in member template functions with many template parameters. Given that member template functions cannot have default template parameters, using such member template functions with explicit invocation syntax could be tedious. Fortunately, in practice few operations have more than two or three arguments in total.

Some might consider this solution to be overkill. However, sequences are used heavily in many IDL interfaces, and allowing them to be used flexibly as shown here — in a manner akin to the generic programming facilities of Standard C++ — is important for maximizing the utility of mapping IDL to Standard C++.

There are a number of ways to evaluate what we've shown here, but for now we'll reserve such evaluation for our next column, after we've explored the mapping of the server side and pseudo objects to Standard C++ constructs.

Concluding Remarks

This column describes several ways to define a CORBA C++ mapping for IDL data types and client-side interfaces that uses C++ Standard library containers. We've shown how many of the IDL types, such as strings and arrays, map cleanly onto classes in the C++ Standard library. Other types, such as sequences, however, are more problematic because they involve subtle tradeoffs between performance and ease of use.

Acknowledgements

Occasional conversations with both Steinar Bang and Michi Henning over the past few years helped shape the contents of this column.

References

[1] D.C. Schmidt and S. Vinoski. "The History of the OMG C++ Mapping," C/C++ Users Journal, November 2000, http://www.cuj.com/experts/1811/vinoski.html.

[2] Object Management Group. IDL C++ Language Mapping, 1999, http://www.omg.org/technology/documents/formal/c++.htm.

[3] D.C. Schmidt, S. Vinoski, and N. Wang. "Collocation Optimizations for CORBA," C++ Report, October, 1999.

[4] Michi Henning and Steve Vinoski. Advanced CORBA Programming with C++ (Addison Wesley, 1999).

Steve Vinoski is chief architect and vice president of Platform Technologies for IONA Technologies and is also an IONA Fellow. A frequent speaker at technical conferences, he has been giving CORBA tutorials around the globe since 1993. Steve helped put together several important OMG specifications, including CORBA 1.2, 2.0, 2.2, and 2.3; the OMG IDL C++ Language Mapping; the ORB Portability Specification; and the Objects By Value Specification. In 1996, he was a charter member of the OMG Architecture Board. He is currently the chair of the OMG IDL C++ Mapping Revision Task Force. He and Michi Henning are the authors of Advanced CORBA Programming with C++, published in January 1999 by Addison Wesley Longman.

Doug Schmidt is an associate professor member at the University of California, Irvine. His research focuses on patterns, optimization principles, and empirical analyses of object-oriented techniques that facilitate the development of high-performance, real-time distributed object computing middleware on parallel processing platforms running over high-speed networks and embedded system interconnects. He is the lead author of the book Pattern-Oriented Software Architecture: Patterns for Concurrent and Networked Objects, published in 2000 by Wiley and Sons. He can be contacted at [email protected].