Channels ▼
RSS

Polyglot Programming


May 2002: Polyglot Programming

What does it take to support several programming languages within one environment? .NET, which has taken language interoperability to new heights, shows that it's possible—but only with the right design, the right infrastructure, and appropriate effort from both compiler writers and programmers. In this article, I'd like to go deeper than what I've seen published on the topic, to elucidate what it takes to provide true language openness. The experience that my colleagues have accumulated over the last three years of working to port Eiffel on .NET, as well as the countless discussions we've had with other .NET language implementers, informs this discussion.

Who Needs More Than One Language?
Let's start with the impolite question: Should one really care about multilanguage support? When this feature was announced at .NET's July 2000 debut, Microsoft's competitors sneered that it wasn't anything anyone needed. I've heard multilanguage development dismissed, or at least questioned, on the argument that most projects simply choose one language and stay with it. But that argument doesn't really address the issue. For one thing, it sounds too much like asserting, from personal observation, that people in Singapore don't like skiing. Lack of opportunity doesn't imply lack of desire or need. Before .NET, the effort required to interface modules from multiple languages was enough to make many people stick to just one; but, with an easy way to combine languages seamlessly and effortlessly, they may—as early experience with .NET suggests—start to appreciate their newfound freedom to mix and match languages.

Even more significant is the matter of libraries. Whether your project uses one language or more, it can take advantage of reusable libraries, whose components may have originated in different source languages. Here, interoperability means that you can use whatever components best suit your needs, regardless of creed or language of origin.

This ability to mix languages offers great promise for the future of programming languages, as the practical advance of new language designs has been hindered by the library issue: Though you may have conceived the best language in the world, implemented an optimal compiler and provided brilliant tools, you still might not get the users you deserve because you can't match the wealth of reusable components that other languages are able to provide, merely because they've been around longer. Building bridges to these languages helps, but it's an endless effort if you have to do it separately for each one. In recent years, this library compatibility issue may have been the major impediment to the spread of new language ideas, regardless of their intrinsic value. Language interoperability can overturn this obstacle. Under .NET, as long as your language implementation satisfies the basic interoperability rules of the environment (as explained in the following examples), you can take advantage of components written in any other language whose implementers have adhered to the same rules. That still means some work for compiler writers, but it's work they must do once for their language—not once for each language with which they want to interface.

The language openness of .NET is a welcome relief after the years of incessant Java attempts at language hegemony. For far too long, the Sun camp has preached the One Language doctrine. The field of programming language design has a long, rich history, and there is no credible argument that the alpha and omega of programming, closing off any future evolution, was uttered in Silicon Valley in 1995. Microsoft's .NET breaks this lock.

Everyone will benefit, even the Java community: Now that there's competition again, new constructs are—surprise!—again being considered for Java; one hears noises, for example, about Sun finally introducing genericity sometime in the current millennium. Such are the virtues of openness and competition.

The more than 20 languages ported or in the process of being ported to .NET range from Cobol and Fortran to Smalltalk, Oberon, Eiffel, Java, Perl, Scheme and Python. How does this all work? Do languages have to sacrifice anything? Should we believe those who say that it's all smoke and mirrors, and that deep down, all languages get reduced to a common denominator, whether we call it C#, Visual Basic .NET, managed C++ (or Java)? These are some of the questions I'll examine in this three-part article.

Language Operability at Work
Multilanguage communication techniques are nothing new. For some time, Eiffel has included an "external" mechanism for calling out to C and other languages, and a call-in mechanism known as Cecil (which is similar to the Java Native Methods Interface). But all this only addresses calls—.NET goes much further:

  • A routine written in a language L1 may call another routine written in a different language L2.
  • A module in L1 may declare a variable whose type is a class declared in L2, and then call the corresponding L2 routines on that variable.
  • If both languages are object oriented, a class in L1 can inherit from a class in L2.
  • Exceptions triggered by a routine written in L1 and not handled on the L1 side will be passed to the caller, which—if written in L2—will process it using L2's own exception-handling mechanism.
  • During a debugging session, you may move freely and seamlessly across modules written in L1 and L2.

I don't know about you, but I've never seen anything coming even close to this level of interoperability.

Affirmative Action
Let's examine how .NET's language interoperation works. Here's the beginning of an ASP.NET page (from an example at dotnet.eiffel.com). The associated system is written mainly in Eiffel, but you wouldn't guess this from the page text; as stated by the ASP.NET PAGE LANGUAGE directive, the program code on the page itself, introduced by <SCRIPT RUNAT="SERVER">, is in C#:

<%@ Assembly Name="conference" %>
<%@ Import Namespace="Conference_registration" %>
<%@ Page Language="C#" %>

<HTML>
	<HEAD>
	<TITLE>TOOLS CONFERENCE</TITLE>
	<SCRIPT RUNAT="SERVER">

	/* Start of C# code */	
	Registrar conference_registrar;
	bool registered;
	String error_message;				
        void Page_Init(Object Source, 
		EventArgs E) {
		conference_registrar = new Registrar();
		registrar.start();	
	... More C# code ...
	}
	... More HTML ...

The first C# line is the declaration of a C# variable called conference_registrar, of type REGISTRAR. On the subsequent lines, we create an instance of that class through a new expression, and assign it to conference_registrar; and we call the procedure start on the resulting object. Presumably, REGISTRAR is just some C# class in this system.

Presume not. Class REGISTRAR is an Eiffel class. The only C# code in this example application is on the ASP.NET page, and consists of only a few more lines than shown above; its task is merely to read the text entered into the various fields of the page by a Web site visitor and to pass it on, through the conference_registrar object, to the rest of the system—the part written in Eiffel that does the actual processing.

Nothing in the above example (or the rest of the ASP.NET page) mentions Eiffel. REGISTRAR is not declared as an Eiffel class, or a class in any specific language: It's simply used as a class. The expression new REGISTRAR() that creates an instance of the class might look to the unsuspecting C# programmer like a C# creation, but in fact it calls the default creation procedure (constructor) of the Eiffel class. Not that this makes any difference at the level of the Common Language Runtime: At execution time, we don't have C# objects, Eiffel objects or Visual Basic objects; we have .NET citizens with no distinction of race, religion or language origin.

In the previous code sample, if we don't tell the runtime that REGISTRAR is an Eiffel class, how is it going to find that class? Simple: namespaces. Here's the beginning of the Eiffel class text of REGISTRAR:

indexing
	description: "[
           	Registration services for a 	
           	conference; include adding new
registrants and new registrations.
		]"
dotnet_name: 							"Conference_registration.REGISTRAR"       
class
	REGISTRAR
inherit
	WEB_SERVICE
create
	start
feature - Initialization
	start is
	- Set empty error message.
	  	do
			set_last_operation_successful 				   
(True)
			set_last_error_message 
			   	("No Error")
			set_last_registrant_identifier 				   
(-1)
		end

... Other features ...

The line preceded by dotnet_name says: "To the rest of the .NET world, this class shall be part of the namespace Conference_registration, where it shall be known under the name REGISTRAR." This enables the Eiffel compiler to make the result available in the proper place for the benefit of client .NET assemblies, whether they originated in the same language or in another one.

Now reconsider the beginning of the ASP.NET page shown earlier:

<%@ Assembly Name="conference" %>
<%@ Import Namespace="Conference_registration" %>
<%@ Page Language="C#" %>

<HTML>
	<HEAD>
	<TITLE>TOOLS CONFERENCE</TITLE>
	<SCRIPT RUNAT="SERVER">
	... The rest as before ...

The second line says to import the namespace Conference_registration, and that does the trick. A namespace is an association between class names, a way of saying "The class name A denotes that code over there, and the class name B denotes this other code here." In that association, the class name REGISTRAR will denote the Eiffel class above, since we took care of registering it under that name in the dotnet_name entry of its indexing clause.

The basic technique will always be the same:

  1. When you compile one or more classes written in language L1, you specify the namespaces into which they will be compiled and the final names that they must retain in that language.
  2. When you write a system in a language L2—the same as L1, or another one—you specify one or more namespaces to "import"; they will define how to understand any class name to which your system may refer.

The details may vary depending on the languages involved. On the producer side, L1, you may retain the original class names or, as in the preceding Eiffel example, explicitly specify an external class name. On the consumer side, you may have mechanisms to adapt the names of external classes and their features to the conventions of L2. Some flexibility is essential here, since what's acceptable

as an identifier in one language may not be in another: Visual Basic, for example, accepts a hyphen in a feature name, as in my-feature, but most other languages don't, so you'll need some convention to accept the feature under a different name. What's important is that you can have access to all the classes and features from any other .NET language.

Combining Different Language Models
How does the interoperability work in practice? The first key idea is to map all software to the .NET Object Model. Once compiled, classes don't reveal their language of origin.

Combining Different Language Models

How does interoperability work in practice? The first key idea is to map all software to the .NET Object Model. Once compiled, classes don't reveal their language of origin.

Starting from a source language, the compiler will map your programs into a common target, as shown in "Combining Different Language Models." This by itself isn't big news, since we could use the same figure to explain how compilers map various languages to the common model of, say, the Intel architecture. What is new is that the object model, as we've seen in detail, retains high-level structures such as classes and inheritance that have direct equivalents in source programs written in modern programming languages, especially object-oriented ones. This is what allows modules from different languages to communicate at the proper level of abstraction, by exchanging objects—all of which, as .NET objects, are guaranteed to have well-understood, language-independent properties.

Object Model Discrepancies
Of course, the languages involved have their own models, which may differ significantly from the .NET object model. That's to be expected: Otherwise, they wouldn't really be different languages, just a different syntax and minor variations on a single language theme. To a certain extent, this characterization could be applied to C# and Visual Basic .NET; one may claim that these two are, deep down, just one language, now that VB has become an OO language. But it's definitely incorrect if we consider the entire set of .NET language players. The case of non-OO languages is the most obvious: Right from the initial announcements, .NET has included languages like APL and Fortran, which no one would accuse of being object oriented.

Even if we restrict our attention to object-oriented languages, we'll find discrepancies. Each has its own object model; while the key notions—class, object, inheritance, polymorphism, dynamic binding—are common, individual languages depart from the .NET model in some significant respects:

  • Eiffel and C++ allow multiple inheritance; the .NET object model (as well as Java, C# and Visual Basic .NET) permits a class to inherit from only one class, although it may inherit from several interfaces.
  • Eiffel and C++ each support a form of genericity (type parameterization): You can declare an Eiffel class as LIST [G] to describe lists of objects of an arbitrary type G without saying what G is; then you can use the class to define types LIST [INTEGER], LIST [EMPLOYEE], or even LIST [LIST [INTEGER]]. C++'s templates pursue a similar goal. This notion is unknown to the .NET object model, although planned as a future addition; currently, you have to write a LIST class that will manipulate values of the most general type, Object, and then cast them back and forth to the types you really want.
  • The .NET object model permits in-class overloading: Within a class, a single feature name may denote two or more features. Several languages disallow this possibility as incompatible with the aims of quality object-oriented development.

These object model discrepancies raise a serious potential problem: How do we fit different source languages into a common mold? There are two basic approaches: Either change the source language to fit the model, or let programmers use the language as before, and provide a mapping through the compiler.

No absolute criterion exists: Both approaches are found in current .NET language implementations. C++ and Eiffel for .NET provide contrasting examples.

The Radical Solution
C++ typifies the Procrustean solution: Make the language fit the model. To be more precise, on .NET, the name "C++" denotes not one language, but two: Unmanaged and Managed C++. Classes from both languages can coexist in an application: Any class marked __gc is managed; any other is unmanaged. The unmanaged language is traditional C++, far from the object model of .NET; unmanaged classes will compile into ordinary target code (such as Intel machine code), but not to the object model. As a result, they don't benefit from the Common Language Runtime and lack the seamless interoperability with other languages. Only managed classes are full .NET players.

But if you then look at the specifications for managed classes, you'll realize that you're not in Kansas any more (assuming, for the sake of discussion, that Kansas uses plain C++). On the "no" side, there's no multiple inheritance except from (you guessed it) completely abstract classes, no support for templates, no C-style type casts. On the "yes" side, you'll find new .NET mechanisms such as delegates (objects representing functions) and properties (fields with associated methods). If this sounds familiar, that's because it is: Managed C++ is very close to C#, in spite of what the default Microsoft descriptions would have you believe.

Predictably, the restrictions also rule out any cross-inheritance between managed and unmanaged classes.

The signal to C++ developers is hard to miss: The .NET designers don't think too highly of the C++ object model and expect you to move to the modern world as they see it. The role of Unmanaged C++ is simply to smooth the transition by allowing C++ developers to move an application to the managed side one class at a time. An existing C++ application will compile straight away as unmanaged. Then you'll try declaring specific classes as managed. The compiler will reject those that violate the rules of the managed world, for example, by using improper casts; the error messages will tell you what you must correct to turn these classes into proper citizens of the managed world.

For C++, this is indeed a defensible policy, as the language's object model—defined to a large extent by the constraint of backward compatibility with C, a language more than three decades old—is obsolete by today's standards.

Respecting Other Object Models
Only time will tell how successful the .NET strategy will be at convincing C++ programmers to move over to the managed world. But even if they wholeheartedly comply, it won't mean that other languages should follow the same approach. This is particularly true of object-oriented languages that have their own views of what OO should be, with perhaps better arguments than C++. If you've chosen a language precisely because it supports such expressive mechanisms as multiple inheritance, Design by Contract and genericity, do you have to renounce them and step down to the lowest common denominator once you decide to use .NET?

Fortunately, the answer is no, at least not if "you" here means the programmer. The scheme described in "Combining Different Language Models" doesn't require that all languages adhere to the .NET object model; rather that they map to that model. That mapping can be made the responsibility of compilers rather than programmers, enabling programming languages to retain their normal semantics, and establishing a correspondence between the specific semantics of each language and the common rules of the common object model.

Tune in next issue and discover how this all works out.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video