Dr. Dobb's | C and C++: Siblings

C and C++: Siblings

We're at a crossroads for compatibility between C and C++. Can siblings go their separate ways and still remain on speaking terms? In this first of three parts, Bjarne provides context for the discussion.

July 01, 2002
URL:http://www.drdobbs.com/c-and-c-siblings/184401543

July 2002/C and C++: Siblings/Figure 1

Figure 1: The C family tree. A solid line means a massive inheritance of features, a dashed line means the borrowing of major features, and a dotted line means the borrowing of minor features. To simplify, I have left features that appeared almost simultaneously in both languages unrepresented

July 2002/C and C++: Siblings/Figure 2

Figure 2: Seven C/C++ compatibility categories

July 2002/C and C++: Siblings/Figure 3

Figure 3: Nightmare scenario

July 2002/C and C++: Siblings/Sidebar

Macros

C and C++ programmers view macros very differently. The difference is so great that it can be considered philosophical. C++ programmers typically avoid macros wherever possible, preferring facilities that obey type and scope rules. In most cases, C programmers don’t have such alternatives and use macros. For example, a C++ programmer might write something like:
const  int  mx = 7;

template<class  T> inline  T  abs(T  a)
{ return (a<0)?-a:a; }

namespace  N {
  void  f(int  i) { /* ... */ }
};

class  X {
public:
  X(int);
  ~X();
  // ...
};
A C programmer facing a similar task might write something like:
#define  MX  7

#define  ABS(a) (((a)<0)?-(a):(a))

void  N_f(int  i) { /* ... */ }

struct  X { /* ... */ };
void  init_X(struct  X *p, int  i);
void  cleanup_X(struct  X *p);
At the core of many C++ programmers’ distrust of macros lies the fact that macros transform the program text before tools such as compilers see it. Because macro substitution follows rules that don’t involve scope or semantics, surprises can result. Namespaces, class scopes, and function scopes provide no protection against a macro. Eliminating the use of macros to express ideas in code has been a constant aim of C++ (see Chapter 18 of [8]). A C++ programmer tends to view a solution involving a macro with suspicion and, at best, as a lesser evil. On the other hand, a C programmer often views that same solution as natural and often as elegant. Both programmers are right in their respective languages, and this is a source of some misunderstanding. Any solution to a compatibility problem that involves a macro is automatically considered suspect by many C++ programmers. Thus, any use of a macro in the Standard becomes a potential incompatibility as the C++ community looks for alternative solutions to avoid its use. The only macro found in the C++ Standard (besides those inherited from C) is __cplusplus.

July 2002/C and C++: Siblings

C and C++: Siblings

Bjarne Stroustrup

We're at a crossroads for compatibility between C and C++. Can siblings go their separate ways and still remain on speaking terms? In this first of three parts, Bjarne provides context for the discussion.

Classic C [1] has two main descendants: ISO C and ISO C++. Over the years, these languages have evolved at different paces and in different directions. One result of this evolution is that each language provides support for traditional C-style programming in slightly different ways. The resulting incompatibilities can make life miserable for people who use both C and C++, for people who write in one language using libraries implemented in the other, and for implementers of C and C++ tools.

This article is Part 1 in a series that explores the relationship between K&R C’s [2] most prominent descendants: ISO C and ISO C++. My focus is the areas where C and C++ differ slightly (“the incompatibilities”), rather than on the large area of commonality or the areas where one language provides facilities not offered by the other. A longer technical report that presents more historical context and many more examples is available online [3].

A Family Tree

How can I call C and C++ siblings? C++ is a descendant of K&R C. However, what we call “C” today (the C89 or C99 Standard) is also a descendent of K&R C, and it is therefore appropriate to think of C and C++ as siblings.

Figure 1 shows the C family tree. ISO C and ISO C++ emerge as the two major descendants of K&R C, and as siblings. Each carries with it the key aspects of Classic C, and neither is 100-percent compatible with Classic C. For example, both siblings consider const a keyword, and both deem this famous Classic C program non-standard compliant:
main()
{
  printf("Hello, world\n");
}
As a C89 program, Kernighan and Ritchie’s classic “Hello World” has one error. As a C++98 program, it has two errors. As a C99 program, it has the same two errors, and if those were fixed, the meaning would be subtly different from the identical C++ program.

As C and C++ drift further from Classic C, incompatibilities become more numerous and more pronounced. The siblings of Classic C share their various traits in a confusing array of combinations. Figure 2 reveals seven compatibility categories, and a programmer must understand which features fall in which category in order to write compatible code (see Table 1).

One of the big questions for the C/C++ community is whether the next phase of standardization (potentially adding two more circles to Figure 2) will pull the languages together or tear them further apart. In 10 years, there will be large and thriving C and C++ communities. However, if the languages are allowed to drift further apart, there will not be a C/C++ community, sharing tools, implementations, techniques, headers, and code. Figure 3 shows my nightmare scenario. Each separate area of the diagram represents a different set of incompatibilities that an implementer must address and that a programmer may have to be aware of.

The differences between C++ and C89 are documented in Appendix C of the ISO C++ Standard [4]. The major differences between C89 and C99 are summarized on two pages of the C99 foreword [5]. The differences between C++ and C99 are not officially documented because the ISO C committee had neither the time nor the expertise to document differences, and the C99 committee’s charter [6] did not require documenting C++/C99 incompatibilities. An unofficial, but extensive list of incompatibilities can be found on the Web [7].

The Spirit of C

The phrases “the spirit of C” and “the spirit of C++” are often used as weapons to condemn notions supposedly not in the right spirit and therefore somehow illegitimate. More reasonably, these phrases can be used to distinguish languages aimed at supporting low-level systems programming, such as C and C++, from languages without such support. I find these “spirit” arguments poisonous when they are thoughtlessly applied within the C/C++ community. More often than not, these phrases dress up personal likes and dislikes as philosophies supposedly backed by “the fathers of C” or “the fathers of C++.” These attacks can be amusing and occasionally embarrassing to Dennis Ritchie and me. We are still alive and do hold opinions, though Dennis, being older and wiser, is better able to keep quiet.

The following rules are often claimed as part of “the spirit of C”:
Keep the built-in operations close to the machine (and efficient).

Keep the built-in data types close to the machine (and efficient).

No built-in operations on composite objects.

Don’t do in the language what can be done in a library.

The standard library can be written in the language itself.

Trust the programmer.

The compiler is simple.

The run-time support is very simple.

In principle, the language is type-safe, but not automatically checked (use lint for checking).

The language isn’t perfect because practical concerns are taken seriously.

You can find support for all of these rules in the opening pages of [2].

Naturally, Classic C is a good approximation of “the spirit of C.” C99 and C++ are less so, but they still approximate those ideals. This is significant because most languages don’t. From the perspective of Ada, Java, or Python, C and C++ appear as twins. Only in discussions within the C/C++ community do the differences appear to overwhelm the commonalities.

In the spirit of rule 10, Classic C breaks rule 3 by adding structure assignment and structure argument passing to K&R C.

C++ starts out by breaking rule 7: a greater emphasis on type and scope distinguishes C++ compared to C. Consequently, a C++ compiler front end must do much more than a Classic C front end does. The introduction of exceptions complicates C++’s run-time support, violating rule 8. However, that may be defended on the grounds that if you don’t need exceptions, you can avoid using them. After 20 years, it is more remarkable that C++ closely follows the remaining eight criteria. In particular, C++ can be seen as the result of following rules 1 to 5 to their logical conclusion by allowing the user to define general and efficient types and libraries.

Compared to early C compilers, modern C implementations cannot be called simple, so C99 also breaks rule 7. Since <tgmath.h> cannot be written in C (though something almost identical can be written in C++), C99 breaks rule 5. Arguably, C99’s complex facilities violate rules 1, 2, and 3.

Contrary to popular myths, there is no more tolerance of time and space overheads in C++ than there is in C. The emphasis on run-time performance varies more between different communities using the languages than between the languages themselves. In other words, overheads are found in some uses of the languages rather than in the language features.

Underlying the flame wars over “the Spirit of C” is a genuine concern for the direction of C’s and/or C++’s evolution — that is, a consistent aim to provide a coherent language from a set of changes and extensions.

In their evolution from Classic C, C99 and C++ differ in philosophy. C++ has a clearly stated philosophy of language: the emphasis in the selection of new facilities is on mechanisms for defining and using new types safely and efficiently. Basic facilities for computation were, as much as possible, inherited from Classic C and later from C89. C++ will go a long way to avoid introducing a new fundamental type. The prevailing view is that if you need one type then many programmers will need similar types. Consequently, providing mechanisms for expressing such types in the language will serve many more programmers than providing the one type as a built-in. In other words, the emphasis is on facilities for organizing code and building libraries (often referred to as “abstraction mechanisms”).

By contrast, the emphasis in the evolution of C89 into C99 has been on the direct support for traditional (Fortran-style) numerical computation. Consequently, the major extensions of C99 compared to C89 are in new built-in numeric types, new mathematical functions and macros, numeric I/O facilities, and extensions to the notion of an array. The contrasting approaches to complex numbers and to vectors/VLAs illustrate the difference in C++’s and C99’s design philosophies: C adds built-in facilities where C++ adds to the standard library [3].

Ideally, C’s emphasis on built-in facilities and C++’s emphasis on abstraction mechanisms are complementary. However, to maximize compatibility, the emphasis on built-in facilities must be on fundamental computational issues (i.e., on facilities that cannot elegantly and efficiently be provided by composing already existing facilities). Care must be taken not to increase reliance on mechanisms known to cause problems for the abstraction mechanisms, such as macros (see sidebar), uneven support for built-in types, and type violations.

Understanding C/C++ Feature Differences

Most C and C++ compatibility problems fall into one of the following catagories:

Issues that affect interfaces, such as virtual functions and VLAs.

Issues that affect only the form of the code that they are part of, such as declarations in conditions and designated initializers.

The following sections give examples of these compatibility issues and explore some of the perils programmers face when they navigate the incongruities of C and C++.

Trivial Interfaces

C++ programmers have always known that to make code accessible to C programs they must provide interfaces that avoid non-C features, such as classes with virtual functions. These C-to-C++ interfaces have typically been trivial. For example:
// C interface:
extern  int  f(struct  X* p, int  i);

// C++ implementation of C interface:
extern "C" int  f(X* p, int  i) 
{ return  p->f(i); }
C programmers typically assume any C header can be used from a C++ program. This assumption has largely been true (after someone adds suitable extern "C" directives), though headers that use C++ keywords as identifiers have been a constant irritant to C++ programmers (and sometimes a serious practical problem). For example:
// not C:
class  X { /* ... */ };
// not C++:
struct  S { int  class; /* ... */ };
C99 introduces several features that, if placed in a header, will prohibit the use of that header in a C++ program (or in a C89 program). Examples include VLAs, restricted pointers, _Bool, _Complex, some inline functions, and macros with a variable number of arguments. For example:
// C99 interface features, not found in 
// C++ or C89: equivalent to 
// f(int *const):
void  f1(int[const]);
// p is supposed to point to at least
// 8 chars:
void  f2(char  p[static  8]);
void  f3(double *restrict);
// p is a VLA
void  f4(char  p[*]);

// may or may not be C++ also [3]:
inline  void  f5(int  i) { /* ... */ }
void  f6(Bool);
void  f7(Complex);

#define  PRINT(form ...) \
fprintf(form,__VA_ARGS__)
If a C header uses one of those features, mediation code and a C++ header must be provided for the C code to be used from C++.

The ability to share header files is an important aspect of C and C++ culture and a key to performance of programs using both languages. If the header files are kept compatible, C and C++ programs can call libraries implemented in “the other language” with no data conversion overheads and no (or very minimal) call overhead.

Thin Bindings

Shared declarations are sometimes an insufficient solution to the header compatibility problem. In cases where the languages provide similar functionality in different ways, another approach to header compatibility is to provide “compatibility headers” that, through liberal use of #ifdefs, provide very different definitions for each language, but allow user code to look very similar. For example:
// my double precision complex

#ifdef  __cplusplus
  #include<complex>
  using  namespace  std;
  typedef  complex<double> Cmplx;
  inline  Cmplx  Cmplx_ctor(double  r,
    double  i)
  { return  Cmplx(r,i); }
  //...
#else
  #include<complex.h>
  typedef  double  complex  Cmplx;
  #define  Cmplx_ctor(r,i) \
  ((double)(r)+I*(double)(i))
  //...
#endif

void  f(Cmplx  z)
{
  Cmplx  zz = z+Cmplx_ctor(1,2);
  Cmplx  z2 = sin(zz);
  // ...
}
This approach requires the programmer to create a new dialect that maps into both languages. In other words, a user (or a library vendor) must invent a private language simply to compensate for compatibility problems. The resulting code is typically neither good C nor good C++. In particular, by using this technique, the C++ programmer is restricted to using what is easily represented in C. For example, unless exceptional effort is expended on the C mapping, arrays must be used rather than containers, overloading beyond what is offered by C99’s <tgmath.h> must be avoided, and errors cannot be reported using exceptions. In addition, macros tend to be used much more heavily than C++ programmers would like. Such restrictions can be acceptable when providing interfaces to other code, but these restrictions are typically too constraining for a C++ programmer to use within the implementation. Similarly, a C programmer using this technique is prevented from using C facilities not also supported by C++, such as VLAs and restricted pointers.

Real code/libraries will have much larger “thin bindings” with many more macros, typedefs, inlines, etc., and more conventions for their use. The likelihood that two such “thin bindings” can be used in combination is slim and the effort to learn a new binding is non-trivial. Thus, the “compatibility header” approach doesn’t scale and fractures the community.

Competing Programming Models

Interfaces (e.g., information in header files) are all that matter to people who see C and C++ as distinct languages that just happen to be able to produce code that can be linked together (like C and Fortran). However, teachers, implementers, and all other programmers who work in both languages must contend with equally intractable compatibility issues related to the facilities used to express computations.

The differing programming models of C and C++ lead to alternative solutions for many common tasks. These alternative approaches are problematic for the following reasons:

An alternative forces programmers to choose between two sets of facilities and their associated programming techniques.

An alternative more than doubles the effort for teachers and students.

Code using separate alternatives can often cooperate only through specially written mediation code.

Consider the problem of manipulating a number of objects where that number is known only at run time. C++ and C99 offer alternative solutions not present in C89. Consider a C89 example:
/* C89: v points to m Ys */
void  f89(int n, int m, struct Y* v)
{
  /* not Classic C; not C++ */
  struct X* p = 
    malloc(n*sizeof(struct  X));
  struct Y* q = 
    malloc(m*sizeof(struct  Y));
  /* memory exhausted */
  if (p==NULL || q==NULL) exit(-1);
  if (3<n && 4<m) p[3] = v[4];
  /* copy */
  memcpy(q,v,v+m*sizeof(struct  Y));
  /* ... */
  free(q);
  free(p);
}
Among the potential problems with this code is that v might not point to an array with at least m elements.

The obvious C99 alternative is:
// C99: v points to m Ys
void  f99(int n, int m, struct Y v[m])  
{
  // not C89; not C++
  struct X p[n];
  struct Y q[m];
  if (3<n && 4<m) p[3] = v[4];
  // copy
  memcpy(q,v,v+m*sizeof(struct  Y));
  // ...
}
The nicer syntax makes it less likely that v does not point to an array with at least m elements, but that is still possible. Unfortunately, the code does not define what happens if the array definition fails to allocate memory for the n elements required. The use of arrays automates the freeing of memory, though there could still be a memory leak if f99 is exited through a longjmp.

The obvious C++ alternative is:
// C++: v holds v.size() Ys
void  fpp(int n, vector<Y>& v)
{
  // not C89; not C99
  vector<X> p(n);
  if (3<p.size() && 4<v.size()) p[3] =
    v[4];
  // copy
  vector<Y> q = v;
  // ...
}
A vector contains the number of its elements, so the programmer doesn’t have to worry about keeping track of array sizes or about freeing the memory used to hold those elements.

The standard library vector is more general than a VLA. For example, vector has a copy operation, you can change the size of a vector, and vector operations are exception safe (see Appendix E of [9]). This could imply a performance overhead compared to VLAs on some implementations, but so far I have not found significant overheads.

The key point is that users have to choose and the users of more than one of these languages have to understand the different programming styles and remember where to apply them. The result is that these differences in the programming models of C and C++ make it significantly more difficult to program in both languages than to program in one — even though the two languages share many features and are supposed to be closely related.

As Close as Possible...

The semi-official policy for C++ in regards to C compatibility has always been “As Close as Possible to C, but no Closer” [10]. Naturally, wits have answered with “As Close as Possible to C++, but no Closer,” but I have never seen that in any official context nor seen any elaboration of what it means.

How close is “as close as possible to C”? Traditionally, this statement has meant “compatible with C except where the C++ type system would be compromised.” Differences such as those for void*, C++’s insistence on function prototypes, the use of built-in types for bool and wchar_t, and even the inline rules, can be explained that way [3].

The “as close as possible...” rules were crafted under the assumption that “the other language” was immutable. In reality, it has not been so: just look at the number of cross borrowings between C and C++ [3]. I believe that it would be technically feasible for “as close as possible” to be “identical in the subset supporting traditional C-style programming” assuming that changes could be made simultaneously to both languages systematically bringing them closer together.

Whatever is (or isn’t) done must be considered in light of the fact that the world changes rapidly and users expect programming languages to evolve to meet new challenges. Thus, compatibility issues must be considered in the wider context of language evolution. The most promising approach is to consider C and C++ close to complete in language support for their respective programming styles. Future extensions can focus on provision of standard or non-standard libraries. If you think of C and C++ as essentially complete, C/C++ compatibility emerges as part of the consolidation and cleanup of basic facilities.

You’ll learn more about the case for C and C++ compatibility in next month’s CUJ.

References

[1] Classic C is K&R C plus structure assignment, enumerations, and void. I picked the term “Classic C” from a sticker that used to be affixed to Dennis Ritchie’s terminal.

[2] Brian Kernighan and Dennis Ritchie. The C Programming Language (Prentice-Hall, 1978).

[3] Bjarne Stroustrup. Sibling Rivalry: C and C++ (AT&T Labs — Research Technical Report TD-54MQZY, January 2002), <www.research.att.com/~bs/siblingrivalry.pdf>.

[4] ISO/IEC 14882, Standard for the C++ Language.

[5] ISO/IEIC 9899:1999, Programming Languages C.

[6] John Benito, the ISO C committee liaison to the ISO C++ committee, in response to a request to document C++/C99 incompatibilities similar to C89/C++ incompatibilities.

[7] David R. Tribble. “Incompatibilities between ISO C and ISO C++,” <http://david.tribble.com/text/cdiffs.htm>.

[8] Bjarne Stroustrup. The Design and Evolution of C++ (Addison-Wesley, 1994).

[9] Bjarne Stroustrup. The C++ Programming Language, Special Edition, (Addison-Wesley, 2000).

[10] Andrew Koenig and Bjarne Stroustrup. “C++: As Close to C as Possible but No Closer,” The C++ Report, July 1989.

[11] Graham Birtwistle, Ole-Johan Dahl, Bjorn Myrhaug, and Kristen Nygaard. SIMULA BEGIN (Studentlitteratur, 1979).

[12] ISO/IEC 9899:1990, Programming Languages C.

[13] Brian Kernighan and Dennis Ritchie. The C Programming Language, Second Edition (Prentice-Hall, 1988).

[14] Martin Richards and Colin Whitby-Strevens. BCPL, the language and its compiler (Cambridge University Press, 1980).

Bjarne Stroustrup is the designer and original implementer of C++. He has been a member of the C/C++ community since he first used C in 1975. For 17 years, he worked in Bell Labs’ Computer Science Research Center alongside people such as Dennis Ritchie and Brian Kernighan. In the early 1980s, he participated in the internal Bell Labs standardization of C. He is the author of The C++ Programming Language and The Design and Evolution of C++. His research interests include distributed systems, operating systems, simulation, design, and programming. He is an AT&T Fellow and heads AT&T Lab’s Large-scale Programming Research department. He is actively involved in the ANSI/ISO standardization of C++. He received the 1993 ACM Grace Murray Hopper award and is an ACM fellow.

July 2002/C and C++: Siblings/Table 1

Figure 1: The C family tree. A solid line means a massive inheritance of features, a dashed line means the borrowing of major features, and a dotted line means the borrowing of minor features. To simplify, I have left features that appeared almost simultaneously in both languages unrepresented

Figure 2: Seven C/C++ compatibility categories

Figure 3: Nightmare scenario

Macros

C and C++: Siblings

Bjarne Stroustrup

A Family Tree

The Spirit of C

Understanding C/C++ Feature Differences

Trivial Interfaces

Thin Bindings

Competing Programming Models

As Close as Possible...

References

Table 1: Traits shared by siblings