Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C++ Theory and Practice

Dan Saks

, December 01, 1997


December 1997/C++ Theory and Practice

Storage Classes and Language Linkage

Ever wonder what extern "C" really means? Here's your chance to find out.


Copyright © 1997 by Dan Saks

C++ is a popular today largely because it piggybacked on C's existing popularity. Bjarne Stroustrup designed C++ with this in mind [1] , and it was a pretty smart decision. C programmers can start to use C++ as just a "better" C with little loss of productivity. They can grow into using data abstraction and object-oriented programming as project schedules permit.

Although you can translate a hefty C program into C++ in just weeks or even days, you may find it more practical to rewrite only those parts of your program that you are actively maintaining. If that's your preference, you can translate your existing C code into C++ one header or source file at a time. You may never get around to translating all the files, but that's okay.

The ability to mix C++ and C code is not just a migration tool. C++ programs built from the ground up may need to call upon functions that are available only from C libraries.

C++ makes it pretty easy, though not effortless, to mix C with C++ code. The principle mechanism is called a linkage-specification, which many programmers refer to as an extern "C" declaration. This month, I will try explain the need for linkage-specifications, as well as what you need to know to use them.

A Brief Look Ahead

Before I get mired in the details, let me just show you a few examples of linkage specifications in the event you haven't seen them before.

A C++ program can call a standard C function such as

char *strcpy(char *, char const *);

If that C++ program declares the function as above, it will fail to link with the C library's version of that function. The C++ code must declare the function in a linkage specification such as

extern "C"
    char *strcpy(char *, char const *);

This declaration can appear in source files, but it really should be in a header file. This particular function declaration appears in the standard C header <string.h> along with many companions. Rather than specify extern "C" linkage for each function individually, the header can specify extern "C" linkage for the entire group using a linkage block, such as:

extern "C"
    {
    char *strcpy(char *, char const *);
    char *strlen(char const *);
    ...
    }

In the past, libraries implemented the C++ version of each C header by wrapping an #include of the C header inside a linkage block. A typical library implementation might place the C headers in one directory, say /c/include, and place the C++ headers somewhere else. Then the C++ header <string.h> would contain simply:

extern "C"
    {
    #include "/c/include/string.h"
    }

These days, most libraries implement both the C and C++ versions of each header as one file that uses conditional compilation. For example, a version of <string.h> that compiles as either C or C++ looks something like:

#ifdef __cplusplus
extern "C"
    {
#endif
    char *strcpy(char *, char const *);
    char *strlen(char const *);
    ...
#ifdef __cplusplus
    }
#endif

The preprocessor of a C++ compiler predefines the macro __cplusplus (with two leading underscores) to indicate that it's compiling the code as C++ rather than as C. If you look in the headers for your compiler, you'll probably see similar conditional compilation directives buried somewhere in each of the C library headers.

A Brief Look Back

Last month, I explained the syntax of storage class specifiers and two aspects of their semantics: how they affect when a declaration is also a definition, and how they affect linkage. (See "C++ Theory and Practice: Storage Classes and Linkage," CUJ, November 1997.) Since I will refer to some of those concepts again, here's a quick recap.

A declaration introduces a name into a program, and specifies attributes for that name. If the declarator-id (the name being declared) in the declaration designates an object or function, and the declaration reserves storage for that object or function, then that declaration is also a definition.

The linkage of a name is the extent to which a name might refer to a name declared elsewhere. C++ provides for three levels of linkage:

  • A name with external linkage denotes an entity that can be referenced via names declared in other scopes in the same or different translation units.
  • A name with internal linkage denotes an entity that can be referenced via names declared in other scopes in the same translation unit.
  • A name with no linkage denotes an entity that cannot be referenced via names from other scopes.

Tables 1 and 2 summarize how the storage class specifiers and scope of a declaration determine linkage. Table 1 shows linkage for function names. Table 2 shows linkage for object names. Empty table entries indicate invalid combinations of scope and storage class.

Table 2 contains some information about objects that I did not cover last month, but added to the table to complete the picture. In particular, Table 2 also lists the storage duration of declared objects.

The storage duration of an object defines the lifetime of the storage containing the object. The storage duration of an object can be:

  • static
  • automatic
  • dynamic

An object with static storage duration has storage allocated at program startup. The memory remains in place for the duration of program execution. An object with automatic storage duration has storage allocated by a function call, and deallocated by the corresponding function return. An object with dynamic storage duration has storage allocated by a call to an allocation function (operator new or operator new[]) and deallocated by a corresponding call to a deallocation function (operator delete or operator delete[]).

Neither table includes a row for the storage class specifier mutable. The mutable specifier affects neither the linkage not storage duration of a name. mutable can appear only on a data member declaration (in class scope), and it specifies that the declared members are never const.

Overloading and Linkage

C++ supports function name overloading. That is, a C++ program can declare two or more different functions with the same name in the same scope. For a given call that names an overloaded function, the compiler selects (at compile and link time) the function whose formal parameters are the best match for the actual arguments in the call. The selection process is called overload resolution.

Each function in a set of overloaded functions must have a signature sufficiently distinct so that overload resolution can tell the functions apart. A function's signature is the information about that function that participates in resolving calls. Among other things, the signature includes the types of the function's parameters (but not their names). For instance, the signature of

char *strcpy(char *, char const *);

is the sequence { char *, char const * }.

In C++, as in C, declarations in one translation unit may declare functions that are defined in other units. Therefore, a particular call expression might (and often does) call a function defined in another unit. Since C doesn't allow overloaded function names, a C linker only needs function names to resolve calls across translation units. Since C++ allows overloading, a C++ linker needs to see signatures as well as function names.

Early C++ compilers, notably AT&T's cfront, used a technique called name mangling to encode a function's signature as a character string and attach it to the function name. Using this scheme, cfront mangled the declaration

char *strcpy(char *, char const *);

into strcpy__FPcPCc, or something very similar. (The details of this scheme aren't crucial to this discussion. If you're really intrigued, see the ARM [2]

for details.) cfront mangled all function names, not just those of overloaded functions. The mangled names are what appeared in generated object modules.

C doesn't allow function name overloading, so C compilers have no need to encode function names and signatures into mangled names. C compilers just use the function names in the source as the names in the object modules. Some compilers adorn the names by adding a leading underscore, but no more. For example, those compilers turn the function name strcpy into the linker symbol _strcpy.

Name mangling presents a problem for C++ programs linked with C code. If you compile a function definition using C, the compiler won't mangle the function name. If you declare and call that function from C++, the compiler will mangle the name. The linker will see the mangled and unmangled names as different (because they are) and won't be able to resolve the calls. Linkage specifications were designed to solve this problem.

Though originally conceived to solve the problem of linking C++ programs with C code, the draft C++ Standard presents linkage specifications as a more general mechanism with the potential to allow linking with code written in any language.

Language Linkage

The draft C++ Standard never says that linkage specifications turn name mangling off. In fact, the draft never mentions name mangling or any other name encoding scheme. Rather, it simply states that all function types, function names, and object names have a property called language linkage.

Language linkage is a bundle of implementation-dependent properties. The language linkage of function and object names probably involves some name encoding, but the draft only suggests this might be so. The language linkage of function types might involve calling conventions for placing arguments in registers or on the stack.

Not surprisingly, names with linkage declared outside a linkage specification have C++ linkage by default.

Linkage Specifications

A linkage-specification has two forms, a linkage-block and a linkage-declaration, as shown in the grammar in Table 3. Both forms begin with the keyword extern followed by a string literal. A linkage-block ends with a sequence of zero or more declarations enclosed in braces. A linkage-declaration ends with a single declaration.

The string literal specifies the language linkage of names declared within the linkage specification. All C++ implementations must recognize "C++" and "C" as language linkage strings. For example,

extern "C++"
    float sqrt(float);   // C++ linkage
extern "C"
    double sqrt(double); // C linkage
complex sqrt(complex);   // C++ linkage by default

Beyond "C++" and "C", the set of language linkage strings is implementation-defined. That is, an implementation might recognize other strings, such as "Fortran" or "Pascal", or it might not. The exact meaning of any language specification string is also implementation-defined, except that a linkage specification with an unrecognized string is an error requiring a diagnostic message.

Linkage specifications can occur only at namespace scope (which includes file scope). The brace-enclosed region of a linkage-block does not constitute a distinct scope.

A program can nest one linkage specification inside another. The name(s) declared in a linkage specification get the language linkage of the innermost linkage specification enclosing the declaration. For example,

extern "C++"
    {
    extern "C" int f(int);
    int g(int);
    }

declares f with C linkage and g with C++ linkage. More precisely, f has type "function with C linkage with parameter of int returning int". Whenever I think the meaning is clear, I prefer writing "X function" instead of "function with X linkage". For example, g has type "C++ function with parameter of int returning int".

Any declaration that can appear at namespace scope can appear inside a linkage specification. Even declarations for names with no linkage can appear inside linkage specifications. For example,

extern "C"
    {
    enum color { red, green, blue };
    typedef int index;
    }

declares types color and index, and constants red, green, and blue. None of these names has linkage, and therefore none has language linkage.

The specified language linkage applies to the function names and function types of all function declarators declared within a linkage specification. The draft illustrates this rule with examples similar to the following:

extern "C"
    {
    void f1(void (*pf)(int));
    extern "C" typedef void FUNC();
    }

Here, f1 names a function, so the C linkage specified in the enclosing linkage specification applies to f1. Neither pf nor FUNC names a function, so the specified C linkage does not apply to them. However, f1, *pf, and FUNC are all declarators with function types, so the specified C linkage applies to all these function types.

In summary, the effect is:

  • The name f1 has C linkage and its type is "C function with parameter of (pointer to C function with parameter of int returning void) returning void".
  • The name FUNC itself has no linkage, but it has type "C function with no parameters returning void".

The declaration

FUNC f2;

is not in a linkage specification, so the name f2 has C++ linkage by default. However, f2's function type has C linkage because FUNC's type has C linkage.

The declaration

extern "C" FUNC f3;

gives C linkage to both the name f3 and its function type. Finally,

void (*pf2)(FUNC*);

declares the name pf2 with C++ linkage by default. pf2's type is "pointer to C++ function with parameter of (C function with no parameters returning void) returning void".

C++ ignores language linkage for the names of class members and for the function type of class member function declarators. For example,

extern "C"
    {
    class T
        {
    public:
        void f();
        ...
        };
    }

declares the name T::f and its function type with C++ linkage.

As another example, consider

class X
    {
    FUNC f;
    void g(FUNC *);
    };

where FUNC is the typedef declared above as a function type with C linkage. C++ ignores the linkage of FUNC's type in declaring X::f; both the name and type of X::f have C++ linkage. On the other hand, C++ does not ignore the language linkage of a type used in forming the parameter type of a member function. Thus, the name and type of X::g have C++ language linkage, but its parameter has type "pointer to C function ...".

The functions in a set of overloaded functions with the same name need not have the same language linkage. However, at most one function in a given set can have C linkage.

A linkage specification can contain declarations with explicit storage class specifiers. For example,

extern "C"
    {
    int f(int);
    extern int g(int);
    static int h(int i)
        {
        ...
        }
    }

Here, all three functions, f, g, and h have C language linkage. However, only f and g have external linkage. h has internal linkage.

As I explained last month,

extern int i;

is a declaration, but

int i;
int i = 0;
extern int i = 0;

are all definitions.

Placing a declaration in a linkage-block (with braces) doesn't affect whether that declaration is also a definition. On the other hand, placing a declaration in a linkage-declaration (without braces) behaves as if it had an extern specifier for the purpose of determining whether the declaration is also a definition. For example,

extern "C" int i; // declaration
extern "C"
    {
    extern int j; // declaration
    int k;        // definition
    }

Language Linkage in the Library

Two function types with different language linkages are distinct types even if they are otherwise identical. For example, given

typedef void (*handler)();
extern "C"
    typedef void (*c_handler)();

handler and c_handler are function types with identical parameter and return types, but different language linkages. Thus, they are distinct types. This poses a problem for C++ programs that call certain functions in the C library.

Several functions in the C library have parameters of type "pointer to function", such as

int atexit(void (*f)(void));

and

void (*signal(int s, void (*f)(int)))(int);

as well as bsearch and qsort. As I explained earlier, the function declarators in parameter declarations have language linkage. Any pointer you pass as an argument must point to a function with the same linkage as its corresponding parameter.

Declaring a function with the same language linkage as a function parameter in the library shouldn't be much of problem. But, until recently, it was. The current C++ draft (November, 1996 as of this writing) states that:

It is unspecified whether a name from the Standard C library declared with external linkage has either extern "C" or extern "C++" linkage.

Thus, the C++ library header <cstdlib> might declare

extern "C"
    {
    int atexit(void (*f)(void));
    ...
 			}

so that both atexit and its pointer parameter have function types with C linkage. On the other hand, the header might omit the linkage specification so that both types have C++ linkage. Then again, it might declare

typedef void (*__atfunc)(void));
extern "C"
    int atexit(__atfunc f);

so that atexit has C linkage, but the parameter points to a function with C++ linkage.

The C++ committee realized that this left programmers with no way to know whether to declare atexit function, signal handlers, and other such functions as C++ or C functions. This past July, the committee tentatively agreed to pin down language linkage more precisely. In particular:

  • Signal handlers must have C linkage.
  • Other C library functions with "pointer to function" parameters must be overloaded to accept functions with either C or C++ linkage.

I expect this agreement will make it into the eventual C++ Standard and the libraries you will use.

References

[1] Bjarne Stroustrup. The Design and Evolution of C++ (Addison-Wesley, 1994).

[2] Margaret A. Ellis and Bjarne Stroustrup. The Annotated C++ Reference Manual (Addison-Wesley, 1990).

Dan Saks is the president of Saks & Associates, which offers training and consulting in C++ and C. He is active in C++ standards, having served nearly seven years as secretary of the ANSI and ISO C++ standards committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield, OH 45504-4906 USA, by phone at +1-937-324-3601, or electronically at [email protected].


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.