Like many other programming languages, C employs concepts such as scope and storage allocation. The C standard also employs the lesser known concept of linkage. Some other programming languages employ this concept as well, but few aside from C++ use the same terminology as C.
Although most programmers understand scope and storage allocation well enough to cope with common programming situations, their understanding often breaks down when confronted with anything out of the ordinary. They also seem to have a sense of what linkage is, but don't really understand how it is distinct from the other concepts.
Much of the confusion stems from the complex semantics of storage class specifiers such as extern
and static
. The keyword static
is particularly inscrutable. Sometimes it affects the way a program allocates storage. It can also affect how the linker resolves names as it links object files together. In C++, it can even restrict the behavior of class member functions. Understanding these distinctions can help you implement your designs more effectively and avoid some maintenance headaches.
In this article, I'll explain how the C standard defines the concept of scope. The C++ standard describes scope much the same way as the C standard, but with a few noteworthy differences. I'll focus initially on what the C standard says, and point out the differences with C++ as appropriate. I'll try to be reasonably precise without swamping you with unnecessary details.
Translation Units
As you know, a C program can consist of numerous source files. A compiler processes one source file at a time. A source file usually contains #include
directives that refer to headers. The compiler's preprocessor merges those headers with the source file to produce a transitory source file, which the standard calls a "translation unit." Translation units are also known as compilation units.
Later phases of the compilation process transform each translation unit into an object file or object module. The linker combines object files and library components to produce an executable program.
As you'll see, you can't talk knowledgeably about scope and linkage without mentioning translation units.
Declarations and Definitions
A "declaration" is a construct in the source code that introduces one or more names into a translation unit and associates attributes with those names. Alternatively, a declaration might simply redeclare a name introduced by a declaration that appeared earlier in the translation unit.
A declaration might also be a definition. Informally, a "definition" is declaration that not only says, "Here's a name," but also, "Here's all the information the compiler needs to create the code for that name."
For functions and objects, a definition is a declaration that generates storage. It's easy to tell when a function declaration is also a definition a function definition has a body, which generates storage in the code space. For objects, the distinction is not so simple it depends on the object's scope, linkage, and initializer. I'll get to them in a moment.
In C, a struct
declaration never generates storage, so C doesn't distinguish struct
definitions from other struct
declarations. C++ does. In C++, a struct
declaration is also a definition if it has a body, as in:
struct widget // a definition { ... };
It's only a declaration if it lacks a body, as in:
struct widget; // just a declaration
The C standard uses more complicated verbiage to distinguish these different forms of struct declarations. I prefer C++'s approach.
In both C and C++, all typedef and enumeration constant declarations are also definitions.
Scope Regions in C
When the compiler encounters the declaration of a name, it stores that name and its attributes in a symbol table. When the compiler encounters a reference to a name, it looks up the name in the symbol table to find those attributes. Each declared name is visible can be found by lookup only within a portion of the translation unit called its scope.
Some programming languages use dynamic scoping, in which name lookup is done at run time and may yield different results depending on the state of the running program. That is not the case with C and C++. Both languages use static scoping and do all name lookup at compile time.
C has four kinds of scope:
- A name has file scope if it's declared in the outermost scope of a translation unit, that is, outside of any function, structure, or union declaration. Its scope begins right after its declaration and runs to the end of the translation unit.
- A name (other than a statement label) has block scope if it's declared within a function definition (including that function's parameter list) or in a brace-enclosed block within that function. Its scope begins right after its declaration and runs to the end of the block immediately enclosing that declaration.
- A name has function prototype scope if it's declared in the function parameter list of a function declaration that is not also a definition. Its scope begins right after its declaration and runs to the end of the parameter list.
- Statement labels, and only statement labels, have function scope. A label can be defined only in the body of a function definition and is in scope everywhere in that body, even before the label has been defined.
For example, in Listing One:
- Object k and functions
f
andg
have file scope. - Parameter
i
in functionf
and parametern
in functionh
have function prototype scope. - Parameter
i
, objectsj
andk
, and functionh
, all within functiong
, have block scope. - Label done in function
g
has function scope.
int k; int f(int i); int g(int i) { int j, k; int h(int n); if (i < j) goto done; k = 42; ... done: return 0; }
Listing One.
Most programmers, not just C and C++ programmers, refer to names declared in an inner scope (a block scope) as "local" names, and to names declared at the outermost scope (file scope) as "global" names. The C++ standard uses the terms local and global in this sense, but the C standard rarely does.