Understanding compiler rules better equips you to interpret messages the compiler sends you.
May 16, 2006
URL:http://www.drdobbs.com/cpp/living-by-the-rules/187203727
Pete is a consultant specializing in library design and implementation. He has been a member of the C++ Standards Committee since its inception, and is Project Editor for the C++ Standard. He is writing a book on the newly approved Technical Report on C++ Library Extensions to be published by Addison-Wesley. Pete can be contacted at [email protected].
BILL CLINTON GOT IN TROUBLE over a nuanced statement about context dependency, delivered in a context where nuance can be drowned in deliberate noise. The C++ Standard makes nuanced statements about valid and invalid programs, which usually don't drown in mere noise. Instead, they often disappear in a forest of perceived complexity, some essential, some the byproduct of simplifications made by teachers for the benefit of newcomers who soon outgrow them, and some from ad hoc learning that displaces detailed study. The rules for what a compiler must tell you when you try to compile invalid code are more complex than most programmers realize; at the same time, they're also much simpler. In this column, I examine those rules to understand their necessary complexity and their actual simplicity. With that knowledge, you'll be better equipped to interpret messages that you get from your compiler.
Most programming languages are modeled on human languages. They have a grammar that consists of syntactic rules, semantic rules, and transformations that map things in the language into things in the computer system that we're writing code for [1].
Syntactic rules are often expressed through grammar productions. You can read the productions for C++, in far greater detail than you're interested in, in the C++ Standard. They tell you what constitutes a valid statement in C++. Just as "This sentence no verb" is not a valid sentence in English, int x 3; is not a valid statement in C++. Both are missing a required element, and in both cases you can look through the grammar rules and find the one that's being violated.
Semantic constraints deal with context: A statement that is syntactically valid might not make sense in the place where it's used, typically because something is missing or ambiguous in the broader context. "C and C++ are third-generation programming languages. It can be far more expressive than assembler." Both sentences are syntactically correct, but the combination of the two doesn't make sense, because it's not possible to determine whether the "it" at the beginning of the second sentence refers to "C" or "C++." The sentences, taken together, violate the semantic rule that a singular pronoun must have exactly one antecedent to refer to. Similarly, in the code fragment void f(char); void f(double); f(3);, none of the three statements standing alone violates any syntax rules, but the combination is ambiguous, because it's not possible to determine whether the call to f(3) in the third statement refers to the first or the second version of f. The statements, taken together, violate the semantic rule that a call to an overloaded function must refer to exactly one of the overloads.
And, finally, the transformation from a valid statement in the language into a meaningful concept in the outside world is fraught with danger. "Colorless green ideas sleep furiously" violates neither syntactic rules nor any semantic rules. Nevertheless, when you map the words it uses and the abstract structure of the sentence into real-world concepts, it doesn't mean anything. Similarly, cout < "The cosine of 30 degrees is " < tan(90) < '\n'; violates no syntactic rules or any semantic rules, but its output is meaningless.
Of course, as programmers, it's our job to make sure that the output of the programs we write, whatever that output's form may be, is meaningful. So we try to write programs that don't violate any syntactic rules or any semantic rules. Having done that, we mentally apply the transformation rules that give meanings to programs, determine exactly what it is that the code we wrote is supposed to do, and check to be sure that what it does is what we want it to do. Easy enough, right? But it's not that simple: There are places where the transformation rules allow more than one meaning.
One of the primary goals of the C++ programming language, which it inherited from C, is to permit compilers to generate optimal code for their target platform. As a result, the C++ Standard doesn't spell out the exact details of things such as the size of the various integer types or the order of evaluation of arguments to functions. It leaves it to the compiler writer to decide what works best for the computer system that the compiler is targeting, and to fill in the details appropriately, within certain constraints.
For example, each of the integer types unsigned char, unsigned short, unsigned int, and unsigned long must be able to represent all of the values that can be represented by the type that precedes it in that list. In addition, each of the types must be able to represent all of the values in the minimal range specified for that type. As a result, this code snippet:
unsigned int i = 65535; cout < ++i < '\n';
can display the value 0 or the value 65536, depending on the actual range of values supported by an unsigned int [2]. The Standard requires that an unsigned int be able to store values that are greater than or equal to 0 and less than or equal to 65535. With a compiler that supports that exact range, incrementing the value of i makes it wrap around to 0. With a compiler that supports a larger range, incrementing the value of i simply sets it to the next value, 65536 [3].
To provide some guidance for users of features that have this sort of flexibility, the C++ Standard provides two categories for requirements that can be satisfied in more than one way: implementation-defined behavior and unspecified behavior. In both cases, the implementor chooses from a usually small set of alternatives. When something is designated as implementation-defined behavior, the implementor must document which choice was made. For unspecified behavior, no documentation is required. For example, the fundamental type char must have the same representation as either signed char or unsigned char. The choice is implementation-defined, so the documentation that comes with the compiler must tell you which one it is. On the other hand, the order of evaluation of arguments to a function is unspecified. The following code can write its two output lines in either order, and the implementor has no obligation to tell you what the order will be [4]:
int f() { cout < "In f\n"; return 3; } int g() { cout < "In g\n"; return 4; } int sum(int i, int j) { return i + j; } int main() { return sum(f(), g()); }
In both cases, if you want to write portable code, you should make sure that any code whose effect depends on implementation-defined or unspecified behavior is easily identifiable and well isolated, so that it can be changed easily if the need arises.
Some of the semantic rules in the C++ Standard explicitly say that if they are violated, "no diagnostic is required." Some say that violating the rule results in undefined behavior. Some say nothing about what behavior is required. The rest of the semantic rules are referred to as diagnosable semantic rules [5].
A well-formed program is a C++ program that does not violate any syntax rules, diagnosable semantic rules, or the One Definition Rule [6]. An ill-formed program is a program that is not a well-formed program. A diagnostic message is a message that is one of an implementation-defined subset of the implementation's output messages. If a program contains a violation of any diagnosable rule [7], the C++ Standard requires the implementation to issue at least one diagnostic message. If a program does not contain a violation of any of the rules (diagnosable or otherwise), and the program isn't too big for the compiler to handle [8], the C++ Standard requires the implementation to "accept and correctly execute" the program.
Read those definitions again, paying particular attention to what they don't say. They don't say that any code shouldn't compile, nor do they say anything about error messages or warnings. In fact, a compiler that always gives the message "compile successful," regardless of any violations of language rules, meets those requirements [9]. Of course, nobody would use such an uninformative compiler if they could avoid it, but that's just a practical detail.
Compilers have to offer more than mere standards conformance to succeed in the marketplace. When there's an error in the code, we expect compilers to tell us something about what was actually wrong and where. We also expect that a compiler won't generate an executable file if there's nothing sensible it can do with the code we fed it. That's what most programmers mean when they say that they expect "an error," or that the compiler "shouldn't compile" the code.
On the other hand, there often is something sensible that the compiler can do with code that violates diagnosable rules in the C++ Standard. That's known as a language extension, and it's one of the reasons that the C++ Standard says so little about what compilers should do with ill-formed programs. For example, C++ code that defines a variable whose type is long long int violates a syntactic rule: There is no such type. If the Standard prohibited compilers from accepting code that violated syntactic rules, that common extension (based on C99 and almost certainly coming to C++ in its next revision) couldn't be used. As it is, the compiler must issue a diagnostic, and it is then free to do just what you expect it to do: Treat your variable as an integer with type long long int, and adjust the normal rules as appropriate.
In practice, this means that we have two typical categories of diagnostic messages: Error messages, meaning "This code violates a requirement of the C++ Standard, and I refuse to compile it," and warnings, meaning "This code violates a requirement of the C++ Standard, but I'm going to do something (probably) sensible with it, anyway." Unfortunately, most compilers also use warning messages to give advice about programming style. Compilers should not be in the business of criticizing style; there are other tools that do that. Compiler output messages should clearly distinguish between extensions and advice, either with a different kind of message or with a switch that turns off all messages that don't relate to violations of language rules. That would make it much easier to ignore their advice, and concentrate on the real coding problems [10].
The next time someone comes to you with code that they say "shouldn't compile," smile knowingly, and get on to the real problem.
Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.