Pete is a consultant specializing in library design and implementation. He has been a member of the C++ Standards Committee since its inception, and is Project Editor for the C++ Standard. He is writing a book on the newly approved Technical Report on C++ Library Extensions to be published by Addison-Wesley. Pete can be contacted at [email protected]
BILL CLINTON GOT IN TROUBLE over a nuanced statement about context dependency, delivered in a context where nuance can be drowned in deliberate noise. The C++ Standard makes nuanced statements about valid and invalid programs, which usually don't drown in mere noise. Instead, they often disappear in a forest of perceived complexity, some essential, some the byproduct of simplifications made by teachers for the benefit of newcomers who soon outgrow them, and some from ad hoc learning that displaces detailed study. The rules for what a compiler must tell you when you try to compile invalid code are more complex than most programmers realize; at the same time, they're also much simpler. In this column, I examine those rules to understand their necessary complexity and their actual simplicity. With that knowledge, you'll be better equipped to interpret messages that you get from your compiler.
Grammars, Human and Otherwise
Most programming languages are modeled on human languages. They have a grammar that consists of syntactic rules, semantic rules, and transformations that map things in the language into things in the computer system that we're writing code for .
Syntactic rules are often expressed through grammar productions. You can read the productions for C++, in far greater detail than you're interested in, in the C++ Standard. They tell you what constitutes a valid statement in C++. Just as "This sentence no verb" is not a valid sentence in English, int x 3; is not a valid statement in C++. Both are missing a required element, and in both cases you can look through the grammar rules and find the one that's being violated.
Semantic constraints deal with context: A statement that is syntactically valid might not make sense in the place where it's used, typically because something is missing or ambiguous in the broader context. "C and C++ are third-generation programming languages. It can be far more expressive than assembler." Both sentences are syntactically correct, but the combination of the two doesn't make sense, because it's not possible to determine whether the "it" at the beginning of the second sentence refers to "C" or "C++." The sentences, taken together, violate the semantic rule that a singular pronoun must have exactly one antecedent to refer to. Similarly, in the code fragment void f(char); void f(double); f(3);, none of the three statements standing alone violates any syntax rules, but the combination is ambiguous, because it's not possible to determine whether the call to f(3) in the third statement refers to the first or the second version of f. The statements, taken together, violate the semantic rule that a call to an overloaded function must refer to exactly one of the overloads.
And, finally, the transformation from a valid statement in the language into a meaningful concept in the outside world is fraught with danger. "Colorless green ideas sleep furiously" violates neither syntactic rules nor any semantic rules. Nevertheless, when you map the words it uses and the abstract structure of the sentence into real-world concepts, it doesn't mean anything. Similarly, cout < "The cosine of 30 degrees is " < tan(90) < '\n'; violates no syntactic rules or any semantic rules, but its output is meaningless.
Of course, as programmers, it's our job to make sure that the output of the programs we write, whatever that output's form may be, is meaningful. So we try to write programs that don't violate any syntactic rules or any semantic rules. Having done that, we mentally apply the transformation rules that give meanings to programs, determine exactly what it is that the code we wrote is supposed to do, and check to be sure that what it does is what we want it to do. Easy enough, right? But it's not that simple: There are places where the transformation rules allow more than one meaning.