C/C++

Living By the Rules

By Pete Becker, May 16, 2006

Understanding compiler rules better equips you to interpret messages the compiler sends you.

Practical Details

Compilers have to offer more than mere standards conformance to succeed in the marketplace. When there's an error in the code, we expect compilers to tell us something about what was actually wrong and where. We also expect that a compiler won't generate an executable file if there's nothing sensible it can do with the code we fed it. That's what most programmers mean when they say that they expect "an error," or that the compiler "shouldn't compile" the code.

On the other hand, there often is something sensible that the compiler can do with code that violates diagnosable rules in the C++ Standard. That's known as a language extension, and it's one of the reasons that the C++ Standard says so little about what compilers should do with ill-formed programs. For example, C++ code that defines a variable whose type is long long int violates a syntactic rule: There is no such type. If the Standard prohibited compilers from accepting code that violated syntactic rules, that common extension (based on C99 and almost certainly coming to C++ in its next revision) couldn't be used. As it is, the compiler must issue a diagnostic, and it is then free to do just what you expect it to do: Treat your variable as an integer with type long long int, and adjust the normal rules as appropriate.

In practice, this means that we have two typical categories of diagnostic messages: Error messages, meaning "This code violates a requirement of the C++ Standard, and I refuse to compile it," and warnings, meaning "This code violates a requirement of the C++ Standard, but I'm going to do something (probably) sensible with it, anyway." Unfortunately, most compilers also use warning messages to give advice about programming style. Compilers should not be in the business of criticizing style; there are other tools that do that. Compiler output messages should clearly distinguish between extensions and advice, either with a different kind of message or with a switch that turns off all messages that don't relate to violations of language rules. That would make it much easier to ignore their advice, and concentrate on the real coding problems [10].

The next time someone comes to you with code that they say "shouldn't compile," smile knowingly, and get on to the real problem.

Notes

[1] I'm using "semantic" here with its meaning in computer languages; in human languages, its meaning tends to incorporate both this meaning and the mapping between words and ideas.
[2] Java tried to do away with this flexibility by prescribing exact sizes for all integer and floating-point types. That seems to have worked okay for the integer types (although prematurely settling on 16 bits for its character type means that writing general-purpose character handling code is unnecessarily difficult now that Unicode doesn't fit in 16 bits), but with floating-point types it caused a major problem. Prescribing sizes and exact semantics meant that the Java runtime couldn't use the fastest available floating-point implementation on some hardware, so floating-point Java code ran significantly slower on Intel hardware than the equivalent C or C++ code. These restrictions on floating-point types have been relaxed at the request of people who write number-crunching code.
[3] If you need fully predictable behavior for code like this, the usual solution is an #if/#elseif chain that checks the range, using the macros defined in the header <climits>. In its latest revision, the C Standard provides a set of types with well-defined sizes. The C++ Standards committee's Technical Report on C++ Library Extensions also adds these types.
[4] The order sometimes changes when the code is recompiled with the same compiler and different optimization settings.
[5] Violations of semantic rules that are not diagnosable semantic rules result in undefined behavior; that is, the C++ Standard doesn't impose any requirement on what a compiler does when faced with such code. When referring to this, please don't use the abominable wording "This program invokes undefined behavior." The correct phrasing is "The behavior of this program is undefined." And please keep in mind that undefined behavior means only that the C++ Standard doesn't say what the code in question does. It does not mean that compilers are obliged to do nasty things like set fire to your hard drive. Often, the best way to write code that takes maximal advantage of the hardware it will run on is to use code constructs whose behavior is undefined, but well understood. For example, if you really need speed, instead of testing whether an integer value is greater than or equal to zero and less than some upper limit, you can convert it to an unsigned integer type with the same number of bits and test whether the result is greater than the upper limit. On most architectures, converting the value doesn't change any bits, so it doesn't require any code; negative values are simply treated as large unsigned values, which will always exceed the limit. The behavior of that code is formally undefined, but it works. Except when it doesn't. Test for it.
[6] The One Definition Rule says, in essence, that things that ought to be defined the same way must be defined the same way. For example, defining a struct named foo with a single member of type int in one translation unit and with a single member of type double in another translation unit violates the ODR. Violations of the ODR result in undefined behavior.
[7] The diagnosable rules are the syntactic rules and all diagnosable semantic rules.
[8] Some valid programs are simply too big for a compiler to handle. That doesn't make the compiler nonconforming, even if other compilers can handle the same program.
[9] Provided, of course, that its documentation lists that message as its diagnostic message.
[10]Advice is sometimes useful, especially for beginners, but I'm tired of having to write code that satisfies four different compiler writers' ideas of what constitutes good style, so that customers can compile it with warnings cranked up to the maximum. Yes, if(0 <= x) might be a mistake when x is an unsigned type; on the other hand, when I'm writing template code that works with all integral types, I don't want to write two separate versions for signed and unsigned types, and I don't want to have to write some compiler-breaking mess of metaprogramming. I just want to write the test, even though it's always true for an unsigned type. The code is legal and its meaning is well defined and clear. If the compiler writer wants to remove the test when x is unsigned, that's fine with me.

Previous 1 2 3

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

C/C++

Living By the Rules

Practical Details

Notes

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

C/C++

Living By the Rules

Practical Details

Notes

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

C/C++ Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content