Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

Why Language Designers Tolerate Undefined Behavior

January 16, 2014

As I promised last week, here is my first example of the tension between theoretical and socially inspired solutions to technical problems in programming-language design. This example dates back to 1980 or so, and concerns a C compiler that was running on a PDP-11, a 16-bit minicomputer that was widely used at the time.

A user of this C compiler had come to the compiler's author with code along the following lines:

 
     int a, b, c;
     a = /* a value */
     b = /* another value */
     if ((c = a – b) != 0) { /* do something */ }

The complaint was that the code represented by /* do something */ was sometimes being executed even when c was zero. Obviously, this was a compiler bug, right?

To answer this question, we must understand a little about the PDP-11's instruction set. Every add or subtract operation sets a two-bit condition code, which represents one of four possible states: positive, negative, zero, or over/underflow. In other words, when an addition or subtraction overflows or underflows, the condition code reports nothing about the result beyond the fact that it overflowed or underflowed. The PDP-11 also had available an instruction to test the value of a word and set the condition code appropriately.

The compiler was generating the following machine code for this C fragment:

 
               Copy the value of a into a register.
               Subtract the value of b from the register.
               Store the register in c.
               If the condition code indicates “zero,” jump around the /* do something */ code.

The third of these instructions, which stores the result of the subtraction in memory, does not affect the condition code; so the jump would occur whenever the condition code indicated that the result was zero. The problem was that if the subtraction overflowed or underflowed, the condition code would indicate as much regardless of the numerical result, so the /* do something */ code would always be executed, even if the value of c happened to be zero.

The user's argument was straightforward: "If I write

 
if ((c = expression) != 0) { /* do something */ }

and /* do something */ is executed, I think I have a right to expect that c will not be zero. After all, that's what I'm explicitly testing for."

The argument on the other side was harder to frame and harder to understand. As a pragmatic matter, the compiler's author could have solved the problem by inserting another instruction between the subtraction and the jump (either before or after the store instruction) to test the value of the register and set the condition code accordingly. However, he did not want to make this change because it would make every such piece of code run more slowly. The question, then, was how to deal with the user's expectation that c will not be zero if /* do something */ is executed?

Obviously, he couldn't say that making the code work the way the user expected would make it slower, so he wasn't going to do it. To do so would require an answer to the follow-up question: "You mean it's OK to get the wrong answer if you can do so more quickly?" Accordingly, in order to justify the compiler's actual behavior, it was necessary to argue that the user's expectations were incorrect.

As it happens, this argument is not hard to find. One must merely reframe the question to ask: "What does the user have the right to expect after an overflow?" Although work on the ISO C standard had not even started when this problem came up, there was already a firm sense of the answer, namely "Nothing at all." This answer came from two general design principles behind C:

  1. The language should impose no unnecessary overhead on the implementation.
  2. It should be as easy as possible to implement C on a wide variety of hardware.

The PDP-11 had no overhead-free way of checking for integer overflow, because it had no way of telling the processor to check for overflow automatically. Instead, every overflow check required at least one extra instruction, namely a conditional jump based on whether the condition code indicated an overflow. Therefore (1) argued that C should not require an overflow check. Moreover, that there might someday be a C implementation on a computer that always checks for integer overflow, and signals an error condition when overflow occurs. In that case, (2) argued that the implementation should be allowed to pass that error condition on to the user, rather than figuring out how to ignore it.

As a result, the C reference manual had already discussed the notion of undefined behavior, and had said that if a program behaves in an undefined way, then the user has no legitimate expectations at all about the program's behavior. Because of this policy, the compiler's author decided that the compiler's behavior in this case was correct, even though it led to the seemingly nonsensical result of c being zero even though a comparison had just established that c was not zero.

This example shows how the tension between theory and practice can yield surprising behavior. Such surprises often come from two circumstances that happen to occur at once:

  • The desire to broaden the opportunities for implementation gives latitude to implementors in how their products behave.
  • This latitude results in an individual implementation behaving in ways that are sometimes surprising, or even hazardous.

In effect, the desire to define a compiler's behavior in theoretically clean ways conflicts with the practical — i.e., the social — aspects of how people use compilers.

We shall look at another example of this tension between theory and practice next week.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video