Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

Undefined Behavior Versus Teaching

August 22, 2012

Last week, I asked readers to find the bug in the following code example:

const char* hello = "Hello"; 
const char* world = "world"; 
char* helloworld = new char[strlen(hello) + strlen(world) + 2];
strcpy(helloworld, hello);
strcat(helloworld, ", ");
strcat(helloworld, world);

One reader posted a comment that identified the bug: The 2 in the third line should be 3, in order to account for the comma (in the quotes on the fifth line), the space after the comma, and the null character that terminates C-style strings. There is actually a second bug, which went unmentioned: The memory to which helloworld points was allocated with a new expression, so at some point it must be freed with an appropriate delete[] expression.

A second reader made the following comment:

I can see advantages and disadvantages to starting instruction at the high-level and starting at the low-level. Part of it depends on your application area. 


If you're a newly minted college grad with some C# or Java experience, you get to your first job and learn that you'll be doing a Linux driver for a new device the company is building. In such a case, though you want to make use of some C++ facilities, it doesn't necessarily move you along faster to start with std::string and std::vector because you need to know more about pointers and raw arrays in order to interface with the OS and the device.

This is an interesting comment, and one about which I think reasonable people can disagree. There is no question that people who intend to write device drivers need to understand low-level concepts thoroughly. However, I think there is definitely a question about the best way to reach that state of affairs.

I believe that even if the eventual goal is to show programmers how to write low-level code, it is still better to start at the top and work down. The trouble with code examples such as the one above is that all too often, they will appear to work — thereby giving the programmer the incorrect impression that the erroneous code is actually correct. Such misimpressions can take surprisingly long to unlearn.

When a program performs an operation, the language in which it is written can treat that operation in one of three ways:

  1. The operation is well defined, so the program behaves according to the definition.
  2. The operation is defined to be invalid, so the program reports the error.
  3. The operation is not defined, so the program does something that may or may not be related to what the programmer intended.

My experience is that not only is case (3) a rich source of bugs that are hard to trace, but it is also a serious impediment to learning. If a student writes X and the machine does Y, the student will typically conclude, "If I write X, the machine will do Y." It is only after much experience that students learn that the right conclusion might actually be, "If I write X, the machine will do something, which might or might not happen to be Y; I cannot draw any conclusions from what it happens to do."

This experience argues — at least to me — that in teaching programming, it is important to avoid situations in which students might be tempted to write undefined code that happens to appear to work. The example above is typical of such code: A student who writes it and gets what appears to be correct results will have no reason to suspect that anything is wrong.

In other words, a programmer who does something undefined does not always realize what has happened. In the strcpy case above, it is entirely possible that the memory that is improperly overwritten happens not to be used for any other purpose, so the program will appear to work. How can a programmer who writes such code come to realize that it is a mistake?

Even worse: It is not always possible to test reliably for undefined behavior. The widely held view that a program is correct if it passes all of its test cases simply does not apply when the program evokes undefined behavior.

Because of such problems, I believe — and I think my experience bears out — that it is important to avoid undefined behavior wherever possible when teaching beginners.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video