Undefined Behavior Versus Teaching
Last week, I asked readers to find the bug in the following code example:
const char* hello = "Hello"; const char* world = "world"; char* helloworld = new char[strlen(hello) + strlen(world) + 2]; strcpy(helloworld, hello); strcat(helloworld, ", "); strcat(helloworld, world);
One reader posted a comment that identified the bug: The
2 in the third line should be
3, in order to account for the comma (in the quotes on the fifth line), the space after the comma, and the null character that terminates C-style strings. There is actually a second bug, which went unmentioned: The memory to which
helloworld points was allocated with a
new expression, so at some point it must be freed with an appropriate
A second reader made the following comment:
I can see advantages and disadvantages to starting instruction at the high-level and starting at the low-level. Part of it depends on your application area.
If you're a newly minted college grad with some C# or Java experience, you get to your first job and learn that you'll be doing a Linux driver for a new device the company is building. In such a case, though you want to make use of some C++ facilities, it doesn't necessarily move you along faster to start with
std::vector because you need to know more about pointers and raw arrays in order to interface with the OS and the device.
This is an interesting comment, and one about which I think reasonable people can disagree. There is no question that people who intend to write device drivers need to understand low-level concepts thoroughly. However, I think there is definitely a question about the best way to reach that state of affairs.
I believe that even if the eventual goal is to show programmers how to write low-level code, it is still better to start at the top and work down. The trouble with code examples such as the one above is that all too often, they will appear to work — thereby giving the programmer the incorrect impression that the erroneous code is actually correct. Such misimpressions can take surprisingly long to unlearn.
When a program performs an operation, the language in which it is written can treat that operation in one of three ways:
- The operation is well defined, so the program behaves according to the definition.
- The operation is defined to be invalid, so the program reports the error.
- The operation is not defined, so the program does something that may or may not be related to what the programmer intended.
My experience is that not only is case (3) a rich source of bugs that are hard to trace, but it is also a serious impediment to learning. If a student writes X and the machine does Y, the student will typically conclude, "If I write X, the machine will do Y." It is only after much experience that students learn that the right conclusion might actually be, "If I write X, the machine will do something, which might or might not happen to be Y; I cannot draw any conclusions from what it happens to do."
This experience argues — at least to me — that in teaching programming, it is important to avoid situations in which students might be tempted to write undefined code that happens to appear to work. The example above is typical of such code: A student who writes it and gets what appears to be correct results will have no reason to suspect that anything is wrong.
In other words, a programmer who does something undefined does not always realize what has happened. In the
strcpy case above, it is entirely possible that the memory that is improperly overwritten happens not to be used for any other purpose, so the program will appear to work. How can a programmer who writes such code come to realize that it is a mistake?
Even worse: It is not always possible to test reliably for undefined behavior. The widely held view that a program is correct if it passes all of its test cases simply does not apply when the program evokes undefined behavior.
Because of such problems, I believe — and I think my experience bears out — that it is important to avoid undefined behavior wherever possible when teaching beginners.