More Thoughts on Undefined Behavior
I first started thinking about undefined behavior as an important issue in the 1970s, when I read Edsger Dijkstra's then recently published book A Discipline of Programming. Among other ideas, this book advanced a notion of what a program should be that was quite radical at the time, and has become much more mainstream — though far from universal — today.
Dijkstra argued that instead of thinking of a program as a set of instructions for a computer, we should instead think of a computer as a device that is intended to execute our programs. This distinction may seem like splitting hairs, but it becomes an important part of deciding how we think about software errors. If a program's purpose is to instruct a computer, then it is the programmer's responsibility to issue only valid instructions. If a computer's purpose is to execute our programs, then it is the computer's responsibility to detect and refuse to follow invalid instructions.
In the early days of computing, the first of these viewpoints was overwhelmingly the most common. Even trivial coding errors would often result in the entire machine stopping instantly, with no easy way to determine where or why it had done so. A few programming languages, such as BASIC and APL, tried to report all runtime errors in their programs; but as a result of this checking, many people considered them too slow for serious applications. As computers became faster, programmers became more willing to accept the idea that some of that speed should go into checking that programs contained only well-defined operations. Some languages, such as PL/I, offered a choice of implementations, so that programmers could choose how aggressive the implementation would be about runtime checking.
One language whose designers made a particularly interesting choice in this regard was Modula-3. As its name suggests, a Modula-3 program consists of modules. Each module is either safe or unsafe. Programmers choose modules' designations, with safe being the default. The language also divides errors into checked and unchecked. All compile-time errors are checked; an unchecked runtime error is what a C or C++ programmer would call undefined behavior. A program that encounters a checked runtime error will always report that error — at least if the language is implemented correctly.
What interests me particularly about Modula-3 is that it allows unchecked errors to occur only in unsafe modules. Therefore, unless you designate a module as unsafe, the implementation is required to generate code that checks for all possible runtime errors. This requirement becomes tricky in the case of pointers — which Modula3 has — for the same reason as in C++, namely that deleting the object to which a pointer points invalidates all pointers to that object. Modula-3 solves that problem in a radical way: It says that its equivalent of the C++
delete expression is permitted only in unsafe modules. Of course this policy would lead to memory leaks were it not that Modula-3 requires its implementations to include a garbage collector.
In other words, the Modula-3 designers concluded that garbage collection is necessary for runtime safety as follows:
- A computer is a device that is intended to execute a program.
- Therefore, a computer that is unable to execute part of a program should report its inability to do so, rather than just stop working.
- The ability to make such reports requires that all operations be well defined.
- Explicitly freeing dynamically allocated memory creates a whole category of undefined operations.
- Therefore, programs should not be permitted to free dynamic memory explicitly.
- Therefore, a garbage collector is necessary.
However, garbage collection has some disadvantages of its own:
- If memory is freed automatically, the programmer loses at least some control over programs' resource usage.
- A less than perfectly designed garbage collector might introduce delays at inconvenient times. This problem is particularly important in systems with real-time constraints.
C++ smart pointers offer the possibility of a similar level of dynamic-memory safety as garbage collection, but with a different set of performance tradeoffs. Their very existence tells me that despite the more than 35 years since Dijkstra published his book, programmers are still trying to resolve the tension between safety and performance.