It seems like every week or so, we are reading about some new software disaster (often described as a "glitch" in the press) caused by a bug in a program. Recently publicized incidents include recalls of cars due to a significant error in the control software, and shortly before that, the security hole in many Apple operating systems. The "glitch du jour" is a little more spectacular: the Heartbleed bug has caused a security hole in literally tens of millions of devices from dozens of manufacturers. This is a particularly disturbing defect because there is no way to tell if some malevolent intruder has taken advantage of it.
These and other similar bugs have been the subject of extended discussions in the technical software community, with a particular focus on how they might be prevented in future. We will consider one narrow question in this article: Can the use of a more appropriate programming language prevent or at least reduce the chance of such errors?
The specific bug in Heartbleed is a typical mistake in C code: the failure to perform a bounds check. C has no notion of bounds checking, so unless the programmer is careful to insert explicit checking code, the result can be undefined. Most programming languages (since the introduction of Fortran) have had some notion of "undefined," meaning that the language just doesn't say what happens. There was an entertaining discussion of this subject in the Algol-68 community back in the 1960s. Gerhardt Goos asked what "undefined" meant, and Charles Lindsay replied that "it could be anything, perhaps unimaginable chaos," to which Gerhardt asked, "and how do I implement unimaginable chaos in my compiler." Typically, the answer is that compilers steam ahead assuming nothing is undefined. If they assume wrong, then the generated code can indeed result in arbitrary chaos. In the case at hand, the arbitrary chaos came to be known as the "Heartbleed" bug, allowing confidential memory contents to be read by unauthorized users.
Built-in Bounds Checking
Could the use of some other programming language have helped? Definitely yes. In many languages, out-of-bounds references simply cannot happen: References are checked at compile time or, if necessary, with runtime code. The program is not allowed to proceed if an out-of-bounds situation is detected. Interestingly, such languages have been around for decades. Pascal introduced a notion of checked subtypes in the early 1970s and, a decade later, Ada extended this concept to cover many other runtime checks. Ada also included the notion of user-defined exceptions to handle runtime errors resulting from the violation of these and other checks. (Readers familiar with programming language history may know that both these notions have been around even longer.)
If the OpenSSL code had been written in Ada, for example, using the normal default mode with the compiler generating bounds checks, then there would have been no security vulnerability. Instead, an intruder could have at worst caused the code to terminate, requiring perhaps a restart of the affected process but that's much better than allowing an undetected system intrusion.
Going Beyond Ada's Checks
It has now been more than 30 years since the first Ada compiler was validated, so we can reasonably ask whether we can do even better in designing languages that help avoid the occurrence of errors in programs. There are two weaknesses in the Ada approach. First, Ada's predefined exceptions cover only a subset of things that can go wrong. True, we can avoid integer overflow and buffer overrun errors, two of the most common sources of vulnerabilities in C programs, but that's hardly the whole world of possible errors. Suppose for example, we decide to encode a month and day in an integer as a single value so that
621 means June 21. In Ada we could write:
type MonthDay is range 101 .. 1231;
This range would catch some invalid values such as
1301, but it would not detect in-between values such as
1095. In an article in Dr. Dobb's on the latest version of Ada, Ada 2012, we discussed the use of contracts including predicates, preconditions, post-conditions, and other kinds of assertions to extend the kinds of conditions that will be checked. As an example of a predicate in Ada 2012, we can write:
type MonthDay is new Integer with Static_Predicate => MonthDay in 101 .. 131 | 201 .. 229 …. | 1201 .. 1231;
Now, an attempt to assign a value of
1095 to a variable of this type will cause a runtime exception. The use of preconditions and postconditions for subprograms allows arbitrary conditions to be specified that are required to be met on the call to a subprogram and, respectively, guaranteed to be true on the return.
That's a big step forward, and indeed the contracts feature makes Ada 2012 a major advance over previous versions of the language. But it's not enough, and that brings us to the second issue. Although the use of Ada could have reduced the seriousness of Heartbleed, it's really not satisfactory to let intruders cause your program to abort. It's one thing if it means you have to reboot your server, but it's quite another if the system in question is the critical software controlling a car or a plane. In such situations, it doesn't do much good to detect the problem at runtime. If you are driving along at 70 miles an hour, and you suddenly lose steering control, you are hardly going to be satisfied by seeing a message on your console saying that an out-of-bounds index has been detected.
So how can we do better still? The obvious point is that we want to know in advance, as we build and compile the program, that it is impossible for such errors to occur at runtime. Looking at the declaration above, we don't want to think of this as "please check at runtime that any assignment to a
MonthDay variable meets this predicate." Instead, we want to think of this as meaning "please guarantee that the program cannot possibly attempt an assignment of an invalid value." Our focus shifts from checking for errors at runtime to proving at compile time that no errors can occur.