It's no secret that most languages have adopted the two-tier type system introduced by their early forebears. That model consists of a series of primitives, such as characters, integers, floating-point values, and an odd fellow the boolean which I'll return to in a moment. A higher tier, referred to in some languages as "compound data types," comprises entities that are containers for multiple instances of primitives. General-purpose languages generally include among these, familiar constructs such as arrays and structures, but rarely much else. It's probably best saved for a separate editorial, but I've often felt that the most interesting languages are the ones that define more-complex compound types and integrate them deeply into the language. Lists in Lisp and Haskell, or tables in Lua, come to mind. As do first-class functions in many languages. There's no doubt that integrated, compound data types allow for all kinds of interesting possibilities.
White PapersMore >>
- How to Mitigate Fraud & Cyber Threats with Big Data and Analytics
- Advanced Threat Protection For Dummies ebook and Using Big Data Security Analytics to Identify Advanced Threats Webcast
Primitive types are not without interest, however. Consider such apparently trivial entities as characters. In C, they were long interchangeable with bytes until the rest of the world thought that a Latin character set was a tad limiting and US-centric. And for reasons I'll never fathom, them foreigners did not cotton to using trigraphs for more complex characters. As a result, characters suddenly became wide and have continued widening with the zeitgeist until you have languages such as the new system language from Mozilla, called Rust, which defaults to 32-bit characters.
While character widths have been evolving with the needs of the larger community, booleans have remained stable, if homunculus-like, creatures the neglected stepchildren of language designers, seemingly constrained to blink true or false forever with no foreseeable evolution. Let me show you how bad they have it. Check a few languages, and you'll see the pattern of disrespect. In Java, the boolean data type can have only two values:
false. You can test for a value being
== true or
== false. However, true and false are not reserved words in Java. It's not like Java had too many reserved words (despite reserving two keywords that are not used). It's just that the literals for the boolean values were not elected to the fraternity. In all other aspects but the terminology, they are reserved words: You cannot create a variable named
false in Java.
Things are hardly better in native languages. C did not have a boolean type until C99, some 25 years after the language was first published. And even, then booleans were syntactically introduced with this monster syntax:
_Bool. Apparently, sensing that the syntax might be a trifle ponderous, the committee provided a header file, stdbool.h, that aliased the syntax to bool. I believe it's the only language primitive that requires its very own header file to assuage a peculiar syntactical choice. Pressing on, I should note booleans in C and C++ are not binary settings as they are in Java. Rather, they are integer values whose boolean status is determined by whether the integer equals 0. I leave it to others to work out the conceptual basis for having numbers be true or false. Only auditors could find that concept intuitive.
While I've clearly been carrying on with tongue firmly in cheek, I do believe that the purely binary implementation of boolean data types should be revised for motives of software engineering. There should be a third state, termed
uninitialized. An uninitialized state should be the default state for any boolean item until it's initialized. Thereafter, it could be true or false only. Any attempt to evaluate the true/false status of a boolean in an uninitialized state should result in an exception. This is similar to my longstanding wish that floating-point numbers default to NaN (not a number) when they are defined. Operations on NaN values throw exceptions or cause other conspicuous problems, meaning that they're much better from a software engineering point of view than the traditional defaults (generally 0.0), which can fail silently.
The idea of a three-way boolean did not originate with me. In fact, there is just such an animal in the Boost libraries, called a tribool. In that implementation, the third state is termed indeterminate and it's a valid state, on the same footing as true and false. To my eye, that's not a terribly useful implementation because it subverts the boolean entirely. Moreover, it provides no special security, as the default implementation is false, not indeterminate. Fortunately, writing an implementation for the boolean I describe is not difficult.
Native languages during the last 12 months have seen a lot of new growth. C and C++ both have new language standards, and bare-metal competitors such as Mozilla's Rust and Google's Go are emerging from their development labs. All these advances share a common concern for making the languages safer for use. Unfortunately, this concern reaches to only one data type the pointer. Specifically, the concern about null pointers. Eventually, we can hope that the language will provide better safety through spectacular blow-ups of uninitialized data items. When it does so, I expect three-way booleans will be part of the solution.