Practical Advantages of Immutable Values
For the past few weeks I've been showing examples of how to program in a hypothetical language that prohibits the value of a variable from changing once the variable has been created. This prohibition, which has roots in how mathematicians describe their art, has wide-ranging implications for how we program in such a language. The examples so far have shown some of those implications:
- There is no practical difference between two variables having the same value and one being a copy of the other.
- Iteration must usually be expressed as (tail) recursion.
- Arrays, by their nature, rely on the ability to change individual elements; so programs tend to use tree- or list-like data structures instead.
These implications are all mixed blessings — a fact that suggests asking why bother at all? That is, why would anyone go to the trouble of designing a language that forbids variables from varying? It is true that a language such as Lisp, which uses lists as its primary data structure, allows us to write some programs that are (at least arguably) easier to explain than their more familiar counterparts. Nevertheless, a language that uses such data structures could still permit elements of those data structures to vary — and, indeed, the versions of Lisp available today all allow mutable data. So what, really, is the advantage or prohibiting mutable data?
One advantage is that prohibiting mutable data eliminates the whole notion of side effects. So, for example, if you write an expression such as f(x)+g(y)
, you do not have to worry about whether f
might change a value that g
uses. Such a change might cause the program to work differently on one implementation or another, depending on whether the compiler decides to call f
before g
or vice versa. Prohibit mutable data and this entire problem vanishes.
Similarly, you never need to worry about whether a function might change your data. Many programmers have probably had the experience of writing an expression such as lookup(data, x)
, not realizing that this particular lookup
function actually inserts a copy of x
into data
if it doesn't find x
there already.
However, the notion of never changing data gets particularly interesting when one applies it to a filesystem. Optical data-storage devices such as CD and DVD have long been available in versions that allow writing only once, and those devices have some interesting practical uses.
For example, a filesystem that does not support any way of changing data is particularly useful for logging errors, because it ensures that bugs in the logging software can never change a log entry that has already been made. If you think that it's paranoid to try to defend against such bugs, perhaps you should consider security flaws as well as accidental bugs.
For example, people who successfully break into systems often alter log files to remove the evidence of how they managed to penetrate those systems, and sometimes even remove any evidence that they had been there at all in the first place. If a system maintains its log files on a device that is physically incapable of changing data once created, the possibility goes away of events being covered up after they have happened.
Filesystems on write-once devices have other advantages as well. For example, it is always possible, at least in principle, to recover earlier states of such filesystems. After all, if the hardware does not allow any data to be changed, then all of the data associated with earlier states is still there. In practical terms, such a design might mean "infinite undo" everywhere. In other words, if you're writing a paper and decide you don't really like the subject, you can throw it away and start something else instead, secure in the knowledge that if you change your mind, you can still retrieve it later.
Such a state of affairs has profound implications with both engineering and social aspects. As an example of engineering aspects, look again at that f(x)+g(y)
expression. As we've written it, a compiler in an immutable-data language can see that the subexpressions f(x)
and g(y)
are independent, so, for example, it can execute those expressions in parallel. If we wanted to make one of those expressions depend on the other, we would have had to write something such as g(f(x), y)
, so that the computation in g
could make use of the result of f
. We would also have had to define both f
and g
so as to make explicit the data that f
passes to g
. In other words, if we are really serious about immutable variables, we have to change many aspects of how we program in order to cater to the relevant restrictions.
As an example of social aspects, think about recovering what you discarded after having second thoughts. That's useful if you started working on a paper, changed your mind, and then changed it back. However, what about that angry email that you started writing late at night and — fortunately — had the foresight not to send? How does a filesystem that never lets you delete anything permanently affect how people use such systems? That's a topic for another day.