# When Is An Optimization More Than Just An Optimization?

While I've been discussing immutable values these past few weeks, I’ve noticed an article about Haskell. Haskell is a language that carries the notion of immutable data to extremes, which makes it worth mentioning in the context of the current discussion.

One of the more interesting aspects of programming with immutable data is that if your data are truly immutable, then there is never any need to call a function more than once with the same arguments. All you have to do is remember what the result was the last time you called that function with the same arguments, and you can avoid repeating that computation. This optimization is sometimes called *memoization*, spelled (and pronounced) without an r.

For example, consider a function that computes factorials:

unsigned fact(unsigned n) { if (n == 0) return 1; return n * fact(n-1); }

We have seen in a previous article that we can rewrite this code to change its recursion into a tail recursion, which the compiler can then rewrite as iteration:

unsigned fact_aux(unsigned f, unsigned n) { if (n == 0) return f; return fact_aux(n * f, n – 1); } unsigned fact(unsigned n) { return fact_aux(1, n); }

Because `fact_aux`

calls itself as the very last thing it does, a smart compiler will avoid the recursive call by rewriting that call into a jump to the beginning of the function's code.

If we memoize our fact function, it doesn't gain anything the first time we call it. However, suppose we call `fact(5)`

. When that computation has completed, we will have stored not only the value of `fact(5)`

but also the values of `fact(4), fact(3)`

, and so on. A subsequent call of `fact(n)`

will avoid any multiplication so long as `n < 6`

. In fact, even calling `fact(7)`

will avoid most of the computation, and will store `fact(6)`

and `fact(7)`

for future reference.

So far, this optimization seems like no big deal. However, consider the following function to compute the *n*th Fibonacci number:

unsigned fib(unsigned n) { if (n < 2) return 1; return fib(n-1) + fib(n-2); }

For starters, note that it is not obvious how to rewrite this function to use tail recursion, because the + inside the `return`

statement is executed *after* both recursive calls. Note too that the execution time for this function is at least proportional to its result! To see this, observe that the execution time for the `return`

statement must be at least the sum of the execution times for `fib(n-1)`

and `fib(n-2)`

unless `n < 2`

, in which case the execution time is still nonzero. This computation exactly mirrors the computation of `fib`

's value, so the execution time itself must mirror that value.What if we memoize `fib`

? It should not be hard to see that the execution time now decreases from `O(fib(n))`

to `O(n)`

the first time we call `fib(n)`

, and to `O(1)`

on all subsequent calls. This is an enormous improvement.

Memoization is one kind of *lazy evaluation*, a phrase that refers to the notion of evaluating only enough of an expression to determine its value. The kind of short-circuit evaluation that C++ does on the built-in &&, ||, and ?: operators is an example of lazy evaluation.

The Haskell article that I mentioned at the beginning of this one gives the following example:

let a = [2, 4..100] let b = drop 5 a

as an example of how to generate a list with all but the first five even numbers between 2 and 100. What the article doesn't mention is that Haskell does not actually evaluate the elements of *either* `a`

or `b`

until the program does something that needs the values of those elements. This use of lazy evaluation sometimes allows remarkable simplifications in how programmers express problems.

For example, consider a problem that Edsger Dijkstra attributed to Richard Hamming many years ago: Write a program to produce, in increasing order, a sequence of positive integers that have no prime factors other than 2, 3, or 5. The first such integer is 1, of course; the next two are 2 and 3. Then we have 4, which is 2 * 2, then 5, then 6. We skip 7, which has 7 as a factor, then include 8, 9, and 10. And so on.

This sequence of values is unbounded — there is no *a priori* limit on how many elements it has. Nevertheless, because Haskell is based on lazy evaluation, we can solve this problem as follows. I'm sure that if I'm mistaken about the tiny little bit of Haskell I know, someone will correct me:

seq = 1 : merge (scale 2 seq (merge (scale 3 seq) (scale 5 seq)))

Here, instead of defining a recursive function, we are defining recursive data: The name `seq`

refers to a sequence with a first element of 1, and remaining elements obtained by merging three sequences. Each of these sequences is obtained by taking `seq`

— all of its elements — and multiplying those elements by 2, 3, or 5, respectively.

To make this description clearer, here's the definition of scale:

scale n [] = [] scale n (x:xs) = (n * x): (scale n xs)

Colloquially: Applying `scale`

to `n`

and a null sequence yields a null sequence; otherwise, we obtain the first element of the result by multiplying `n`

by the first element of the input sequence and calling `scale`

recursively for the rest of the elements. Finally, we can define merge this way:

merge xs [] = xs merge [] ys = ys merge (x:xs) (y:ys) = if x == y then x : (merge xs ys) else if x < y then x : (merge xs (y:ys)) else y : (merge (x:xs) ys)

This code represents the usual way to merge two sorted sequences, except that the merged sequence has only a single copy of values that appear in both input sequences.

If not for lazy evaluation, this code would never terminate. In effect, both the Haskell compiler and language must guarantee that only as much of this computation will be executed as is necessary to determine the values that the programmer wants to see. The idea of writing a computation that takes forever, and trusting the compiler to do only the necessary parts of it, is radical enough that we need to think of it as more than just an optimization. Instead, it's a radical rethinking of what one does in order to write programs.

This radical thinking comes from following the idea of immutable data toward its logical conclusion: Immutable data means that we can avoid calling functions more than once, which means that we can write infinite recursion and let the compiler figure out where to end it. Of course, relying on compilers to do too much for you can have its own problems. I'll discuss some of those next week.