Reversing an Immutable List
Last week, I described a function that figured out the length of a Lisp-like list based on the following operations:
- For any values
x
andy
,pair(x, y)
is a single value that containsx
andy
. first(pair(x, y))
isx
.second(pair(x, y))
isy
.ispair(x)
is true if and only ifx
is apair
.nil
is a special value that is not apair
, and is not equal to any other value.
This function was easy to implement recursively, and it was also easy to change the recursion into a tail recursion.
Our next problem is a little harder: Reverse one of these lists. The C++ standard-library reverse algorithm operates in place: Give it a pair of iterators, and it reverses the sequence of the elements in the range associated with those iterators. Obviously, reversing an immutable data structure in place would be a contradiction in terms; so instead we must define a reverse function that takes a list (i.e., a pair
) and returns its reversal.
How do we go about implementing this algorithm? If the list to be reversed is empty, we're done; but what if it isn't? When we create a new list, we do so by creating a new first element (or, if you prefer, a new pair
that represents the root of a binary tree). If we had variables that we could modify, we might write something along these lines:
reverse(x) { result = nil; while (x != nil) { result = pair(first(x), result); x = second(x); } return result; }
This code exploits the notion that we can append to a list only at the beginning: We successively pull elements off the beginning of x
and push them onto the beginning of result
. When we're done, result
has the elements of x
in reverse order.
However, this code uses mutable variables. If we really wish to use immutable data structures, we need to find another way to write reverse
. The key insight needed to do so is that mutable variables are useful only in the context of loops: Without loops, we could simply use a new name every time we want to create a variable. It is only when loops enter the picture that it is necessary for a name to represent one value at one point in a program's execution and another value at another point.
To change a program that uses loops and mutable variables into one that uses recursion and immutable variables, we replace each loop by a tail-recursive function. This function has a parameter that corresponds to each variable that changes in the loop. The function's result contains the variable(s) we care about after the loop is over. In the case of reverse
, the variables that change inside the loop are x
and result
; after the loop is over, we care only about result
. Therefore, we can rewrite the loop as follows:
reverse_aux(x, result) { if (x == nil) return result; return reverse_aux(second(x), pair(first(x), result)); }
Once we have reverse_aux
, we can use it to implement reverse
:
reverse(x) { return reverse_aux(x, nil); }
I've seen a number of programmers look at code written in this style and say something such as, "This looks really weird — why would anyone program this way?" Part of the answer is that weird is often just a slightly pejorative way of writing unfamiliar. There is little question that this code looks weird to people who are accustomed to seeing only iterative code. However, the same could be said of code that uses while
loops when seen by programmers who are used to writing goto
statements. To see this, try to explain to someone who has never written a while
loop how to come up with the iterative version of the reverse
function.
These remarks don't really make it easier to understand how the recursive reverse
function works, because I showed how to write it by starting from the iterative version. As a result, my explanation assumes that you already understand the iterative version, so of course the recursive one is harder to understand. Next week, I'll explain how to write reverse
without thinking iteratively, after which I'll explain how to tackle sorting.