Introduction to Programming with Lists
Last week, I described the fundamental data structure in Lisp. This data structure is similar to the C++ pair library template, except that it is dynamically typed. To recap, it entails five operations, which I have given C++like names rather than Lisp-like names:
- For any values
x
andy, pair(x, y)
is a single value that containsx
andy
. first(pair(x, y))
isx
.second(pair(x, y))
isy
.ispair(x)
is true if and only ifx
is apair
.nil
is a special value that is not apair
, and is not equal to any other value.
Because we are assuming that we are working in a world in which values never change after being created, we will also assume that each of these operations runs in O(1)
time, regardless of the complexity of their arguments. The reason for this latter assumption is that there is never a need to copy an immutable value; rather, we can use a pointer to stand in for the value to which it points. In order for these data structures to be useful, we will also assume that there are values, such as integers, that are neither pair
s nor nil
.
It should not take too much thought to realize that the data structure I have just described is a binary tree. However, much of the time, Lisp programmers don't think of the data structure that way. Instead, they restrict themselves to a particular subset of binary trees, which they call lists. We can define a list this way:
A list is either nil
or a pair
whose second
element is a list.
So, for example, pair(x, nil)
is a list regardless of the value of x
, because it is a pair
whose second element is the list nil
. Similarly, pair(x, pair(y, nil))
is a list, as is pair(x, pair(y, pair(z, nil)))
, and so on. However, pair(nil, x)
is a list only if x
is nil
. In other words, a list is a tree on which successive calls of second yield
a nil
-terminated sequence of lists.
These restricted trees appear in so many contexts that Lisp has an abbreviated notion for them: (x y z)
is an abbreviation for pair(x, pair(y, pair(z, nil)))
. If we view a list in this way, we can see that there is a natural way of talking about its length — so we can try our hand at writing a function that computes the length of a list:
int length(x) { int n = 0; while (x != nil) { x = second(x); ++n; } return n; }
This isn't quite a C++ program because of x
's dynamic typing, but you should get the idea. If x
is nil
, we return 0; otherwise, we follow the second
elements as far as we can and count how many times we did so.
Oops! This program violates our original assumption that all values are immutable. Let's try again:
int length(x) { if (x == nil) return 0; return length(second(x)) + 1; }
I find this function easier to understand than the earlier one, despite the recursion, because it is easy to verify that its results are correct. The length of nil
is obviously zero, and the length of a non-nil
list is equally obviously one more than the length of its second
part. However, this simplicity comes at a cost: The function makes a recursive call for each element of the list.
We can remove that cost by rewriting the function to use tail recursion:
int length_aux(x, n) { if (x == nil) return n; return length_aux(second(x), n + 1); } int length(x) { return length_aux(x, 0); }
Readers must decide for themselves whether this rewrite is easier to understand than the original iterative version.
This example should start to give a feel for what it's like to program with immutable lists. Next week, I'm going to show how to implement two algorithms that are part of the C++ standard library: sort
and reverse
. Between now and then, I invite you to think about how to do so.