Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

Introduction to Programming with Lists

October 09, 2013

Last week, I described the fundamental data structure in Lisp. This data structure is similar to the C++ pair library template, except that it is dynamically typed. To recap, it entails five operations, which I have given C++like names rather than Lisp-like names:

  • For any values x and y, pair(x, y) is a single value that contains x and y.
  • first(pair(x, y)) is x.
  • second(pair(x, y)) is y.
  • ispair(x) is true if and only if x is a pair.
  • nil is a special value that is not a pair, and is not equal to any other value.

Because we are assuming that we are working in a world in which values never change after being created, we will also assume that each of these operations runs in O(1) time, regardless of the complexity of their arguments. The reason for this latter assumption is that there is never a need to copy an immutable value; rather, we can use a pointer to stand in for the value to which it points. In order for these data structures to be useful, we will also assume that there are values, such as integers, that are neither pairs nor nil.

It should not take too much thought to realize that the data structure I have just described is a binary tree. However, much of the time, Lisp programmers don't think of the data structure that way. Instead, they restrict themselves to a particular subset of binary trees, which they call lists. We can define a list this way:

A list is either nil or a pair whose second element is a list.

So, for example, pair(x, nil) is a list regardless of the value of x, because it is a pair whose second element is the list nil. Similarly, pair(x, pair(y, nil)) is a list, as is pair(x, pair(y, pair(z, nil))), and so on. However, pair(nil, x) is a list only if x is nil. In other words, a list is a tree on which successive calls of second yield a nil-terminated sequence of lists.

These restricted trees appear in so many contexts that Lisp has an abbreviated notion for them: (x y z) is an abbreviation for pair(x, pair(y, pair(z, nil))). If we view a list in this way, we can see that there is a natural way of talking about its length — so we can try our hand at writing a function that computes the length of a list:

 
     int length(x) {
           int n = 0;
           while (x != nil) {
                
                x = second(x);
                ++n;
           }
           return n;
     }
 

This isn't quite a C++ program because of x's dynamic typing, but you should get the idea. If x is nil, we return 0; otherwise, we follow the second elements as far as we can and count how many times we did so.

Oops! This program violates our original assumption that all values are immutable. Let's try again:

 
     int length(x) {
           if (x == nil)
                return 0;
           return length(second(x)) + 1;
     }
 

I find this function easier to understand than the earlier one, despite the recursion, because it is easy to verify that its results are correct. The length of nil is obviously zero, and the length of a non-nil list is equally obviously one more than the length of its second part. However, this simplicity comes at a cost: The function makes a recursive call for each element of the list.

We can remove that cost by rewriting the function to use tail recursion:

 
     int length_aux(x, n) {
           if (x == nil)
                return n;
           return length_aux(second(x), n + 1);
     }
 
     int length(x) { return length_aux(x, 0); }

Readers must decide for themselves whether this rewrite is easier to understand than the original iterative version.

This example should start to give a feel for what it's like to program with immutable lists. Next week, I'm going to show how to implement two algorithms that are part of the C++ standard library: sort and reverse. Between now and then, I invite you to think about how to do so.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Comments:

ubm_techweb_disqus_sso_-e45861e6d5eff9bcfc725427b05bbac6
2013-10-10T19:04:05

This is starting to look more and more like template meta-programming (i.e., a "pure functional" language). I generally still find it hard to see the real practical advantages of that style, at least, in a pure form. I do enjoy the "puzzles" that have to be solved to make it work, but that's "play" not "work". And if you like pure-functional style puzzles, just take a look at the Boost.MPL (a compile-time pure-functional implementation of almost the entire standard library). And doing a lot of template meta-programming requires thinking in a "pure-functional" way, with tail-recursions and all that, so, it's an important skill, for sure.

Meta-programs are "executed" as part of an AST construction, which is immutable (or non-destructive) by nature, thus, leading to a pure functional language. And using "immutable" data structures can clearly be more efficient in certain domains, as a kind of copy-on-write on steroid, i.e., you sort of have the three levels: "copy everywhere, write freely"; "copy-on-write"; and, "never write, rarely need to copy". But this really just covers the cases where a pure-functional style is imposed, or where it is reasonable and effective to use it.

But I simply cannot get passed the amount of implied under-the-hood overhead of this programming style. And using simple lists of integers in your examples doesn't help getting past the feeling that it's an insanely inefficient way to do things. It's clearly only suitable for very high-level tasks, and even then, I'm not sure it's really easier to write or maintain than the more classical schemes such as thin resource-holding classes with move-semantics. In usual C++ code, even though it looks like a lot of copying and overwriting, it is actually just light-weight tokens being tossed around, nearly always referring to immutable resources. I guess I still think that if immutability is preferable for performance reasons, then it is easier (on the programmer) to hide away that immutability (behind move-semantics, lazy evaluations and COW), as opposed to exposing it and imposing it. Call me conservative, if you want.

And in practice, it is also hard for pure-functional style to co-exist with imperative styles, because their rules are so different. I wouldn't mind having/using a pure-functional DSEL for C++, but it would have to be very distinctive from normal C++.


Permalink
VSChawathe
2013-10-10T16:41:07

The only discomfort from tail recursion that I feel is because data structures books made me habituated to recursion without particularly emphasizing tail recursion. Such discomfort breaks bad habits I've accumulated, so I'm grateful for your insight. Thank you.


Permalink


Video