Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

A Simple, Immutable, Node-Based Data Structure

October 02, 2013

We continue the discussion of programming without mutable data by looking at what operations make sense for such data structures. For example, last week I noted that array-like data structures aren't terribly useful in a world that doesn't allow data to change, because it's hard to implement even such simple operations as appending to an array efficiently. The difficulty is that in an environment with immutable data, you can't just append a value to an array; you have to create a new array that contains the old array along with the value that you want to append.

I suggested one way of finessing around the problem: Use clever data structures so that the old and new arrays actually share storage. In other words, if x is an array-like data structure, and you execute

 
               auto y = append(x, v);

then what is really stored is only a (single) copy of y; under the hood, x refers to the same data along with an indication that the last element of y isn't part of x. I also pointed out that this technique has limitations. In particular, after you execute

 
               auto z = append(x, w);
 

it is impossible for y and z to share storage, which means that all of the elements of x must be copied.

A reader pointed out that if the value to which you are appending is an rvalue, such as in

               auto w = append(foo(), v);

then it is possible to arrange for the compiler to reuse foo()'s memory and avoid copying its elements. Like my suggestion, this one can be useful in some circumstances and has shortcomings in others. In particular, both of these suggestions have the problem that two statements that appear to behave similarly might have dramatically different execution times, as one of them (such as the call to append(x, v) above) does not need to copy the elements of x, but the other one (i.e., the call to append(x, w)) does.

This need to copy data does not come from the fact that the user of these data structures is asking to "copy" them. A crucially important characteristic of immutable data is that there is never any need to copy it because of its contents alone; the copy always has the same value as the original. The only reason to copy elements of an array-like data structure is that the elements' location in memory is itself the result of a computation. If we never change a value after we create it, and we don't care where in memory that value is located, we don't need to copy values.

To explore this idea, consider a simplified version of the C++ standard-library pair template. Without getting too hung up on the details of specific programming languages, imagine a data structure with the following properties:

  • For any values x and y, pair(x, y) is a (single) value that contains both x and y.
  • If we define z as pair(x, y), then first(z) is x and second(z) is y.

If we were talking about C or C++ here, I would probably want to say that first(z) evaluates to a copy of x or something like that. However, in the world of immutable data, this distinction is unnecessary.

As we're not talking about C or C++, let's simplify matters still further by assuming that our data structure is dynamically typed. Then all we really need in order to make our data structure useful is one more function and one value:

  • For any value x, ispair(x) is true if and only if x is a pair.
  • The special value nil is not a pair, and is not equal to any other value.

Without going into implementation details, let me observe again that because we are dealing entirely with immutable data, the execution time for pair(x, y) does not depend on the complexity of x or y.

The data structure I've just described is the core of Lisp and its derivatives, and it has remained so over the more than 50 years that the Lisp family of languages has been in existence. Next week, I'm going to discuss how a data structure of this kind can represent both programs and data, and how it can be used to solve a wide variety of programming problems more easily than you might think.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video