Channels ▼
RSS

Design

Function Purity and Immutable Data Structure Construction


The D programming language has a slightly unusual notion of function purity — A pure function does not read or write any mutable data that is not accessible via the function's arguments. This is enforced by the compiler:

     int x;
     immutable int y;
	 
     pure int foo(int* p) {
        *p++;     // ok
        ++x;      // error
        *p = x;   // error
        *p = y;   // ok
        for (int i = 0; i < 10; i+)
            *p += i;  // ok
        return x; // error
     }

In some cases, the compiler can generate better code for pure functions, and certainly there's a benefit to improving the ability for programmers to reason about the code. This capability is well studied in the context of functional languages.

A subtler and less mentioned consequence of purity is related to the construction of objects.

D also has the notion of immutable data structures. Immutable data has the convenient property of being shareable between threads without synchronization, as well as (also) making data structures easier to reason about. In a language that offers mutable as well as immutable data, this design requires that any data reachable through an immutable reference must also be immutable. That is, " it's turtles all the way down," which is otherwise known as "transitive immutability."

That's all well and good, but how is an immutable data structure created? Consider:

     immutable int* p = new int;

The new int allocates a mutable thread-local int on the heap and returns a pointer to it. But a pointer to mutable thread-local data is not implicitly convertible to immutable (type safety would pretty much go out the window if that were allowed). It can be explicitly cast:

     immutable int* p = cast(immutable)(new int);

and that works. But casting is a blunt instrument — it overrides any safety checks by the compiler, and so the programmer must ensure that it is correct, rather than the compiler doing so. And casting is error-prone, consider:

     struct S {
       int* p;
     }
     int x;
     immutable(S)* p = cast(immutable)(new S(&x)); // S.p is set to &x

This code creates an immutable pointer to a mutable, which breaks the transitivity property of immutability. (This is why a cast to immutable is not allowed in D code marked as @safe.)

Just looking at:

     new int;

it's obvious that it is perfectly safe to convert it to immutable, as there are no other references to the created instance and no mutable references. We'd like this to be implicitly convertible to immutable without needing a cast, while not implicitly converting the unsafe construction of S. The gain of allowing such implicit conversions is not needing to resort to using unsafe and unchecked casts.

Let's start with a few observations:

  1. An element or slice of an immutable array is also immutable.
  2. null is implicitly convertible to a pointer to immutable.
  3. Values that don't contain mutable references are implicitly convertible to immutable.

Generalizing this, we can state that an expression whose components are implicitly convertible to immutable is itself implicitly convertible to immutable. In math, an expression with components can be called a function with arguments.

You can guess where this is going:

       pure int* func(int, int, int);
       ...
       immutable p = func(arg1, arg2, arg3);

If the function is pure, then its argument list comprises the entirety of the inputs to the function, and this is enforced by the compiler when it compiles the implementation of the function. If each argument is implicitly convertible to immutable, then the return value of the function is also implicitly convertible to immutable! There's no other way — the function can only produce unique newly created outputs or transformations of its inputs. Either way, the result can be frozen as immutable.

In the immortal words of my thermodynamics Professor: "Und now zat vee haf zee eqvations, vee merely turn zee crank!" So let's turn the crank and see where this inevitably leads us.

The first stop is object construction. They are constructed with (obviously) constructors.

What are constructors, really? They are functions that take two sets of arguments:

  1. The arguments to the constructor
  2. The default field initializers

And now immutable objects can be constructed without needing a so-called "immutable constructor" — all that is necessary is to have a pure constructor, constructor arguments that can be implicitly cast to immutable, and default field initializers that can be implicitly cast to immutable. For example,

     struct T {
       immutable(int)* p = null;
       this(immutable(int)* q) pure {
         p = q;
       }
     }

     void foo() {
       immutable int y;
       immutable(T) p1 = T(&y);
       immutable(T)* p2 = new T(&y);
       immutable(T)* p3 = new T(new int);
     }

struct T has a pure constructor that can be used in various ways to safely construct immutable objects. Note that memory allocation is also considered pure, and so objects can be created on the GC heap and implicitly cast to pure.

Conclusion

Arbitrarily complex immutable data structures can be constructed without needing to cast anything to immutable, that is, they are mechanically checkable by the compiler. Pure functions make this possible (along with other useful consequences of function purity), but it's not sufficient for a function to be pure — the purity must be enforceable.

Acknowledgments

Thanks to Andrei Alexandrescu for reviewing a draft of this.


Walter Bright is a contributor to Dr. Dobb's and the designer of the D programming language. He was also the main developer of the first native C++ compiler, Zortech C++ (later to become Symantec C++, now Digital Mars C++).


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video