Containers That Never Change
Last week, I introduced a programming style that never changes the value of a variable once that value has been established. In that article, I made a statement that one reader questioned:
If we try to write this program recursively using the technique we mentioned earlier, we find that each recursive call passes the entire result array as an argument. As a result, the program's runtime will be quadratic in the number of iterations — hardly a happy state of affairs.
Here's why I made this claim.
We start with the example to which I was referring last week:
vector<unsigned> collatz_vec(unsigned n) { assert(n != 0); vector<unsigned> result; while (n != 1) { result.push_back(n); if (n % 2 == 0) n /= 2; else n = 3 * n + 1; } return result; }
Next, we rewrite it as a tail-recursive function. We do so by observing that the variables that change in the loop are result
and n
, so we define an auxiliary function with those values as parameters:
vector<unsigned> collatz_aux(unsigned n, vector<unsigned> result) { if (n == 1) return result; new_result = concat(result, n); if (n % 2 == 0) return collatz_aux(n / 2, new_result); else return collatz_aux(3 * n + 1, new_result); }
We now rewrite our collatz_vec
function to call our auxiliary function:
vector<unsigned> collatz_vec(unsigned n) { assert (n != 0); return collatz_aux(n, vector<unsigned>()); }
The subexpression vector<unsigned>()
is a convenient way of constructing a vector
with no elements.
At this point, we have a problem: How do we write concat
? The obvious technique is something like this:
vector<unsigned> concat(vector<unsigned> v, unsigned n) { v.push_back(n); return v; }
This function violates the restriction we wished to impose against changing the value of a variable once such a value was established. Unfortunately, so long as we stick with vector
s, it's the best we can do. At least, by using this function, we have been able to encapsulate the idea of appending an element to a vector
to create a new vector
. However, this function copies all of the elements of v
, which means that our collatz_vec
function winds up running in quadratic time — because as result grows, all of the vector's elements must be copied on each iteration.
In short, in order to make the recursive collatz_vec
function run in a reasonable amount of time, we need some kind of vector
-like data structure that allows us to append an element in O(1) time, rather than having to copy all of the elements each time we wish to append one. Putting it more simply, the whole notion of appending rests on being able to change the data structure to which we are appending. What we want is to be able to create a new data structure that contains all of the elements of the old one, along with one additional element, and we wish to do so in O(1) time.
There are many data structures of this kind. Lisp used one of the simpler ones. Next week I'll start explaining how it works.