What Does It Mean To Change An Object?
Some of the reader comments about last week's note suggest a question of definition. For example, one of the comments eventually leads to an example that includes the following code:
template< class Item > inline auto concat( NcArray<Item> a, Item const v ) -> NcArray<Item> { a.append( movable( v ) ); return movable( a ); }
This code appears as part of the definition of a class named NcArray
, which is intended to represent an array is never copied, but still supports operations, such as the function concat
defined here, that appears to create a new object that contains a copy of the original object's elements.
I say appears in the previous sentence because what really happens is the result of concat
contains the contents of its argument augmented by concatenating a new element, and has the side effect that the argument is no longer usable. In other words, if I have an NcArray
object named foo
, and an element named x
, and I write
auto bar = concat(foo, x);
then when I'm done, the value of bar
is an NcArray
whose elements are all of the elements in foo
followed by a copy of x
, and as a side effect, I must promise that I will never again use the value of foo
.
The question, then, is whether this call of concat
modifies foo
. We might argue that the definition of concat
requires its users to write their code in ways that work properly regardless of whether concat
modifies foo
. If a program never uses a variable again after passing it as the first argument of concat
, it meets this requirement.
With this notion in mind, let's look again at the collatz_vec
function I showed two weeks ago. This function executes the statement
result.push_back(n);
repeatedly in a loop, where result
is a vector<unsigned>
. Using the technique that I described in that article, we could define a tail-recursive auxiliary function something like
vector<unsigned> collatz_vec_aux(unsigned n, unsigned count, vector<unsigned> result)
and then replace the recursive calls in collatz_aux
by something like
return collatz_vec_aux(n / 2, count + 1, concat(result, n));
Here, concat
would be a function written along the lines of the first example in this article, which would give the appearance of creating a new vector
containing the elements of result
along with a new element at the end, but which would really recycle the vector
's previous contents instead of copying them.
This technique would work for this particular example. However, it does not meet what I think is the definition of a data structure that never changes after being created. For such a data structure, I think it is reasonable to insist that after executing
auto bar = concat(foo, x);
the value of foo
must be whatever it was before, and there should be no prohibition against looking at that value.
I had thought until recently that such a data structure must be impossible if it were based on contiguous data structures such as arrays or vectors — and, in a sense, I still think it is impossible. However, one can almost get there through cheating. Here's how.
Consider the following simple data structure, each instance of which represents an initial segment of a vector:
template<class T> struct Thing { typename vector<T>::size_type size; shared_ptr<vector<T>> p; };
We can imagine implementing a nondestructive concat
function along these lines:
template<class T> Thing<T> concat(const Thing<T>& t, T n) { Thing<T> result = t; ++result.size; result.p->push_back(n); }
This code would affect the contents of the vector
associated with a Thing
, but because its size
element would remain unchanged, the change to the vector
would be invisible. Accordingly, after
auto bar = concat(foo, x);
it would still be possible to access the value of foo
.
The reason I said one could almost get there is that the following example fails:
auto bar = concat(foo, x); auto baz = concat(foo, y);
Here, we would like baz
to hold the (original) elements of foo
followed by y
, but bar
has left the underlying vector
in a state that renders this strategy impossible. The solution is to make concat
check whether the Thing
's size matches the vector
's size; if it doesn't, it is necessary to copy the vector
. This strategy allows well-behaved programs such as collatz_vec
to run efficiently, while still delivering semantics — at a price — that preserve the value of concat
's argument. We can see this price: Defining bar
above takes O(1) time, but defining baz
takes O(n) time, where n is the number of elements in foo
.
I think that these examples show a fundamental limitation of array-based data structures: There is only one place where "the next element" in such a data structure can go. As a result, if you want to be able to augment a data structure in a uniformly efficient way, it seems to be necessary to use a node-based structure rather than an array-based one.