Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

What Does It Mean To Change An Object?

September 26, 2013

Some of the reader comments about last week's note suggest a question of definition. For example, one of the comments eventually leads to an example that includes the following code:

 
template< class Item >
inline auto concat( NcArray<Item> a, Item const v )  ->
NcArray<Item>
{
        a.append( movable( v ) );
        return movable( a );
}

This code appears as part of the definition of a class named NcArray, which is intended to represent an array is never copied, but still supports operations, such as the function concat defined here, that appears to create a new object that contains a copy of the original object's elements.

I say appears in the previous sentence because what really happens is the result of concat contains the contents of its argument augmented by concatenating a new element, and has the side effect that the argument is no longer usable. In other words, if I have an NcArray object named foo, and an element named x, and I write

 
               auto bar = concat(foo, x);

then when I'm done, the value of bar is an NcArray whose elements are all of the elements in foo followed by a copy of x, and as a side effect, I must promise that I will never again use the value of foo.

The question, then, is whether this call of concat modifies foo. We might argue that the definition of concat requires its users to write their code in ways that work properly regardless of whether concat modifies foo. If a program never uses a variable again after passing it as the first argument of concat, it meets this requirement.

With this notion in mind, let's look again at the collatz_vec function I showed two weeks ago. This function executes the statement

 
               result.push_back(n);

repeatedly in a loop, where result is a vector<unsigned>. Using the technique that I described in that article, we could define a tail-recursive auxiliary function something like

 
               vector<unsigned> collatz_vec_aux(unsigned n, unsigned count, vector<unsigned> result)

and then replace the recursive calls in collatz_aux by something like

 
               return collatz_vec_aux(n / 2, count + 1, concat(result, n));

Here, concat would be a function written along the lines of the first example in this article, which would give the appearance of creating a new vector containing the elements of result along with a new element at the end, but which would really recycle the vector's previous contents instead of copying them.

This technique would work for this particular example. However, it does not meet what I think is the definition of a data structure that never changes after being created. For such a data structure, I think it is reasonable to insist that after executing

 
               auto bar = concat(foo, x);
 

the value of foo must be whatever it was before, and there should be no prohibition against looking at that value.

I had thought until recently that such a data structure must be impossible if it were based on contiguous data structures such as arrays or vectors — and, in a sense, I still think it is impossible. However, one can almost get there through cheating. Here's how.

Consider the following simple data structure, each instance of which represents an initial segment of a vector:

 
     template<class T> struct Thing {
           typename vector<T>::size_type size;
           shared_ptr<vector<T>> p;
     };
 

We can imagine implementing a nondestructive concat function along these lines:

 
template<class T> Thing<T> concat(const Thing<T>& t, T n)

{
           
           Thing<T> result = t;
           ++result.size;
           result.p->push_back(n);
     }

This code would affect the contents of the vector associated with a Thing, but because its size element would remain unchanged, the change to the vector would be invisible. Accordingly, after

 
               auto bar = concat(foo, x);

it would still be possible to access the value of foo.

The reason I said one could almost get there is that the following example fails:

 
     auto bar = concat(foo, x);
     auto baz = concat(foo, y);

Here, we would like baz to hold the (original) elements of foo followed by y, but bar has left the underlying vector in a state that renders this strategy impossible. The solution is to make concat check whether the Thing's size matches the vector's size; if it doesn't, it is necessary to copy the vector. This strategy allows well-behaved programs such as collatz_vec to run efficiently, while still delivering semantics — at a price — that preserve the value of concat's argument. We can see this price: Defining bar above takes O(1) time, but defining baz takes O(n) time, where n is the number of elements in foo.

I think that these examples show a fundamental limitation of array-based data structures: There is only one place where "the next element" in such a data structure can go. As a result, if you want to be able to augment a data structure in a uniformly efficient way, it seems to be necessary to use a node-based structure rather than an array-based one.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Comments:

ubm_techweb_disqus_sso_-826ad3d60cb32366c17900d34c93db85
2013-09-27T13:03:18

Alfps is completely right that I missed the point of his example. Having used C++ for nearly 30 years, I momentarily forgot that its semantics had changed in the last two years in a way that affected his particular example but not the point I had been trying to make. Specifically his concat function has the useful property that if you call concat(foo, x) and foo is a variable, as I wrote in my article, then foo will be copied. If, on the other hand, you give an rvalue as concat's first argument, then the compiler will realize that the memory associated with that argument is going away, and recycle the vector rather than copying it.

That's a useful optimization. What it achieves is to make it possible to write code that appends elements to an array in a way that appears superficially to be copying the array many times, while still allowing the compiler to avoid doing do. This optimization is both useful and clever.

However, although it works on the particular code examples I gave (which I picked out of a desire to make them simple enough that people would bother reading them), it did so only because they all involved appending values successively to an array. This special case does not make the general case any easier: Because arrays are generally contiguous in memory, it is not easy to arrange to append to them while still maintaining reasonable efficiency.

Please look again at my last example from the article:

auto bar = concat(foo, x);
auto baz = concat(foo, y);

The point I am trying to make, and which I think does not depend on implementation details, is that the elements of bar and baz cannot occupy the same memory if bar and baz are implemented as arrays. For if they did, the copies of x and y would have to be in the same memory location even though they have different values. As a result, at least one of the calls to concat MUST copy its argument, regardless of whether there is any optimization or coding technique that makes one of those copies unnecessary.


Permalink
ubm_techweb_disqus_sso_-3f93c7557cf656fff40964309589760f
2013-09-27T07:59:59

This article claims that the call "concat(foo, x);", with an lvalue actual argument for a pass by value formal argument, changes the supplied actual argument.

On the surface that's just absurd.

Anyway it's incorrect, not reality.

I find equally absurd the author's comment on his previous DDJ blog article where he, reasoning by analogy, compares the purely value/expression-oriented concat() to an update operation like *=.

If the author were not a world class authority on C++ I would think that perhaps the author is not yet familar with C++11 std::move and move constructors. For that could make it possible to confuse the effect of a C++11 move with the implementation of an operation like concat, to somehow think that concat(), or perchance the array implementation?, did some fishy moving under the hood. However, this explanation of the mistakes is almost as absurd as the mistakes themselves, so possibly the author has read what he EXPECTED in some code and comments -- instead of dealing with the actual details.


Permalink
ubm_techweb_disqus_sso_-2ba690347a6f88b75088c993be9b1b06
2013-09-26T17:54:18

It seems to me array based and node based could be combined to get better performance than node-only constructs. Every time the list is modified, the shared unmodified part remains in the vector and the vector in the chained node contains the concatenated values. Of course performance will drop as the complexity of the data increases but that happens already with nodes.


Permalink


Video