Copying Container Elements From The C++ Library: It's Trickier Than It Looks
Last week, I pointed out that putting a new element in a vector might reallocate the vector. As a result of this behavior, we must generally assume that adding elements to a vector will invalidate every reference, pointer, or iterator that refers to a vector element.
More Insights
White Papers
More >>Reports
More >>Webcasts
- Intrusion Prevention Systems: What to Look for in a Solution
- How to Transform Paper Insurance Documents into Digital Data
With this assumption in mind, let's assume that t
is a Thing
and that v
is a vector<Thing>
, and consider the function call
v.push_back(t);
Obviously, this call appends a copy of t
to v
. What is the type of push_back
's parameter?
One's first thought is apt to be that the parameter should be a Thing
. However, in that case, calling v.push_back(t)
would always end up copying t
twice: first from t
to push_back
's parameter, and then from there into v
. To avoid this double copying, push_back
is actually defined to take a reference parameter. In other words, by definition a call to push_back
does not copy the argument directly; any copying must be done explicitly within the body of push_back
.
Now consider this call:
v.push_back(v[0]);
We assume that v
already has at least one element; otherwise, the mere act of evaluating v[0]
is undefined. In this call, push_back
's parameter is a reference to v[0]
. Like any other reference to an element of v
, this reference will become invalid if v
is reallocated. This fact leads to a requirement on the implementation of push_back
:
push_back
must copy its parameter before it invalidates any references to elements of v
.
However, in order to copy its parameter, push_back
must have some place to copy it. Unless the parameter is to be copied twice, that place must be the element of v
in which the copy will eventually live. But that element does not exist until v
is reallocated! So we seemingly have a requirement that push_back
copy its parameter after reallocating v
, even though we have already seen that it must copy it before reallocating v
.
Fortunately, there is a way around the seeming contradiction between these requirements: Unlike user code, library routines such as push_back
can know about how vectors are implemented. For example, reallocating a vector probably has four phases:
- Allocate enough memory to hold the new elements of
v
. - Move the old elements to the new memory.
- Destroy the old elements.
- Free the old memory.
Unlike user code, push_back
can copy its parameter to the appropriate place in the new memory between phases 1 and 2. At that point in the code, all of v
's elements are still in their original places, which means that any references to those elements are still valid. So long as push_back
's parameter has been copied to its destination before phase 2 begins, there is no risk of invalidating that parameter before copying its contents.
You should be able to see from even this simple example that it is harder for the C++ library to manipulate vectors than might appear on the surface. This difficulty is yet one more reason to use the library instead of trying to reimplement it.
In fact, this discussion has carefully concealed yet another complexity, which nevertheless peeked out in a single word in phase 2 above. Notice that I said that push_back
must move the old elements to the new memory, rather than copying them. This distinction is important, especially where runtime efficiency is concerned. We'll begin exploring the idea of moving data next week.