Copying Container Elements From The C++ Library: It's Trickier Than It Looks
Last week, I pointed out that putting a new element in a vector might reallocate the vector. As a result of this behavior, we must generally assume that adding elements to a vector will invalidate every reference, pointer, or iterator that refers to a vector element.
- The Role of the WAN in Your Hybrid Cloud
- The People Problem: Cyber Threats Aren't Just a Technology Challenge
With this assumption in mind, let's assume that
t is a
Thing and that
v is a
vector<Thing>, and consider the function call
Obviously, this call appends a copy of
v. What is the type of
One's first thought is apt to be that the parameter should be a
Thing. However, in that case, calling
v.push_back(t) would always end up copying
t twice: first from
push_back's parameter, and then from there into
v. To avoid this double copying,
push_back is actually defined to take a reference parameter. In other words, by definition a call to
push_back does not copy the argument directly; any copying must be done explicitly within the body of
Now consider this call:
We assume that
v already has at least one element; otherwise, the mere act of evaluating
v is undefined. In this call,
push_back's parameter is a reference to
v. Like any other reference to an element of
v, this reference will become invalid if
v is reallocated. This fact leads to a requirement on the implementation of
push_back must copy its parameter before it invalidates any references to elements of
However, in order to copy its parameter,
push_back must have some place to copy it. Unless the parameter is to be copied twice, that place must be the element of
v in which the copy will eventually live. But that element does not exist until
v is reallocated! So we seemingly have a requirement that
push_back copy its parameter after reallocating
v, even though we have already seen that it must copy it before reallocating
Fortunately, there is a way around the seeming contradiction between these requirements: Unlike user code, library routines such as
push_back can know about how vectors are implemented. For example, reallocating a vector probably has four phases:
- Allocate enough memory to hold the new elements of
- Move the old elements to the new memory.
- Destroy the old elements.
- Free the old memory.
Unlike user code,
push_back can copy its parameter to the appropriate place in the new memory between phases 1 and 2. At that point in the code, all of
v's elements are still in their original places, which means that any references to those elements are still valid. So long as
push_back's parameter has been copied to its destination before phase 2 begins, there is no risk of invalidating that parameter before copying its contents.
You should be able to see from even this simple example that it is harder for the C++ library to manipulate vectors than might appear on the surface. This difficulty is yet one more reason to use the library instead of trying to reimplement it.
In fact, this discussion has carefully concealed yet another complexity, which nevertheless peeked out in a single word in phase 2 above. Notice that I said that
push_back must move the old elements to the new memory, rather than copying them. This distinction is important, especially where runtime efficiency is concerned. We'll begin exploring the idea of moving data next week.