The Hazards of Remembering Positions in Vectors
Last week, I observed that vectors, unlike other library classes, get reallocated from time to time, and reallocating a vector invalidates all references to elements of the vector. Because vectors behave this way, it is problematic to use a pointer or iterator to remember a location in the vector. For example, if v
is a vector, and we execute
auto it = find(v.begin(), v.end(), x); v.push_back(y);
then the call to push_back
might reallocate v
. If push_back
does so, it
will no longer be valid. This behavior implies that you should usually treat pointers, references, or iterators that refer to vector elements as being strictly temporary.
Yet it is sometimes important to keep track of a location in a vector. Probably the most useful way of doing so is by using an index. If you already have an iterator, as in the example we just saw, you can bracket any operation that might reallocate the vector by statements that convert the iterator to an index and back:
auto it = find(v.begin(), v.end(), x); auto index = it – v.begin(); v.push_back(y); it = v.begin() + index;
This technique of converting an iterator to an index and back is a bit of a nuisance. Moreover, it doesn't work for lists, which means that code that uses this technique is not generic. At first glance, it may be surprising that programmers don't run into this problem very often.
One reason that it's not a larger problem is the programs that use vectors do not need to remember locations within those vectors. For example, countless textbooks and introductory articles have contained code along the following lines:
vector<Thing> v; Thing thing; while (cin >> thing) { v.push_back(thing); } sort(v.begin(), v.end()); // and so on
At no point does this code remember any specific positions within the vector. First, it appends elements using push_back
; then, it sorts the vector. It is likely that subsequent code will either write the contents of the sorted vector or use it as a subject for binary searches or other algorithms. It may well be that once the vector has been filled — once the input file has reached its end — the size of the vector will not change again throughout its lifetime.
In fact, this code avoids problems in remembering vector indices in two ways. The obvious one is that it doesn't use them. More subtly, however, it follows a pattern common to many programs that use vectors:
- First, it builds up a vector one element at a time, typically by appending to it.
- Once the vector has been built, subsequent operations do not increase its size.
If you don't increase a vector's size, you don't reallocate it — in which case, pointers, references, and iterators that refer to elements will remain valid. This fact suggests a rule of thumb:
Don't remember pointers, references, or iterators to vector elements until you're done putting new elements in the vector.
Not every program can follow this rule, but many of them can.
Next week, we'll see how this rule makes life more complicated for the parts of the standard library that manipulate vectors.