# Moving Data and Address Arithmetic

August 21, 2013

Last week, I observed that an important reason for moving data in C++ programs, rather than copying it, is to be able to implement types such as `vector` efficiently. Such types use the fact that a computer's memory addresses are numbers, and those numbers can be the results of computation. For example, the C++ standard guarantees that all of the elements of a `vector` are stored in contiguous memory locations. This property makes it possible to locate any `vector` element quickly from its index once the program knows the location and index of any other element.

If a data structure uses address arithmetic to determine the location of its data, then from time to time programs will almost surely find it necessary to put data at a particular place in memory — namely, at the address that is the result of a particular computation. If that location is already occupied, then the program must move data around so that that location is no longer occupied. For example, if a `vector` is currently stored in n contiguous chunks of memory, and the program wants to append an n+1st element to the `vector`, that element must wind up in the memory immediately following the nth element. If something else occupies that, then either that something else must move, or the program must reallocate the entire `vector`.

A program can often avoid address arithmetic by using pointers. Instead of storing data in contiguous memory, such a program typically stores the address of each piece of data along with one or more of its neighbors. It is possible to implement such a data structure without caring at all about the memory addresses at which its elements are stored. However, storing an element's address only with its neighbors often implies that there is no quick way to jump from one place in the data structure to another place that is not a neighbor. In an odd reversal of interface and implementation, data structures that store their elements consecutively make random access easier, while data structures that store their elements in random locations support only sequential access.

Programs that don't care about the addresses of their data rarely need to move those data. After all, we can view the whole point of moving an object as a way of changing the object's address without changing its contents; if we don't care what the address is, we don't need to change it. As a result, C++ library data structures such as `list` do not need to be able to move their elements once those elements have been created.

The tradeoff between moving data to appropriate addresses and using pointers to keep track of data in arbitrary locations is one of the fundamental tradeoffs that influences programmers' choice of data structures. The C++ library offers two extremes and a compromise, which, like most compromises, combines advantages and disadvantages of both of the extremes.

One extreme is `vector`. All of a `vector`'s elements are always contiguous, so accessing the nth element takes the same amount of time for all values of n. Maintaining this contiguous data structure requires reallocating all of the `vector`'s elements whenever there is no room for a new element.

The other extreme is `list`. You can think of each element of a `list` as being totally independent from all others and coupled with pointers to its immediate neighbors. As a result, the library never needs to move `list` elements once they are created, not even to append new elements to a `list`. On the other hand, locating the nth element of a list typically requires n pointer operations. Moreover, each element has at least a couple of pointers' worth of space overhead, which implies that `list`s can waste quite a bit of memory unless the `list` elements themselves are large.

The compromise data structure, `deque`, often gets glossed over in descriptions of the C++ library. It is typically implemented as a collection of chunks, which are `vector`-like data structures, together with an auxiliary data structure that makes it possible to find the right chunk quickly. A `deque` is almost surely harder to implement than either `list` or `vector`. However, from a user's viewpoint, it combines some nice properties of the two data structures.

For example, as with `list`, you can append elements to a `deque` at either end, not just at one end. Moreover, doing so does not require reallocating the `deque`'s already existing elements. As with `vector`, you can access a `deque`'s elements efficiently in random order. Moreover, `deque` does not have nearly the space overhead per element that `list` does.

Because of these properties, some writers have suggested using `deque` whenever there is no obvious reason to choose a different data structure. This advice leads naturally to the question of what some of those reasons might be.

One such reason is that like `vector` and unlike `list`, `deque` does not support fast insertion and deletion in the middle. Indeed, although `list` never requires moving elements around, `deque` will do so for insertions and deletions anywhere but at the ends of the structure. Compared with `vector`, a `deque` will typically have more overhead for `vector`-like operations such as incrementing and decrementing iterators, accessing elements, and so on. Moreover, in addition to fast insertion and deletion, `list` has specialized operations for transferring elements from one `list` object to another without moving the elements themselves. Because `deque` uses address arithmetic to support random access, it sometimes needs to move elements to other memory locations, something that `list` never needs to do.

Because moving data is so tied up with address arithmetic, programs that avoid address arithmetic can often avoid moving data. In fact, it is possible to write programs that never change the contents of memory once they have put something there in the first place. Such programs can have surprising properties: Programmers have to jump through some unusual hoops in order to write them, but there are some kinds of reliability and security problems that become much less likely. Next week, I'll continue this discussion with some examples of such programs and programming techniques.

### More Insights

 To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.