Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

Moving Data and Address Arithmetic

August 21, 2013

Last week, I observed that an important reason for moving data in C++ programs, rather than copying it, is to be able to implement types such as vector efficiently. Such types use the fact that a computer's memory addresses are numbers, and those numbers can be the results of computation. For example, the C++ standard guarantees that all of the elements of a vector are stored in contiguous memory locations. This property makes it possible to locate any vector element quickly from its index once the program knows the location and index of any other element.

If a data structure uses address arithmetic to determine the location of its data, then from time to time programs will almost surely find it necessary to put data at a particular place in memory — namely, at the address that is the result of a particular computation. If that location is already occupied, then the program must move data around so that that location is no longer occupied. For example, if a vector is currently stored in n contiguous chunks of memory, and the program wants to append an n+1st element to the vector, that element must wind up in the memory immediately following the nth element. If something else occupies that, then either that something else must move, or the program must reallocate the entire vector.

A program can often avoid address arithmetic by using pointers. Instead of storing data in contiguous memory, such a program typically stores the address of each piece of data along with one or more of its neighbors. It is possible to implement such a data structure without caring at all about the memory addresses at which its elements are stored. However, storing an element's address only with its neighbors often implies that there is no quick way to jump from one place in the data structure to another place that is not a neighbor. In an odd reversal of interface and implementation, data structures that store their elements consecutively make random access easier, while data structures that store their elements in random locations support only sequential access.

Programs that don't care about the addresses of their data rarely need to move those data. After all, we can view the whole point of moving an object as a way of changing the object's address without changing its contents; if we don't care what the address is, we don't need to change it. As a result, C++ library data structures such as list do not need to be able to move their elements once those elements have been created.

The tradeoff between moving data to appropriate addresses and using pointers to keep track of data in arbitrary locations is one of the fundamental tradeoffs that influences programmers' choice of data structures. The C++ library offers two extremes and a compromise, which, like most compromises, combines advantages and disadvantages of both of the extremes.

One extreme is vector. All of a vector's elements are always contiguous, so accessing the nth element takes the same amount of time for all values of n. Maintaining this contiguous data structure requires reallocating all of the vector's elements whenever there is no room for a new element.

The other extreme is list. You can think of each element of a list as being totally independent from all others and coupled with pointers to its immediate neighbors. As a result, the library never needs to move list elements once they are created, not even to append new elements to a list. On the other hand, locating the nth element of a list typically requires n pointer operations. Moreover, each element has at least a couple of pointers' worth of space overhead, which implies that lists can waste quite a bit of memory unless the list elements themselves are large.

The compromise data structure, deque, often gets glossed over in descriptions of the C++ library. It is typically implemented as a collection of chunks, which are vector-like data structures, together with an auxiliary data structure that makes it possible to find the right chunk quickly. A deque is almost surely harder to implement than either list or vector. However, from a user's viewpoint, it combines some nice properties of the two data structures.

For example, as with list, you can append elements to a deque at either end, not just at one end. Moreover, doing so does not require reallocating the deque's already existing elements. As with vector, you can access a deque's elements efficiently in random order. Moreover, deque does not have nearly the space overhead per element that list does.

Because of these properties, some writers have suggested using deque whenever there is no obvious reason to choose a different data structure. This advice leads naturally to the question of what some of those reasons might be.

One such reason is that like vector and unlike list, deque does not support fast insertion and deletion in the middle. Indeed, although list never requires moving elements around, deque will do so for insertions and deletions anywhere but at the ends of the structure. Compared with vector, a deque will typically have more overhead for vector-like operations such as incrementing and decrementing iterators, accessing elements, and so on. Moreover, in addition to fast insertion and deletion, list has specialized operations for transferring elements from one list object to another without moving the elements themselves. Because deque uses address arithmetic to support random access, it sometimes needs to move elements to other memory locations, something that list never needs to do.

Because moving data is so tied up with address arithmetic, programs that avoid address arithmetic can often avoid moving data. In fact, it is possible to write programs that never change the contents of memory once they have put something there in the first place. Such programs can have surprising properties: Programmers have to jump through some unusual hoops in order to write them, but there are some kinds of reliability and security problems that become much less likely. Next week, I'll continue this discussion with some examples of such programs and programming techniques.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Comments:

ubm_techweb_disqus_sso_-26be4e708c84c817e84abd2af4fd4ba2
2014-02-25T10:25:34

Seriously...

That's such another vague article...

"Because moving data is so tied up with address arithmetic, programs that avoid address arithmetic can often avoid moving data. In fact, it is possible to write programs that never change the contents of memory once they have put something there in the first place."

Ok so I'm using vector<string> in which I only push_back. I don't need to change the contents of memory, the vector needs to upon relocation.


Permalink


Video