Addresses and Nodes: Two Ways To Get Around
Last week, I noted that programs that avoid address arithmetic can also avoid moving data. The reason is that it is only arithmetic that really cares about where in memory an object is; in other circumstances, having a pointer to that memory is enough. Therefore, if we refer to memory only through a pointer to that memory, we don't actually care about the contents of that pointer. In particular, avoiding address arithmetic gives us a completely different way of thinking of one object as being "next to" another — instead of the two objects occupying adjacent memory locations, one of them is located in a place that is easy to reach from the other by following a pointer.
This alternative notion of "next to" is what distinguishes random-access iterators from other iterator categories in C++. Without using random-access iterators, there is no way to navigate instantly between two arbitrary places in a container. Instead, one must move from one element to the one next to it, then to the one next to that, and so on. On the other hand, avoiding address arithmetic also makes the notion of position in a sequence much more flexible.
As an example of this flexibility, consider the insert
member of C++ standard containers. If c
is a container, p
is an iterator that refers to a position in that container, and x
is a value that can stand as an element of that container, then
c.insert(p, x);
creates a new element in c
that contains a copy of x
. This element is inserted into the container ahead of the position p
. If the container uses address arithmetic to determine its elements' positions, then the newly created element must take the place of the element to which p
formerly referred (assuming that p
is not an off-the-end iterator). That element must be moved to the successor position of its old location, and all the elements after it must similarly be shifted. If, on the other hand, the container does not use address arithmetic, then the library can generally arrange for the new element to appear at the right place in sequence merely by manipulating pointers.
A container that uses pointers to locate its elements is sometimes called a node-based container. The word node normally carries the idea that the various nodes that constitute such a data structure are independent of each other. In particular, they can be inserted, deleted, and reconnected without ever moving the nodes themselves or changing their contents apart from the parts of the nodes that might be used to locate other nodes. We can think of nodes, connected by pointers, as an alternative to arrays, the elements of which are connected by the numbers that constitute their addresses.
These alternatives — nodes and arrays — go back to the dawn of programming languages. Four of the earliest programming languages were Cobol, Fortran, Algol, and Lisp. Cobol, Fortran, and Algol all used data structures based on arrays; Lisp used data structures based on nodes that were connected by pointers.
Notice that of these four early languages, three were array-based. To varying degrees, all three of them were also based on a desire to make computer hardware facilities easier for people to use. Lisp's philosophy was distinctly different: Its purpose was to use a computer to implement a particular mathematical abstraction. As a result, the people who designed Lisp didn't care nearly as much about how closely their language mirrored an actual computer's behavior.
Next week, I'm going to discuss some fundamental ways in which Lisp differed from the other three languages as a result of this difference in philosophy, after which I will trace some of these differences all the way to present-day languages.