Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Off-The-End Pointers And While Statements

June 05, 2014

Last week, I argued that once we have off-the-end pointers in a C-like language, off-the-beginning pointers are much less useful. This is an odd kind of argument: What is it about arrays that should make us treat the beginning of an array differently than the end when are talking about ranges of array elements?

It was not always thus. For example, two of the earliest high-level programming languages, FORTRAN and Algol, referred to an array, or to a range of indices, by their first and last elements. For example, in FORTRAN, we set the elements of an N-element array to zero by writing

DO 3 K = 1, N
3    A(K) = 0

In this example, the loop body (statement 3) is executed N times. The first time through, K has the value 1; the last time through, K has the value N. In this sense, FORTRAN treats the beginning and end of the array symmetrically.

Similarly, in Algol, we would write code such as

for k := 1 step 1 until N do
a[k] := 0

Again, the variable k takes on the value 1 the first time through the loop and N the last time through.

In FORTRAN as originally defined, the first element of an array had index 1; in Algol, the programmer had to state the lower bound explicitly. In both languages, writing a loop entailed writing the value that the control variable should have the first and last times through the loop. In contrast, when we write an analogous loop in C or C++, we typically do so as

for (k = 0; k != N; ++k)
a[k] = 0;

The last time through the loop, k has the value N, not N-1. Of course, we could write

for (k = 0; k <= (N-1); ++k)
a[k] = 0;

but in general, we don't. Why not? I don't think the following argument is completely conclusive, but I find it persuasive.

In their original forms, neither FORTRAN nor Algol had a while statement, whereas in C, a for statement is nothing more than an abbreviation for a while statement with some extra decoration. Using while statements changes one's perspective subtly from using for or DO statements, because when we write a while statement, instead of stating the last value that a variable should have within a loop, we state explicitly a condition that we expect to be false after the loop finishes. So, for example, when we write

for (k = 0; k != N; ++k)
a[k] = 0;

we are really writing

k = 0;
while (k != N) {
a[k] = 0; ++k;
}

By implication, expect k != N to be false when the loop finishes, so that the value of k will be equal to N. Claiming that k will be equal to N in this way is a precise statement about what the loop should do. In contrast, if we write the loop as

for (k = 0; k <= (N-1); ++k)
a[k] = 0;

we are being more vague about the value of k after the loop finishes: It might be equal to N-1, or it might be any greater value.

In contrast, when we write

for k := 1 step 1 until N do
a[k] := 0

we are saying that we want k to take on the value N the last time through the loop, and we don't care what value k has after the loop finishes.

This shift in emphasis carries a subtle implication: When we use a while statement, we tend to express the end of a range not by describing its last element, but instead by describing the first value past that last element. In other words, using off-the-end pointers is a natural outgrowth of shifting from ranges defined explicitly by their first and last elements to ranges defined by while statements. In such statements, the loop ends when the condition is false for the first time, not when it is true for the last time. As a result, there is a natural asymmetry about using while statements that maps into an asymmetry in the values that those while statements manipulate.

And yet, there is a natural symmetry between the beginning and the end of a sequence — a symmetry that using off-the-end pointers in this way appears to violate. Surely there should be an easy way to step backward through an array simply by swapping the beginning and end of the range of elements. Is there a way in which we can unify the symmetric nature of the beginning and end of a range with the asymmetric nature of a while statement?

To be continued…

More Insights

 To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.