# Still More Ado About Nothing: How Off-The-End Pointers Should Behave

May 21, 2014

Last week, I made the case that a programming language, such as C or C++, that allows pointers to array elements and supports arithmetic on those pointers needs a way to create a pointer past the end of an array. This line of reasoning has an important implication: It must be possible to have a pointer that is partially invalid, in the sense that the pointer itself is valid but it does not point to well-defined memory. Such a pointer is also sometimes called a sentinel, although strictly speaking, it would be more accurate to use the term to refer to the imaginary element to which an off-the-end pointer points.   The existence of off-the-end pointers raises two additional questions:

• Is it enough for the language to guarantee the validity of a pointer only to the position immediately past the end of an array, or should there be two-off-the-end pointers, three-off-the-end pointers, or even more?
• If there is always an off-the-end pointer, should there also always be an off-the-beginning pointer?

One way to see why off-the-end pointers are important is to think about a loop that touches every element of an array. If the array has n elements, that loop presumably executes n times. If there is a pointer that points to an array element, that pointer is presumably incremented each time through the loop. That means that the pointer is incremented n times — which means that the pointer must be able to take on n+1 distinct values during the loop's execution.

We can avoid this requirement by not incrementing the pointer the last (or the first) time through the loop. However, it is hard to make this strategy work properly if n is zero, because in that case, we would have to increment the pointer a negative number of times. Moreover, an array without any elements cannot have a pointer to one of its elements, because there are no elements to which the pointer can point! Accordingly, to use pointers to handle an n-element array without an off-the-end pointer, we must know that n must be greater than zero, which means that we must handle n=0 as a special case.

We might argue that both C and C++ require arrays to have at least one element, so there is no need to treat zero-element arrays as a special case. Unfortunately, this argument fails for two reasons. First, it is common to store information in part of an array and work on the part that is stored; programs that operate this way often use zero as the number of occupied elements. Second, although every explicitly declared array must have at least one element, it is possible to allocate memory dynamically, and such dynamic allocation definitely does allow for a zero-element allocation.

If we represent every sequence of elements as a pointer to the first element and an off-the-end pointer, it is easy to see how to represent an array without any elements: The "pointer to the first element" is the same as the off-the-end pointer because there is no first element. Without off-the-end pointers, it is once again hard to see how to handle an empty sequence without making it a special case.

We can now turn our attention to the first of the questions we noted earlier: Should there be more than one off-the-end pointer? Alternatively, if we define pointers to k distinct positions past the end of an array, what should k be?

I think it is a reasonable position that k should be 1, because it's hard to understand how to make use of a larger value. For example, suppose we knew that it was possible to form a pointer that pointed either one or two elements past the end of an array. What kind of code might use such a pointer? We might imagine processing array elements in pairs, supposing that two off-the-end pointers are available to us, thereby making `&array[n+1]` legitimate:

```
T *p = &array[0];
while (p < &array[n+1]) {
// Do something with the elements at *p and *(p+1)
p += 2;
}
```

Unfortunately, this code has a serious problem: What happens if `array` has an odd number of elements? The answer is that the last time through the loop, the code tries to use the array element at `*(p+1)`, and that element does not exist.

We can solve this problem in two different ways. One is to check in advance whether `array` has an odd number of elements and treat that case separately; the other is to rewrite the loop slightly so that if `array` does have an odd number of elements, the loop will disregard the "extra" element at the end:

```
T *p = &array[0];
while (p < &array[n]) {
// Do something with the elements at *p and *(p+1)
p += 2;
}
```

Here, we have changed the code to use `&array[n]` instead of `&array[n+1]` as the upper bound. This change makes no difference if `n` is even; but if `n` is odd, this loop executes one less time than did the previous one. However, this rewrite also completely removes the need for more than one off-the-end pointer.

More generally, I find it hard to understand what one might do with a pointer to more than one element past the end of an array. Perhaps someone who reads this article will have better luck.

The question about whether there should be an off-the-beginning pointer is more interesting; we shall discuss that next week.

### More Insights

 To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.