Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

Much Ado About Nothing: Why Off-The-End Pointers Are Necessary

May 16, 2014

I pointed out last week that C and C++ use pointers to locate array elements. I would like to continue by explaining why it is important to be able to form pointers that "point" just past the end of an array.

On the surface, one might suppose that there is no need for such pointers. Both C and C++ require that every array have at least one element, which means that every pointer into an array has an element to which it can point. However, that supposition turns out to have serious problems.

Let's start with a simple example: an array named array with N elements of type T. Suppose we want to do something with every element of that array. We might write a loop along the following lines:

     T *p = &array[0];
     while (p <= &array[N-1]) {
           /* Do something with the element to which p points */

Do you see the problem? The last time through the loop, p starts out pointing at array[N-1], but then when ++p is executed, p winds up pointing off the end of the array. If p starts out pointing at array[0], then clearly we can increment p no more than N-1 times before p becomes invalid. If p is incremented each time through a loop, that loop must execute no more than N-1 times. If we work on a single array element each time through the loop, we have to handle one element outside the loop in order to get them all:

     T *p = &array[0];
     while (p < &array[N-1]) {  /* < instead of <= */
           /* Do something with the element to which p points */
     /* Now p points to the last element of array, so we can handle that element separately. */

This loop works, but at the cost of duplicating the code that processes an array element. To avoid the code duplication, we might rewrite the loop to arrange not to increment p the last time through:

     T *p = &array[0];
     while (1) {
           /* Do something with the element to which p points */
                    if (p == &array[N-1])

Like the last version, this loop works. Moreover, it avoids code duplication. Therefore, at least one can imagine that C or C++ might have been made to work without an off-the-end pointer. However, I invite you to consider the social implications of such a language.

For example, suppose there was a choice between writing any of the three loops shown here. Which alternative do you think most programmers would choose? I think it would be the first one, because it looks the most like a loop that one might write in order to use indices rather than pointers. Suppose a programmer writes the first of these examples, then compiles and executes it. Does it work?

In a sense, the answer is obviously no, because we are supposing that the language is defined in a way that prohibits this first example. However, if the compiler does address arithmetic in the most straightforward way, the code will usually appear to work — unless the compiler explicitly generates code that checks pointers for bounds errors. In other words, even if the code is broken de jure, it will probably work de facto — which means that many programmers will come to expect it to work. As I pointed out eight weeks ago, once programmers come to expect a particular kind of behavior from an implementation, that behavior becomes very difficult to change — even if there is no formal justification for it. In short, even if it might be possible to get along without off-the-end pointers, their existence makes even very simple programs easier to write.

If we accept that off-the-end pointers are necessary, the logical next step is to figure out what characteristics they ought to have. I'll talk about those characteristics next week.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.