Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

Asymmetric Bounds, Part 3: Generic Programming and First Principles

June 20, 2012

Last week , I said I'd talk about how to allow index variables to stray outside their bounds safely. The ability to do so is intimately bound up with the data structures that those index variables represent.

Consider a data structure S that represents a sequence of values, together with an index i that represents either an element of that sequence or an out-of-bounds value. If low and high are the lower and upper bounds on i, then we might think that we can verify that i refers to an element of S by checking whether low<i<high, where one or both of the < might be <= depending on whether the bounds are inclusive or exclusive.

When we get into the realm of generic programming, however, such tests fail — because there is no reason to believe that the < operation is defined at all on the values that we use to represent sequence elements. As a result, generic programs avoid using order relations on indices or iterators where they can, preferring instead to use equality or inequality comparisons.

Let's go back to the question of how to represent ranges, keeping in mind that programs are apt to use equality comparisons rather than ordering comparisons on those representations. What do we know so far?

  • We would like to be able to use two values of an appropriate type to represent a range of elements.
  • If a value represents an element, we need a way of accessing the element that corresponds to the value.
  • The sequence might be empty; therefore, it is necessary to be able to have a value that does not represent an element.

The most obvious way to represent a range is to note its first and last elements. This is also the way in which people usually refer to ranges colloquially: For example, we say "1 through 10" to refer to the integers 1, 2, … , 9, 10. If we represent a range symmetrically, we see that both bounds are the same when the range has a single element. In other words, the range "5 through 5" has just one element, 5. Moreover, unless the range is empty, the beginning and end are both elements; so there is no need to use any values that do not refer to actual elements.

The trouble with symmetric ranges shows up when we try to represent an empty range. Doing so requires at least one value to fail to denote an element, because there are no elements to denote. Moreover, the two values that denote the range cannot be equal to each other, because that equality would denote a range with one element. Therefore, if we use symmetric bounds, we need to be able to represent two distinct non-element values. Moreover, this need comes into play in the special case where the range is empty.

We have already seen that we need at least one non-element value, because there is no way to represent an element in a range that has no elements. What if we use two instances of that value to represent an empty range?

The answer, of course, is that we get the exact asymmetric-bound scheme that C++ uses: Every range, including the empty range, is represented by a pair of values. If the values are equal, the range is empty; otherwise, the first value represents the first element in the range, and the second value represents one past the last element. This technique allows the well-known loop form:

 
     iter = begin;
     while (iter != end) {
           // Process the element denoted by iter
          ++iter;
     }

If you try to write a corresponding loop for symmetric bounds, I think you will see why asymmetric bounds are to be preferred. In addition, they have two other nice properties:

  • The lower and upper bounds are equal if and only if the range is empty.
  • If it is possible to do so, subtracting the lower bound from the upper yields the number of elements.

I believe that representing ranges in this way generally makes programming easier. It is only when the asymmetric-bound technique conflicts with colloquial usage that there is any difficulty.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video