# Asymmetric Bounds, Part 3: Generic Programming and First Principles

Last week , I said I'd talk about how to allow index variables to stray outside their bounds safely. The ability to do so is intimately bound up with the data structures that those index variables represent.

Consider a data structure `S`

that represents a sequence of values, together with an index `i`

that represents either an element of that sequence or an out-of-bounds value. If `low`

and `high`

are the lower and upper bounds on `i`

, then we might think that we can verify that `i`

refers to an element of `S`

by checking whether `low<i<high`

, where one or both of the < might be <= depending on whether the bounds are inclusive or exclusive.

When we get into the realm of generic programming, however, such tests fail — because there is no reason to believe that the < operation is defined at all on the values that we use to represent sequence elements. As a result, generic programs avoid using order relations on indices or iterators where they can, preferring instead to use equality or inequality comparisons.

Let's go back to the question of how to represent ranges, keeping in mind that programs are apt to use equality comparisons rather than ordering comparisons on those representations. What do we know so far?

- We would like to be able to use two values of an appropriate type to represent a range of elements.
- If a value represents an element, we need a way of accessing the element that corresponds to the value.
- The sequence might be empty; therefore, it is necessary to be able to have a value that does not represent an element.

The most obvious way to represent a range is to note its first and last elements. This is also the way in which people usually refer to ranges colloquially: For example, we say "1 through 10" to refer to the integers 1, 2, … , 9, 10. If we represent a range symmetrically, we see that both bounds are the same when the range has a single element. In other words, the range "5 through 5" has just one element, 5. Moreover, unless the range is empty, the beginning and end are both elements; so there is no need to use any values that do not refer to actual elements.

The trouble with symmetric ranges shows up when we try to represent an empty range. Doing so requires at least one value to fail to denote an element, because there are no elements to denote. Moreover, the two values that denote the range cannot be equal to each other, because that equality would denote a range with one element. Therefore, if we use symmetric bounds, we need to be able to represent *two* distinct non-element values. Moreover, this need comes into play in the special case where the range is empty.

We have already seen that we need at least one non-element value, because there is no way to represent an element in a range that has no elements. What if we use two instances of that value to represent an empty range?

The answer, of course, is that we get the exact asymmetric-bound scheme that C++ uses: Every range, including the empty range, is represented by a pair of values. If the values are equal, the range is empty; otherwise, the first value represents the first element in the range, and the second value represents one past the last element. This technique allows the well-known loop form:

iter = begin; while (iter != end) { // Process the element denoted by iter ++iter; }

If you try to write a corresponding loop for symmetric bounds, I think you will see why asymmetric bounds are to be preferred. In addition, they have two other nice properties:

- The lower and upper bounds are equal if and only if the range is empty.
- If it is possible to do so, subtracting the lower bound from the upper yields the number of elements.

I believe that representing ranges in this way generally makes programming easier. It is only when the asymmetric-bound technique conflicts with colloquial usage that there is any difficulty.