Asymmetric Bounds, Part 4: Pragmatic Advantages
People who come to C++ (or C) from languages such as Fortran or Basic that normally begin array indices with 1 sometimes argue that the C++ way of doing things is counterintuitive. For example, if a 42-element array has indices 1 through 42, the argument is that it is easier to tell that the array has 42 elements than it is if the array has indices 0 through 41.
This argument makes sense if you accept that "1 through 42" is the usual way of describing the range of permissible indices — and you also accept that 1 is a reasonable value for the smallest possible index. However, as soon as you admit that the first element of an array might not be element number 1, this simplicity evaporates.
For example, suppose you have a bar graph with an x-axis that represents each of the years 1986 through 2008, inclusive. How many bars are there? Subtract 1986 from 2008 and you get 22, so the answer is probably 21, 22, or 23. But without pausing for thought, do you know for sure which of these alternatives is the right one? I didn't think so.
This particular uncertainty comes from using symmetric bounds. If instead I had said that the first year in the interval was 1986 and the first year beyond the interval was 2009, then you could subtract 1986 from 2009 to get 23, and know immediately that the bar graph would have 23 bars. This observation gives us a useful way to think about asymmetric bounds in general: Instead of representing the first and last elements of a range, we can think of the range as being followed immediately by another range, and we represent the first element of each of those ranges.
It is easier to see the utility of representing only the beginnings of ranges if you think about how you would represent several consecutive ranges at once. If the ranges are consecutive, there is no need to store the beginning and end of each range separately: The end of each range except the last is just before the beginning of the one that follows it. The only wrinkle is that representing ranges this way uses asymmetric bounds for all but the last range. As a result, almost anything you might do with such ranges becomes simpler if you represent the last range in the same way.
As an example, suppose you want to convert numeric grades to letter grades. You might say that A is 90-100, B is 80-89, and so on. Or you might just write down the following:
A begins at 90 B begins at 80 C begins at 70 D begins at 60 F is anything below that.
This table has the same information as the explicit ranges, but it's much easier to manage than the explicit, inclusive ranges. Moreover, it's more honest. For example, if an A starts at 90 and a B ends at 89, what do you do with a grade of 89.5? You might argue that the grades have to be integers, so that condition is impossible; but that argument fails if someone comes back later and changes the code. If you don't believe that the first definition could lead to a crash when a student gets a grade of 89.5, you haven't been programming long enough.
Once we adopt the habit of representing only beginnings of ranges, and treating what comes immediately after a range as if it were another range, a lot of computations become much easier. For example, it becomes obvious that an empty range is equivalent to using the same value for the beginning and the end. It takes only a little more thought to realize that the number of elements in the range must be the difference between the beginning and the end. Moreover, you no longer have to worry about whether to use < or <= to end a loop; just use != (or ==) and be done with it. And of course the fact that you don't have to think about ordering comparisons makes it possible to use iterators that don't support + or - at all.