Dijkstra's Example Simplified By Off-The-End Values: First Try
Last week, I described a problem that Edsger Dijkstra posed in his excellent book, A Discipline of Programming. This article continues that description; please read that first entry before continuing.
Dijkstra used a variable r
to denote the index of the last red pebble (if any), w
to denote an index one less than the first white pebble, and b
to denote an index one less than the first blue pebble. In order to avoid confusion, I am going to choose entirely new variable names in the revised version. We shall stick with Dijkstra's notion of dividing the sequence of buckets into four consecutive subsequences that are known to contain red, arbitrary, white, and blue pebbles, respectively. Where we shall part company with Dijkstra is in how we delimit them:
rb
is the index of the first (as opposed to the last) red pebble;
xb
is the index of the first unknown pebble, or, equivalently, one past the last red pebble;
wb
is the index of the first white pebble;
bb
is the index of the first blue pebble; and
N
is an index that is one past the last blue pebble.
This description is vague, in the sense that it does not cater to the possibility of one or more of these regions being empty. However, we can make it precise:
- If
rb
<= k <xb
, bucket k contains a red pebble. - If
xb
<= k <wb
, bucket k has not yet been examined. - If
wb
<= k <bb
, bucket k contains a white pebble. - If
bb
<= k <N
, bucket k contains a blue pebble.
Where Dijkstra was striving to be symmetric, we are being uniformly asymmetric. We have also abandoned Dijkstra's use of 1
as a lower bound, preferring instead to use rb
to index the first bucket and N
to be one past the index of the last bucket. Because we are not constraining the lower bound, we use four variables in addition to N
where Dijkstra used three. However, one of those variables, rb
, is already initialized. We can establish the initial conditions similarly to how Dijkstra did so: Set xb
to rb
, and set both wb
and bb
to N
.
We shall follow Dijkstra's strategy of repeatedly examining the last unknown element to determine where to place it. Here, we encounter the first problem with our asymmetric notation: No variable contains the index of that last element. Rather, that element, if it exists, is element wb-1
. Accordingly, our code looks like this:
while (xb < wb) { Examine bucket wb-1. If the pebble in that bucket is Red: Swap buckets xb and wb-1; increment xb. White: Decrement wb. Blue: Swap buckets wb-1 and bb-1; decrement wb and bb. }
This code cries out for refactoring: It evaluates wb-1
in three different places and decrements wb
in two out of three cases in the three-way test. If we do not wish to introduce an additional variable, we can nevertheless rewrite the code by decrementing wb before examining any bucket:
while (xb < wb) { Decrement wb. Examine bucket wb. If the pebble in that bucket is Red: Swap buckets xb and wb; increment xb and wb. White: Do nothing. Blue: Swap buckets wb-1 and bb-1; decrement bb. }
I find this code easier to understand than Dijkstra's version. However, it still has two shortcomings:
- It decrements
wb
and then sometimes increments it again. Alternatively, it could have savedwb
in an auxiliary variable and restored it in the case that a red pebble was seen, but that strategy seems like the waste of a variable. - Even though we have a variable, namely
xb
, that we know contains the index of a bucket to examine, we do what seems like a gratuitous computation instead of using that variable for that purpose.
Can we do better? The advantage of examining the last unknown pebble rather than the first is that we never need to swap the pebble over more than one boundary between regions. If we looked at the first pebble, and it happened to be blue, we would have to swap it over all the unknown pebbles and all the white pebbles in order to put it in the right spot. Moreover, we would have to do something with the white pebble that might already be there, perhaps requiring two swaps.
However, we can get around this objection by changing the order of the four regions: Instead of putting the unknown region second, we can put it third. We'll try that alternative next week.