# Invariants for Binary Search, Part 4: Using The Improved Abstractions

*n*-element sequence.

We continue our discussion of how to implement our binary search in an I-element sequence. Last week , we defined a simple abstraction, which we called `vless(x, k)`

, that yields `true`

if

`k`

is the position of an element of the sequence that is not (strictly) less than`x`

, or`k`

is equal to`n`

(i.e.,`k`

is the off-the-end position).

This abstraction lets us simplify our code by pretending that `x`

is never so large that it must be inserted at the end of the sequence. In other words, when we ask the question "Where in the sequence can we insert `x`

without disrupting the sequence's order?" we can turn that question into "What is the smallest value of `k`

for which `vless(x, k)`

is `true`

?" with the confidence that there will always be such a value even if every element in the sequence is less than `x`

. This is an assumption that simplifies how we think about our code by eliminating a special case.

It is a useful idea to simplify code by removing special cases. In fact, this is the second special case that we have removed — the first being that we do not intend to return a result that indicates whether `x`

is in the sequence at all. If `x`

is not in the sequence, we shall return either an off-the-end position (i.e., `n`

) or the index of an element that is strictly greater than `x`

. It will be up to our caller to use that policy to decide whether `x`

is present in the sequence.

We can say with confidence that the value that we wish our binary-search function to return is in the range `[0, n]`

(i.e., the range from `0`

through `n`

that includes both `0`

and `n`

). The reason for this claim is that if `x`

is strictly less than every element of the sequence, it suffices to return `0`

; and if every element of the sequence is less than `x`

, it suffices to return `n`

. If the sequence has no elements, then both of these conditions is true; fortunately, in that case `n`

is equal to `0`

and we can satisfy both conditions by returning `0`

.

This observation suggests that we might begin our code as follows. We shall assume that we are searching an n-element array of type `T`

:

size_t binary_search(T x, const T& array, size_t n) { size_t begin = 0, end = n; // …

As usual, `size_t`

is the type of an array index. We are adopting the usual practice of using `begin`

and `end`

to refer to the limits of our range. However, because of how we have defined `vless`

, we shall feel free to work in the range `[begin, end]`

rather than the more usual `[begin,end)`

. In particular, the existence of `vless`

allows us to state the following loop invariant:

// Invariant: This function will eventually return a value in the range[begin, end]

Our strategy will be to increase `begin`

and/or decrease `end`

until the two are equal. At that point, there is only one value that the function can return, so we shall return it. That observation allows us to write the rest of the function except for the loop body:

while (begin != end) { // Reduce the size of the range [begin, end] while maintaining the invariant } return begin; // Or return end, because begin == end }

What about that loop body? As before, we start by computing a midpoint:

size_t mid = (begin + end) / 2;

We now use our `vless`

function to inspect the element at position `mid`

:

if (vless(x, mid)) { // Do something } else { // Do something else }

What are these two things that we might do? Let's think back to the purpose for which we defined `vless`

: We wanted the result of our function to be the smallest value `k`

for which `vless(x, k)`

is `true`

. If `vless(x, mid)`

is `true`

, then `k`

might be equal to `mid`

, and `k`

might even be less than `mid`

. However, `k`

cannot possibly be greater than `mid`

and still meet the requirement of being the smallest such value. We can therefore reduce our range to `[begin, mid]`

and still be confident in maintaining the invariant.

We can justify the phrase "reduce our range to `[begin, mid]`

" by noting that `mid`

must be less than `end`

. The reason is that `begin`

is less than `end`

(because of the `while`

condition), and therefore that `(begin + end) / 2`

is less than `end`

(because integer division on `unsigned`

values always truncates downward). Therefore, "Do something" translates to

end = mid;

and this somewhat long-winded argument shows that this simple assignment always reduces `[begin, end]`

by at least one element while maintaining the loop invariant.

What about the case where `vless(x, mid)`

is `false`

? In that case, we know that our function must return a value strictly greater than `mid`

, because all the values of `k`

for which `vless(x, k)`

is `false`

are smaller than the values of `k`

for which `vless(x, k)`

is `true`

. We can therefore translate "Do something else" to/

begin = mid + 1;

Analogously to the other case, this assignment reduces the range `[begin, end]`

because `mid`

cannot be less than `begin`

, hence `mid + 1`

is strictly greater than `begin`

. Moreover, because we know that `begin < end`

, it is not possible for `begin`

to cross `end`

and become too large.

For convenience, here is the entire function in one place:

size_t binary_search(T x, const T& array, size_t n) { size_t begin = 0, end = n; // Invariant: This function will eventually return a value in the range[begin, end] while (begin != end) { size_t mid = (begin + end) / 2; if (vless(x, mid)) { end = mid; } else { begin = mid + 1; } } return begin; // Or return end, becausebegin == end }

All that is left of the implementation is to write `vless`

. We shall do so next week, after which we shall consider how to go about testing this code.