Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

Invariants for Binary Search, Part 4: Using The Improved Abstractions

November 10, 2014

We continue our discussion of how to implement our binary search in an I-element sequence. Last week , we defined a simple abstraction, which we called vless(x, k), that yields true if

  • k is the position of an element of the sequence that is not (strictly) less than x, or
  • k is equal to n (i.e., k is the off-the-end position).

This abstraction lets us simplify our code by pretending that x is never so large that it must be inserted at the end of the sequence. In other words, when we ask the question "Where in the sequence can we insert x without disrupting the sequence's order?" we can turn that question into "What is the smallest value of k for which vless(x, k) is true?" with the confidence that there will always be such a value even if every element in the sequence is less than x. This is an assumption that simplifies how we think about our code by eliminating a special case.

It is a useful idea to simplify code by removing special cases. In fact, this is the second special case that we have removed — the first being that we do not intend to return a result that indicates whether x is in the sequence at all. If x is not in the sequence, we shall return either an off-the-end position (i.e., n) or the index of an element that is strictly greater than x. It will be up to our caller to use that policy to decide whether x is present in the sequence.

We can say with confidence that the value that we wish our binary-search function to return is in the range [0, n] (i.e., the range from 0 through n that includes both 0 and n). The reason for this claim is that if x is strictly less than every element of the sequence, it suffices to return 0; and if every element of the sequence is less than x, it suffices to return n. If the sequence has no elements, then both of these conditions is true; fortunately, in that case n is equal to 0 and we can satisfy both conditions by returning 0.

This observation suggests that we might begin our code as follows. We shall assume that we are searching an n-element array of type T:

 
    size_t binary_search(T x, const T& array, size_t n)    
{
           
         size_t begin = 0, end = n;
         // …

As usual, size_t is the type of an array index. We are adopting the usual practice of using begin and end to refer to the limits of our range. However, because of how we have defined vless, we shall feel free to work in the range [begin, end] rather than the more usual [begin,end). In particular, the existence of vless allows us to state the following loop invariant:

 // Invariant: This function will eventually return a value in the range[begin, end]

Our strategy will be to increase begin and/or decrease end until the two are equal. At that point, there is only one value that the function can return, so we shall return it. That observation allows us to write the rest of the function except for the loop body:

 
      while (begin != end) {
                          // Reduce the size of the range [begin, end] while maintaining the invariant
   }
   return begin;   // Or return end, because begin == end
}

What about that loop body? As before, we start by computing a midpoint:

 
size_t mid = (begin + end) / 2;

We now use our vless function to inspect the element at position mid:

 
if (vless(x, mid)) {
          // Do something
} else {
         // Do something else
}

What are these two things that we might do? Let's think back to the purpose for which we defined vless: We wanted the result of our function to be the smallest value k for which vless(x, k) is true. If vless(x, mid) is true, then k might be equal to mid, and k might even be less than mid. However, k cannot possibly be greater than mid and still meet the requirement of being the smallest such value. We can therefore reduce our range to [begin, mid] and still be confident in maintaining the invariant.

We can justify the phrase "reduce our range to [begin, mid]" by noting that mid must be less than end. The reason is that begin is less than end (because of the while condition), and therefore that (begin + end) / 2 is less than end (because integer division on unsigned values always truncates downward). Therefore, "Do something" translates to

 
end = mid;

and this somewhat long-winded argument shows that this simple assignment always reduces [begin, end] by at least one element while maintaining the loop invariant.

What about the case where vless(x, mid) is false? In that case, we know that our function must return a value strictly greater than mid, because all the values of k for which vless(x, k)is false are smaller than the values of k for which vless(x, k) is true. We can therefore translate "Do something else" to/

 
begin = mid + 1;

Analogously to the other case, this assignment reduces the range [begin, end] because mid cannot be less than begin, hence mid + 1 is strictly greater than begin. Moreover, because we know that begin < end, it is not possible for begin to cross end and become too large.

For convenience, here is the entire function in one place:

 
    size_t binary_search(T x, const T& array, size_t n)    
{
           
       size_t begin = 0, end = n;
 
                      // Invariant: This function will eventually return a value in the range[begin, end]
                     while (begin != end) {
                                    size_t mid = (begin + end) / 2;
                                    if (vless(x, mid)) {
                                                end = mid;
                                    } else {
                                                begin = mid + 1;
                                    }
           }
 
           return begin;   // Or return end, becausebegin == end
}
 

All that is left of the implementation is to write vless. We shall do so next week, after which we shall consider how to go about testing this code.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video