Channels ▼

Andrew Koenig

Dr. Dobb's Bloggers

Invariants for Binary Search, Part 2: Refining The Specifications

October 22, 2014

Last week, we discussed how to use loop invariants to think about an implementation of the standard-library binary_search function. Although this function is easy to implement, it is not the most useful form of binary search. Accordingly, we continue the discussion by trying to define more carefully the problem we would like to solve before leaping into the solution.

Two separate aspects of last week's program are less useful than we might like. First, although it is useful to be able to find out quickly whether a given value appears in a sequence, it is even more useful to know where it appears. Moreover, if it does not appear, it would be nice to know where in the sequence it would be appropriate to insert the absent value. The second less-than-useful aspect is that we have quietly been assuming that the elements of the sequence are totally ordered. This assumption is not necessarily valid. For example, consider a sequence of names that are compared in a case-insensitive manner. It might be that two names might be considered equal for search purposes even though they are not equal in fact.

In fact, the C++ standard library takes pains to avoid talking about two values being "equal" for ordering purposes. Instead, it allows the user to supply a comparison function that might or might not be the same as the comparison that one uses in other contexts. This possibility leaves open the question of what properties such a comparison function might have; let's think a bit about those properties.

Perhaps the most fundamental property is that the elements of the sequence being searched must not be out of order. In other words, if the sequence is called a, and we pick any two integers i < j that are indices of elements of a, it must never be the case that a[j] < a[i].

This requirement implies that < cannot be circular. In other words, there can never be three values x, y, and z such that x < y, y < z, and z < x, for if there were, there would be no way to order x, y, and z to avoid the possibility of i < j and a[j] < a[i] at the same time.

The standard library imposes other requirements on <. Rather than try to work those out in advance, we will try to see what requirements are necessary to come up with useful invariants as we write out code.

What code shall we write? We are given an ordered sequence and an element to seek. If that element is there, we want to yield the position of that element — or, more accurately, a position, because there might be more than one such element. To be precise, we'll say that we want to yield the first such position; that is, the one with the lowest index. If that element is not there, we want to yield the position at which that element would go if it were there. In other words:

  • We are given a sequence a and a value x. The sequence must be ordered according to < (with the precise meaning of this requirement to be determined).
  • If any element of a has a value equal to x, the function will return the smallest index of such an element.
  • If no element of a has a value equal to x, the function will return the smallest integer j such that if i is the index of an element and i < j, then a[i] < a[j].

When we say that an element of a "has a value equal to x," what we really mean is that if y is such a value, then x < y and y < x are both false.

When we call this function, we will typically ask two questions about the result:

  • If the result is not the index of an element, it must be one past the end, and every element of a must be < x. The proof of this claim is left as an exercise.
  • If the result is the index of an element then either x is < that element, in which case x is not present in a, or x is not < that element, in which case the element is the first of a sequence of one or more elements that is equal (in the sense described earlier) to x.

Either way, the result will give us more useful information than the mere true or false returned by the last version would give.

We shall start writing code next week.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video