Channels ▼
RSS

<algorithm>: find_end


October, 2005: <algorithm>: find_end

Bjorn is author of Beyond The C++ Standard Library: An Introduction to Boost. He can be contacted at [email protected]


In this series of articles, I focus on clause 25 of the C++ Standard [1], which is the Algorithms Library. These algorithms are designed to work with iterators from the standard containers [2] (std::vector, std::map, and so forth) and other sequences. The first part of this series of articles covers the nonmodifying sequence operations, the second part talks about the mutating sequence operations, and the third discusses sorting and related operations. These divisions are named after the categorization of algorithms defined in the C++ Standard. The topics of this article are three related algorithms called search, search_n, and find_end.

The purpose of search and search_n is to find the first occurrence of a subsequence in another sequence.

The purpose of find_end is to find the last occurrence of a subsequence in another sequence.

Here is how the C++ Standard defines search() and search_n():

25.1.9 Search [lib.alg.search]

template<class ForwardIterator1, class ForwardIterator2>
  ForwardIterator1
    search(ForwardIterator1 first1, 
           ForwardIterator1 last1,
           ForwardIterator2 first2, 
           ForwardIterator2 last2);

template<class ForwardIterator1, 
         class ForwardIterator2,
         class BinaryPredicate>
  ForwardIterator1
    search(ForwardIterator1 first1, 
           ForwardIterator1 last1,
           ForwardIterator2 first2, 
           ForwardIterator2 last2,
           BinaryPredicate pred);

1 Effects: Finds a subsequence of equal values in a sequence.
2 Returns: The first iterator i in the range [first1, last1 - (last2 - first2)) such that for any nonnegative integer n less than last2 - first2 the following corresponding conditions hold: *(i + n) == *(first2 + n), pred(*(i + n), *(first2 + n)) != false. Returns last1 if no such iterator is found.
3 Complexity: At most (last1 - first1) * (last2 - first2) applications of the corresponding predicate.

template<class ForwardIterator, class Size, class T>
  ForwardIterator
    search_n(ForwardIterator first, 
             ForwardIterator last, 
             Size count,
             const T& value);

template<class ForwardIterator, 
         class Size, 
         class T,
         class BinaryPredicate>
  ForwardIterator
    search_n(ForwardIterator first, 
             ForwardIterator last, 
             Size count,
             const T& value, 
             BinaryPredicate pred);

4 Requires: Type T is EqualityComparable (20.1.1), type Size is convertible to integral type (4.7, 12.3).
5 Effects: Finds a subsequence of equal values in a sequence.
6 Returns: The first iterator i in the range [first, last - count) such that for any nonnegative integer n less than count the following corresponding conditions hold: *(i + n) == value, pred(*(i + n),value) != false. Returns last if no such iterator is found.
7 Complexity: At most (last1 - first1) * count applications of the corresponding predicate.

Header: <algorithm>
Namespace: std
Category: Non-modifying sequence operations

The first form of search() expects four arguments: an iterator first1 to the first element of the sequence to search, an iterator last1 to the last element of the sequence, an iterator first2 to the first element of the subsequence to find, and an iterator end2 to the last element of the subsequence to find. search() returns an iterator to the first element in the searched sequence where the beginning of the subsequence is found, or last1 if the subsequence cannot be found.

The second form of search() expects an additional argument, a binary predicate pred used to test corresponding elements of the two ranges.

The first form of search_n() expects four arguments: an iterator first to the first element of the sequence to search, an iterator last to the last element of the sequence, an integral value count that denotes how many elements should be in the subsequence, and a value value that is used to test for element equality. search_n() returns an iterator to the first element in the searched sequence where the beginning of the subsequence (count number of value) is found, or last if the subsequence cannot be found.

The second form of search_n() expects an additional argument, a binary predicate pred used for comparing elements and value.

Here is how the C++ Standard defines find_end():

25.1.3 Find End [lib.alg.find.end]

  template<class ForwardIterator1, class ForwardIterator2>
    ForwardIterator1 find_end(
      ForwardIterator1 first1, 
      ForwardIterator1 last1,
      ForwardIterator2 first2, 
      ForwardIterator2 last2);

  template<class ForwardIterator1, 
           class ForwardIterator2, 
           class BinaryPredicate>
    ForwardIterator1 find_end(
      ForwardIterator1 first1, 
      ForwardIterator1 last1,
      ForwardIterator2 first2, 
      ForwardIterator2 last2,
      BinaryPredicate pred);

1 Effects: Finds a subsequence of equal values in a sequence.
2 Returns: The last iterator i in the range [first1, last1 - (last2-first2)) such that for any nonnegative integer n < (last2-first2), the following corresponding conditions hold: *(i+n)== *(first2+n), pred(*(i+n),*(first2+n)) != false. Returns last1 if no such iterator is found.
3 Complexity: At most (last2 - first2) * (last1 - first1 - (last2 - first2) + 1) applications of the corresponding predicate.

Header: <algorithm>
Namespace: std
Category: Non-modifying sequence operations

The first form of find_end() expects four arguments: an iterator first1 to the first element of the sequence to search, an iterator last1 to the last element of the sequence, an iterator first2 to the first element of the subsequence to find, and an iterator end2 to the last element of the subsequence to find. find_end() returns an iterator to the first element in the searched sequence where the beginning of the last subsequence is found, or last1 if the subsequence cannot be found.

The second form of find_end() expects an additional argument, a binary predicate pred used to test corresponding elements of the two ranges.

Suppose that you have a std::vector<int> containing a series of values. You need to find the first and last occurrences of a specific subsequence of values in that series. Here is how you can use std::search() and std::find_end() to solve the problem:

#include <vector>
#include <algorithm>

int main()
{
  typedef std::vector<int> value_container;
  typedef value_container::iterator value_iterator;

  // Create the sequence
  value_container sequence;

  sequence.push_back(1);
  sequence.push_back(2);
  sequence.push_back(3);
  sequence.push_back(4);
  sequence.push_back(1);
  sequence.push_back(2);
  sequence.push_back(3);
  sequence.push_back(4);

  // Create the subsequence 
  value_container subsequence;

  subsequence.push_back(1);
  subsequence.push_back(2);
  subsequence.push_back(3);

  // Use search to find the first location of subsequence in sequence
  value_iterator first=
    std::search(
      sequence.begin(),     // The beginning of the sequence 
      sequence.end(),       // The end of the sequence
      subsequence.begin(),  // The beginning of the subsequence
      subsequence.end());   // The end of the subsequence


  // Use find_end to find the last location of 
  // subsequence in sequence
  value_iterator last=
    std::find_end(
      sequence.begin(),     // The beginning of the sequence 
      sequence.end(),       // The end of the sequence
      subsequence.begin(),  // The beginning of the subsequence
      subsequence.end());   // The end of the subsequence


  if (first!=sequence.end() && last!=sequence.end() && first!=last)
  {
    // The first and last subsequence was found
  }
  else
  {
    // The first and last subsequence was not found
  }
}

In the above example, the subsequence (1,2,3) occurs twice in the container, so the call to search() will return an iterator for the first element. find_end() will ignore the first match, because there is another match closer to the end of the sequence, so find_end() returns an iterator for the fifth element in sequence. Figure 1 demonstrates how the subsequence is matched against the full sequence.

Basic Find End

We will now look in more detail at how to use find_end(). The introductory example showed how to find a subsequence of ints in a std::vector<int>. For a slightly different use case, let's say you're a meteorologist collecting temperature samples. You want to present some unusual information, so you decide to find out the previous time when the average temperature for the last three days occurred. You happen to have a std::vector<int> containing the average temperatures for each day, and here's how you'd get the last time this series (in this example, 22, 24, 25 degrees (Celsius)) occurred:

#include <vector>
#include <list>
#include <algorithm>

#include "boost/array.hpp"

// Get the average temperature:
// this function is called once
// per day
int get_temperature()
{
  // Access thermometer hardware  
  // and return value
}

int main()
{
  typedef std::vector<int> sample_container;
  typedef sample_container::iterator sample_iterator;

  sample_container samples;

  // Collect temperatures for a period of time

  // Find the last three consecutive samples with the temperatures
  // 22,24,25
  const boost::array<int,3> subsequence={22,24,25};

  sample_iterator it=
    std::find_end(
      samples.begin(),
      samples.end(),
      subsequence.begin(),
      subsequence.end());

  if (it!=samples.end())
  {
    // Yes, this has happened before!
    std::cout << "\nYeah, it happened just " << 
      std::distance(it,samples.end()) << " days ago.\n";
  }
  else
  {
    // This is the first time this has ever happened!
    std::cout << "We've never had this exact situation before\n";
  }
}

The example is very similar to the one given in the introduction. A container is created—samples—and it is populated with the average temperature for a number of days. Then, another container is created—subsequence—and its elements denote the range to find in samples. You'll note that subsequence is of the type boost::array [3], which is a simple wrapper class around arrays that makes it possible to treat arrays almost like STL containers. If find_end() locates a subsequence, it returns an iterator that points to the first element of that (last) subsequence in samples.

Of course, you may not have a series of exact values to compare with, but rather you need to find a series that fulfills certain criteria. For example, you might want to retrieve the last occurrence of five consecutive days with an average temperature below 15. To do that, you need a binary predicate (std::less will work) that is passed to find_end():

#include <vector>
#include <list>
#include <algorithm>
#include <iostream>
#include <functional>

// Get the current temperature
int get_temperature()
{
  // Access thermometer and return value
}

int main()
{
  typedef std::vector<int> sample_container;
  typedef sample_container::iterator sample_iterator;

  sample_container samples;

  // Collect temperatures for a period of time

  // Find the last give consecutive days
  // with temperatures below 15
  const boost::array<int,5> subsequence={15,15,15,15,15};

  sample_iterator it=
    std::find_end(
      samples.begin(),
      samples.end(),
      subsequence.begin(),
      subsequence.end(),
      std::less<int>());

  if (it!=samples.end())
  {
    // Yes, this has happened before!
    std::cout << "\nYeah, it happened just " << 
      std::distance(it,samples.end()) << " days ago.\n";
  }
  else
  {
    // This is the first time this has ever happened!
  }
}

This example isn't much harder to understand: just remember to include <functional>, and pass an instance of std::less<int> as the fifth argument to find_end(). The binary predicate is used to test each element in subsequence, and all comparisons must return true for the algorithm to return an iterator to the beginning of that range. Just like the non-predicate version, it is only the last matching subsequence that is returned.

Intermediate Search

You may be thinking that it is often awfully inefficient to search from the beginning of the range, when all you're interested in is the last matching subsequence. That's absolutely true, and by using reverse iterators [4], it is often much more efficient to emulate find_end() using search(), and perform the search starting from the end. You will also need to reverse the search sequence. However, it is worth noting that it's not always possible to search this way: it depends on the sequence you are operating on. The requirements for the iterators used by find_end() are very relaxed—they just need to be ForwardIterators (more powerful iterators work too, of course). Now, if you're reading the input from std::cin, for example, you won't be able to use search() with reverse iterators, but you'll still be able to use find_end(). If you have bidirectional iterators for both the sequence to search and the subsequence, searching from the end works perfectly.

"Why can't the algorithm implementations be clever enough to figure out when the sequences can be reversed and thus searched from the end?" I hear you ask. Well, they can, and some do. When compiling the example code for this article, I used the Standard Library that comes bundled with Visual Studio 2003 [5], and it didn't use this technique. I also compiled with Comeau's compiler [6] and its Standard Library implementation (libcomo), which did implement this optimization. The C++ Standard does not mandate that find_end() implementations search from the end when possible, so it's a quality-of-implementation issue.

In the following example, you will see how search_n() can be used to create a much more efficient version of the previous example that used find_end(). You'll recall that the problem was to find the last occurrence of five consecutive days with average temperature less than 15 degrees. We could perform the calculation using search(), just like in the previous example (albeit with the search starting from the other end), but search_n() works even better, because it expects an integral value that states how many occurrences of the element must match. By searching backwards in the sequence and using the predicate version of search_n(), we are able to find the last match without one single unnecessary comparison of temperatures.

#include <vector>
#include <list>
#include <algorithm>
#include <iostream>
#include <functional>

// Get the current temperature
int get_temperature()
{
  // Access thermometer and return value
}

int main()
{
  typedef std::vector<int> sample_container;
  typedef sample_container::iterator sample_iterator;
  typedef sample_container::reverse_iterator sample_reverse_iterator;

  sample_container samples;

  // Collect temperatures for a period of time

  // Find the last give consecutive days
  // with average temperature below 15
  sample_reverse_iterator rit=
    std::search_n(
      samples.rbegin(),
      samples.rend(),
      5,
      15,
      std::less<int>());

  if (rit!=samples.rend())
  {
     sample_iterator it=rit.base();
     // Because the match refers to the last element of 
     // the subsequence (we're searching backwards),
     // the iterator "it" must be decreased by 5 
     // (the size of the subsequence).
     std::advance(it,-5); 
    // Yes, this has happened before!
    std::cout << "\nYeah, it happened just " << 
      std::distance(it,samples.end()) << " days ago.\n";
    std::cout << "The values were: " 
      << (*it++) << ", " 
      << (*it++) << ", " 
      << (*it++) << ", " 
      << (*it++) << ", " 
      << (*it++) << "\n";
  }
  else
  {
    // This is the first time this has ever happened!
    std::cout << "Nah\n";
  }
}

By using reverse iterators, the sequence is searched backwards, which means that the first match is actually the last occurrence in the sequence. When you reverse the reversed iterator to get a normal iterator, which is done using the sample_reverse_iterator's member function base(), the returned iterator is actually referring to the last element of the matched subsequence. Thus, you need to move back in the sequence, and do so using the size of the subsequence to position the iterator correctly. You'll note that to get reverse iterators for a sequence, you need to call the member functions rbegin() and rend(), rather than begin() and end():

  sample_reverse_iterator rit=
    std::search_n(
      samples.rbegin(),
      samples.rend(),
      5,
      15,
      std::less<int>());

Later, when the sample_reverse_iterator rit is to be converted to a normal iterator (in order to traverse the sequence from beginning to end again), you must call the algorithm std::advance() with a negative value to position the sample_iterator it correctly, which is where the first element of the matched subsequence exists:

  std::advance(it,-5); 

One more thing to note before we end this section is that in this example, we were searching for a subsequence of equal values—or rather equal predicates—using search_n(). Had that not been the case, you would have used search() together with a subsequence that also would need to be reversed when searching, or else the only match you could possibly get would be one for the subsequence backwards.

 

Bidirectional Iterators

A bidirectional iterator can be incremented (using operator++) and decremented (using operator--). You can use a bidirectional to traverse a sequence both forwards and backwards, whereas with a forward iterator, you are only able to traverse the sequence in one direction.

The C++ Standard defines five iterator categories (input iterators, output iterators, forward iterators, bidirectional iterators, and random access iterators). However, it has been shown that this division isn't optimal, and there's a proposal for updating the C++ Standard with new iterator concepts, which you can read more about at Boost or at WG21.

As I mentioned earlier, it's not always possible to use search() or search_n() as a more efficient version of find_end(), because you will need to have at least bidirectional iterators to operate backwards on a sequence. If all you have is forward iterators, you'll need to traverse the sequence from the beginning.

Perhaps you're lucky enough to work with an STL implementation where find_end() works just like you saw in this example when invoked with bidirectional iterators. If not, you're now able to implement the optimization by hand.

Related Topics

This article has grouped together three of the algorithms that are used for searching a sequence for a subsequence. There are also other algorithms that help in searching. If you want to search for a specific element in a sequence, use std::find() [7]. std::adjacent_find() locates two equal adjacent elements in a sequence. If you operate on sorted sequences, there are several algorithms that can help you search for elements: binary_search(), lower_bound(), upper_bound(), and equal_range(). For searching for minimum and maximum values of elements, there are the algorithms min() and max(). Because so many programming problems relate to finding elements with certain properties, it is inevitable that a substantial part of the STL algorithms are related to searching.

The following table lists all of the algorithms in find_end()'s category.

Non-modifying sequence operations
for_each Applies a function to every element in a sequence
find
find_if
Finds a specific element in a sequence
find_end Finds the last occurrence of a subsequence in another sequence
find_first_of Finds an element that matches one of a set of values
adjacent_find Finds an adjacent pair of equal elements
count count_if Counts the number of elements with a specific value
mismatch Locates the first position where the elements in two sequences have different values
equal Tests if two sequences are equal
search search_n Finds the first occurrence of a subsequence in another sequence

Summary

You have seen how the algorithms search(), search_n(), and find_end() can help you to find subsequences in other sequences. You have used both search() and find_end() to locate series with specific values, and series that match a binary predicate. You have also seen search_n() used to operate backwards on a sequence for a more efficient search than the naive find_end() implementations. During the uses of these algorithms, you have also briefly seen the algorithm advance() in action, and had a look at reverse iterators. Searching for subsequences in other sequences is common, and with the help of search(), search_n(), and find(), you're well equipped to solve any such problem.

References

[1] The C++ Standard, ISO/IEC 14992-98, often referred to as "C++98," is available for purchase from the ANSI standards store at http://webstore.ansi.org/ansidocstore.
[2] To start learning about Standard Template Libray (STL) containers, I would suggest reading The C++ Standard Library : A Tutorial and Reference, by Nicolai Josuttis.
[3] Boost.Array is documented at http://www.boost.org/doc/html/array.html.
[4] If you haven't learned about reverse iterators, consult "STL Tutorial and Reference Guide: C++ Programming with the Standard Template Library (2nd Edition)", by David Musser, Gillmer Derge, and Atul Saini.s
[5] Microsoft Visual Studio's website is at http://msdn.microsoft.com/vstudio.
[5] Visit Comeau Computing's website at http://www.comeaucomputing.com.
[7] See the previous article in this series at C/C++ Users Journal's Experts Forum, http://www.cuj.com/experts.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.