Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

C++ Set-Theoretic Operations on Virtual Containers


Oct01: C++ Set-Theoretic Operations on Virtual Containers

The authors are members of the Algorithms Group at Zapper Technologies. They can be contacted at [email protected], [email protected], and [email protected], respectively.


The C++ Standard Template Library (STL) provides the set-theoretic operations union, intersection, difference, and symmetric difference, accessible through the header file <algorithm>. These operations (or "algorithms" in STL-ese) are applicable to sets, as well as to other sorted ranges with input iterators defined. Given two ranges of input iterators and an output iterator, set algorithms construct a new range implementing the desired operation.

When the result of a set operation is needed only for some interim manipulation (for sweeping over its elements once, for instance), two scenarios are possible with the conventional STL approach:

  • The outcome range is actually constructed in a temporary data structure, filled through an insert iterator (that is, an adaptor that implements the interface of an output iterator so that any element assigned to it is inserted into the underlying container). Listing One illustrates this approach by computing an intersection of two sets of integers. An obvious drawback of this approach is the overhead to create (and eventually to deallocate) this auxiliary structure with all the element copying involved.
  • The code fragment to be applied to the outcome range is encapsulated in an iterator-like object that satisfies all the assumptions of an output iterator. This object is passed as an output parameter to set operations, and thus gets invoked on each element of the resulting range as a callback function. Listing Two implements this technique for the intersection operation. A disadvantage of this technique is that detaching the code from regular program flow and encapsulating it in an auxiliary object makes the implementation cumbersome (especially when the code fragment uses multiple external variables defined elsewhere).

Consequently, we propose an alternative approach in which the output range is built in a "lazy" manner, its elements being computed only when needed; see Listing Three. To this end, we define a virtual container featuring a constant input iterator, const_iterator (note that the use of "virtual" here has nothing to do with virtual functions). Successively incrementing this iterator virtually traverses the elements of the resulting range in the correct order without actually constructing the range. The iterator has to be constant, since otherwise modifying the elements it points to might invalidate the ordering of input ranges. (The complete source-code implementation of this technique is available electronically; see "Resource Center," page 5.)

Our approach provides template-based classes (virtual containers) set_union_online, set_intersection_online, set_difference_online, set_symmetric_difference_online, and merge_online. (The inplace_merge algorithm is not suitable for this treatment because its main purpose is to actually merge its input ranges in the same memory space where the original elements reside.) The differences in logic of these operations are realized in different imple- mentations of the increment (op++()) and dereference (op*()) operators of each container's iterator.

Apparently, the approaches of both Listings Two and Three encapsulate the algorithm logic, but the latter has the advantage of doing so only once. While the callback function needs to be specifically designed for each application, we wrap the algorithms of the various set operations in smart iterators that can later be used elsewhere without the need for adaptation.

Implementation

To perform a set operation in Listing Three, we first instantiate the appropriate virtual container, then use its iterator to run through the elements of the output range (we use an input iterator so that dereferencing it yields the desired elements as if the operation outcome range has actually been constructed). To facilitate this approach, the algorithm for computing the desired operation (set_intersection) needs to be encapsulated in the iterator itself. The iterator's increment/dereference operators are thus defined in terms of three auxiliary functions — _pre_increment(), _dereference(), and _find_next() (the latter being used to identify the next element in the output range). These auxiliary functions realize the specific algorithm in hand. To prevent code duplication, we only implement the preincrement operator, which is then used to implement the postincrement version; see Listing Four. The iterator template in Listing Four only declares these auxiliary functions, while the full implementation is given later for each particular algorithm (set_union_online, set_intersection_online, and so on).

Our goal was to make the code compatible with a number of major C++ compilers. Consequently, since Microsoft's Visual C++ does not support partial template specialization, we opted for a slightly bulky, but portable, approach. The iterator template declares the three auxiliary functions for all available algorithms, but later each algorithm only implements those functions pertinent to its iterator proper. The linker eventually gets rid of all the unused declarations. This scheme operates a variant of the tag dispatching mechanism (see Generic Programming Techniques, http://www.boost.org/more/generic_programming.html), using the definitions of tags and supplementary macros of Listing Five.

The parameters of the iterator class template are:

  • class _T, a value type.
  • class _Iter1, class _Iter2, iterator types for the two input ranges.

  • class _StrictWeakOrdering, a binary predicate for comparing the objects of the two input ranges.

  • class _Tag, an auxiliary class for distinguishing the implementations of various set operations through the tag dispatching mechanism.

Base Class for Virtual Containers

All virtual containers implementing set operations derive from the base class in Listing Six. The template parameters are mainly used to instantiate the smart iterator and are, therefore, identical to those of Listing Four. The base class contains the functionality shared by all set operations; namely, the iterator access functions begin() and end().

Set Algorithms

Listing Seven outlines the core implementation of set algorithms, using set intersection as an example. This code fragment first instantiates the container itself (using the symbolic intersection_tag), then implements the three auxiliary functions of the corresponding iterator:

  • _pre_increment(), which underlies the implementation of op++(), advances the current position in both of its input ranges, and scans them for the next element to be included in the intersection.
  • _dereference(), used in the dereference operator op*(), returns the element found earlier by the increment operator or the iterator constructor (should the dereference be invoked immediately upon the iterator construction).

  • _find_next(), which implements the essence of the set intersection algorithm, looks for the next element to be added to the resultant set.

Listing Seven uses a set of auxiliary macros to instantiate all the templates involved; these are depicted in Listing Eight. Implementation of other algorithms is mostly similar to that of set intersection.

Discussion

The implementation of set-theoretic operations for C++ we present here does not compromise computational efficiency or programming style. It also satisfies all the assumptions for STL set operations — namely time complexity, stability (where applicable), and support for multisets.

Additional functionality might be obtained by instantiating the virtual containers developed here on reverse iterators. When the two input ranges are given by reverse rather than forward iterators, it becomes possible to traverse the outcome of set operations backwards — a useful extension under certain circumstances.

Future extensions to this work can relax the requirements of uniformity on input/output ranges. For example, in some cases it may be necessary to merge sequences of records with different structures, performing a particular set operation based on (comparable) keys available in both kinds of records. Another extension can generalize over the type of output elements, creating a sequence of elements of some new type, different from that of input items.

Acknowledgment

Thanks to Alex Gontmakher for his helpful comments.

DDJ

Listing One

void temp_intersection() {
   vector<int> vec1, vec2, temp_result;
   ... // prepare input sequences in 'vec1' and 'vec2'
   set_intersection(vec1.begin(), vec1.end(), vec2.begin(), vec2.end(), 
    back_inserter(temp_result)); // build the result in a temporary container
   for (int i = 0; i < temp_result.size(); ++i) {
      // do something
   }
}

Back to Article

Listing Two

class callback {
      void do_something(int v) {
         // do something
      }
public:
      callback &operator=(int v) { do_something(v); return *this;}
      callback &operator*() { return *this; }
      // op++ for output iterators has a dummy implementation
      callback &operator++() { return *this; }
      callback &operator++(int) { return *this; }
};
void callback_intersection() {
   vector<int> vec1, vec2;
   ...
   set_intersection(vec1.begin(), vec1.end(),
 	        vec2.begin(), vec2.end(), callback());
}

Back to Article

Listing Three

void online_intersection() {
   vector<int> vec1, vec2;
   ...
   typedef set_intersection_online<int, 
                                   vector<int>::iterator> IntersectionOnline;
   IntersectionOnline res(vec1.begin(), vec1.end(), vec2.begin(), vec2.end());
   for (IntersectionOnline::const_iterator iter = 
        res.begin(); iter != res.end(); ++iter) {
      // do something
   }
}

Back to Article

Listing Four

template<class _T, class _Iter1, 
                   class _Iter2, class _StrictWeakOrdering, class _Tag>
class _const_set_online_iterator {
public:
   typedef _T value_type;     
private:
   // Iterators to the current and last elements of the input ranges
   _Iter1 _current1, _last1;
   _Iter2 _current2, _last2;
   // Possible outcomes of comparing the current elements in the input ranges:
   // FIRST      : _StrictWeakOrdering(*current1, *current2) == true
   //                   or the second range has been exhausted              
   // SECOND : _StrictWeakOrdering(*current2, *current1) == true
   //                   or the first range has been exhausted
   // EQUAL    : neither the first nor the second condition is true
   enum compare_state { FIRST, SECOND, EQUAL };

   // The comparison state is used in iterator increment and dereference
   compare_state _compare;
      
   compare_state get_compare_state() const {
      if (_current1 == _last1) return SECOND;
      if (_current2 == _last2) return FIRST;
      if (_StrictWeakOrdering()(*_current1, *_current2)) return FIRST;
      if (_StrictWeakOrdering()(*_current2, *_current1)) return SECOND;
      return EQUAL;
   }

   // These functions need to be defined with the corresponding tag parameter:
   // inline _const_set_online_iterator& _pre_increment(const _Tag&);
   // inline void _find_next(const _Tag&);
   // inline const value_type _dereference(const _Tag&) const;
   //
   // Since MS VC++ does not support partial template specialization we 
   // declare these functions for all possible tags. The implementation, 
   // however, is provided only for the pertinent tag.
   INSTANTIATE_FOR_ALL_OPERATIONS(DECLARE_PRE_INCREMENT)
   INSTANTIATE_FOR_ALL_OPERATIONS(DECLARE_DEREFERENCE)
   INSTANTIATE_FOR_ALL_OPERATIONS(DECLARE_FIND_NEXT)
   
   void find_next() { 
      _find_next(_Tag()); 
   }
public:
   // The constructor takes two boundary elements for each of the input ranges
   _const_set_online_iterator(const _Iter1& current1, const _Iter1& last1,
                   const _Iter2& current2, const _Iter2& last2) :
      _current1(current1), _last1(last1), _current2(current2), _last2(last2)
   {
      _compare = get_compare_state(); 
                                // update compare state for current elements
      find_next();  // find the next element for which the predicate is true
   }
   // Basic operators
   bool operator==(const _const_set_online_iterator& rhs) const {
      return ((_current1 == rhs._current1) && (_current2 == rhs._current2));
   }
   bool operator!=(const _const_set_online_iterator& rhs) const {
      return !operator==(rhs);
   }
   _const_set_online_iterator& operator++() {
      return _pre_increment(_Tag());
   }
   _const_set_online_iterator& operator++(int) {
      _const_set_online_iterator temp = *this;
      ++(*this);
      return temp;
   }
   const value_type operator*() const { 
      return _dereference(_Tag()); 
   }
};

Back to Article

Listing Five

struct union_tag {};
struct intersection_tag {};
struct difference_tag {};
struct symmetric_difference_tag {};
struct merge_tag {};

// Declaration of the pre-increment operator
#define DECLARE_PRE_INCREMENT(tag)               \
inline _const_set_online_iterator& _pre_increment(const tag&);

// Declaration of the dereference operator
#define DECLARE_DEREFERENCE(tag)                 \
inline const value_type& _dereference(const tag&) const;

// Declaration of 'find_next' function
#define DECLARE_FIND_NEXT(tag)                    \ 
inline void _find_next(const tag&);

#define INSTANTIATE_FOR_ALL_OPERATIONS(MACRO)     \
MACRO(union_tag)                                  \
MACRO(intersection_tag)                           \
MACRO(difference_tag)                             \
MACRO(symmetric_difference_tag)                   \
MACRO(merge_tag)

Back to Article

Listing Six

template<class _T, class _Iter1, class _Iter2, 
                                 class _StrictWeakOrdering, class _Tag>
class _set_online {
   _Iter1 _first1, _last1;
   _Iter2 _first2, _last2;
public:
   typedef _T value_type;
   typedef _const_set_online_iterator<_T, _Iter1, 
                                    _Iter2, _StrictWeakOrdering, _Tag> 
                _const_iterator;
   _set_online(_Iter1 first1, _Iter1 last1, _Iter2 first2, _Iter2 last2) :
      _first1(first1), _last1(last1), _first2(first2), _last2(last2) {}
   _const_iterator begin() const { return _const_iterator(_first1, 
                                               _last1, _first2, _last2); }
   _const_iterator end() const { return _const_iterator(_last1, 
                                               _last1, _last2, _last2); }
};

Back to Article

Listing Seven

INSTANTIATE_SET_ONLINE(set_intersection_online, intersection_tag)

INSTANTIATE_PRE_INCREMENT(intersection_tag)
{
   ++_current1;
   ++_current2;
   find_next();
   return *this;
}
INSTANTIATE_DEREFERENCE(intersection_tag)
{
   return *_current1;
}
INSTANTIATE_FIND_NEXT(intersection_tag)
{
   if (_current1 == _last1 || _current2 == _last2) {
      _current1 = _last1;
      _current2 = _last2;
      return;
   }
   while (_StrictWeakOrdering()(*_current1, *_current2) || 
    _StrictWeakOrdering()(*_current2, *_current1)) 
   {
      while ((_current1 != _last1) && 
        _StrictWeakOrdering()(*_current1, *_current2))
         ++_current1;
            
      if (_current1 == _last1) {
         _current2 = _last2;
         return;
      }
      while ((_current2 != _last2) && 
        _StrictWeakOrdering()(*_current2, *_current1))
         ++_current2;
      if (_current2 == _last2) {
         _current1 = _last1;
         return;
      }
   }
}

Back to Article

Listing Eight

// Auxiliary macro for instantiating "virtual containers"
#define INSTANTIATE_SET_ONLINE(classname, tag)          \
template<class _T,                                      \
         class _Iter1,                                  \
         class _Iter2 = _Iter1,                         \
         class _StrictWeakOrdering = std::less<_T> >    \
class classname :                                       \
   public _set_online<_T, _Iter1, _Iter2, _StrictWeakOrdering, tag>     \
{                                                       \
public:                                                 \
   typedef _T value_type;                               \
   classname (_Iter1 first1, _Iter1 last1, _Iter2 first2, _Iter2 last2) :  \
      _set_online<_T, _Iter1, _Iter2, _StrictWeakOrdering, tag> \
   (first1, last1, first2, last2)                       \
   {}                                                   \
};
// Auxiliary macro for instantiating the pre-increment operator
#define INSTANTIATE_PRE_INCREMENT(tag)                  \
template<class _T, class _Iter1, class _Iter2, class _StrictWeakOrdering,   \
               class _Tag>                              \
inline _const_set_online_iterator<_T, _Iter1, _Iter2,   \
                                         _StrictWeakOrdering, _Tag>&      \
   _const_set_online_iterator<_T, _Iter1, _Iter2,       \
                                       _StrictWeakOrdering, _Tag>::       \
_pre_increment(const tag&)
// Auxiliary macro for instantiating the dereference operator
#define INSTANTIATE_DEREFERENCE(tag)                    \
template<class _T, class _Iter1, class _Iter2, class _StrictWeakOrdering,  \
              class _Tag>                               \
inline const _const_set_online_iterator<_T, _Iter1, _Iter2,                \
                                 _StrictWeakOrdering, _Tag>::value_type    \
   &_const_set_online_iterator<_T, _Iter1, _Iter2,       \
                       _StrictWeakOrdering, _Tag>::      \
_dereference(const tag&) const

// Auxiliary macro for instantiating the "find_next" function
#define INSTANTIATE_FIND_NEXT(tag)                      \
template<class _T, class _Iter1, class _Iter2, class _StrictWeakOrdering,  \
               class _Tag>                              \
inline void _const_set_online_iterator<_T, _Iter1, _Iter2,              \
                                     _StrictWeakOrdering, _Tag>::        \
_find_next(const tag&)

Back to Article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.