C++ Set-Theoretic Operations on Virtual Containers

The C++ Standard Template Library (STL) provides the set-theoretic operations union, intersection, difference, and symmetric difference, accessible through the header file .


October 01, 2001
URL:http://www.drdobbs.com/cpp/c-set-theoretic-operations-on-virtual-co/184404803

Oct01: C++ Set-Theoretic Operations on Virtual Containers

The authors are members of the Algorithms Group at Zapper Technologies. They can be contacted at [email protected], [email protected], and [email protected], respectively.


The C++ Standard Template Library (STL) provides the set-theoretic operations union, intersection, difference, and symmetric difference, accessible through the header file <algorithm>. These operations (or "algorithms" in STL-ese) are applicable to sets, as well as to other sorted ranges with input iterators defined. Given two ranges of input iterators and an output iterator, set algorithms construct a new range implementing the desired operation.

When the result of a set operation is needed only for some interim manipulation (for sweeping over its elements once, for instance), two scenarios are possible with the conventional STL approach:

Consequently, we propose an alternative approach in which the output range is built in a "lazy" manner, its elements being computed only when needed; see Listing Three. To this end, we define a virtual container featuring a constant input iterator, const_iterator (note that the use of "virtual" here has nothing to do with virtual functions). Successively incrementing this iterator virtually traverses the elements of the resulting range in the correct order without actually constructing the range. The iterator has to be constant, since otherwise modifying the elements it points to might invalidate the ordering of input ranges. (The complete source-code implementation of this technique is available electronically; see "Resource Center," page 5.)

Our approach provides template-based classes (virtual containers) set_union_online, set_intersection_online, set_difference_online, set_symmetric_difference_online, and merge_online. (The inplace_merge algorithm is not suitable for this treatment because its main purpose is to actually merge its input ranges in the same memory space where the original elements reside.) The differences in logic of these operations are realized in different imple- mentations of the increment (op++()) and dereference (op*()) operators of each container's iterator.

Apparently, the approaches of both Listings Two and Three encapsulate the algorithm logic, but the latter has the advantage of doing so only once. While the callback function needs to be specifically designed for each application, we wrap the algorithms of the various set operations in smart iterators that can later be used elsewhere without the need for adaptation.

Implementation

To perform a set operation in Listing Three, we first instantiate the appropriate virtual container, then use its iterator to run through the elements of the output range (we use an input iterator so that dereferencing it yields the desired elements as if the operation outcome range has actually been constructed). To facilitate this approach, the algorithm for computing the desired operation (set_intersection) needs to be encapsulated in the iterator itself. The iterator's increment/dereference operators are thus defined in terms of three auxiliary functions — _pre_increment(), _dereference(), and _find_next() (the latter being used to identify the next element in the output range). These auxiliary functions realize the specific algorithm in hand. To prevent code duplication, we only implement the preincrement operator, which is then used to implement the postincrement version; see Listing Four. The iterator template in Listing Four only declares these auxiliary functions, while the full implementation is given later for each particular algorithm (set_union_online, set_intersection_online, and so on).

Our goal was to make the code compatible with a number of major C++ compilers. Consequently, since Microsoft's Visual C++ does not support partial template specialization, we opted for a slightly bulky, but portable, approach. The iterator template declares the three auxiliary functions for all available algorithms, but later each algorithm only implements those functions pertinent to its iterator proper. The linker eventually gets rid of all the unused declarations. This scheme operates a variant of the tag dispatching mechanism (see Generic Programming Techniques, http://www.boost.org/more/generic_programming.html), using the definitions of tags and supplementary macros of Listing Five.

The parameters of the iterator class template are:

Base Class for Virtual Containers

All virtual containers implementing set operations derive from the base class in Listing Six. The template parameters are mainly used to instantiate the smart iterator and are, therefore, identical to those of Listing Four. The base class contains the functionality shared by all set operations; namely, the iterator access functions begin() and end().

Set Algorithms

Listing Seven outlines the core implementation of set algorithms, using set intersection as an example. This code fragment first instantiates the container itself (using the symbolic intersection_tag), then implements the three auxiliary functions of the corresponding iterator:

Listing Seven uses a set of auxiliary macros to instantiate all the templates involved; these are depicted in Listing Eight. Implementation of other algorithms is mostly similar to that of set intersection.

Discussion

The implementation of set-theoretic operations for C++ we present here does not compromise computational efficiency or programming style. It also satisfies all the assumptions for STL set operations — namely time complexity, stability (where applicable), and support for multisets.

Additional functionality might be obtained by instantiating the virtual containers developed here on reverse iterators. When the two input ranges are given by reverse rather than forward iterators, it becomes possible to traverse the outcome of set operations backwards — a useful extension under certain circumstances.

Future extensions to this work can relax the requirements of uniformity on input/output ranges. For example, in some cases it may be necessary to merge sequences of records with different structures, performing a particular set operation based on (comparable) keys available in both kinds of records. Another extension can generalize over the type of output elements, creating a sequence of elements of some new type, different from that of input items.

Acknowledgment

Thanks to Alex Gontmakher for his helpful comments.

DDJ

Listing One

void temp_intersection() {
   vector<int> vec1, vec2, temp_result;
   ... // prepare input sequences in 'vec1' and 'vec2'
   set_intersection(vec1.begin(), vec1.end(), vec2.begin(), vec2.end(), 
    back_inserter(temp_result)); // build the result in a temporary container
   for (int i = 0; i < temp_result.size(); ++i) {
      // do something
   }
}

Back to Article

Listing Two

class callback {
      void do_something(int v) {
         // do something
      }
public:
      callback &operator=(int v) { do_something(v); return *this;}
      callback &operator*() { return *this; }
      // op++ for output iterators has a dummy implementation
      callback &operator++() { return *this; }
      callback &operator++(int) { return *this; }
};
void callback_intersection() {
   vector<int> vec1, vec2;
   ...
   set_intersection(vec1.begin(), vec1.end(),
 	        vec2.begin(), vec2.end(), callback());
}

Back to Article

Listing Three

void online_intersection() {
   vector<int> vec1, vec2;
   ...
   typedef set_intersection_online<int, 
                                   vector<int>::iterator> IntersectionOnline;
   IntersectionOnline res(vec1.begin(), vec1.end(), vec2.begin(), vec2.end());
   for (IntersectionOnline::const_iterator iter = 
        res.begin(); iter != res.end(); ++iter) {
      // do something
   }
}

Back to Article

Listing Four

template<class _T, class _Iter1, 
                   class _Iter2, class _StrictWeakOrdering, class _Tag>
class _const_set_online_iterator {
public:
   typedef _T value_type;     
private:
   // Iterators to the current and last elements of the input ranges
   _Iter1 _current1, _last1;
   _Iter2 _current2, _last2;
   // Possible outcomes of comparing the current elements in the input ranges:
   // FIRST      : _StrictWeakOrdering(*current1, *current2) == true
   //                   or the second range has been exhausted              
   // SECOND : _StrictWeakOrdering(*current2, *current1) == true
   //                   or the first range has been exhausted
   // EQUAL    : neither the first nor the second condition is true
   enum compare_state { FIRST, SECOND, EQUAL };

   // The comparison state is used in iterator increment and dereference
   compare_state _compare;
      
   compare_state get_compare_state() const {
      if (_current1 == _last1) return SECOND;
      if (_current2 == _last2) return FIRST;
      if (_StrictWeakOrdering()(*_current1, *_current2)) return FIRST;
      if (_StrictWeakOrdering()(*_current2, *_current1)) return SECOND;
      return EQUAL;
   }

   // These functions need to be defined with the corresponding tag parameter:
   // inline _const_set_online_iterator& _pre_increment(const _Tag&);
   // inline void _find_next(const _Tag&);
   // inline const value_type _dereference(const _Tag&) const;
   //
   // Since MS VC++ does not support partial template specialization we 
   // declare these functions for all possible tags. The implementation, 
   // however, is provided only for the pertinent tag.
   INSTANTIATE_FOR_ALL_OPERATIONS(DECLARE_PRE_INCREMENT)
   INSTANTIATE_FOR_ALL_OPERATIONS(DECLARE_DEREFERENCE)
   INSTANTIATE_FOR_ALL_OPERATIONS(DECLARE_FIND_NEXT)
   
   void find_next() { 
      _find_next(_Tag()); 
   }
public:
   // The constructor takes two boundary elements for each of the input ranges
   _const_set_online_iterator(const _Iter1& current1, const _Iter1& last1,
                   const _Iter2& current2, const _Iter2& last2) :
      _current1(current1), _last1(last1), _current2(current2), _last2(last2)
   {
      _compare = get_compare_state(); 
                                // update compare state for current elements
      find_next();  // find the next element for which the predicate is true
   }
   // Basic operators
   bool operator==(const _const_set_online_iterator& rhs) const {
      return ((_current1 == rhs._current1) && (_current2 == rhs._current2));
   }
   bool operator!=(const _const_set_online_iterator& rhs) const {
      return !operator==(rhs);
   }
   _const_set_online_iterator& operator++() {
      return _pre_increment(_Tag());
   }
   _const_set_online_iterator& operator++(int) {
      _const_set_online_iterator temp = *this;
      ++(*this);
      return temp;
   }
   const value_type operator*() const { 
      return _dereference(_Tag()); 
   }
};

Back to Article

Listing Five

struct union_tag {};
struct intersection_tag {};
struct difference_tag {};
struct symmetric_difference_tag {};
struct merge_tag {};

// Declaration of the pre-increment operator
#define DECLARE_PRE_INCREMENT(tag)               \
inline _const_set_online_iterator& _pre_increment(const tag&);

// Declaration of the dereference operator
#define DECLARE_DEREFERENCE(tag)                 \
inline const value_type& _dereference(const tag&) const;

// Declaration of 'find_next' function
#define DECLARE_FIND_NEXT(tag)                    \ 
inline void _find_next(const tag&);

#define INSTANTIATE_FOR_ALL_OPERATIONS(MACRO)     \
MACRO(union_tag)                                  \
MACRO(intersection_tag)                           \
MACRO(difference_tag)                             \
MACRO(symmetric_difference_tag)                   \
MACRO(merge_tag)

Back to Article

Listing Six

template<class _T, class _Iter1, class _Iter2, 
                                 class _StrictWeakOrdering, class _Tag>
class _set_online {
   _Iter1 _first1, _last1;
   _Iter2 _first2, _last2;
public:
   typedef _T value_type;
   typedef _const_set_online_iterator<_T, _Iter1, 
                                    _Iter2, _StrictWeakOrdering, _Tag> 
                _const_iterator;
   _set_online(_Iter1 first1, _Iter1 last1, _Iter2 first2, _Iter2 last2) :
      _first1(first1), _last1(last1), _first2(first2), _last2(last2) {}
   _const_iterator begin() const { return _const_iterator(_first1, 
                                               _last1, _first2, _last2); }
   _const_iterator end() const { return _const_iterator(_last1, 
                                               _last1, _last2, _last2); }
};

Back to Article

Listing Seven

INSTANTIATE_SET_ONLINE(set_intersection_online, intersection_tag)

INSTANTIATE_PRE_INCREMENT(intersection_tag)
{
   ++_current1;
   ++_current2;
   find_next();
   return *this;
}
INSTANTIATE_DEREFERENCE(intersection_tag)
{
   return *_current1;
}
INSTANTIATE_FIND_NEXT(intersection_tag)
{
   if (_current1 == _last1 || _current2 == _last2) {
      _current1 = _last1;
      _current2 = _last2;
      return;
   }
   while (_StrictWeakOrdering()(*_current1, *_current2) || 
    _StrictWeakOrdering()(*_current2, *_current1)) 
   {
      while ((_current1 != _last1) && 
        _StrictWeakOrdering()(*_current1, *_current2))
         ++_current1;
            
      if (_current1 == _last1) {
         _current2 = _last2;
         return;
      }
      while ((_current2 != _last2) && 
        _StrictWeakOrdering()(*_current2, *_current1))
         ++_current2;
      if (_current2 == _last2) {
         _current1 = _last1;
         return;
      }
   }
}

Back to Article

Listing Eight

// Auxiliary macro for instantiating "virtual containers"
#define INSTANTIATE_SET_ONLINE(classname, tag)          \
template<class _T,                                      \
         class _Iter1,                                  \
         class _Iter2 = _Iter1,                         \
         class _StrictWeakOrdering = std::less<_T> >    \
class classname :                                       \
   public _set_online<_T, _Iter1, _Iter2, _StrictWeakOrdering, tag>     \
{                                                       \
public:                                                 \
   typedef _T value_type;                               \
   classname (_Iter1 first1, _Iter1 last1, _Iter2 first2, _Iter2 last2) :  \
      _set_online<_T, _Iter1, _Iter2, _StrictWeakOrdering, tag> \
   (first1, last1, first2, last2)                       \
   {}                                                   \
};
// Auxiliary macro for instantiating the pre-increment operator
#define INSTANTIATE_PRE_INCREMENT(tag)                  \
template<class _T, class _Iter1, class _Iter2, class _StrictWeakOrdering,   \
               class _Tag>                              \
inline _const_set_online_iterator<_T, _Iter1, _Iter2,   \
                                         _StrictWeakOrdering, _Tag>&      \
   _const_set_online_iterator<_T, _Iter1, _Iter2,       \
                                       _StrictWeakOrdering, _Tag>::       \
_pre_increment(const tag&)
// Auxiliary macro for instantiating the dereference operator
#define INSTANTIATE_DEREFERENCE(tag)                    \
template<class _T, class _Iter1, class _Iter2, class _StrictWeakOrdering,  \
              class _Tag>                               \
inline const _const_set_online_iterator<_T, _Iter1, _Iter2,                \
                                 _StrictWeakOrdering, _Tag>::value_type    \
   &_const_set_online_iterator<_T, _Iter1, _Iter2,       \
                       _StrictWeakOrdering, _Tag>::      \
_dereference(const tag&) const

// Auxiliary macro for instantiating the "find_next" function
#define INSTANTIATE_FIND_NEXT(tag)                      \
template<class _T, class _Iter1, class _Iter2, class _StrictWeakOrdering,  \
               class _Tag>                              \
inline void _const_set_online_iterator<_T, _Iter1, _Iter2,              \
                                     _StrictWeakOrdering, _Tag>::        \
_find_next(const tag&)

Back to Article

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.