A Container for a Set of Ranges

If you need to represent an ordered list of ranges, the best data structure is probably somewhere between a list and a set.


June 01, 1999
URL:http://www.drdobbs.com/a-container-for-a-set-of-ranges/184403660

June 1999/A Container for a Set of Ranges/Figure 1

Figure 1: Partial listing of range_set.h — defines the range_set class

#ifndef RANGE_SET_H
#define RANGE_SET_H

// Copyright 1998 (c) Andrew W. Phillips
// You can freely use this software for any purpose
// as long as you preserve this copyright notice.

#include <list>
#include <iterator>
#include <utility>                  // for make_pair<T1,T2>()   
#include <functional>               // for less<T>
#include <istream>                  // used in << and >>
#include <ios>
#include <cassert>                  // for assert()

template <class T, class Pred = std::less<T>,
          class Alloc = std::allocator<T> >
class range_set
{
public:
    // Define standard types for STL container
    typedef Alloc allocator_type;
    typedef Alloc::size_type size_type;
    typedef Alloc::difference_type difference_type;
    typedef Alloc::pointer pointer;
    typedef Alloc::const_pointer const_pointer;
    typedef Alloc::reference reference;
    typedef Alloc::const_reference const_reference;
    typedef Alloc::value_type value_type;
    typedef Pred value_compare;
    typedef value_type key_type;        // Value == key for sets
    typedef value_compare key_compare;

    // All iterators are const (mods via iterators not allowed)
    class const_iterator;
    typedef const_iterator iterator;

private:
    friend class const_iterator;

    // Class segment: one contiguous block from the set
    // Note: we could have just used std::pair<T,T> rather
    // than creating this class, but this may have hindered
    // future additions.
    struct segment
    {
        T sfirst;           // Bottom of segment
        T slast;            // One past the top of the segment

        // Constructor
        segment(T fst, T lst) : sfirst(fst), slast(lst) { }
    };
    typedef std::list<segment> range_t;
    mutable range_t range_; // All segments for this range_set
    Pred compare_;          // Object used for comparing elts
    Alloc allocator_;       // Object used for allocating elts
    bool increasing_;       // compare_ uses increasing order?

    // not shown: functions insert_helper(), erase_helper()

    T one_more(const T &v) const
    {
        // Since we store "half-open" ranges we need a way to
        // work out what is "one more" than a value.  If we're
        // using less<T> for comparisons this is done by adding
        // 1 but if using greater<T> then we must subtract.
        return increasing_ ? v + T(1) : v - T(1);
    }
    T one_less(const T &v) const
    {
        // We similarly obtain the value "before" another value.
        return increasing_ ? v - T(1) : v + T(1);
    }

public:
    // Iterators
    class const_iterator :
        public std::iterator<std::bidirectional_iterator_tag,
                             T, difference_type>
    {
    private:
        // Since the container does not "contain" all the actual
        // values the iterator itself must contain the current
        // value pointed to.  We must also store an iterator
        // into the list of segments so we know what the
        // valid values are.
        T value_;                      // The value "pointed" to
        mutable range_t::iterator pr_; // Segment with the value

        // We must also access a few things in the container.
        const range_set *pc_;          // Pointer to container

        const_iterator(const range_t::iterator &p,
                       const range_set *pc, const T &v)
            : pr_(p), pc_(pc), value_(v)
        {
            // Check that we are creating a valid iterator
            assert(pr_ == pc_->range_.end() || 
                   (!pc_->compare_(v, pr_->sfirst) && 
                    pc_->compare_(v, pr_->slast)));
        }

    public:
        friend class range_set<T,Pred,Alloc>;
        const_iterator() : pc_(0)   // Default constructor
        {
        }

        // Use compiler generated copy constructor,
        // copy assignment operator, and destructor

        const_reference operator*() const
        {
            // Check that the iterator is valid and
            // that we don't dereference end()
            assert(pc_ != 0);
            assert(pr_ != pc_->range_.end());

            return value_;
        }
        const_pointer operator->() const
        {
            assert(pc_ != 0);
            assert(pr_ != pc_->range_.end());

            return &value_;
        }
        const_iterator &operator++()
        {
            // Check that the iterator is valid and
            // that we don't try to move past the end()
            assert(pc_ != 0);
            assert(pr_ != pc_->range_.end());

            value_ = pc_->one_more(value_);
            if (!pc_->compare_(value_,pr_->slast) &&
                ++pr_ != pc_->range_.end())
            {
                value_ = pr_->sfirst;
            }
            return *this;
        }
        const_iterator operator++(int)
        {
            assert(pc_ != 0);           // Ensure iter is valid
            const_iterator tmp(*this);
            ++*this;
            return tmp;
        }
        const_iterator &operator--()
        {
            assert(pc_ != 0);           // Ensure iter is valid

            // Check that we don't move back before start
            assert(pc_->range_.begin() != pc_->range_.end());
            assert(pr_ != pc_->range_.begin() ||
                   pc_->compare_(pr_->sfirst, value_));

            value_ = pc_->one_less(value_);
            if (pr_ == pc_->range_.end() || 
                pc_->compare_(value_, pr_->sfirst))
            {
                --pr_;
                value_ = pc_->one_less(pr_->slast);
            }
            return *this;
        }
        const_iterator operator--(int)
        {
            assert(pc_ != 0);           // Ensure iter is valid
            const_iterator tmp(*this);
            --*this;
            return tmp;
        }
        bool operator==(const const_iterator& p) const
        {
            assert(pc_ != 0);           // Ensure iter is valid
            assert(pc_ == p.pc_);       // Same container?

            if (pr_ != p.pr_)
                // Different segments so they must be different
                return false;
            else if (pr_ == pc_->range_.end())
                // Both are at end so they compare the same
                return true;
            else
                // The iterators are now the same if their 
                // value_s are the same.  The following tests
                // for equality using compare_.
                // [Note that A == B is equivalent to
                //  A >= B && A <= B  or  !(A < B) && !(B < A)]
                return !pc_->compare_(value_, p.value_) && 
                       !pc_->compare_(p.value_, value_);
        }
        bool operator!=(const const_iterator& p) const
        {
            return !operator==(p);
        }
    };

    // Create a reverse iterator based on normal (forward) iter
    typedef std::reverse_bidirectional_iterator<const_iterator,
                                value_type, const_reference,
                                const_pointer, difference_type>
            const_reverse_iterator;
    typedef const_reverse_iterator reverse_iterator;

    // Constructors
    explicit range_set(const Pred &p = Pred(),
                       const Alloc &a = Alloc())
        : compare_(p), allocator_(a)
    {
        increasing_ = compare_(T(0), T(1));
    }

    // not shown, range_set constructor based on member templates
    // ...

    range_set(const T *p1, const T *p2,
              const Pred &p = Pred(), const Alloc &a = Alloc())
        : compare_(p), allocator_(a)
    {
        increasing_ = compare_(T(0), T(1));

        const_iterator last_pos = begin();
        while (p1 != p2)
            last_pos = insert(last_pos, *p1++);
    }
    range_set(const_iterator p1, const const_iterator &p2,
              const Pred &p = Pred(), const Alloc &a = Alloc())
        : compare_(p), allocator_(a)
    {
        assert(p1.pc_ == p2.pc_);       // Same container?
        assert(p1.pc_ != 0);            // ... and valid?

        increasing_ = compare_(T(0), T(1));

        const_iterator last_pos = begin();
        while (p1 != p2)
            last_pos = insert(last_pos, *p1++);
    }

    // Use compiler generated copy constructor,
    // copy assignment operator, and destructor

    // not shown: functions begin(), end(), rbegin(), rend(), 
    // get_allocator(), max_size(), size(), empty(), and insert()
 
    std::pair<const_iterator, const_iterator>
    insert_range(const value_type &ss, const value_type &ee)
    {
        return insert_helper(range_.begin(), ss, ee);
    }
    std::pair<const_iterator, const_iterator>
    insert_range(const const_iterator &p, const value_type &ss,
                 const value_type &ee)
    {
        assert(p.pc_ == this);          // For this container?
        return insert_helper(p.pr_, ss, ee);
    }

    // not shown: erase() functions

    const_iterator erase_range(const key_type &ss,
                               const key_type &ee)
    {
        return erase_helper(range_.begin(), ss, ee);
    }
    const_iterator erase_range(const const_iterator &p,
                       const key_type &ss, const key_type &ee)
    {
        assert(p.pc_ == this);         // For this container?
        return erase_helper(p.pr_, ss, ee);
    }

    // not shown: functions clear(), swap(), key_comp(), find(),
    // count(), lower_bound(), upper_bound(), operator>>(),
    // operator<<(), operator==(), operator!=(), operator<(),
    // operator>(), operator<=(), operator>=()
};
#endif


June 1999/A Container for a Set of Ranges

A Container for a Set of Ranges

Andrew Phillips

If you need to represent an ordered list of ranges, the best data structure is probably somewhere between a list and a set.


This article describes a template class, called range_set, that is useful for efficiently representing large sets of integers as an ordered set of ranges. Aside from a few restrictions on the contained type, it is completely compatible with the STL container std::set. It furthermore has facilities for adding and deleting ranges of values to the container.

In the right circumstances, this class can provide large benefits over std::set in memory usage. Lookup can even be faster when you have a set of integers which is mainly composed of large "clumps," or contiguous ranges of values. Since implementing the precursor of range_set, I have found that these circumstances tend to occur surprisingly often.

For example, I wrote the original C++ class on which range_set was based to keep track of the selection of up to two billion elements in a custom list-box control. Using the list-box user interface (clicking, dragging, and shift-clicking) the user can easily select a few individual elements, a few contiguous chunks of them, or even the whole lot. Unless the user decides to do something unusual, like selecting every second element (which for two billion elements would literally take a lifetime), this class is much more space efficient than std::set.

The original class, created many years ago, did not use templates because it was written before compilers implemented them, and before I had even heard of the STL. It worked only with elements of type int. So I decided to "templatize" it by changing all occurrences of int to a template parameter. On further consideration, I realised that the class was really a cut-down version std::set<int> with a few additions for adding and deleting ranges of values and converting ranges to and from C-style strings. There's nothing I would have liked more than to replace my original class with an STL container. Unfortunately, for my requirements there was no STL container that was as convenient or efficient. Instead I decided to create a new class that was as compatible with std::set as possible.

Although it took a little more work than I expected, all that was really necessary was to rearrange some of the code from the original class and to create iterators and a few utility functions, such as upper_bound, lower_bound, count, etc. The complete class is implemented in the file range_set.h, and is available online (see p. 3 for downloading instructions). A partial listing is shown in Figure 1.

Design Choices

My initial impulse was simply to create a class that behaved as much like std::set as possible. But before writing any new code, I then gave consideration to other implementation choices that might improve the flexibility, efficiency, and potential uses of the class. I first considered deriving it from std::set. But there was no advantage in doing this, and it seemed contrary to the way things were done in the STL. I also considered making it a partial specialization of std::set. However, I could not see any way to make a specialization based on the contained type being an integral type. In some circumstances it's preferable to use the real std::set<int> rather than be forced to use the specialization. Moreover, the new class was more than a specialization since it added new facilities.

Lastly, I considered making it a container adapter. This would allow adapting an STL container such as std::set or std::list, depending on the requirements. After some experimentation I decided this was not appropriate for a number of reasons. But the main reason was that the resulting class would not be as compatible with std::set as I wanted. So I returned to my original impulse and created a class called range_set.

Using range_set

Template class range_set is typically used to contain elements of type int, long, or some other integral type. Iterators and member functions necessarily assume that the contained type implements operator+ and operator-. Of course, you can create or use any class as long as it implements these operators with the semantics of an integral type.

There would probably be little profit in using range_set as a simple replacement for std::set and not take advantage of the extra facilities it provides. As you might expect, range_set has member functions to add and delete ranges of values. These are called insert_range and delete_range. There are also versions of insert_range and delete_range that take a "hint" iterator. This is similar to the version of std::set::insert that takes an iterator parameter, to provide for more efficient insertions. If the hint iterator points to the place in the range_set where the change is to take place, the operation is performed in constant time rather than linear time.

In accordance with STL and C/C++ generally, ranges passed to insert_range and delete_range are "half open." That is, they are specified with an inclusive lower bound and an exclusive upper bound. Hence a.insert_range(2, 4) adds two elements, with values 2 and 3.

The forerunner to range_set was primarily provided to allow the user to enter ranges of integers as a string, such as "1-10, 20-30." So, at no extra cost, I have provided an operator<< and an operator>> for reading and writing ranges to a stream. Note that, as users are not always programmers, these ranges are inclusive or closed ranges and not the half-open ranges mentioned above. Also, for output, operator<< uses a colon (:) rather than a minus sign (-) to separate the two ends of the range. This alleviates confusion for ranges with negative values. operator>> accepts ranges with ends separated by either character.

As an example, the following code is taken from an MFC application. In this application an edit control in a dialog box allows the user to specify a set of values using the formatting provided by range_set. The conversion process is a little convoluted, but it works. To put the value of the range_set into the edit control, it is converted to a std::ostringstream object, thence to a std::string object, thence to a C-style string, thence to an MFC CString, before finally being added to the edit box!

#include <string>
#include <sstream>
#include <range_set.h>
using namespace std;
     
    .....
// Put current range in edit box
    ostringstream ostr;
    ostr << group_[current].range;
     
    const CString s = (const char *)
                 ostr.str().c_str();
    pedit->SetWindowText(s);
    .....
     
    // Get the range from edit box
    pedit->GetWindowText(s);
     
    istringstream
        istr((const char *)s);
    istr >> group_[current].range;

This code was taken from a binary file editor I wrote for Windows that allows the user to specify sets of bytes to be displayed in different colours. The complete source for this binary file editor, including range_set.h, is available for download from the web site:

http://members.tripod.com/AndrewPhillips

(range_set.h is also available from the CUJ ftp site.)

range_set allows you to create sets of very large ranges of numbers. With the standard header <limits> you can create a set containing all valid values of the contained type. There is one restriction — because of the use of half-open ranges, you cannot add the largest value to the set. The code below will create a range_set that includes all values of type int in the implementation, except the very largest.

#include <limits>
using namespace std;
     
    .....
    range_set<int> a;
    a.insert(
        numeric_limits<int>::min(),
        numeric_limits<int>::max());

For further information on using range_set member functions, iterators, etc., refer to the description of std::set that comes with your compiler, or other STL documentation.

Assertions

I have often found, when using the STL and particularly iterators, that it is easy to misunderstand the parameters to a function, or even make a simple typo that results in undefined behaviour. For example, many functions take two iterators that specify the start and end of a range. If by mistake these iterators are not for the same container, then the program will suffer anything from a benign tumour to a quick but painful death. Worse, it may cause an insidious but undetected cancer.

range_set member functions will detect most, if not all, problems of this type. All parameters passed to range_set member functions are validated using assertions. This is done fairly easily since range_set iterators necessarily know the container they are to be used with. Hence it is simple to check that an iterator is valid for a container.

Note that assertions are only activated if the macro NDEBUG is not defined. This allows you to easily track down bugs in the debug version of your software but remove the assertions in the release version.

Unlike some people, I recommend removing (most) assertions from production code. There is no purpose in leaving assertions enabled in the final product, since the real value of assertions is in tracking down the cause of bugs, not to actually detect their presence. If you have the above sorts of bugs in your production code then you are in real trouble!

Unfortunately, range_set iterators cannot perform their magic when they are used with STL algorithms. Changes to the standard library code would be necessary. It would be an enormous boon if compiler vendors provided such a debug version of the standard C++ library.

Implementation

range_set is implemented using a linked list (std::list) of half-open ranges. For example, if a range_set object contains the numbers from 10 to 19 (inclusive), just one element in the linked list is required. It stores the pair of values 10 and 20, representing the half-open range [10, 20). The large space savings possible with range_set are obviously due to the fact that not all values "stored" in a container need actually be present in it. However, there are adverse consequences. For example, a range_set object cannot be used to store objects where the caller relies on all the objects being constructed.

Another problem is that a range_set object can be slow given the wrong sort of data. In the worst case, a range_set object will be a lot slower than an std::set object. It takes time O(n) to test if a value is in the set of ranges, compared with O(log n) for the standard set, where n is the number of elements stored. However, in the best case it will take constant time. For the uses for which range_set is designed, it will typically be faster than std::set. Moreover, its space efficiency is at least as good as std::set (O(n)), and often much greater.

The time to insert and delete a value in the set of ranges is also O(n) in the worst case compared with O(log n) for std::set. This is ameliorated by the fact that, once again for appropriate uses, range_set is faster. Also, versions of insert, insert_range, and erase_range are provided which take a hint iterator, which is used as a starting point for the insertion or deletion. Proper use of these functions results in amortised constant time for the operations.

Iterators are typically dereferenced (using operator-> or operator*) to obtain a value from a container. But as mentioned above, a range_set does not actually store all of the values it represents. So for an iterator to expose all the values of the container, we need a little sleight of hand. The trick is that every iterator stores an object of the contained type, which has the value of the current container element that it designates. It is this member object that is dereferenced by operator-> and operator*.

This trick leads to an obscure difference between range_set and std::set. If you obtain the address of the element that an iterator references, the addresses will be different for different iterators even if they point to the same container element. This is demonstrated by the following code. The assertion on the last line fails for a range_set object, but not for a std::set object:

range_set<int>::iterator i1, i2;
i1 = a.begin();
i2 = a.begin();
const int *p1, *p2;
p1 = i1.operator->();
p2 = i2.operator->();
assert(p1 == p2);

Another consequence is that iterators can be much larger than expected if the contained type is a large class. This is not a problem for small built-in types like int.

Conclusion

Template class range_set blends well with STL because it is designed to be compatible with std::set. It is also easy to use if you are already familiar with STL containers. In some situations it could even be used as a drop-in replacement for std::set, since it provides all the same functions, iterators etc. In general, range_set is not a replacement for std::set, but in particular cases it is much more efficient. It is also easier to use because it provides functions to deal with ranges of values. On the other hand, it can also be very inefficient. So it is important to know when its use is appropriate.

Fortunately, it is usually easy to decide when to use it. range_set should be used when you need to store a set of integral values (of type int, long, etc.), and there are just a few values to be stored or a few ranges of contiguous values. When these ranges are large is when range_set really shines, providing big savings in memory and speed.

Andrew Phillips has a B.Sc. in Computer Science from the University of Sydney. He has been using C and C++ professionally for 15 years. His interests include improving the maintainability, testability, and debugability of code and user interface design. He is the senior programmer with Encom Technology and can be reached at [email protected] or [email protected].

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.