Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

A Container for a Set of Ranges


June 1999/A Container for a Set of Ranges

A Container for a Set of Ranges

Andrew Phillips

If you need to represent an ordered list of ranges, the best data structure is probably somewhere between a list and a set.


This article describes a template class, called range_set, that is useful for efficiently representing large sets of integers as an ordered set of ranges. Aside from a few restrictions on the contained type, it is completely compatible with the STL container std::set. It furthermore has facilities for adding and deleting ranges of values to the container.

In the right circumstances, this class can provide large benefits over std::set in memory usage. Lookup can even be faster when you have a set of integers which is mainly composed of large "clumps," or contiguous ranges of values. Since implementing the precursor of range_set, I have found that these circumstances tend to occur surprisingly often.

For example, I wrote the original C++ class on which range_set was based to keep track of the selection of up to two billion elements in a custom list-box control. Using the list-box user interface (clicking, dragging, and shift-clicking) the user can easily select a few individual elements, a few contiguous chunks of them, or even the whole lot. Unless the user decides to do something unusual, like selecting every second element (which for two billion elements would literally take a lifetime), this class is much more space efficient than std::set.

The original class, created many years ago, did not use templates because it was written before compilers implemented them, and before I had even heard of the STL. It worked only with elements of type int. So I decided to "templatize" it by changing all occurrences of int to a template parameter. On further consideration, I realised that the class was really a cut-down version std::set<int> with a few additions for adding and deleting ranges of values and converting ranges to and from C-style strings. There's nothing I would have liked more than to replace my original class with an STL container. Unfortunately, for my requirements there was no STL container that was as convenient or efficient. Instead I decided to create a new class that was as compatible with std::set as possible.

Although it took a little more work than I expected, all that was really necessary was to rearrange some of the code from the original class and to create iterators and a few utility functions, such as upper_bound, lower_bound, count, etc. The complete class is implemented in the file range_set.h, and is available online (see p. 3 for downloading instructions). A partial listing is shown in Figure 1.

Design Choices

My initial impulse was simply to create a class that behaved as much like std::set as possible. But before writing any new code, I then gave consideration to other implementation choices that might improve the flexibility, efficiency, and potential uses of the class. I first considered deriving it from std::set. But there was no advantage in doing this, and it seemed contrary to the way things were done in the STL. I also considered making it a partial specialization of std::set. However, I could not see any way to make a specialization based on the contained type being an integral type. In some circumstances it's preferable to use the real std::set<int> rather than be forced to use the specialization. Moreover, the new class was more than a specialization since it added new facilities.

Lastly, I considered making it a container adapter. This would allow adapting an STL container such as std::set or std::list, depending on the requirements. After some experimentation I decided this was not appropriate for a number of reasons. But the main reason was that the resulting class would not be as compatible with std::set as I wanted. So I returned to my original impulse and created a class called range_set.

Using range_set

Template class range_set is typically used to contain elements of type int, long, or some other integral type. Iterators and member functions necessarily assume that the contained type implements operator+ and operator-. Of course, you can create or use any class as long as it implements these operators with the semantics of an integral type.

There would probably be little profit in using range_set as a simple replacement for std::set and not take advantage of the extra facilities it provides. As you might expect, range_set has member functions to add and delete ranges of values. These are called insert_range and delete_range. There are also versions of insert_range and delete_range that take a "hint" iterator. This is similar to the version of std::set::insert that takes an iterator parameter, to provide for more efficient insertions. If the hint iterator points to the place in the range_set where the change is to take place, the operation is performed in constant time rather than linear time.

In accordance with STL and C/C++ generally, ranges passed to insert_range and delete_range are "half open." That is, they are specified with an inclusive lower bound and an exclusive upper bound. Hence a.insert_range(2, 4) adds two elements, with values 2 and 3.

The forerunner to range_set was primarily provided to allow the user to enter ranges of integers as a string, such as "1-10, 20-30." So, at no extra cost, I have provided an operator<< and an operator>> for reading and writing ranges to a stream. Note that, as users are not always programmers, these ranges are inclusive or closed ranges and not the half-open ranges mentioned above. Also, for output, operator<< uses a colon (:) rather than a minus sign (-) to separate the two ends of the range. This alleviates confusion for ranges with negative values. operator>> accepts ranges with ends separated by either character.

As an example, the following code is taken from an MFC application. In this application an edit control in a dialog box allows the user to specify a set of values using the formatting provided by range_set. The conversion process is a little convoluted, but it works. To put the value of the range_set into the edit control, it is converted to a std::ostringstream object, thence to a std::string object, thence to a C-style string, thence to an MFC CString, before finally being added to the edit box!

#include <string>
#include <sstream>
#include <range_set.h>
using namespace std;
     
    .....
// Put current range in edit box
    ostringstream ostr;
    ostr << group_[current].range;
     
    const CString s = (const char *)
                 ostr.str().c_str();
    pedit->SetWindowText(s);
    .....
     
    // Get the range from edit box
    pedit->GetWindowText(s);
     
    istringstream
        istr((const char *)s);
    istr >> group_[current].range;

This code was taken from a binary file editor I wrote for Windows that allows the user to specify sets of bytes to be displayed in different colours. The complete source for this binary file editor, including range_set.h, is available for download from the web site:

http://members.tripod.com/AndrewPhillips

(range_set.h is also available from the CUJ ftp site.)

range_set allows you to create sets of very large ranges of numbers. With the standard header <limits> you can create a set containing all valid values of the contained type. There is one restriction — because of the use of half-open ranges, you cannot add the largest value to the set. The code below will create a range_set that includes all values of type int in the implementation, except the very largest.

#include <limits>
using namespace std;
     
    .....
    range_set<int> a;
    a.insert(
        numeric_limits<int>::min(),
        numeric_limits<int>::max());

For further information on using range_set member functions, iterators, etc., refer to the description of std::set that comes with your compiler, or other STL documentation.

Assertions

I have often found, when using the STL and particularly iterators, that it is easy to misunderstand the parameters to a function, or even make a simple typo that results in undefined behaviour. For example, many functions take two iterators that specify the start and end of a range. If by mistake these iterators are not for the same container, then the program will suffer anything from a benign tumour to a quick but painful death. Worse, it may cause an insidious but undetected cancer.

range_set member functions will detect most, if not all, problems of this type. All parameters passed to range_set member functions are validated using assertions. This is done fairly easily since range_set iterators necessarily know the container they are to be used with. Hence it is simple to check that an iterator is valid for a container.

Note that assertions are only activated if the macro NDEBUG is not defined. This allows you to easily track down bugs in the debug version of your software but remove the assertions in the release version.

Unlike some people, I recommend removing (most) assertions from production code. There is no purpose in leaving assertions enabled in the final product, since the real value of assertions is in tracking down the cause of bugs, not to actually detect their presence. If you have the above sorts of bugs in your production code then you are in real trouble!

Unfortunately, range_set iterators cannot perform their magic when they are used with STL algorithms. Changes to the standard library code would be necessary. It would be an enormous boon if compiler vendors provided such a debug version of the standard C++ library.

Implementation

range_set is implemented using a linked list (std::list) of half-open ranges. For example, if a range_set object contains the numbers from 10 to 19 (inclusive), just one element in the linked list is required. It stores the pair of values 10 and 20, representing the half-open range [10, 20). The large space savings possible with range_set are obviously due to the fact that not all values "stored" in a container need actually be present in it. However, there are adverse consequences. For example, a range_set object cannot be used to store objects where the caller relies on all the objects being constructed.

Another problem is that a range_set object can be slow given the wrong sort of data. In the worst case, a range_set object will be a lot slower than an std::set object. It takes time O(n) to test if a value is in the set of ranges, compared with O(log n) for the standard set, where n is the number of elements stored. However, in the best case it will take constant time. For the uses for which range_set is designed, it will typically be faster than std::set. Moreover, its space efficiency is at least as good as std::set (O(n)), and often much greater.

The time to insert and delete a value in the set of ranges is also O(n) in the worst case compared with O(log n) for std::set. This is ameliorated by the fact that, once again for appropriate uses, range_set is faster. Also, versions of insert, insert_range, and erase_range are provided which take a hint iterator, which is used as a starting point for the insertion or deletion. Proper use of these functions results in amortised constant time for the operations.

Iterators are typically dereferenced (using operator-> or operator*) to obtain a value from a container. But as mentioned above, a range_set does not actually store all of the values it represents. So for an iterator to expose all the values of the container, we need a little sleight of hand. The trick is that every iterator stores an object of the contained type, which has the value of the current container element that it designates. It is this member object that is dereferenced by operator-> and operator*.

This trick leads to an obscure difference between range_set and std::set. If you obtain the address of the element that an iterator references, the addresses will be different for different iterators even if they point to the same container element. This is demonstrated by the following code. The assertion on the last line fails for a range_set object, but not for a std::set object:

range_set<int>::iterator i1, i2;
i1 = a.begin();
i2 = a.begin();
const int *p1, *p2;
p1 = i1.operator->();
p2 = i2.operator->();
assert(p1 == p2);

Another consequence is that iterators can be much larger than expected if the contained type is a large class. This is not a problem for small built-in types like int.

Conclusion

Template class range_set blends well with STL because it is designed to be compatible with std::set. It is also easy to use if you are already familiar with STL containers. In some situations it could even be used as a drop-in replacement for std::set, since it provides all the same functions, iterators etc. In general, range_set is not a replacement for std::set, but in particular cases it is much more efficient. It is also easier to use because it provides functions to deal with ranges of values. On the other hand, it can also be very inefficient. So it is important to know when its use is appropriate.

Fortunately, it is usually easy to decide when to use it. range_set should be used when you need to store a set of integral values (of type int, long, etc.), and there are just a few values to be stored or a few ranges of contiguous values. When these ranges are large is when range_set really shines, providing big savings in memory and speed.

Andrew Phillips has a B.Sc. in Computer Science from the University of Sydney. He has been using C and C++ professionally for 15 years. His interests include improving the maintainability, testability, and debugability of code and user interface design. He is the senior programmer with Encom Technology and can be reached at [email protected] or [email protected].


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.