Exception Safety & Containers

By Laurence Marrie, March 01, 2004

Laurence examines how to provide more balance between exception safety and efficiency than the commonly recommended "copy, modify, and swap" idiom.

March 04: Exception Safety & Containers

All container member functions in the C++ Standard Library are "exception safe," albeit to varying degrees. In this article, I examine ways of increasing the level of exception safety of multi-element insertions into node-based containers, such as std::set. These methods sometimes provide a more practical balance between exception safety and efficiency than the commonly recommended "copy, modify, and swap" idiom.

By "exception safe," I mean that an operation offers one of three types of contracts [1]:

A basic contract guarantees that an operation will not leak resources if it is interrupted by an exception; the program remains in a valid, but not necessarily predictable, state.
A strong contract builds on the basic guarantee by also promising that changes to the program state will be reversed before an exception is propagated. This is more colorfully described as "commit-or-rollback" semantics.
A no-throw contract guarantees that an operation will always complete — callers never need to worry about having to handle exceptions.

Most operations on Standard Library containers give the strong guarantee, and some even offer the no-throw guarantee. A few (such as iterator range insertions) provide only the lowest level of exception safety, namely the basic guarantee. In those cases, it is not possible to give a better guarantee without an accompanying constant factor reduction in efficiency. By design, the C++ Standard Library places a higher priority on performance than on meeting the two higher levels of exception safety. Fortunately, the basic guarantee is frequently sufficient for error handling. Moreover, you can usually acquire the strong guarantee yourself, although the performance cost is high at times.

A Motivating Problem

The std::set member function for multi-element (iterator range) insertion offers only the basic guarantee. If a range iterator or the container's allocator, comparison object, or element copy constructor throws an exception, then some elements may still be inserted. Suppose that due to a program invariant you require the strong guarantee: Either all of the elements in a range must be inserted into the container, or else none must be added.

The usual approach for migrating from the basic to the strong guarantee is to "copy, modify, and swap" [2]. First you make a copy of the container, then you insert the range elements into the copy, and finally you use an operation offering the no-throw guarantee to exchange the contents of the copy with the contents of the original. The main idea is that the destructor of a temporary object can serve as a rollback mechanism for basic operations.

For instance, suppose that c is the container (instantiated with some arbitrary type, T) and that a range of Ts is delimited by two iterators first and last. The idiomatic method then looks like this:

<b>std::set<T> rollback_temp( c );
rollback_temp.insert( first, last );
c.swap( rollback_temp );
</b>

This approach is straightforward and works for nearly all cases — but it also suffers from some significant drawbacks.

First, existing iterators in the container become invalid. This places an unfortunate restriction on how the container can be used. Second, copy, modify, and swap is not a general solution for sorted associative containers (such as std::set). The idiom relies on the assumption that the container's swap operation provides the no-throw guarantee. This assumption is likely not valid if the comparison object has an assignment operator that may throw an exception [3]. (The same issue does not arise in the case of allocator objects; these are never swapped.)

Finally, making a temporary copy may incur significant time overhead if the container already holds a large number of elements. This is the most serious issue. If N is the number of elements already in the container, and M is the number of elements in the range, then the worst-case time complexity of the idiomatic method is O(N+M log(N+M)). By comparison, the complexity of range insertion with the basic guarantee is merely O(M log(N+M)). This means that if M is small relative to N, then the runtime for copy, modify, and swap is related more to the (large) size of the container than to the (small) size of the range.

In practice there are alternatives to copy, modify, and swap when a container is node based. These alternatives neatly address all of the concerns just raised.

The basic idea is simple, but you can implement it in several ways. An iterator range insertion is simply a sequence of single-element insertions. Instead of making a copy of the entire container, you can maintain a rollback list. As new elements are added to the container, you also add them to the rollback list. If an exception occurs, simply traverse the list and remove the corresponding elements from the container.

Listing 1 is one way to implement this. Since the container is no longer copied, existing iterators cannot be invalidated. You also do not need to worry about whether the comparison object has an assignment operator that may throw an exception. You do need to be careful to handle exceptions involving the rollback list. Suppose that an element is successfully added to the container, but that the returned iterator cannot be appended to the deque. In this situation, the inner catch block traps the out-of-memory exception and uses the iterator to remove that one element from the container. Afterward, the exception passes to the outer catch block. The outer catch block is responsible for walking the rollback list and removing the remaining range elements from the container. All other exceptions are caught by this block directly.

The cost of maintaining the rollback list is linear in the size of the range, so the time complexity is O(M log(N+M)). This solution is not as elegant as the copy, modify, and swap idiom, but it is asymptotically superior.

Intrusive Approaches

Using std::deque to maintain the rollback list obviously incurs a cost in both time and space. You can reduce this overhead by having the container manage the rollback list internally. The remaining alternatives to copy, modify, and swap are intrusive in that they involve rewriting the container's range insertion algorithm, and in that they may require modification of its node type.

First look at a typical existing implementation. As with std::map, std::multimap, and std::multiset, the underlying implementation of std::set is likely to be a balanced binary search tree. In pseudocode, iterator range insertion looks like Listing 2. Next, verify that this provides the basic guarantee. As before, there are four potential sources of exceptions:

The comparison function.
The memory allocator.
The element copy constructor.
The range iterators.

The comparison function is used when searching the tree, and the memory allocator is called to create new nodes. The iterators, of course, are used for retrieving the elements in the range. It is easy to see that if any of these throws an exception, then the tree remains in a consistent state. The final potential source of exceptions needs to be handled explicitly — if the element copy constructor throws, then memory for the failed node must be released before the exception is propagated.

Given a base implementation like that just presented, one intrusive method for providing the strong guarantee is to augment each tree node with an additional rollback pointer. As new nodes are added to the tree, they are also stored in a singly linked list formed using the additional pointers. Nodes are thus shared between the rollback list and the tree. If an exception occurs, you walk the list and remove its member nodes from the tree. The intrusive solution in Listing 3 is a straightforward modification of the typical existing implementation.

It is not difficult to show that this meets the strong guarantee. Unlike the solution involving std::deque, no new potential sources of exceptions have been introduced — the additional code uses nonthrowing operations only. Second, the outer catch block traps exceptions from all four sources, and will remove any newly added nodes from the tree. Hence, this does provide the desired commit-or-rollback semantics.

The time complexity is O(M log(N+M)), which again is superior to the bound for the copy, modify, and swap idiom. The advantage of the intrusive method is that it is extremely time efficient; maintaining the rollback list amounts to only 2M+1 additional pointer assignments.

The disadvantage of the intrusive method is that, in general, memory requirements are increased by a constant factor [4]. Memory allocators typically distribute memory in fixed-size blocks; byte requests are rounded up to the nearest size. (In my environment, for example, small memory requests are rounded up to multiples of 8 bytes.) Sometimes, the additional pointer causes node allocation requests to be answered with a slightly larger sized block, but a de facto increase may not occur in all cases. In such situations, this method offers the best of both worlds: high performance without an increase in memory footprint.

Can You Eliminate the Space Overhead Entirely?

If time is less of a concern than space, then it is worthwhile to consider methods that do not impose a space penalty. To start with a simplifying assumption, suppose that the comparison function provides the no-throw guarantee. (This is typically the case, and I expect that in practice this can nearly always be assumed.) This leads to the next solution. First you insert the range elements into a temporary tree. If this fails, then rollback constitutes calling the tree's destructor. If this succeeds, then you can move the temporary tree nodes into the existing tree without fear of exceptions; see Listing 4.

Since the motivating example specifies std::set, you need to test each range element for uniqueness. Accordingly, this solution involves more searching than the other methods. To test for uniqueness you need to search both the existing and temporary trees, and later you must search the existing tree again when moving over the nodes from the temporary tree. Compared to the previous intrusive solution, there is also a time cost associated with constructing and walking the temporary tree. Since the notation hides the constant factors, however, the time complexity of this solution is similarly O(M log(N + M)).

Now suppose that you still want space efficiency, but refuse to require that the comparison function provide the no-throw guarantee. Does an equally space-efficient solution still exist? The answer is "yes," and it is instructive to see how this slight change in interface drastically increases the complexity of implementation.

The idea is to proceed as before, and first insert the range elements into a temporary tree. If this fails, then rollback again constitutes calling the temporary tree's destructor. If this succeeds, then you are faced with the task of merging two balanced search trees in a strongly exception-safe manner. At this point the range has been sorted, and the only remaining source of exceptions is the comparison function. Depending on the details of the tree implementation, a merge operation between two trees of sizes A and B (with B<A) can be performed in O(B log(A/B)), which is optimal [5],[6]. A strongly exception-safe version of the tree merge algorithm essentially involves partitioning each tree into a linked set of smaller trees. If the comparison function throws an exception during the partitioning step, then the original tree is reconstructed via a series of strongly exception-safe concatenation operations. Otherwise, as a final step, concatenation is used to interleave the partitions together. Like our other solutions, the entire procedure can be bounded as O(M log(N+M)). However, I omit the complete details of the algorithm.

Changing Containers

Imagine substituting std::vector for std::set in the first (motivating) example. Is maintaining a rollback list still an alternative to copy, modify, and swap? Unfortunately, it isn't because I have been relying on certain features of std::set:

Single-element insertions are strongly exception safe.
Insertions don't invalidate pointers to existing elements.
Erasures provide the no-throw guarantee.

Insertions into std::vector invalidate iterators, which means that the second feature is missing. This property is also missing from std::deque, unless all insertions occur at the front or the back of the sequence. As with std::vector, single-element insertions and erasures are not strongly exception safe if the element type can throw an exception when it is copied. Opportunities to use rollback lists with std::deque are therefore likely limited to situations where the std::stack or std::queue adaptors are appropriate. However, maintaining a rollback list is always an alternative to copy, modify, and swap when the container is std::set, std::multiset, std::map, std::multimap, or std::list. The technique is also mostly applicable to the hash table containers proposed for inclusion in a future revision of the C++ Standard Library. [7]

Techniques in Action

The first alternative (involving std::deque) operates within the existing framework of the Standard Library, and so it is immediately applicable to any code base. For simplicity, the example code was wrapped in a free-standing function, but in practice a class can yield a more powerful interface:

<b>strong_insert rollback_temp( c );
rollback_temp.insert( first, last );
rollback_temp.commit();</b>

Similar to the copy, modify, and swap idiom, the idea is to have the class destructor perform a rollback unless a method is called to signal that the updating finished successfully. This interface actually allows for insertion and rollback of multiple ranges across one or more containers. However, unlike copy, modify, and swap, the std::deque alternative does not support rollback of intermixed insertions and erasures.

To adopt one of the intrusive methods, you first need to refactor the source code for an existing container into a new type (say, strong_set) [8]. The intrusive techniques are actually just as powerful as the copy, modify, and swap idiom — both methods can be extended to support rollback of intermixed insertions and deletions. Consider the method of augmenting tree nodes with an additional rollback pointer, for example. You can accommodate erasures by maintaining a second rollback list. Each time an element is inserted, you add its node to the primary rollback list, as before. Each time an element is erased, you remove its node from the tree but do not deallocate it. Instead, you prepend it to an erasure linked list formed using one of the node's child pointers. In addition, you use the node's parent pointer to record the node's successor in the tree. Unless a rollback is required, you finish by walking the erasure list and deallocating each node. Otherwise, when an exception occurs the rollback procedure constitutes a two-step process. First, you traverse the erasure list and add its nodes back into the tree (the parent pointers let you avoid searching for the insertion point). Afterward, you remove each of the nodes in the primary rollback list from the tree, and deallocate them.

Each of the proposed alternatives to copy, modify, and swap offers a slightly different tradeoff between space and time efficiency, and also implementation difficulty. A general guideline is to choose the simplest solution that meets your design, performance, and maintenance goals.

Final Thoughts

Although my focus has been limited to a specific case, the context can be viewed somewhat more broadly. To acquire the strong guarantee for a throwing sequence of updates, you need to provide for the three fundamental operations: begin, commit, and rollback. The copy, modify, and swap idiom is one well-known method. An alternative approach (for node-based containers) is to keep track of each operation performed, along with enough information to undo it. Compared to copy, modify, and swap, this promises asymptotically better performance, but additionally requires that undo operations are both possible and have the no-throw guarantee. Further, you have to be careful to manage the rollback list (or lists) in an exception-safe manner. Coupling the rollback list nodes with the container nodes can be helpful in meeting this requirement. Using stronger preconditions (such as requiring that client-supplied objects provide the no-throw guarantee) may also ease the burden considerably.

References

[1] Abrahams, David. "Exception-Safety in Generic Components," http://www.boost.org/more/generic_exception_safety.html.

[2] Sutter, Herb. Exceptional C++. Addison-Wesley Longman, 2000, pp.60.

[3] By obtaining storage for the comparison object from an allocator, the container could swap pointers rather than objects (and thus provide the no-throw guarantee). Standard C++ does not require implementors to use this method, however.

[4] You may be able to offset the increase in node size. A typical red-black tree implementation of std::set requires three pointers and one color bit per node. Researchers report that compacting the color bits into the parent pointers increases runtimes by less than 3 percent, yet (due to alignment issues) typically reduces the space overhead of each node by the size of one pointer; see Hervé Brönnimann, Frédéric Cazals, and Marianne Durand. "Randomized Jumplists: A Jump-and-Walk Dictionary Data Structure," STACS, 2003.

[5] Red-black trees (with parent pointers) are sufficient, for example.

[6] Moffat, Alistair, Ola Petersson, and Nicholas C. Wormald. "Sorting and/by Merging Finger Trees," ISSAC, 1992.

[7] According to the proposal, the containers are node-based and insertions invalidate iterators but not pointers to elements. A rollback list can be used provided that it stores pointers (rather than iterators) and the user-supplied hashing and comparison objects provide the no-throw guarantee. See Matthew Austern. "A Proposal to Add Hash Tables to the Standard Library (revision 4)," http://std.dkuug.dk/jtc1/sc22/wg21/docs/papers/2003/n1456.html.

[8] Review your license first! One standard library implementation with a permissive license is publicly available from http://www.stlport.org/.

Laurence Marrie is a software developer at a financial institution in Toronto. He can be contacted at [email protected].

1 2 3 4 5 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Exception Safety & Containers

A Motivating Problem

Intrusive Approaches

Can You Eliminate the Space Overhead Entirely?

Changing Containers

Techniques in Action

Final Thoughts

References

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Exception Safety & Containers

A Motivating Problem

Intrusive Approaches

Can You Eliminate the Space Overhead Entirely?

Changing Containers

Techniques in Action

Final Thoughts

References

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content