Dr. Dobb's | Generic<Programming>:Typed Buffers (I)

Generic:Typed Buffers (I)

You thought buffers were an uninteresting subject? Try to write a buffer that's at the same time generic and efficient. Hint: efficient buffers are easy, and generic buffers are easy, but buffers that are both generic and efficient are difficult. This article is the first of a two-part treatment of typed buffers, heavy-duty components that can replace std::vector in performance-demanding applications.

August 01, 2001
URL:http://www.drdobbs.com/generictyped-buffers-i/184403791

August 2001 C++ Experts Forum/Generic<Programming>

Imagine the installment of "Generic<Programming>" you're about to read starting like so: "This article treats memory buffers in C++."

As you hastily close the browser, if you kept your ear attentive, you would hear the subtle noise of tens of thousands of other mouse clicks doing the same. Because who'd be interested at all in an article that treats such a trivial subject as memory buffers?!

This article indeed treats memory buffers in C++, but with two twists: first, the buffers are generic, which means they can contain typed data (as opposed to raw bytes). Second, the buffers we'll talk about are as efficient as their hosted type and the host operating system allows, in every aspect — allocation strategy, as well as data manipulation.

As you'll soon see, writing a generic buffer is a cute exercise on templates. Writing an efficient buffer is not too complicated of a task either. Writing a buffer that is exception-safe is more challenging, but still not too hard, especially if you've read the book on exception safety [1]. But writing a buffer that's both generic and efficient is like a long climb up a steep hill. As it often happens with C++, the scenery you enjoy at the top of the hill is well worth the effort — if only you could ignore that thing you just stepped in.

The Twilight Zone

Every now and then, someone posts to the Usenet newsgroup comp.lang.c++.moderated a question like: Why doesn't auto_ptr offer you the chance to use delete[] in the destructor instead of delete? The discussion follows a specific pattern:

"You should stay away from C-style arrays and use std::vector, which is efficient and all."

"Well, std::vector is not efficient enough for what I'm trying to do."

"Why isn't it?" etc.

The thing is, std::vector is indeed a very well designed class for handling contiguous sequences of objects. When I first saw std::vector's elegant design, I wasn't compelled at all to throw it away and define my own (a compulsion felt when I saw, for example, certain MFC artifacts). However, std::vector does have some inefficiencies that can seriously bother certain users:

Unnecessary data initialization. Often, you need to create an appropriately sized vector of a primitive type (such as char) and pass it to a low-level C function that fills it with data read from a socket or file. The problem is that, although you don't need it, the whole vector will be initialized upon construction or resizing. If you have to interface with C functions, there is no way in which you can avoid this overhead.
Exponential growth. Standard vectors sport fast (constant time on average) growth. This implies that std::vector grows in multiplicative hops — most implementations double the allocated size whenever they need to grow automatically. If you want to append an object to a vector containing one million objects, a vector will allocate for a short time enough space to accommodate three million objects; that might not be what you would prefer.
"Imperialistic" memory allocation strategy. Standard vectors never shrink; they only grow. If you collected three million data samples and you decide to keep one million samples after some interpolation, you'll have slack space worth two million objects. The recommended idiom for such cases is to copy your vector into a fresh new vector and then swap the two vectors:

std::vector<double> data;
... work with data ...
// Shrink data
std::vector<double>(data.begin(), data.end()).swap(data);

The idiom is less than optimal because you must copy the data. Worse, even this idiom is not 100% guaranteed to make economic use of memory because std::vector can always allocate more memory than needed. It would be better if you could shrink the vector "in-place," that is, just tell the allocator that it can use the memory at the end of your vector. Many memory allocators allow you to do that.

Inefficient object moving. std::vector does not differentiate between copying and moving. To an std::vector, a move is a copy followed by the destruction of the source. So if you have a vector of strings and reallocate it, there's no way in which you can write a primitive that moves a couple of pointers efficiently to new memory and have std::vector use it. Every single string will be unnecessarily copied to a new string. This policy is a big source of inefficiencies for all those who are not members of CFC [2], and an often-quoted reason for which std::vector< std::vector<T> > is a no-no.
Further inefficiencies in data copying and moving. Some STL implementers don't differentiate between vectors of plain-old-data types and vectors of elaborate types. (Metrowerks is a notable exception.) This means that they use the most generic constructs to copy data, such as for loops. You'd imagine that a good compiler would replace a loop that copies a range of integers with an efficient memcpy call. Well, not quite. I tested two compilers known for generating efficient code: Microsoft Visual C++ and Metrowerks CodeWarrior. In full speed optimization mode, they both generated significantly slower code for copying arrays of integers using straight loops as compared to the equivalent memcpy call. So if you want performance, you may need to be sure the call to memcpy is right in your source code.
Unneeded exception safety. Many std::vector implementations (but not all) assume that the constructor, copy constructor, and assignment operator of the stored type might throw. This is not the case for primitive types and many other types. The generated code for these types might be unnecessarily bigger and slower.

In many applications, some or all of these problems are not annoying at all. But when you start asking for heavy-duty performance, the attractiveness of std::vector might decrease and you might be looking for a structure that is more optimized and gives you more control. The problem is that you won't find any container in the C++ Standard below std::vector — except for the C-style string. It's either Cadillac or Geo.

Proof? Well, the proof's right there in the pudding: STL uses contiguous buffers in at least three places (aside from std::vector itself): std::basic_string, std::deque's chunk management, and std::deque's chunks themselves. However, no STL implementation I've seen uses vector as a back-end for these structures. If std::vector is the one-stop-shop for your contiguous data needs, then it must be said that STL implementers shop at a specialty grocer.

There is an interstice, a twilight zone between std::vector and the C-style string. We target this niche. We will develop together a contiguous buffer that:

is generic — can store sequences of any type
is familiar — supports a subset of std::vector's syntax and semantics
offers total control — offers fine-grained primitives in addition to higher-level functions
generates pedal-to-the-metal code for primitive types and select user-defined types
allows user-controlled optimizations
supports advanced allocation facilities, such as in-place expansion and fast reallocation
most importantly, can be used as a back-end for higher-level contiguous containers — in particular, you can use it to implement vectors, strings, or deques

buffer: First Blush

Let's define an interface for a buffer class template. The buffer holds only two pointers — the beginning and the end of the controlled memory chunk. To a buffer, there's no difference between size and capacity. We thus start from std::vector's interface and eliminate functions that refer to capacity. We are left with the following set of member functions:

template <class T, class Allocator = std::allocator<T> >
class buffer 
{
public:
    ... all of std::vector's types and functions, except:
    // size_type capacity() const;
    // void reserve(size_type n);
private:
    T* beg_;
    T* end_;
};

Interestingly, although buffer has no notion of capacity, it can implement all of std::vector's primitives except capacity and reserve, with the amendment that buffer cannot satisfy the performance requirements of all those functions. For example, buffer<T>::push_back has O(N) complexity, while std::vector<T>::push_back has O(1) complexity if mediated over a large number of calls. (That is what the Standard calls "amortized constant" complexity.) As you'll see down the road, we'll improve buffer<T>::push_back's performance in certain cases, while still not supporting the notion of capacity.

Implementing buffer's interface is not very complicated; the two things that require attention are exception safety [3] and properly destroying the objects. The standard functions std::uninitialized_copy and std::uninitialized_fill are two very helpful primitives.

To allow the user to allocate a buffer without initializing it, we need a special constructor and a couple of helper functions. Then, we need a primitive grow_noinit that grows the buffer without calling the constructors. Consequently, we need the complementary function shrink_nodestroy that shrinks the vector without calling the destructors, and finally, the slightly redundant clear_nodestroy, which clears the buffer and deallocates the memory without invoking the destructors. Here they are:

template <class T, class Allocator = std::allocator<T> >
class buffer 
{
    ... as above ... 
public:
    enum noinit_t { noinit };
    buffer(size_type n, noinit_t, 
        const Allocator& a = Allocator());
    void grow_noinit(size_type growBy);
    void shrink_nodestroy(size_type shrinkBy);
    void clear_nodestroy();
};

This extended interface gives you total control over the buffer and the data that's contained in it. Beware, however, of naively using the extension functions. Consider, for example, you use grow_noinit as follows:

void Fun()
{
    buffer<Widget> someWidgets;
    ...
    // add space for 10 Widgets
    someWidgets.grow_noinit(10);
    // Initialize the Widgets
    ConstructWidgets(
        someWidgets.end() - 10, someWidgets.end());
    ...
}

The problem is that if the construction of the 10 Widgets fails in any way, all things are not good. When Fun returns, someWidgets' destructor will diligently try to destroy all of its contents — it doesn't track constructed and non-constructed Widgets because, again, unlike std::vector, buffer doesn't have the notion of capacity. If there is still any uninitialized memory in the buffer, the effect of applying Widget's destructor to that memory is, obviously, undefined.

With these caveats in mind, it seems like we have a good start. Let's see now how we can optimize buffer.

Type Traits

One key technique for optimizing generic code is to infer information about the types on which the generic code operates. Then, the generic code can dispatch work at compile time to specialized routines that make specific assumptions about the types on which they operate.

For example, in our case, an important piece of information is whether the copy constructor of buffer's hosted type throws an exception or not. For a type that doesn't throw an exception upon copying, the code is simpler as it doesn't have to take care of exception safety. In addition, some compilers generate better code if try blocks are not used.

Type traits are a known technique for deducing information about types. Boost [4] has a library that implements type traits, as well as Loki [5]. (With buffer, we will use a type traits mechanism that's slightly different from those implemented in both Boost and Loki, suggested to me by Vesa Karvonen via email.)

Let's figure out code for deducing whether the copy constructor of a type might throw an exception or not. Quite frankly, figuring this out for any type is not possible in current C++. However, we can make a few steps and let the user help us when they want optimization.

You don't have to be Sherlock Holmes to deduce that the copy constructor of any primitive type cannot throw. Therefore, a conservative assumption is any type's copy constructor might throw an exception, except for primitive types. Here's the corresponding code:

namespace TypeTraits
{
    // Enumerate all primitive types
    typedef TYPELIST_14(
        const bool,
        const char,
        const signed char, 
        const unsigned char,
        const wchar_t,
        const short int, 
        const unsigned short int,
        const int,
        const unsigned int,
        const long int, 
        const unsigned long int,
        const float,
        const double,
        const long double)
    PrimitiveTypes;

    template <typename T> struct IsPrimitive
    {
        enum { value =
             Loki::TL::IndexOf<PrimitiveTypes,
             const T>::value >= 0 };
    };

    template <typename T> struct IsPrimitive<T*>
    {
        enum { value = true };
    };

    template <typename T> struct CopyThrows
    {
        enum { value = !IsPrimitive<T>::value };
    };
}

To be more concise, the code above makes use of two amenities provided by Loki: typelists and IndexOf. Typelists allow you to create and manipulate lists of types, and Loki::TL::IndexOf finds an individual type in a typelist and returns its index. The index is negative if the type is not found in the typelist. Finally, TypeTraits::CopyThrows<T>::value contains the needed information.

The mechanism through which type information is inferred is considerably flexible. Imagine you define the following type in an application:

struct Point
{
    int x;
    int y;
    ... utility functions ...
};

The type Point above is not primitive, but it also doesn't throw an exception upon copying. You can communicate that to our type traits mechanism. All you have to do is to reopen the TypeTraits namespace and put an explicit instantiation of CopyThrows in there:

// In file "Point.h", after Point's definition
namespace TypeTraits
{
    template <> struct CopyThrows<Point>
    {
        enum { value = false };
    };
}

Better yet, you can specify CopyThrows for an entire category of types via partial template specialization. For example, recall the standard complex type, std::complex<T>. You can instantiate std::complex with primitive arithmetic types, but also with a user-defined numeric type, such as Rational or BigInt. Now, because copying an std::complex<T> object consists of copying two T objects (the real and imaginary part), it follows that std::complex<T> has the same CopyThrows trait as T. Here's how you can express that:

namespace TypeTraits
{
    template <class T> struct CopyThrows< std::complex<T> >
    {
        enum { value = CopyThrows<T>::value };
    };
}

Let's get back to buffer. How does buffer use the CopyThrows information? Dispatching upon a Boolean value at compile time is easy using the Int2Type class template, described in [5] and [6]. To recap, Int2Type has the deceptively simple definition:

template <int v>
struct Int2Type
{
    enum { value = v };
};

Here, for example, is how buffer's constructor uses Int2Type to dispatch to an exception-safe or an exception-numb initialization function:

template <class T, class Allocator =
     std::allocator<T> >
class buffer : private Allocator
{
private:
    enum { copyThrows =
        TypeTraits::CopyThrows<T>::value != 0 };

    // Exceptionless initialization
    void Init(size_type n, const T& value,
        Loki::Int2Type<false>)
    {
        ... 
    }

    // Initialization in the presence of exceptions
    void Init(size_type n, const T& value,
        Loki::Int2Type<true>)
    {
        ...
    }

public:
    explicit buffer(size_type n,
        const T& value = T(),
        const Allocator& a = Allocator())
    {
        Init(n, value, 
            Loki::Int2Type<copyThrows>());
    }
};

Other sensible pieces of information about a type that would be interesting to buffer would be:

Whether the type is MemCopyable, that is, if copying an object is the same as memcpy'ing that object's bits. Obviously, this is the case for primitive types and for POD (Plain Old Data, C-style) structures.
Whether the type is MemMoveable, that is, if copying an object from a spot to another in memory and then destroying the source object is the same as memcpy'ing the object's bits and not destroying the source. Again, this is the case for primitive types and PODs. But, as you will see soon, a surprisingly rich subset of user-defined types is MemMoveable.

The next installment of "Generic<Programming>" will define MemCopyable and MemMoveable and exploit them in a manner similar to CopyThrows. Does that put the subject to rest? Not at all. As the reallocate and inplace_reallocate artifacts in the downloadable code suggest, we'll travel through the Caudine Forks of memory allocation. Until then — comments galore!

Acknowledgement

Many thanks to Tim Sharrock for a thorough review.

Bibliography and Notes

[1] Herb Sutter. Exceptional C++ (Addison-Wesley, 2000).

[2] Acronym for COW Fan Club. In turn, COW stands for Copy-On-Write. COW is the one strategy that is friendly to the move-implemented-as-copy policy used by std::vector. However, many library writers are moving away from COW-based implementations due to the problems they raise in multithreaded programs.

[3] In general, buffer must have the same level of exception safety as vector. See Bjarne Stroustrup, Appendix E: Standard-Library Exception Safety at <www.research.att.com/~bs/3rd_safe0.html>.

[4] Boost is a collection of cutting-edge C++ libraries. See <www.boost.org>.

[5] Loki is the library featured in Modern C++ Design (Addison-Wesley, 2001) written by yours truly. You can download Loki from <www.moderncppdesign.com>.

[6] Andrei Alexandrescu. "Generic<Programming>: Mappings between Types and Values," C/C++ Users Journal C++ Experts Forum, October 2000, <www.cuj.com/experts/1810/alexandr.htm>.

Andrei Alexandrescu is a Development Manager at RealNetworks Inc. (<www.realnetworks.com>), based in Seattle, WA, and author of the acclaimed book Modern C++ Design. He may be contacted at www.moderncppdesign.com. Andrei is also one of the featured instructors of The C++ Seminar (<www.gotw.ca/cpp_seminar>).