Channels ▼
RSS

Generic: Typed Buffers (III)


December 2001 C++ Experts Forum/Generic<Programming>


This is the last installment treating typed buffers, lightweight and flexible contiguous arrays of objects of arbitrary types. Lying somewhere in between the primitive built-in arrays and the sophisticated std::vector, typed buffers are useful structures when efficiency is paramount, and, more important, handy building blocks for more elaborate structures — such as strings, vectors, queues, and more.

The previous installment [1] focused on efficient implementations of basic buffer-related operations, such as filling and copying memory. The article you are about to read takes a broader view — we will talk about copying and moving objects, as opposed to raw data.

Mor(e)on Allocators

When trying to stun people at parties by talking about policy-based design [2], I usually remark sardonically that STL allocators are a well-known experiment with a policy, an experiment that failed miserably.

Like those Freudian reports that link the mid-life angst to some little event in childhood, memory allocation in C++ has quite an interesting story that harkens back to C.

Since its inception, C has offered three memory management primitives: malloc, free, and realloc:

void* malloc(size_t size);
void free(void* p);
void* realloc(void* p, size_t newSize);

Three might be a magic number, but three memory management primitives are just not enough. Even more so when you consider the annoying overlap — realloc does it all:

  • If you pass it a null pointer, realloc does what malloc does.
  • If you pass it a zero size, realloc does what free does.

So realloc is a do-it-all function that takes care of all your memory management needs. (This is a bad sign, by the way.) Let's see what a typical realloc implementation looks like.

void* realloc(void* p, size_t newSize)
{
     // Get rid of degenerate cases
    if (p == 0) return malloc(newSize);
    if (newSize == 0) return free(p), 0;

    // Attempt in-place expansion
    if (p == _expand(p, newSize) && _msize(p) >= newSize) 
    {
        return p;
    }
 
    // Need to move memory — acquire new chunk 
    void* pNew = malloc(newSize);
    if (!pNew) return 0;</p>

    // Move data to the new location
    memcpy(pNew, p, std::min(_msize(p), newSize));
    free(p);
    return pNew;
}

Guess the realloc implementation above is quite easy to read, especially because it doesn't obey the SESE (Single Entry, Single Exit) tyranny [3]. Some details are unclear, though — what do _expand and _msize do?

These two primitives are instrumental to realloc's functioning:

void* _expand(void* p, size_t newSize);
size_t _msize(void* p);

_expand attempts to expand in place the memory block pointed to by its first argument. It is possible that when you try to reallocate a block of memory, the block just next to it is available. In that case, the memory allocator can just quickly adjust its bookkeeping structures to accommodate the reallocation at the expense of that next block. This is much faster than copying the whole chunk to a larger location. In case of success, _expand returns the pointer passed to it.

_msize simply returns the size of the chunk pointed to by its argument. (Any memory manager must keep track of the size of each chunk.)

_expand and _msize are not standard, so you only can use them indirectly via realloc.

Getting back to realloc, if in-place expansion fails, then realloc allocates a whole new block, copies the old block to the new block using memcpy, and frees the old block. Very simple. Maybe too simple.

When C++ was born, it relied heavily on C's library and infrastructure. In particular, it would have been awkward if C++ developed its own, separate, memory allocator from scratch — it made much more sense to build a new, strongly typed allocation mechanism onto Standard C's allocation primitives.

And here lies the problem.

While memcpy is so loyal to C, it is C++'s well-intended but clumsy servant. For C, moving things around with memcpy works just fine, but C++ needs more. Objects with constructors and destructors cannot be moved in memory just like that. They might encapsulate pointers to their own internals or compiler-managed data (such as pointers to base in virtual inheritance). This means that a C++ object might not be valid anymore if memcpy'd from one location to another — and righteously, the C++ Standard explicitly prohibits memcpying objects that are not POD (Plain Old Data).

If memcpy is not good, then realloc is not good — and so C++ forewent any reallocation opportunities since day one. C++ has new and delete, but not "renew" or anything similar to realloc.

It comes as no surprise that std::allocator (that is, yet another allocation thingie whose only purpose is to annoy everybody) does not provide an interface for reallocation, either. And so in the dusk of the second millennium, the most modern C++ library does not have a means to support optimal memory allocation.

Because C++ lacks efficient reallocation, by definition a C++ program uses more memory and/or is slower than it could be. To add insult to injury, neither C nor C++ expose _msize, so in practice the size of a dynamically allocated block must usually be stored twice — in the bowels of the allocator and in the program itself.

Now close your eyes for a moment. Close your eyes and imagine the multitude of C++ applications humming (and some crashing) out there right now — all those desktop apps, business apps, servers, middle-tiers, embedded systems, you name it. Now open your eyes and face the truth. They are sub-optimal: they copy memory when they shouldn't, and they manage information that they shouldn't. All this because C's original allocator design didn't provide client access to two primitives [4].

Let's now see how we can optimize buffer memory allocation. Because typed buffers use allocators, we need a solution that plays well with the std::allocator interface.

The Mallocator

No, it's not the title of a Hollywood movie — it's just an std-like allocator based on malloc.

Let's start by making an observation: realloc is not entirely unusable to C++; it's just unusable with a subset of C++'s types. So even within the confines of the scarce Standard C API, you can safely use realloc with any POD, including all primitive types. Thus, in order to reap fast reallocation for our buffer class, we need to do the following:

    A) Figure out whether an arbitrary type T is a POD or not.

    B) Define a malloc-based allocator that has the same interface as std::allocator and adds a reallocate function. For PODs, the new allocator uses realloc directly; for non-PODs, it returns "uh-uh," in which case the caller must proceed with the classic allocate-copy-discard sequence.

    C) Provide a method to interrogate an allocator: "Are you able to reallocate?" To that question, std::allocator must answer "No." Rub: you cannot touch std::allocator.

    D) If available, exploit that reallocate function within buffer. That is, if you instantiate buffer<char, std::allocator<char> >, nothing special happens, but if you say buffer<char, mallocator<char> >, then buffer uses efficient reallocation wherever possible.

Each of the issues above has a unique best solution in modern C++, and it is a good exercise to think of that before continuing to read. Deal?

Ok, here goes.

    A) If you've read [5], you know that TypeTraits is the solution. Adding to [5]'s code base, you can now write:

    namespace TypeTraits
    {
        template <typename T> struct IsPOD
        {
            enum { value = IsPrimitive<T>::value };
        };
    }

    By default, only primitive types are POD. As shown in [5], you can specialize TypeTraits::IsPOD for types that you know are PODs.

    B) Herb and Jim would write, "Prophet Austern has explained to the masses how to write a Standard-blessed allocator." Indeed, Matt Austern did it in [6], which you can find on the holy Internet.

    template <typename T>        
    struct Mallocator        
    {        
        ... allocator implementation using malloc/free, see <a href="#6">[6]</a> ...
    
        T* reallocate(T* p, size_type newSize)        
        {        
            return TypeTraits::IsPOD<T>::value        
            ? static_cast<T*>(realloc(p, newSize))        
            : static_cast<T*>(0);        
        }        
    };

    C) How to query a type's capabilities non-intrusively? Of course — traits [7]! Without further ado:

    template <typename A>        
    struct AllocatorSupportsReallocation        
    {        
        enum { value = false };        
    };
    
    template <typename T>        
    struct AllocatorSupportsReallocation< Mallocator<T> >        
    {        
        enum { value = true };        
    };

    D) Writing two versions of a function depending on a Boolean condition is easy accomplished by using Int2Type [2, 8]. The reallocation is best handled by an overloaded function:

    template <typename A>        
    typename A::pointer Reallocate(        
        A& alloc,        
        typename A::pointer p,         
        typename A::size_type oldObjectCount,        
        typename A::size_type newObjectCount,        
        Int2Type<false>)        
    {        
        ...implement reallocation for allocators         
            that DON'T support <b>reallocate</b>...        
    }
    
    template <typename A>        
    typename A::pointer Reallocate(        
        A& alloc,        
        typename A::pointer p,         
        typename A::size_type oldObjectCount,        
        typename A::size_type newObjectCount,        
        Int2Type<true>)        
    {        
        ...implement reallocation for allocators         
            that DO support <b>reallocate</b>...        
    }

Why do you need the oldObjectCount parameter? It's simple: when A::realloc returns zero and you need to allocate a new chunk and then copy objects, you need to know how many objects you had. That's why the two Reallocate implementations deal with object counts, not byte counts. Reallocate is a tad higher-level than allocators' member functions.

Now when invoking Reallocate, all buffer<T, A> has to do is to pass Int2Type<TypeTraits::IsPOD<T>::value> as the last argument, and, voilà! Fast reallocation kicks in!

Sometimes, you do have access to library extensions, such as _expand and _msize. This is the case with Microsoft's C library implementation. That means, you can implement efficient reallocation for all types, not just PODs. Of course, if you do that and you want your code to have some cross-platform dignity, you must do the usual #ifdef sorcery.

Moving Objects

Let's now move on to another topic. Even if you could reallocate memory optimally by exploiting whatever in-place expansion opportunities there are, things are still far from optimal. Consider this:

buffer<std::string> buf(1000000);        
for (size_t i = 0; i != buf.size(); ++i)        
{        
    buf[i] = GetSomeString(i);        
}        
...        
buf.reserve(buf.size() + 1000);

Now regardless of whether or not you use the Mallocator, sometimes you do need to go through the allocate-copy-destroy cycle. Assuming that happens in the example above, the reserve call involves moving around 1,000,000 string objects. In C++, moving objects is done similar to those primitive teleporting methods: clone the object at a distance and then kill the original.

Cloning all those strings can be costly — much more costly than just copying the memory they sit on. If the strings are not reference counted — which is quite likely [9] — then reserve incurs a lot of unnecessary work. reserve allocates a new memory chunk, copies each string to the new chunk, and finally, destroys the strings in the old chunk.

But why all these copies when we actually only need to smoothly reseat objects? We just need to tell a string, "Hey, here's a new place for you in memory, please reseat yourself." Because the old location of the string is irrelevant (will be discarded anyway), the string can only efficiently copy its internal pointers to the destination memory block.

It doesn't take long to figure out that to model what we're trying to do requires the concept of moving as a primitive operation, distinct from copying. Copying is duplicating the object. Moving is not duplication, the total number of objects stays the same. Absent the concept of moving, we implement it in C++ by cloning and then butchering the original. We need to be more human.

There are many ways of implementing moving. The simplest one can think of is via a function:

template <class T> void Move(T* src, void* dest);

That's not that easy, however. Consider you are implementing a string class:

class UltimateString        
{        
    char* start_;        
    char* end_;        
    char* endOfStorage_; // because there's no <b>_msize</b>, sigh        
public:        
    ... functions ...        
};

Now you'd think you specialize Move for UltimateString as follows:

template <>        
void Move(UltimateString* src, void* dest)        
{        
    UltimateString* typedDest = static_cast<UltimateString*>(dest);        
    typedDest->start_ = src->start_;        
    typedDest->end_ = src->end_;        
    typedDest->endOfStorage_ = src->endOfStorage_;        
}

If you think that's it, you're wrong. That's not it, because the Standard leaves compilers the latitude to add their own data to your classes (except when they're POD). That means, the compiler might add an int __coolnessFactor_ member variable to your string class, a variable that you don't know about, which makes it impossible for you to copy all of src into dest. The effect of Move is undefined.

C++ insists that every object that has constructors is built through a call to one of those constructors. Any other way of creating an object is prohibited. So a solution to moving objects must go through some constructor.

Let's then define a stub class like this:

template <typename T>        
struct Takeover        
{        
    explicit Takeover(T& obj) : obj_(obj) {}        
    T& Get() { return obj_; }        
private:        
    T& obj_;        
};

Takeover encapsulates a reference in an object and provides access to it.

Then, you implement a constructor of UltimateString that accepts a Takeover<UltimateString> object:

class UltimateString        
{        
    char* start_;        
     char* end_;        
    char* endOfStorage_; // because there's no <b>_msize</b>, sigh        
public:        
   UltimateString(Takeover<UltimateString> wrap)        
    {        
        UltimateString& src = wrap.Get();        
        start_ = src.start_;        
        end_ = src.end_;        
        endOfStorage_ = src.endOfStorage_;        
        // Nuke the source        
        src.start_ = src.end_ = src.endOfStorage_ = 0;        
    }        
    ...        
};

For convenience's sake, let's name the constructor above a "takeover constructor." The takeover constructor coexists with other constructors. It copies the three pointers into the constructed string and then nukes the original pointers. This last step is needed because the takeover constructor must leave the original object in a destroyable state.

So far, so good. However, how can you figure out in a piece of generic code that a type T implements a constructor accepting a Takeover<T>? Depending on that, you need to dispatch at compile time between different algorithms for moving objects.

Here's the juiciest part of the article, because I can nicely point you to [2] and [8]. Remember the Conversion template? If not, trust me, it's worth a read, so print [8] and read it, or better yet (for me at least), buy [2]. Conversion offers a way to detect at compile time whether some time U can be converted to some type T. If you replace U with Takeover<T> and T with, well, T, you found the solution to our problem. Any piece of code can figure out from the outside whether an arbitrary type implemented a takeover constructor or not.

A word on exception safety. Normally, moving an object should not throw exceptions because moving doesn't involve allocating new resources. This is not always true, however. Consider the following code:

class Widget        
{        
    ... does not implement a takeaway constructor ...        
    ... copy constructor might throw exceptions ...        
};

class Midget        
{        
    Widget w_;        
public:        
    Midget(Takeover<Midget> wrap)        
    {        
        ... oops, how do you implement it to not throw exceptions? ...        
    }        
};

This problem can be avoided in two ways. One is, simply, to replace the Widget member with a Widget* (or, if you have enough nerve, with std::auto_ptr<Widget>), and to use dynamic allocation. The pointer can be smoothly copied without having to copy Widget itself, so the constructor is easy to write.

The second way to implement a takeover constructor for Midget is to require two different capabilities from Widget:

  1. Widget must implement a non-throwing swap, as all containers in the standard library do. It also establishes a (very shaky) framework for other types to define swap as well. Most important, the standard library made a point that swapping objects without throwing is a fundamental primitive operation. swap is very useful in a variety of situations and cannot be implemented efficiently (and non-throwingly) from the outside of a class' implementation. If you are std-savvy, you likely do implement swap for all your first-class (non-polymorphic) objects. (With luck, maybe this article convinces you to implement a takeover constructor as well.)
  2. Widget's default constructor does not throw an exception. This is usually easy to accomplish.

If the two conditions above hold, then you can implement the takeover constructor like so:

Midget(Takeover<Midget> wrap)        
: w_() // make it clear that w_ is default-constructed        
{        
    w_.swap(wrap.Get().w_);        
}

That is, build an empty value for the destination and swap that empty value with the value to be taken over.

Takeover construction and swapping are two related and slightly overlapping operations. The relationship between them is as follows:

  • You can implement a takeover constructor in terms of a non-throwing swap, but only if you have a non-throwing constructor available. If all your constructors might throw, swap is useless in implementing takeover construction.
  • You could implement swap in terms of a takeover constructor, were it not for that @#!%^& alignment. Consider this:
  • void SwapViaMove(Midget& lhs, Midget& rhs)        
    {        
        char buffer[sizeof(Midget)]; // MUST BE ALIGNED TO HOLD A MIDGET        
        // Move lhs into buffer        
        Midget* lhsMoved = new (buffer) Midget(Takeover<Midget>(lhs));        
        // Move rhs over lhs        
        lhs.~Midget();        
        new (&lhs) Midget(Takeover<Midget>(rhs));        
        // Move lhs (who's now in buffer) to rhs        
        rhs.~Midget();        
        new (&rhs) Midget(Takeover<Midget>(*lhsMoved));        
        lhsMoved->~Midget();        
    }

The function is incorrect (in addition to looking quite arcane) because there's no 100-percent portable way to ensure that buffer is properly aligned to hold a Widget.

If you have access to a non-throwing constructor, you can get rid of the alignment problem like this:

void SwapViaMove(Midget& lhs, Midget& rhs)        
{        
    Midget temp; // Won't throw        
    // Move lhs into temp        
    temp.~Midget();        
    new (&temp) Midget(Takeover<Midget>(lhs));        
    // Move rhs over lhs        
    lhs.~Midget();        
    new (&lhs) Midget(Takeover<Midget>(rhs));        
    // Move temp to rhs        
    rhs.~Midget();        
    new (&rhs) Midget(Takeover<Midget>(temp));        
    // temp->~Midget(); not needed because the compiler does it        
}

(If you think of making temp static to save some cycles, you've just thrown away the reentrancy of the function.)

Corollary: any of non-throwing swap and takeover construction can be portably implemented in terms of the other if and only if you have access to a non-throwing constructor.

Now when it comes to moving ranges of objects, the most effective algorithm looks like this:

  • If the type is a POD, you use memcpy, memmove, or one of the fast copy routines described in [1]. Duff said. Pardon. 'Nuff said.
  • Else, if the type supports takeover construction, use it.
  • Else use the inhuman clone-destroy routine in a loop.

Summary and Conclusion

Memory allocation in portable C and C++ is suboptimal. The C library makes it impossible to expand a memory block in place, and consequently C++ doesn't support something similar, either. In addition, C and C++ do not provide information about the size of a memory block, information that is nonetheless available to the allocator. These two shortcomings hamper an application's speed and footprint. To what extent, I don't know; I suspect the impact is negligible to small for most applications, and quite bothersome for some.

To overcome the mentioned shortcomings, you can design an allocator (Mallocator) that uses realloc for PODs. This way at least part of your objects will benefit from fast reallocation. If your C library implementation provides extensions such as _msize and _expand, you can use those in your Mallocator implementation, thus implementing fast reallocation for all objects.

To C++ objects, non-throwing moving is as fundamental an operation as copying. Swapping objects is somewhat conceptually equivalent to moving (under special circumstances). A solid solution to moving objects is takeover construction.

However you implement it, having a means for moving objects (efficiently and without throwing exceptions) is the only way in which you can make compound containers such as vector< vector<double> > become a viable solution.

Bibliography and Notes

[1] Andrei Alexandrescu. "Generic<Programming>: Typed Buffers (II)," C/C++ Users Journal C++ Experts Forum, October 2001, <www.cuj.com/experts/1910/alexandr.htm>.

[2] Andrei Alexandrescu. Modern C++ Design (Addison-Wesley Longman, 2001). The book is not the first to come up with the concept of policy-based designs, but it documents and applies it systematically.

[3] SESE is a mantra that soared during the days of structured programming. In the Orwellian SESE-land, every function must have only one entry (by design in most languages anyway) and one exit. The "single-exit" predicament is terribly obsolete today, in the era of exceptions. Although it's been shown time and again that SESE code is longer, less maintainable, more "McCabe complex," and less efficient than the equivalent multiple-exit code, there still are poor souls who are convinced that SESE is a great thing. (After all, Aristotle himself believed that the sun was scrolled onto the sky by flying angels.) SESE zealots were formed in the Pascal and C days. Unfortunately, if you do the age progression, those people are exactly today's development managers.

[4] Microsoft's C library implementation does provide access to _expand and _msize. Say what you want, but the folks in Redmond are known for their sense of practicality.

[5] Andrei Alexandrescu. "Generic<Programming>: Typed Buffers (I)," C/C++ Users Journal C++ Experts Forum, August 2001, <www.cuj.com/experts/1908/alexandr.htm>.

[6] Austern, Matt. "The Standard Librarian: What Are Allocators Good For?" C/C++ Users Journal C++ Experts Forum, December 2000, <www.cuj.com/experts/1812/austern.htm>.

[7] Andrei Alexandrescu. "Traits: The else-if-then of Types," C++ Report, April 2000.

[8] Andrei Alexandrescu. "Generic<Programming>: Mappings between Types and Values," C/C++ Users Journal C++ Experts Forum, October 2000, <www.cuj.com/experts/1810/alexandr.htm>.

[9] Andrei Alexandrescu. "Generic<Programming>: A Policy-based basic_string implementation", C/C++ Users Journal C++ Experts Forum, June 2001, <www.cuj.com/experts/1906/alexandr.htm>.

Andrei Alexandrescu is a Ph.D. student at University of Washington in Seattle, and author of the acclaimed book Modern C++ Design. He may be contacted at www.moderncppdesign.com. Andrei is also one of the featured instructors of The C++ Seminar (<www.gotw.ca/cpp_seminar>).


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.