Channels ▼
RSS

C/C++

Discriminated Unions (I)


In conclusion, it would be best if Variant used in-situ allocation.

The search continues with the obvious idea of in-situ allocation: we could use an untyped buffer of characters, right?

template <class TList>
class Variant 
{
    // compute the needed buffer size
    enum { neededSize = ... }; 
    unsigned char buffer_[neededSize];
    ...
};

We actually can compute neededSize quite easily. Just think of implementing a recursive algorithm that gets the largest element in a typelist: you compare the head with the largest element in the tail (which you compute recursively), and you choose the larger of the two.

template <class TList> struct MaxSize;
  
template <> 
struct MaxSize<null_type>
{
    enum { result = 0 };
};

template <class Head, class Tail> 
struct MaxSize< typelist<Head, Tail> >
{
private:
    enum { tailResult = size_t(MaxSize<Tail>::result) };
public: 
    enum { result = sizeof(Head) > tailResult ? 
        sizeof(Head) : size_t(tailResult) };
};

As you can see, MaxSize recursively applies itself to compute the internal variable tailResult, and then it chooses the larger number. The limit case (null typelist) just returns the smallest possible positive number — zero.

Given MaxSize, our Variant definition becomes:

template <class TList>
class Variant 
{
    enum { neededSize = MaxSize<TList>::result };
    unsigned char buffer_[neededSize];
    ...
};

If you heard an awful sound, that's our current storage implementation hitting the wall: it doesn't work. Worse, it might work sometimes — with some types, on some machines, on some compilers, and during certain phases of the moon. Get ready for tough times — we're having alignment issues.

Indeed, the problem with our approach is that the compiler doesn't know that we plan on storing "true" data inside buffer_. For all it knows, the compiler must care about a buffer of characters.

It turns out that, in most current computer architectures, some data types must be stored at specific offsets in memory. For example, typically four-byte integers can be allocated only at addresses that are a multiple of four. That doesn't mean that all eight-byte data must be allocated at multiple-of-eight addresses — their alignment could be four as well. (No data, however, can require alignment greater than its own size, because that would break the array storage rules.)

The rationale for the notion of alignment is the physical arrangement of memory banks, the bus, and the processor. For example, certain memory banks are physically linked to the lowest-order eight bits of the memory bus, so by construction their content cannot directly reach the higher-order bits.

If you try to access data that's not properly aligned, one of several things might happen. You can get the data you wanted, only much slower, because essentially the processor simulates in software, by shuffling data around, what should be a hardware operation. (The x86 processors do this.) Many other processors, however, terminate your program immediately on grounds of an attempt to execute an illegal memory access.

The good news is that most of the time, you don't need to worry about alignment: the processor nicely takes care of it by allocating all data at properly aligned addresses. The bad news is that you do need to care exactly now, because buffer_[neededSize] really means you're taking your life in your hands.

How can we guarantee that buffer_ is properly aligned? Well, that's exactly what union does — it guarantees enough size for its largest member and alignment for its most restrictive member. This is what we need, so our storage should look something like this:

union 
{
    unsigned char buffer_[neededSize];
    Align dummy_;
};

We never use dummy_; it's there just for alignment purposes. We now only need to define the Align type properly. We'll investigate exactly how to do that in the next installment. Until then, happy coding and ideas galore!

Bibliography

[1] A. Alexandrescu. "Generic<Programming>: Typelists and Applications," C/C++ Users Journal C++ Experts Forum, February 2002, <www.cuj.com/experts/2002/alexandr.htm>.

[2] You can download Loki from <www.moderncppdesign.com>.

[3] A. Alexandrescu. Modern C++ Design (Addison-Wesley Longman, 2001).

[4] A. Alexandrescu. "An Implementation of Discriminated Unions in C++," The 2001 C++ Template Programming Workshop, held in conjunction with OOPSLA 2001, <www.oonumerics.org/tmpw01/alexandrescu.pdf>.

[5] Rumor has it that the next revised C++ Standard, fondly called C++0x by those who keep score at home, will support types with constructors as union members. Anyway, types with destructors can't be allowed, and that makes sense. Given that the union doesn't store a discriminator, how would the compiler know which destructor to call when going out of scope?


Andrei Alexandrescu is a Ph.D. student at University of Washington in Seattle, and author of the acclaimed book Modern C++ Design. He may be contacted at www.moderncppdesign.com. Andrei is also one of the featured instructors of The C++ Seminar


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video