Channels ▼
RSS

C/C++

Discriminated Unions (II)


You know the one about the syntactic sugar? It causes cancer of the semicolon [1]? Well, enough jokes for today. We have a lot of ground to cover today, so let's hit the road early.

This installment continues the implementation of discriminated unions in C++. Today we'll finish the discussion on alignment and proceed full-speed to writing some code for the actual implementation of Variant.

Before that, let's refresh our memory on the progress we made in the previous installment of "Generic<Programming>" [2].

Highlights from the Previous Episode

After putting together a wish list for implementing discriminated unions, we discussed what would be the best storage model. After stumbling upon alignment issues, we figured out that a storage model that is at the same time properly aligned, efficient, and exception-friendly is:

union 
{
  unsigned char buffer_[neededSize];
  Align dummy_;
};

where neededSize is the maximum size of the types in the union, and Align is a POD [3] type that ensures proper alignment.

Depending on how well chosen Align is, the discriminated union's storage can vary from optimal to unnecessarily large to defective.

Even with the best Align, the implementation above is still not 100-percent portable for all types. In theory, someone could implement a compiler that respects the Standard but still does not work properly with discriminated unions. This is because the Standard does not guarantee that all user-defined types ultimately have the alignment of some POD type. Such a compiler, however, would be more of a figment of a wicked language lawyer's imagination, rather than a realistic language implementation.

Implementing an Alignment Calculator

That said, several effective alignment computation algorithms can be devised. You can see a couple implemented in the Boost libraries [4]. One that works well for computing the alignment of a type T is:

  1. Start with a collection of all primitive types.
  2. Eliminate from that collection the types larger than T.
  3. The resulting Align is a C union containing each of the types in the collection computed in step 2.

The basic idea underlying this algorithm is that any user-defined type T will ultimately contain only primitive types. Consequently, T's alignment requirements will be the same as one of those types. Larger types have larger alignment requirements; hence it's reasonable to infer that an upper bound of T's alignment is the alignment of its largest member.

This algorithm might "overalign" the result. For example, if T is char[5] and sizeof(int) is four, T might have an alignment of one, but Align will have the alignment of an int (most likely, four).

Erring on the large side of alignment is not that bad in most cases. (Erring on the small side is disastrous.) On the other hand, erring on the large side of size wastes space. Our algorithm's step 2 ensures that only types of size less than or equal to T's size remain selected.

To implement the algorithm for computing Align, recall from the previous installment of "Generic<Programming>" that we have two appropriate tools at our disposal: typelists and ConfigurableUnion. Typelists allow us to manipulate collections of types, allowing us to perform steps 1 and 2. ConfigurableUnion generates a C-style union from a typelist, thus taking care of step 3.

It is worth noting that, in addition to all primitive types, the initial collection of types should include some simple structs as well. Each of those structs contains only one member of a primitive type. We generate these stub structures from a simple template:

template
  <typename U> struct Structify
  { U dummy_; };

The reason for introducing these little structures is simple. Some compilers put tougher alignment on a structure containing only one int than on the int alone. This way, the compiler writer ensures that all user-defined types have the same alignment requirements. In turn, this architectural decision makes things easy for various parts of the compiler.

We thus start with a typelist containing all primitive types and all "structified" types:

class Unknown;

typedef cons<
        char,      
        short int,
        int,
        long int,
        float,
        double,
        long double,
        char*,
        short int*,
        int*,
        long int*,
        float*,
        double*,
        long double*,
        void*,
        Unknown (*)(Unknown),
        Unknown* Unknown::*,
        Unknown (Unknown::*)(Unknown),
        Structify<char>,
        Structify<short int>,
        Structify<int>,
        Structify<long int>,
        Structify<float>,
        Structify<double>,
        Structify<long double>,
        Structify<char*>,
        Structify<short int*>,
        Structify<int*>,
        Structify<long int*>,
        Structify<float*>,
        Structify<double*>,
        Structify<long double*>,
        Structify<void*>,
        Structify<Unknown (*)(Unknown)>,
        Structify<Unknown* Unknown::*>,
        Structify<Unknown (Unknown::*)(Unknown)>
        >::type
      TypesOfAllAlignments;

Okay, so the primitive types are there; the structified types are there, too — very predictable. But some lines of the typelist above look as if the copyeditor's cat did a little Teutonic dance on the keyboard. The incriminating lines are:

        Unknown (*)(Unknown)
        Unknown* Unknown::*
        Unknown (Unknown::*)(Unknown)

Also, their structified versions appear at the bottom of the list.

Who are these guys, and where did they come from? To make them look more familiar, let's give them names:

        Unknown (*T1)(Unknown);
        Unknown* Unknown::* T2;
        Unknown (Unknown::* T3)(Unknown);

Aha! If you precede each of the lines above with typedef, they reveal themselves in all the tired glory of the C declaration syntax. T1 is a pointer to a function taking Unknown and returning Unknown; T2 is a pointer to member of an Unknown, member of type Unknown*; and T3 is a pointer to member function of Unknown, taking an Unknown as an argument and returning an Unknown as a result.

The trick here is that the cunningly named Unknown is indeed not defined. Therefore, the compiler will assume that Unknown is defined elsewhere and that it has the worst alignment requirements possible. (Otherwise, the compiler can optimize Unknown's layout. Optimization here plays against generality.)

Okay, here comes the fun part. Let's now eliminate from TypesOfAllAlignments all types that are larger than a given size, by writing the following template:

template <class TList, size_t size> struct ComputeAlignBound;

First, we treat the limited case, as a nice recursive algorithm ought to:

template <size_t size>
struct ComputeAlignBound<null_typelist, size>
{
  typedef null_typelist Result;
};

Then, we treat the general case:

template <class Head, class Tail, size_t size>
struct ComputeAlignBound<typelist<Head, Tail>, size>
{
  typedef typename ComputeAlignBound<Tail, size>::result TailResult;
  typedef typename select<  // details below
    sizeof(Head) <= size,
    typelist<Head, TailResult>,
    TailResult>::result
    Result;
};

First, ComputeAlignBound yields the result of its recursive application to the tail of the typelist in TailResult. Now, if the head's size is less than or equal to size, the result is a typelist formed by Head and TailResult. Otherwise, the result is only TailResultHead is no longer included.

This type selection is performed by a little template entitled select. To save column space, please allow me to refer you to [5]. It's a very useful little template, and if you are interested in generic programming using C++, you owe it to yourself to find out about it.

To put it all together, we need to tie up some loose ends. Let's recap what we have and what we need. We have the MaxSize template, which calculates the maximum size of all types in a typelist. We have ConfigurableUnion, which builds a union out of a typelist. Finally, we have ComputeAlignBound, which calculates a type that has the same or more stringent alignment requirements as any type in a typelist. We need a POD type that provides a properly aligned storage for any type in a typelist. Here's how to exploit what we have to obtain what we need:

template <typename TList>
class AlignedPOD
{
  enum { maxSize = MaxSize<TList>::result };
  typedef ComputeAlignBound<TypesOfAllAlignments, maxSize>::Result
    AlignTypes;
public:
  typedef ConfigurableUnion<AlignTypes> Result;
  };

You can play encapsulation games by putting ComputeAlignBound, Unknown, Structify, and ConfigurableUnion in some private namespace so that they are properly hidden.

Discriminated Unions Implementation: The Fake vtable Idiom

Let's get back to discriminated unions. At this point, half of this article's real estate has been spent on computing alignment. I hope that it was a well-invested effort; alignment is important in a host of low-level applications, as well as in the efficient implementation of higher-level abstractions.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video