Dr. Dobb's | Discriminated Unions (II)

Discriminated Unions (II)

We continue working on our Variant design and implementation. This time we focus on storage - how data is stored and accessed. This article presents techniques for computing alignment of types with reasonable accuracy and portability. Then, once the storage proper is in place, we need polymorphic means to manipulate that storage - for which Variant uses the fake vtable

June 01, 2002
URL:http://www.drdobbs.com/cpp/discriminated-unions-ii/184403828

You know the one about the syntactic sugar? It causes cancer of the semicolon [1]? Well, enough jokes for today. We have a lot of ground to cover today, so let's hit the road early.

This installment continues the implementation of discriminated unions in C++. Today we'll finish the discussion on alignment and proceed full-speed to writing some code for the actual implementation of Variant.

Before that, let's refresh our memory on the progress we made in the previous installment of "Generic<Programming>" [2].

Highlights from the Previous Episode

After putting together a wish list for implementing discriminated unions, we discussed what would be the best storage model. After stumbling upon alignment issues, we figured out that a storage model that is at the same time properly aligned, efficient, and exception-friendly is:

union 
{
  unsigned char buffer_[neededSize];
  Align dummy_;
};

where neededSize is the maximum size of the types in the union, and Align is a POD [3] type that ensures proper alignment.

Depending on how well chosen Align is, the discriminated union's storage can vary from optimal to unnecessarily large to defective.

Even with the best Align, the implementation above is still not 100-percent portable for all types. In theory, someone could implement a compiler that respects the Standard but still does not work properly with discriminated unions. This is because the Standard does not guarantee that all user-defined types ultimately have the alignment of some POD type. Such a compiler, however, would be more of a figment of a wicked language lawyer's imagination, rather than a realistic language implementation.

Implementing an Alignment Calculator

That said, several effective alignment computation algorithms can be devised. You can see a couple implemented in the Boost libraries [4]. One that works well for computing the alignment of a type T is:

Start with a collection of all primitive types.
Eliminate from that collection the types larger than T.
The resulting Align is a C union containing each of the types in the collection computed in step 2.

The basic idea underlying this algorithm is that any user-defined type T will ultimately contain only primitive types. Consequently, T's alignment requirements will be the same as one of those types. Larger types have larger alignment requirements; hence it's reasonable to infer that an upper bound of T's alignment is the alignment of its largest member.

This algorithm might "overalign" the result. For example, if T is char[5] and sizeof(int) is four, T might have an alignment of one, but Align will have the alignment of an int (most likely, four).

Erring on the large side of alignment is not that bad in most cases. (Erring on the small side is disastrous.) On the other hand, erring on the large side of size wastes space. Our algorithm's step 2 ensures that only types of size less than or equal to T's size remain selected.

To implement the algorithm for computing Align, recall from the previous installment of "Generic<Programming>" that we have two appropriate tools at our disposal: typelists and ConfigurableUnion. Typelists allow us to manipulate collections of types, allowing us to perform steps 1 and 2. ConfigurableUnion generates a C-style union from a typelist, thus taking care of step 3.

It is worth noting that, in addition to all primitive types, the initial collection of types should include some simple structs as well. Each of those structs contains only one member of a primitive type. We generate these stub structures from a simple template:

template
  <typename U> struct Structify
  { U dummy_; };

The reason for introducing these little structures is simple. Some compilers put tougher alignment on a structure containing only one int than on the int alone. This way, the compiler writer ensures that all user-defined types have the same alignment requirements. In turn, this architectural decision makes things easy for various parts of the compiler.

We thus start with a typelist containing all primitive types and all "structified" types:

class Unknown;

typedef cons<
        char,      
        short int,
        int,
        long int,
        float,
        double,
        long double,
        char*,
        short int*,
        int*,
        long int*,
        float*,
        double*,
        long double*,
        void*,
        Unknown (*)(Unknown),
        Unknown* Unknown::*,
        Unknown (Unknown::*)(Unknown),
        Structify<char>,
        Structify<short int>,
        Structify<int>,
        Structify<long int>,
        Structify<float>,
        Structify<double>,
        Structify<long double>,
        Structify<char*>,
        Structify<short int*>,
        Structify<int*>,
        Structify<long int*>,
        Structify<float*>,
        Structify<double*>,
        Structify<long double*>,
        Structify<void*>,
        Structify<Unknown (*)(Unknown)>,
        Structify<Unknown* Unknown::*>,
        Structify<Unknown (Unknown::*)(Unknown)>
        >::type
      TypesOfAllAlignments;

Okay, so the primitive types are there; the structified types are there, too — very predictable. But some lines of the typelist above look as if the copyeditor's cat did a little Teutonic dance on the keyboard. The incriminating lines are:

        Unknown (*)(Unknown)
        Unknown* Unknown::*
        Unknown (Unknown::*)(Unknown)

Also, their structified versions appear at the bottom of the list.

Who are these guys, and where did they come from? To make them look more familiar, let's give them names:

        Unknown (*T1)(Unknown);
        Unknown* Unknown::* T2;
        Unknown (Unknown::* T3)(Unknown);

Aha! If you precede each of the lines above with typedef, they reveal themselves in all the tired glory of the C declaration syntax. T1 is a pointer to a function taking Unknown and returning Unknown; T2 is a pointer to member of an Unknown, member of type Unknown*; and T3 is a pointer to member function of Unknown, taking an Unknown as an argument and returning an Unknown as a result.

The trick here is that the cunningly named Unknown is indeed not defined. Therefore, the compiler will assume that Unknown is defined elsewhere and that it has the worst alignment requirements possible. (Otherwise, the compiler can optimize Unknown's layout. Optimization here plays against generality.)

Okay, here comes the fun part. Let's now eliminate from TypesOfAllAlignments all types that are larger than a given size, by writing the following template:

template <class TList, size_t size> struct ComputeAlignBound;

First, we treat the limited case, as a nice recursive algorithm ought to:

template <size_t size>
struct ComputeAlignBound<null_typelist, size>
{
  typedef null_typelist Result;
};

Then, we treat the general case:

template <class Head, class Tail, size_t size>
struct ComputeAlignBound<typelist<Head, Tail>, size>
{
  typedef typename ComputeAlignBound<Tail, size>::result TailResult;
  typedef typename select<  // details below
    sizeof(Head) <= size,
    typelist<Head, TailResult>,
    TailResult>::result
    Result;
};

First, ComputeAlignBound yields the result of its recursive application to the tail of the typelist in TailResult. Now, if the head's size is less than or equal to size, the result is a typelist formed by Head and TailResult. Otherwise, the result is only TailResult — Head is no longer included.

This type selection is performed by a little template entitled select. To save column space, please allow me to refer you to [5]. It's a very useful little template, and if you are interested in generic programming using C++, you owe it to yourself to find out about it.

To put it all together, we need to tie up some loose ends. Let's recap what we have and what we need. We have the MaxSize template, which calculates the maximum size of all types in a typelist. We have ConfigurableUnion, which builds a union out of a typelist. Finally, we have ComputeAlignBound, which calculates a type that has the same or more stringent alignment requirements as any type in a typelist. We need a POD type that provides a properly aligned storage for any type in a typelist. Here's how to exploit what we have to obtain what we need:

template <typename TList>
class AlignedPOD
{
  enum { maxSize = MaxSize<TList>::result };
  typedef ComputeAlignBound<TypesOfAllAlignments, maxSize>::Result
    AlignTypes;
public:
  typedef ConfigurableUnion<AlignTypes> Result;
  };

You can play encapsulation games by putting ComputeAlignBound, Unknown, Structify, and ConfigurableUnion in some private namespace so that they are properly hidden.

Discriminated Unions Implementation: The Fake vtable Idiom

Let's get back to discriminated unions. At this point, half of this article's real estate has been spent on computing alignment. I hope that it was a well-invested effort; alignment is important in a host of low-level applications, as well as in the efficient implementation of higher-level abstractions.

Now that we have storage for the objects held by the union, we need the discriminator — the tag that Variant stores and that tells what type the object inside really is. The discriminator must make it possible to perform certain type-specific operations on the contents of the Variant, such as type identification and data manipulation in a type-safe manner.

We have many design options. The simplest is to store an integral discriminator:

template <class TList, class Align = AlignedPOD<TList>::Result>
class Variant
{
  union
  {
    char buffer_[size];
    Align dummy_;
  };
  int discriminator_;
public:
  ...
};

The disadvantage of this solution is that integers are not very "smart." To do different things depending on the discriminator, the switch solution rears its head, and switch means coupling. Perhaps we could use the int as an index into some table, but then why not directly use pointers into that table (a solution that we'll investigate later).

The second solution is to use a proxy polymorphic object.

template <class TList, class Align = AlignedPOD<TList>::Result>
class Variant
{
  union
  {
    char buffer_[size];
  Align dummy_;
  };
  struct ImplBase
  {
    virtual type_info& TypeId() = 0;
    virtual Variant Clone(const Variant&) = 0;
    virtual void Destroy(Variant&) = 0;
    ...
  };
  ImplBase* pImpl_;
public:
  ...
};

The idea here is to perform various operations on the same data by using a pointer to a polymorphic object. Then, depending on what actually lies in buffer_, the pointer points to a different concrete ImplBase, and the polymorphic object performs specialized operations on it.

This approach is indeed very clean, except for one thing: efficiency. When invoking a function such as Destroy, the access is:

Variant object.
Dereference pImpl_.
Dereference pImpl_'s virtual function table, the so-called "vtable" [6].
Index into the appropriate function in the vtable.

Indexing with a constant into the vtable and accessing a field in an object (which boils down to indexing as well) are not problematic. The problem is dereferencing pointers — there are two levels of indirection between issuing the call and getting to the function. However, one level of indirection should be enough to dispatch a call to the correct type handler. What to do?

(Allow me a parenthetical remark. If you wonder whether I forgot the two laws of optimizations: (1) don't do it and (2) don't do it yet — well, I still remember them. Why, then, is Variant so keen on optimization? The reason is simple. This is not application code; this is a library artifact, and more so, it is one that emulates and rivals a language feature. The more streamlined Variant is, the more you can rely on it for heavyweight duties; the more sloppily it is implemented, the more it becomes a toy that you wouldn't care to use in a real application.)

A good approach would be to simulate what the compiler does: fake a vtable and ensure only one level of indirection. This leads us to the following code:

template <class TList, class Align = AlignedPOD<TList>::Result>
class Variant
{
  union
  {
    char buffer_[size];
    Align dummy_;
  };
  struct VTable
  {
    const std::type_info& (*typeId_)();
    void (*destroy_)(const Variant&);
    void (*clone_)(const Variant&, Variant&);
  };
  VTable* vptr_;
public:
  ...
};

The VTable structure contains pointers to various functions, functions that access the Variant object. (Be wary of the syntax for defining destroy_ and clone_; they are pointers to functions stored as regular data members inside VTable. The C declaration syntax strikes again.)

The fake vtable idiom offers, at the cost of some implementation effort, only one level of indirection and great flexibility in manipulating the Variant object. Let's now see how we can initialize and use the fake vtable.

Initializing a Variant Object

Upon construction, Variant needs to initialize its vptr_ member. In addition, it should properly initialize each pointer inside the VTable. To this end, let's define a template class VTableImpl<T>. That template defines static functions that match the types of the pointers to functions in VTable.

... inside Variant ...
template <class T>
struct VTableImpl
{
  static const std::type_info& TypeId()
  {
    return typeid(T);
  }

  static void Destroy(const Variant& var)
  {
  const T& data =
        *reinterpret_cast<const T*>(&var.buffer_[0]);
     data.~T();
  }

  static void Clone(const Variant& src, Variant& dest)
  {
    new(&dest.buffer_[0]) T(
      *reinterpret_cast<const T*>(&src.buffer_[0]));
    dest.vptr_ = src.vptr_;
  }
};

Notice a couple of interesting aspects in VTableImpl's implementation:

All functions are static and take the Variant object to manipulate by reference as their first argument.
Whenever it needs access to the actual object of type T, VTableImpl obtains it by casting &buffer_[0] to T*. In other words, all functions in VTableImpl assume that there's a T seated inside buffer_.

It is easy to put the function pointers and the type stored in buffer_ in sync — this is the job of the constructor:

... inside Variant ...
template <class T>
Variant(const T& val)
{
  new(&buffer_[0]) T(val);
  static VTable vtbl =
  {
    &VTableImpl<T>::TypeId,
    &VTableImpl<T>::Destroy,
    &VTableImpl<T>::Clone,
  };
  vptr_ = &vtbl;
}

Yes, it's that easy. We create a copy of the incoming val and comfortably seat it (by using the placement new operator) inside buffer_. (We worked hard to ensure that buffer_ is properly aligned to hold a T.) Then, we create a static VTable right on the spot and initialize it with pointers to the three corresponding static functions in VTableImpl. All that is left to do is to initialize vptr_ with the address of the newly created vtable, and that's all there is to it.

Stop — that's not really all. Consider this:

typedef Variant<cons<int, double>::type> Number;
string s("Hello, World!");
Number weird(s);

The code above compiles fine because it successfully instantiates Variant's constructor with string. However, the intent of Number is obviously to accept only int or double upon construction. So we need to ensure that Variant's templated constructor cannot be instantiated with any other type than those listed in Variant's typelist.

Here two helper tools kick in: typelist manipulation, which makes it possible to compute at compile time the index of a type in a typelist; any typelist package contains such a facility. The other tool is compile time assertions — a feature that allows you to render code non-compilable if a logical condition is not met. I won't bloat the article by repeating the whole story. Both Boost [7] and Loki [8] have typelist manipulation and compile-time assertions that vary only a little in syntax.

If you want to enforce the restriction starting from the first principles, here's a little tool that will work:

template <class TList, typename T>
struct EnsureOccurence
{
  typedef EnsureOccurence<typename TList::tail, T>::Result Result;
};

template <class T, class U >
struct EnsureOccurence<typelist<T, U>, T>
{
  typedef T Result; // can be T or whatever
};

If you instantiate:

typedef EnsureOccurence<cons<int, double>::type, double>::Result Test;

then the first specialization gets instantiated and instantiates recursively the same template with the tail of the list. By this point, the second specialization matches and "extinguishes" the recursion.

If, on the other hand, you try to write:

typedef EnsureOccurence<cons<int, double>::type, long int>::Result Test;

then the second specialization of EnsureOccurence never matches; the recursive instantiation exhausts the typelist, by which point the compiler tries to instantiate null_typelist::tail, a type that does not exist.

So here's the revamped templated constructor of Variant:

... inside Variant ...
template <class T>
Variant(const T& val)
{
  typedef EnsureOccurence<TList, T>::Result Test;
  new(&buffer_[0]) T(val);
  static VTable vtbl =
  {
&VTableImpl<T>::TypeId,
&VTableImpl<T>::Destroy,
&VTableImpl<T>::Clone,
  };
  vptr_ = &vtbl;
}

How about the default constructor? There is a temptation to ban it on grounds of minimalism, but a class without a proper default constructor, is, so to say, too "minimalistic." For one thing, you can't store Variants in STL containers.

A decision that makes sense is to initialize the Variant with a default-constructed object whose type is the first type in the typelist. You can't leave the Variant uninitialized, and defining the constructor that way lets the user place the "cheapest" type in the front of the list.

... inside Variant ...
Variant()
{
  typedef typename TList::Head T;
  new(&buffer_[0]) T(val);
  static VTable vtbl =
  {
  &VTableImpl<T>::TypeId,
    &VTableImpl<T>::Destroy,
    &VTableImpl<T>::Clone,
  };
  vptr_ = &vtbl;
}

Eliminating the code duplication between the templated and the default constructor is left as an interesting exercise to the reader.

Using the Fake VTable

Let's see how to expose the functionality offered by VTableImpl to Variant's user. For example, the function that gets the type identifier of a Variant object looks like this:

template <class TList, class Align = AlignedPOD<TList>::Result>
class Variant
{
  ...
public:
  const std::type_info& TypeId() const
  {
    return (vptr_->typeId_)();
  }
  ...
};

The code accesses the typeId_ pointer to function from vptr_, and invokes the function it points to, in a single shot. Now the function that returns a typed pointer to the contents of the Variant object is only six lines of code away:

... inside Variant ...
template <typename T> T* GetPtr()
{
  return TypeId() == typeid(T)
    ? reinterpret_cast<T*>(&buffer_[0])
    : 0;
}

(Well, in the actual implementation, you would write the corresponding const function as well.)

The destructor is even easier:

... inside Variant ...
~Variant()
{
  (vptr_->destroy_)(*this);
}

By now we have a nice little core Variant template, able to create and destroy objects properly, as well as extract the data stored in a type-safe manner. The little class is hungry for extensions, as we'll see in the next installment of "Generic<Programming>".

Conclusions

There are several conclusions to draw from this article. One is that, like software projects, articles on software can run over budget. I initially planned to dedicate one column to discriminated unions. After carefully analyzing the level of detail necessary, I was sure (and advertised) two columns for the task. Now I'm forced to dedicate three columns (or more, no promise this time) to the subject.

However, my belief is that the issues are important enough to be described in detail.

Computing alignment portably is hard, but feasible. It never is 100-percent portable. The task reclaims a language feature. Alignment is important to anyone who wants to implement custom seating for generic types (as Variant does) or to deal directly with memory in various other ways.

The fake vtable idiom has you write by hand what a compiler generates automatically — a structure containing pointers to functions. It also prescribes the explicit presence of a pointer (vptr_) to such a vtable in your objects and has you initialize the pointer properly. Templates make it possible to automate vtable generation.

In exchange for your effort, the fake vtable idiom allows you to implement polymorphic behavior for in-situ objects.

Acknowledgement

Many thanks are due to Emily Winch who provided a thorough and helpful review.

Notes

[1] One of Alan Perlis' legendary one-liners.

[2] A. Alexandrescu. "Generic<Programming>: Discriminated Unions (I)," C/C++ Users Journal C++ Experts Forum, <www.cuj.com/experts/2004/alexandr.htm>.

[3] POD stands for Plain Old Data, a type umbrella that covers primitive types and essentially C-like structs/unions, namely (in all excruciating detail): having no user-declared constructors, no private or protected non-static data member, no base classes, no virtual functions, no non-static data members of type pointer to member, non-POD-struct, non-POD-union (or an array of such types) or reference, no user-defined copy assignment operator, and no user-defined destructor.

[4] Search for "align" in the Boost downloadable archive or on the developer mailing list. You can get to both from <www.boost.org>.

[5] Select template.

[6] The use of virtual function tables is not required, but virtually all C++ implementations use a variation of this mechanism.

[7] <www.boost.org>

[8] A. Alexandrescu. Modern C++ Design (Addison-Wesley Longman, 2001), Chapters 2 and 3.

Andrei Alexandrescu is a Ph.D. student at University of Washington in Seattle, and author of the acclaimed book Modern C++ Design. He may be contacted at www.moderncppdesign.com. Andrei is also one of the featured instructors of The C++ Seminar.