Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

Discriminated Unions (II)


Now that we have storage for the objects held by the union, we need the discriminator — the tag that Variant stores and that tells what type the object inside really is. The discriminator must make it possible to perform certain type-specific operations on the contents of the Variant, such as type identification and data manipulation in a type-safe manner.

We have many design options. The simplest is to store an integral discriminator:

template <class TList, class Align = AlignedPOD<TList>::Result>
class Variant
{
  union
  {
    char buffer_[size];
    Align dummy_;
  };
  int discriminator_;
public:
  ...
};

The disadvantage of this solution is that integers are not very "smart." To do different things depending on the discriminator, the switch solution rears its head, and switch means coupling. Perhaps we could use the int as an index into some table, but then why not directly use pointers into that table (a solution that we'll investigate later).

The second solution is to use a proxy polymorphic object.

template <class TList, class Align = AlignedPOD<TList>::Result>
class Variant
{
  union
  {
    char buffer_[size];
  Align dummy_;
  };
  struct ImplBase
  {
    virtual type_info& TypeId() = 0;
    virtual Variant Clone(const Variant&) = 0;
    virtual void Destroy(Variant&) = 0;
    ...
  };
  ImplBase* pImpl_;
public:
  ...
};

The idea here is to perform various operations on the same data by using a pointer to a polymorphic object. Then, depending on what actually lies in buffer_, the pointer points to a different concrete ImplBase, and the polymorphic object performs specialized operations on it.

This approach is indeed very clean, except for one thing: efficiency. When invoking a function such as Destroy, the access is:

  • Variant object.
  • Dereference pImpl_.
  • Dereference pImpl_'s virtual function table, the so-called "vtable" [6].
  • Index into the appropriate function in the vtable.

Indexing with a constant into the vtable and accessing a field in an object (which boils down to indexing as well) are not problematic. The problem is dereferencing pointers — there are two levels of indirection between issuing the call and getting to the function. However, one level of indirection should be enough to dispatch a call to the correct type handler. What to do?

(Allow me a parenthetical remark. If you wonder whether I forgot the two laws of optimizations: (1) don't do it and (2) don't do it yet — well, I still remember them. Why, then, is Variant so keen on optimization? The reason is simple. This is not application code; this is a library artifact, and more so, it is one that emulates and rivals a language feature. The more streamlined Variant is, the more you can rely on it for heavyweight duties; the more sloppily it is implemented, the more it becomes a toy that you wouldn't care to use in a real application.)

A good approach would be to simulate what the compiler does: fake a vtable and ensure only one level of indirection. This leads us to the following code:

template <class TList, class Align = AlignedPOD<TList>::Result>
class Variant
{
  union
  {
    char buffer_[size];
    Align dummy_;
  };
  struct VTable
  {
    const std::type_info& (*typeId_)();
    void (*destroy_)(const Variant&);
    void (*clone_)(const Variant&, Variant&);
  };
  VTable* vptr_;
public:
  ...
};

The VTable structure contains pointers to various functions, functions that access the Variant object. (Be wary of the syntax for defining destroy_ and clone_; they are pointers to functions stored as regular data members inside VTable. The C declaration syntax strikes again.)

The fake vtable idiom offers, at the cost of some implementation effort, only one level of indirection and great flexibility in manipulating the Variant object. Let's now see how we can initialize and use the fake vtable.

Initializing a Variant Object

Upon construction, Variant needs to initialize its vptr_ member. In addition, it should properly initialize each pointer inside the VTable. To this end, let's define a template class VTableImpl<T>. That template defines static functions that match the types of the pointers to functions in VTable.

... inside Variant ...
template <class T>
struct VTableImpl
{
  static const std::type_info& TypeId()
  {
    return typeid(T);
  }

  static void Destroy(const Variant& var)
  {
  const T& data =
        *reinterpret_cast<const T*>(&var.buffer_[0]);
     data.~T();
  }

  static void Clone(const Variant& src, Variant& dest)
  {
    new(&dest.buffer_[0]) T(
      *reinterpret_cast<const T*>(&src.buffer_[0]));
    dest.vptr_ = src.vptr_;
  }
};

Notice a couple of interesting aspects in VTableImpl's implementation:

  • All functions are static and take the Variant object to manipulate by reference as their first argument.
  • Whenever it needs access to the actual object of type T, VTableImpl obtains it by casting &buffer_[0] to T*. In other words, all functions in VTableImpl assume that there's a T seated inside buffer_.

It is easy to put the function pointers and the type stored in buffer_ in sync — this is the job of the constructor:

... inside Variant ...
template <class T>
Variant(const T& val)
{
  new(&buffer_[0]) T(val);
  static VTable vtbl =
  {
    &VTableImpl<T>::TypeId,
    &VTableImpl<T>::Destroy,
    &VTableImpl<T>::Clone,
  };
  vptr_ = &vtbl;
}

Yes, it's that easy. We create a copy of the incoming val and comfortably seat it (by using the placement new operator) inside buffer_. (We worked hard to ensure that buffer_ is properly aligned to hold a T.) Then, we create a static VTable right on the spot and initialize it with pointers to the three corresponding static functions in VTableImpl. All that is left to do is to initialize vptr_ with the address of the newly created vtable, and that's all there is to it.

Stop — that's not really all. Consider this:

typedef Variant<cons<int, double>::type> Number;
string s("Hello, World!");
Number weird(s);

The code above compiles fine because it successfully instantiates Variant's constructor with string. However, the intent of Number is obviously to accept only int or double upon construction. So we need to ensure that Variant's templated constructor cannot be instantiated with any other type than those listed in Variant's typelist.

Here two helper tools kick in: typelist manipulation, which makes it possible to compute at compile time the index of a type in a typelist; any typelist package contains such a facility. The other tool is compile time assertions — a feature that allows you to render code non-compilable if a logical condition is not met. I won't bloat the article by repeating the whole story. Both Boost [7] and Loki [8] have typelist manipulation and compile-time assertions that vary only a little in syntax.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.