Channels ▼
RSS

Generic: Discriminated Unions (III)


August 2002 C++ Experts Forum/Generic<Programming>


Before even getting into the introduction for this article, here is some news that might be of interest to you.

A while ago, Jonathan H. Lundquist and Mat Marcus independently ported parts of Loki to Visual C++ 6. Their implementations were meant as a proof of concept and did not reach completeness. By and large, due to various compiler-related problems, at this point in time Loki is more of a source of inspiration for people's designs, rather than a "shrink-wrapped" product that you just drop in your program. The typical Loki user is a savvy, brave developer who doesn't hide under the desk at the sight of an error message — even when that message's length might "buffer overrun" some Web servers.

However, this is changing as you read right now.

It came as no little surprise to me when Rani Sharoni emailed a complete, unabridged port of Loki to Visual C++ 7.0 (sic!). The usage syntax was kept unchanged with very few exceptions, which is remarkable given that Visual C++ 7.0 (Visual C++ .NET) does not support partial template specialization.

If you use that compiler, you may want to give Loki's port a whirl [1].

Other related news — there's an ongoing action to port Loki to Boost, a move that would unite two efforts into a convergent one. Loki::SmartPtr is on its way already; as always when it comes to smart pointers, there's much discussion, this time of unprecedented productivity. Also, Joe Swatosh has proposed the mightily popular ScopeGuard [2], written by Petru Marginean and yours truly, for inclusion into Boost as well.

In a word, these are heady times for cool C++ work. Ultimately, this means you'll have less grunt work to do and more time to spend on your high-level design.

Variant: The Road Ahead

In the past two installments of "Generic<Programming>," we defined discriminated unions and defined a core Variant class. That core class offers the most basic facilities that allow its user to store an object into the Variant, query the type of an object inside a Variant, and retrieve that object in a type-safe manner. Moreover, Variant is portable and has quite an efficient foundation.

This is hardly an exciting collection of features. Sure, portability is a good thing. (But then, what's porting good for, if you have to deal with older, nonconforming compilers?) Also, efficiency certainly doesn't hurt. But these are not features by themselves — Variant's portability and efficiency doesn't render it more expressive when you write code using it. The thing is, more features would be very helpful. Today, we will focus on adding a number of powerful features to Variant — some that don't exist within similar implementations.

But before getting into that, you are due for a soapbox.

Finally, It Happened — And It Will Happen Again

It is a fact of life that no matter how hard you try you cannot please everybody. That's why, ever since Modern C++ Design [3] hit the shelves, I've been waiting for the "your book stinks" message. No doubt, it was to come sooner or later. Damocles' sword has been hanging for more than one year now, when finally it fell, in the form of a review on Amazon.

According to that review, Modern C++ Design is "typical highbrow snobbery, academic extremism by (and for) tenured people who do no actual work and must squeeze out of themselves a piece of writing for the C++ Report on a monthly basis." That's quite a comprehensive statement that covers both me, who wrote what you're reading right now, and yourself, because you're reading it. Maybe we should both stop right now.

Or maybe not. If anything, I would have hoped for a more based "your book stinks" message; this is too easy to combat. For one thing, Modern C++ Design was written entirely after hours; during the day, its author was doing real C++ work — and applying many of the concepts in the book.

Second — and this reveals why I'm bringing the whole thing up — I don't really feel being squeezed after writing any of the bimonthly installments of "Generic<Programming>." There is, however, an issue that I wanted to ask your opinion on.

To that review author's annoyance, I decided to write a second book, tentatively entitled The return of C++. (If you also add the new book that Herb Sutter and myself are writing, C++ Coding Standards, things get really depressing.)

The return of C++ is not a sequel to Modern C++ Design. Sequels can be bad, especially when they are a rehash of a successful idea, in lack of inspiration for something genuinely new. Also, what I don't want this new book to be is a regurgitation of my own articles. I believe that if you pay money to buy a book, then you deserve more than bound material that could be found online gratis.

This ultimately means it's not the flow of new ideas that's at stake, but a conflict of interests. When something cool comes up, the tough decision is, should this be part of a new article, or part of the upcoming book? That's what's stopping me from writing about some exciting things, such as Design by Contract, generic implementations of the Observer pattern, composite objects, or a complete treatment of error handling in C++.

In essence, if you have any idea on how to address this conflict of interests, don't hesitate to email me.

Fortunately, there are many cool new ideas floating around as well, for which the article form is the most suitable. For example, you'll soon see how generic sorting and searching in C++ are not quite top-notch yet, and how to achieve much better performance than many current library implementations offer. Given that programs spend about 80% of time searching (when they don't do I/O), you should have a 0.8 degree of interest in that article. When you don't do I/O, that is.

Enough about that. Otherwise, it's only more fuel to say, "Aha! No more inspiration, so here's some of this guy's existential angst for us to solve!" The thing is, good long-term programming columnists such as Al Stevens, Michael Swaine, or our dear Bobby Schmidt, do include soapboxes in their columns. And you know what? I love reading them. In proportion, it's like reading Russian novelists: it's not something sensational they're saying, but you simply can't let it slip out of your hand.

Back to Variant

Let's recap quickly where we were with our Variant implementation, with a couple of code examples.

Variant object is an object that can store exactly one object of a collection of types. Grace to a template constructor, you can directly initialize a Variant with an object of one of those types. For example:

// DatabaseField can contain a string, an int, or a double
typedef Variant<cons<string, int, double>::type>
    DatabaseField;
// Make three DatabaseField objects — a string,
// an int, and a double
DatabaseField fld1(string("Hello, world"));
DatabaseField fld2(25);
DatabaseField fld3(3.14);

Given a Variant object, you can query its type:

assert(fld2.TypeId() == typeid(int));

You can also access the actual typed value that resides inside a Variant:

string* p = fld1.GetPtr<string>();
// we know for sure fld1 contains a string
assert(p != 0); 
// now it's "Hello, world!" 
// — on a happier note, as it should be
*p += '!'; 

It goes without saying that Variant implements the traditional constructors and that it cleans up after itself upon destruction:

// default constructor, initializes with a default-
// constructed object of the first type in the typelist
DatabaseField fld5; 
assert(fld5.GetType() == typeid(string));
assert(fld5.GetPtr<string>()->empty());
DatabaseField fld4 = fld1; // copy constructor
assert(fld4.GetType() == typeid(string));
assert(*fld4.GetPtr<string>() == "Hello, world!");

Ok, now let's throw in some important utility functions.

First, let's get busy with accessing the typed value stored inside a Variant. You see, GetPtr is okay, but idioms based on it don't offer quite as good mileage. Look at this:

void ProcessField(const DatabaseField& fld)
{
    if (const string* pS = fld.GetPtr<string>())
    {
        ...
    }
    else if (const int* pI = fld.GetPtr<int>())
    {
        ...
    }
    else if (const double* pD = fld.GetPtr<double>())
    {
        ...
    }
}

For everything you need to do with a Variant, you have to specifically ask for each of the types it might contain and process that type in separation. This doesn't quite scale up — the little if-else chains will bury your actual work quite soon. Worse, when you add new types to your Variant type (such as a Memo type to the DatabaseField definition), the compiler won't tap you on the shoulder to add the new corresponding trailing else-if to each chain of tests. You're on your own, and the compiler is like the Mob — it's always better to have it on your side.

Visitation

How can we implement functions that process Variants without having to query their type ad nauseam?

One solution would be to add more functions to Variant's fake vtable, just the way we did with TypeId, Clone, etc. This would work very nicely, but it would destroy Variant's application independence. We want to build a generic, extensible Variant, not one that you have to change to use.

You also don't want to inherit Variant; it's a value type, not a reference type.

A more applicable technique is visitation. Visitor [4] is a design pattern that allows you to add a new operation to a hierarchy, without having to modify that hierarchy.

How does visitation work? A full explanation is beyond the scope of this article, but if the concept is unknown to you, it is highly recommended that you check out Visitor in [4]. The Visitor pattern has a multitude of interesting applications, and at least yours truly sees the non-visitability of some hierarchies as a fatal design mistake [5].

In Variant's case, the hierarchy is the collection of stored types. They really don't form a hierarchy, but the fake vtable that we hold allows us to define polymorphic operations on those types' behalf, without them even noticing. (They even can be primitive types as the code samples above show.)

By the way, if you are interested in the subject, but the "fake vtable" thing confuses you, don't forget that this article builds on two previous ones [6, 7].

To make Variant visitable, we need to take the following steps:

  • Define a BaseVisitor class for the Variant type. In the particular case of DatabaseField, the BaseVisitor class looks like this:
  • struct BaseVisitor
    {
        virtual void Visit(string&) = 0;
        virtual void Visit(int&) = 0;
        virtual void Visit(double&) = 0;
    };
    

  • Define an Accept function in the fake virtual table implementation, as shown in the previous article [7]:
  • ... inside Variant ...
    template <class T>
    struct VTableImpl
    {
        ... as before ...
        static void Accept(Variant& var, BaseVisitor& v)
        {
        T& data = *reinterpret_cast<const T*>
            (&var.buffer_[0]);
        v.Visit(data);
        }
    };
    

  • Add a pointer to the Accept function to the fake virtual table structure:
  • ... inside Variant ...
    struct VTable
    {
        ... as before ... 
        void (*accept_)(Variant&, BaseVisitor&);
    };
    

  • Don't forget to bind the accept_ pointer to the VTableImpl<T>::Accept function in Variant's constructor:
  • ... inside Variant ...
    template <class T>
    Variant(const T& val)
    {
        new(&buffer_[0]) T(val);
        static VTable vtbl = 
        { 
             &VTableImpl<T>::TypeId,
             &VTableImpl<T>::Destroy,
             &VTableImpl<T>::Clone,
             &VTableImpl<T>::Accept, // added
     };
        vptr_ = &vtbl;
    }

  • Provide a façade function:
  • ... inside Variant ...
    void Accept(BaseVisitor& v)
    {
        (vptr_->accept_)(*this, v);
    }
    

Admittedly, it's a little bit more work than simply adding a new function, but fortunately, we won't have to do this over and over again. Once visitation is in place, we can process DatabaseField objects by simply defining new classes derived from BaseVisitor:

struct ProcessDatabaseField : public BaseVisitor
{
    virtual void Visit(string& s)
    {
        ...
    } 
    virtual void Visit(int& i)
    {
        ...
    } 
    virtual void Visit(double& d)
    {
        ...
    } 
};
DatabaseField fld = ...;
ProcessDatabaseField processor;
// invokes the appropriate Visit() in processor
fld.Accept(processor); 

This setting buys us two things. First, you can now distribute ProcessDatabaseField's member functions across compilation units (perhaps specialized in string processing and number crunching, respectively). Second, if you later add a new type to DatabaseField's definition, say:

typedef Variant<cons<string, int, double, Memo>::type>
    DatabaseField;

and if you also add void Visit(Memo&) = 0 to BaseVisitor's definition, then the compiler will flag every derivation of BaseVisitor that doesn't handle Memo as illegal.

If you followed this article (and hopefully you did), you might be kind of like, "Good, I have the concept down. Visitation is a nice alternative to the cumbersome GetPtr." Of course, as with any concept, some details accompany it. The presentation glossed over such a detail for the sake of explaining one thing at a time, and to avoid lengthy prolegomena. If you've read closely, you might have noticed this unsettling detail: BaseVisitor is defined to handle string, int, and double only. It works fine for visiting DatabaseField objects (which can hold each of these types), but it is useless for other Variant instantiations, which might contain various collections of types. How in the world can you define a base abstract BaseVisitor class that has one pure virtual Visit(X&) function for each type X in a given typelist?

It turns out that it's entirely possible, and Modern C++ Design shows how to do that. Even nicer, Loki defines a generic visitation framework. If it's generic and it's about visitation, hey, why not use it. Here's how [8]:

template <class TList, class Align =      
    AlignedPOD<TList>::Result>      
class Variant      
{      
    typedef Loki::Visitor<void, TList> StrictVisitor;      
    ...      
};

That was easy. Now all you have to do to add new processing abilities to DatabaseField is to derive new visiting classes from DataBaseField::StrictVisitor. Sure enough, DataBaseField::StrictVisitor defines exactly three Visit pure member functions, taking a string, an int, and a double, respectively. Or whatever types you decide DatabaseField should contain. Sweet!

Flavors of Visitation

Now that the concept is in place, we can ask ourselves about refinements.

For example, consider const-correctness when talking about visitation. The BaseVisitor::Visit functions all take a non-const argument. This means that the following code won't work:

struct Streamer : public DataBaseField::StrictVisitor      
{      
    virtual void Visit(string& s)      
    {      
        cont << s;      
    }       
    virtual void Visit(int& i)      
    {      
        cout << i;      
    }       
    virtual void Visit(double& d)      
    {      
        cout << d;      
    }       
};      
const DatabaseField fld(4.5); // mind the const  
Streamer str;      
// error! No Streamer::Visit overload       
// accepts const double&      
fld.Accept(str);

Tracing back through the problem, we notice that the root of the trouble is:

... inside Variant ...      
template <class T>      
struct VTableImpl      
{      
    ... as before ...      
    static void Accept(Variant& var, BaseVisitor& v)      
    {      
        ...      
  }
};

The Accept function above takes a non-constant Variant object as its first argument.

An ancient saying (I believe it's from the Greeks) is: "When one VTableImpl<T>::Accept won't do, maybe two will." Indeed, all we've got to do is to go through the checklist of defining a new Variant member function (called Accept as well), except that this time we pass around a const Variant& instead of a plain Variant&.

The corresponding base visitor for const Variant objects is obtained by using the straight Loki::Visitor, but passing to it a transformed typelist. The transformation takes the original typelist and returns a new typelist, with a const attached to every type. Here it is:

template <class TList> struct MakeConst;      
template <> struct MakeConst<null_typelist>      
{      
    // terminator is not const      
    typedef null_typelist Result;       
};      
template <typename Head, class Tail>      
struct MakeConst< typelist<Head, Tail> >      
{      
    typedef typename MakeConst<Tail>::Result NewTail;      
    typedef typelist<const Head, NewTail> Result; 
};

The key to the MakeConst transformation is the last line, where the Head type is transformed into const Head when returned in the Result type. Given MakeConst, we define a new base visitor ConstStrictVisitor like this:

template <class TList, class Align =      
AlignedPOD<TList>::Result>      
class Variant      
{      
    typedef Loki::Visitor<void, TList> StrictVisitor;      
    typedef Loki:: Visitor<void,       
        typename MakeConst<TList>::Result>      
        ConstStrictVisitor;      
    ...
};

All right, now what about that Strict prefix in all the visitor names we've defined so far? Is there some "relaxed" visitation as opposed to strict visitation?

Consider the following task: if a DatabaseField is a string, it needs to be put between square brackets [like this]. Otherwise, there is nothing to do. The implementation is:

struct Transformer : public DataBaseField::StrictVisitor      
{      
    virtual void Visit(string& s)      
    {      
        s = '[' + s + ']';      
    }       
    virtual void Visit(int& i)      
    {      
        // Nothing to do      
    }       
    virtual void Visit(double& d)      
    {      
        // Nothing to do      
    } 
};

Notice that in this case, although Transformer doesn't care about visiting int and double at all, it still has to define the do-nothing Visit(int&) and Visit(double&) overloads. The visitor implementation is strict: all functions must be implemented, and the compiler enforces it.

For cases like Transformer, when we care only about a subset of types and do nothing for all others, non-strict visitation is the way of doing business. Therefore, Variant gets two more type definitions: non-strict visitation for non-const and const Variant objects.

template <class TList, class Align =      
    AlignedPOD<TList>::Result>      
class Variant      
{      
    typedef Loki::Visitor<void, TList> StrictVisitor;      
    typedef Loki::Visitor<void,       
        typename MakeConst<TList>::Result>      
        ConstStrictVisitor;      
    typedef Loki:: NonStrictVisitor<void,      
TList> StrictVisitorImpl;      
    typedef Loki::NonStrictVisitor<void,      
typename MakeConst<TList>::Result>      
    ConstStrictVisitor;      
    ...
};

Some improvements to Variant's visitation mechanism could include a custom return type. (Right now all visitations return void.)

Loose Ends

This concludes our Variant implementation.

Once visitation is in place, you can add any functionality you wish to Variant, at the cost of two virtual function calls (as opposed to one). This is the price to pay for Variant's generality. Fortunately, the solution scales up. (The overhead doesn't grow with the size of the typelist passed to Variant.)

An important feature that Variant implements out-of-the-box is extracting its value converted to another type, or changing its stored type in-place. For example, consider the code below:

DatabaseField fld(45);      
assert(fld.TypeId() == typeid(int));      
double d = fld.ConvertTo<double>();
assert(d == 45.0);

If the type currently stored in the Variant object (in the example above, int) can be converted to the type requested (in this case, double), then the conversion proceeds. Otherwise, ConvertTo throws an exception.

A variation would be to change the type of the object in place:

fld.ChangeType<double>();
assert(fld.TypeId() == typeid(int));
assert(*fld.GetPtr() == 45.0);

These functions are implemented through visitation, as you can see in the downloadable code (alexandr.zip). They also offer inspiration for adding other functions that work with Variant.

Conclusion

Discriminated unions are a useful modeling tool. They can naturally express that an object's type belongs to a collection of types at any given moment. Unlike polymorphic types, discriminated unions limit in advance the collection of possible types. This brings them important efficiency advantages, which are hard, but not impossible, to realize in C++. We reaped efficiency by using in-situ allocation — accompanied with alignment calculation — and by using fake vtables — which emulate polymorphism for value types.

Also, it is important to understand the connection between discriminated unions and visitation. In most functional languages, visitation is implicitly done with a "switch"-like statement that yields errors if not all possible types in an object are handled. In C++, we implemented such type-safe visitation by using an abstract base class.

Most importantly, discriminated unions might bring more coherence to your design, better enforcement of your design axioms, and ease of keeping architecture in sync with implementation.

Bibliography and Notes

[1] <www.geocities.com/rani_sharoni/LokiPort.html> (also linked from Modern C++ Design's homepage, <http://moderncppdesign.com>).

[2] Andrei Alexandrescu and Petru Marginean. "Generic<Programming>: Simplify Your Exception-Safe Code," C/C++ Users Journal C++ Experts Forum, December 2000, <www.cuj.com/experts/1812/alexandr.htm>.

[3] Andrei Alexandrescu. Modern C++ Design (Addison-Wesley, 2001).

[4] E. Gamma, et al. Design Patterns (Addison-Wesley, 1995), Chapter 5.

[5] For example, the standard exception classes are not visitable, and they certainly should be.

[6] Andrei Alexandrescu. "Generic<Programming>: Discriminated Unions (I)," C/C++ Users Journal C++ Experts Forum, April 2002, <www.cuj.com/experts/2004/alexandr.htm>.

[7] Andrei Alexandrescu. "Generic<Programming>: Discriminated Unions (II)," C/C++ Users Journal C++ Experts Forum, June 2002, <www.cuj.com/experts/2006/alexandr.htm>.

[8] Some of the Visitor artifacts shown below are not present in the original version of Loki, but rather in later additions.

Andrei Alexandrescu is a Ph.D. student at University of Washington in Seattle, and author of the acclaimed book Modern C++ Design. He may be contacted at www.moderncppdesign.com. Andrei is also one of the featured instructors of The C++ Seminar (<www.gotw.ca/cpp_seminar>).


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video