Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

Discriminated Unions (I)


When describing union, most C and C++ textbooks refer to it as of an inherently low-level, uninteresting facility — hey, if you really care about saving memory, here's a little feature for you. That's all that C's union can offer you. Discriminated unions, however, are a very useful high-level abstraction. This and the next installment of "Generic<Programing>" will try to convince you of this.

Uses of Discriminated Unions

Discriminated unions come in handy whenever you need to manipulate heterogeneous objects (in state and interface), in a homogenous way. (Polymorphism is great at manipulating objects with a heterogeneous state, but uniform interface.)

Consider a data type intended to store a database field's value. Say database fields can be integers, floating-point numbers, strings, or dates. They don't really have anything in common, except that they live together in the database. If you want to implement a data structure suitable for storing and manipulating database fields, you need to be able to hold any of the database's field types at a given moment. This is a task that fits variants like a glove.

Another example is storing data in a dynamically typed language, such as Scheme or Basic. A variable can have any type of a finite, small set of possibilities.

And yet another example would be items passed around in a parser. For example, during syntactical parsing, you obtain lexemes that each can be either a keyword, an identifier, a literal constant (integer, floating-point, string, etc.), an operator, etc. A discriminated union is best at storing such a lexical element. An object-oriented approach would allow nonsensical operations, such as obtaining the integral value of a keyword. In other words, it's not that nicely typed.

As we'll see below, when carefully implemented, discriminated unions can also have significant (a marketing person would say "overwhelming") efficiency advantages over other approaches.

Current Implementations

The obvious way of implementing discriminated unions in C and C++ is to use a "tagged union": you store a classic union comprising the types of interest together with an integer that tells what field is currently valid. Consider the DatabaseField example:

struct DatabaseField
{
    enum TypeTag { typeNull, typeInt, typeDouble, 
 typeString, typeDate }
        tag_;

   union
    {
        int fieldInt_;
        double fieldDouble_;
        char* fieldString_;
    Date fieldDate_;
    };
    ...
  };

Whenever you assign to any field* member of DatabaseField, you need to set the tag_ member appropriately as well. Conversely, before accessing the members of the union, you need to consult the tag_ first to make sure what exact type the DatabaseField cloaks.

This protocol can be encapsulated by a collection of get/set functions (GetInt, SetInt, GetString, SetString, etc.). The problem is, such an approach is clumsy, limited, and rigid. You need to maintain by hand a three-way relationship — type tag to data to get/set functions. Besides, couldn't there be a nicer interface than the clumsy get/set primitives?

The totally object-oriented approach to implementing discriminated unions is very simple: you derive all of your possible types in the union from a common class, say, DBField. So you have StringDBField and IntDBField and so on. Then, your discriminated union is simply a reference to DatabaseField. You use dynamic_cast (or whatever you call it in the OO language of choice) to retrieve the actual type.

This is one of those cases where object orientation is that square peg in a round hole. A hierarchy makes sense when your types have sensibly close operations; however, there's not much that strings, doubles, integers, and dates have in common. A hierarchy cannot effectively bring structure to such a setup. You need to either rely on casts or be obligated to write embarrassing functions, such as "give me the floating-point value of this date." Sure signs of a bad design.

A Modern Implementation

We will build together an industry-strength implementation of discriminated unions that offers almost everything that you would expect from a similar language-provided feature.

The first observation to make is that, given that discriminated unions are all about collections of types, we can nicely use the typelists we introduced in the previous column. Therefore, our first shot at implementation looks like this:

template <class TList>
class Variant // DiscriminatedUnion would be too long
{
    ...
};

You would instantiate Variant with a typelist:

typedef Variant<cons<int, double, String, Date>::type> DatabaseField;

In addition to letting the user choose the set of supported types, a good implementation should provide:

  • Value semantics: default and copy constructors, assignment, and destructor
  • Constructors that initialize Variant with a value of any type in the typelist
  • Assignment operator for all types in a typelist
  • Typesafe type interrogation and retrieval mechanisms
  • A swap primitive, useful in a variety of idioms
  • Conversions

Last but not least, Variant should provide efficient underlying plumbing. Don't forget we are talking about a primitive mechanism that is usually wired straight into the language.

Storage: It's Not Only about Efficiency

So, how do you want to store the actual value of the variant type?

The first shot at a design might just as well use a union for storage. It does exactly what we need — allows variables of different types to share the same underlying memory. With a little trick, we can take the types from a typelist and put them in a union. (Did you know that unions can be templates?)

template <class TList> union ConfigurableUnion;
  
template <class H, class T> 
union ConfigurableUnion< typelist<H, T> > 
{
    H head_;
    ConfigurableUnion<T> tail_;
};
  
template <class H, class T> 
union ConfigurableUnion< typelist<H, null_type> > 
{
    H head_;
};

ConfigurableUnion is a union in which you can nicely control the collection of types to store.

However, as soon as you try to do this:

typedef ConfigurableUnion<std::string, int> ImpossibleType;

the compiler reminds you that unions don't accept types with constructors and destructors as members, which renders ConfigurableUnion useless for any serious programs [5].

The second idea is to simply store pointers instead of straight types and to use free store allocation for the data. This idea would work, but is has two problems: a little one and a big one. (Exactly which is which depends on your views leaning more towards purism or towards performance.)

One problem is that this solution is not as efficient as the in-situ allocation that a union would offer. Using free store incurs a time overhead, and possibly a space overhead as well (due to the housekeeping data that the allocator must maintain).

One more fundamental (or less relevant, depending on your views) problem is that using free store allocation changes the outbound exception guarantees that Variant can make on construction. If using in-situ allocation, Variant's constructor could throw only the same kinds of exceptions as the copy constructor of the passed-in type. In particular, if you instantiate Variant with types whose copy doesn't throw, the Variant won't throw as well. Variant blends in nicely in any code that uses exceptions without biasing it — in a sense, Variant is to exceptions what tofu is to cooking.

If we design Variant to use free store allocation, its constructors can throw std::bad_alloc in addition to whatever behavior its stored types have. In particular, it can throw exceptions even when instantiated with "nice" types, such as int or char. This is not good because it loosens the guarantees that Variant can make to its clients, which in turn must loosen the guarantees that they can offer, and so on.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.