Dr. Dobb's | Flexible C++ #12: Imperfect enums, Part 2: Forward Declarations

Flexible C++ #12: Imperfect enums, Part 2: Forward Declarations

This second part of the Imperfect Enums series looks at the issue of the forward declaration of enumerations. Why would one ever want to do that, I hear you cry? Certainly it's not a common need. But I have encountered situations where it's required, one of which we'll discuss later.

May 01, 2005
URL:http://www.drdobbs.com/flexible-c-12-imperfect-enums-part-2-for/184403894

< html

Notwithstanding any requirements to do so, the language does not allow the forward declaration of enumerations. Why is that so? Ploughing the newsgroups seems to give three reasons for the illegality of the forward declaration of enumerations:

The size of an enumeration varies according to the range of values it represents, and not just on the size of (one of) the ambient architecture's type(s)
The use of forward declaration was put into the (C) language to cater for structures that may container pointers to the same type.
No-one thought about it at the time, and no-one has subsequently deemed it worthwhile.

The usual argument for forward declaration of enumerations is physical decoupling. As the language has matured, and been used for larger and larger projects, this issue has raised greater prominence. (See John Lakos' seminal work [1] for more information on physical coupling than you could shake a stick at; even this classic work, however, fails to offer a cogent and forthright tactic for dealing with enumeration coupling. It discusses using static/const members to replace class implementation constants, and discuses, though does not wholeheartedly recommend, the use of integral types instead of enumerations.)

One common scenario where enumerations are both desirable and undesirable is as a return code. We may choose to define our return code type, RC as an enumeration, as follows:

enum RC
{
    RC_SUCCESS        = 0
  , RC_OUT_OF_MEMORY
  , RC_INVALID_ARGUMENT
  , RC_BAD_BCD_DIGIT
  . . .  // etc. etc.
}

(This enumeration is not-namespaced, see Part 1 [2], perhaps because it's intended to be used by C and C++, so is using the RC_ prefix for symbol disambiguation.)

The advantage of defining RC as an enumeration is threefold. First, we get type-safety in the assignment to instances of RC. Second, we get uniquely defined return code values by default, so long as no-one gets the bright idea of applying a value to any but the zeroth element. (It's common practice to explicitly give the zeroth element the value 0 to aid readability, even though the compiler would do so automatically.) The third advantage is more prosaic: Integrated development environments are more likely to render you a human-readable symbol rather than an integral value in the debugger.

But there are two disadvantages to using enumerations. First, the order of the return codes thus defined may never be changed. If some order-obsessed maintenance programmer decides to move some around, or prune some now-defunct values, all manner of nasties will occur if two link-units compiled at different times are brought in to play together. As discussed in Part 1 [2], the rule is that you should never remove or change the order of extant items. How your development team defines extant in this case may vary, but at a minimum it should include values that have been built into released components.

The second objection to using enumerations for return codes is that it introduces physical coupling, and a lot of it. Consider a common development scenario, whereby the code of different components in a given product suite share core library functionality, including a set of common return-codes and their manipulating functions. Naturally, as the components evolve, they will require the introduction of new return codes. As long as ordering is not disrupted that's all fine and proper, but it does mean that logically independent changes result in physically dependent rebuild requirements.

The converse option for return codes is, of course, to use an integral type, and to define the values as constants (#defines in C, #defines/constants in C++). The advantages and disadvantages of this approach are the mirror image of the enumerations. First, the coupling, though it does not go away entirely, can be much reduced. This is because it is feasible, and may even be preferable, to allot specific ranges to the subsystems, and split the definitions of the return code values across separate include files. Further, partitioning into sub-systems, and enforcing the extant rule on a group basis can similarly dilute the ordering and pruning restrictions. The disadvantages are that we lose type-safety (unless you use True Typedefs [3,4]), have to manually ensure that values are distinct (unless we auto-generate the return code headers from a database tool), and we're more likely to be looking at uninformative integral values in the debugger watch window.

A Requirement for Forward Declaration

In recent work for a client, involving a disparate product suite, I made use of several enumerations. One was a return code type that was imaginatively called RC. With that one, we opted for enumeration rather than integer, and lived with the coupling because the overall scale of the project was not great in lines of code. (It was pretty large-scale in commercial terms, which is why we tended to err on the side of correctness.)

However, though we could live with the lack of forward declaration for enumerations for the RC type, we had cause to forward declare a different enumeration for another reason--to leverage re-use of a component without compromising on type-safety. Let me explain.

The product suite was a set of networking processes that carried out multiplexing, routing, arbitration and translation, tying together legacy systems operating different communications protocols (e.g. TCP/IP) and middleware (e.g. TibCo EMS). To deal with such cheek-blanching complexity, I designed a foundation message passing architecture piggy-backed on top of the Adaptive Communications Environment (ACE) [5]. The messages were represented as a reference-counted interface, INotification, which could carry arbitrary data with them, and were represented by an identifier. It was the type of this identifier, NotificationId, which we required to be an enumeration in order that we could maximise robustness.

As the product suite evolved, the notification mechanism was naturally migrated to a common arena within our source structure, such that each separate component--programs and dynamical libraries--could use it. However, the NotificationId values used by the different system components were disjoint sets, and we did not want to have all the physical coupling, and increase in complexity, involved were we to have all components share a common set of the union of all notifications. The idea of parts of the codebase needing to be "aware" of enumerators that are not part of their respective problem spaces was not attractive. We needed forward declaration of enumerations, such that the INotification interface and the notification infrastructure classes might be defined independently of the actual values of the notification ids, but still have all that type-safety.

Since my client's development team were using Visual C++ (6 and 7.1), we could have taken the cheap tactic, and used forward declaration of enumerations; Visual C++ is a member of the non-too-small group of compilers that supports them as a proprietary extension. However, because we were following one of the central messages of my book, Imperfect C++ [4], which is to compile your sources with a variety of compilers in order to catch as many warnings and errors as possible, and because using proprietary extensions should raise the hackles of all good engineers, we wanted to a standards-compliant solution.

The Forward Declaration

In Part 1 of this series [2], I advocated the use of a dedicated namespace for all non-member enumerations, in order to remove the possibility--an all too common likelihood in the real world--of symbol name clash between the enumeration values and other constructs (or macros!) in the compilation environment. Well, we can take further leverage from the namespace in order to provide forward declaration of enumerations.

If we're going to forward declare an enumeration legally, we're really going to have to have it masquerade inside something that can be legally forward declared: a class/struct/union. In this case, I chose a struct. Let's look at how the forward declaration is done first. Bearing in mind our lesson from Part 1 [2] to avoid leakage of enumerator names by wrapping in a namespace--the Namespace-Bound Enumeration technique--what we're aiming at is emulating:

namespace NotificationId
{
  enum  NotificationId;
} // namespace NotificationId

What the portable forward declared enumeration actually looks like is as follows (with the portable enumeration part highlighted):

// Forward declaration of NotificationId::NotificationId
namespace NotificationId
{
  struct  NotificationId__type;
  typedef NotificationId__type const &NotificationId;

} // namespace NotificationId

There's no enumeration at all, just a structure and a typedef. The typedef's there so that in client code you can write NotificationId::NotificationId just as you would with a regular enumeration:

class INotification
{
  . . .

  virtual NotificationId::NotificationId GetId() const = 0; // Valid with declaration only. 
                                                            //No need for definition here

  . . .
};

The Definition

Okay, so now we've got declaration. How do we do definition? Clearly it must involve definition of the NotificationId__type structure:

// Definition of NotificationId::NotificationId
namespace NotificationId
{
  enum NotificationId__enum
  {
      unknown = -1
    , null
    , systemShutdown
    . . . // etc. etc.
    , end // The sentinel value 
  };

  struct  NotificationId__type
  {
    NotificationId_(NotificationId__enum v)
      : m_value(v)
    {}
    operator NotificationId__enum() const
    {
       return m_value;
    }
  private:
    NotificationId__enum  m_value;
  };

} // namespace NotificationId

Macro-tidied syntax

Naturally we can easily encapsulate the complex declaration within a macro. (I don't like macros as a rule, but things like this are obvious exceptions to the principle.)

#define DECLARE_FWD_ENUM(X)   class X ## __type; typedef X ## __type const &X

There's a slight complication, however. We cannot declare it as a reference-to-const because the language does not allow double referencing, and we'd want to be able to support signatures such as:

void f(NotificationId const &id); // === void f(NotificationId__type const &&id);

So it has to be:

#define DECLARE_FWD_ENUM(X)   class X ## __type; typedef X ## __type X

This means that passing our "enum" by value results in passing copies of the NotificationId__type structure, but this is not an issue, because it's a very simple structure indeed, and compilers will optimise such things in their sleep. Using the macro form we can now neatly forward declare the NotificationIdenumeration (or any other, for that matter), in a full standards-compliant and portable form:

// INotification.h - common to all projects

namespace NotificationId
{
  DECLARE_FWD_ENUM(NotificationId)
}

class INotification
{
  . . .
  virtual NotificationId::NotificationId GetId() const = 0; // Valid w/ declaration only. 
                                                     //No need for full defn of enum here
  . . .
};

Similarly, the enumeration definition is also improved by the use of two macros DEFINE_FWD_ENUM_BEGIN() and DEFINE_FWD_ENUM_END(), which look like:

#define DEFINE_FWD_ENUM_BEGIN(X)                                \
                                                                \
    enum X ## __enum                                            \
    {


#define DEFINE_FWD_ENUM_END(X)                                  \
                                                                \
    };                                                          \
                                                                \
    DECLARE_FWD_ENUM(X);                                        \
                                                                \
    struct X ## __type                                          \
    {                                                           \
    public:                                                     \
        typedef X ## __enum     enum_type;                      \
    public:                                                     \
        static X ## __type cast(long l)                         \
        {                                                       \
            return X ## __type (static_cast(l));   \
        }                                                       \
                                                                \
    public:                                                     \
        X ## __type(X ## __enum e)                              \
            : m_e(e)                                            \
        {}                                                      \
                                                                \
        operator X ## __enum () const                           \
        {                                                       \
            return m_e;                                         \
        }                                                       \
                                                                \
    private:                                                    \
        X ## __enum m_e;                                        \
    };

// NotificationId.h - local to each project

and are used as follows:

namespace NotificationId
{
  DEFINE_FWD_ENUM_BEGIN(NotificationId)
      unknown = -1
    , null
    , systemShutdown
    . . . // etc. etc.
    , end // The sentinel value 
  DEFINE_FWD_ENUM_END(NotificationId)
}

Mechanism

So how does it all work? Well, to client code, the NotificationId::NotificationId needs to act like an enumeration, so it needs to be initialisable from any of the values [NotificationId::first, . . . , NotificationId::third]. This is achieved by giving NotificationId::NotificationId__type a conversion constructor from the actual enumeration NotificationId::NotificationId__enum. We also need to be able to use it in switch statements, which means it needs to have an implicit conversion operator to an integral type. Since enumerations are integral types, we can implicitly convert to NotificationId::NotificationId__enum itself, which is nice. Now we can write code such as the following:

void func(NotificationId::NotificationId id)
{
  switch(id)
  {
    case  NotificationId::null:
      fprintf(stdout, "Null notification\n");
      break;
    case  NotificationId::systemShutdown:
      fprintf(stdout, "Time to say goodnight\n");
      break;
    default:
      assert("Invalid notification id", 0);
      fprintf(stdout, "INVARIANT VIOLATION: Give up your day job!\n");
      break;
    case  NotificationId::end:
      assert("Unexpected notification id", 0);
      fprintf(stdout, "INVARIANT VIOLATION: The end is nigh!\n");
      break;
  }
}

Note that the enum-size objection to forward declaration of enumerations is now no longer an issue, since any user of any of the values of the enumeration will see the real underlying enumeration--NotificationId__enum. The forward declarative aspects are all encapsulated within the struct, and that's 100 percent legal. Sure, the full code's a bit verbose, but if you can bring yourself to use macros, then it's pretty straightforward. Naturally, you needn't forward declare macros that often, but now you can when you need to.

Cherry on the Cake

An added bonus with this is that you cannot declare uninitialized instances of the enumeration type, addressing a potential source of bugs in C and C++ that's been there since their inception!

NotificationId::NotificationId nid; // Compile error!

STLSoft's Version

These macros are included in STLSoft (http://stlsoft.org/) from version 1.8.3 onwards, in the form of the STLSOFT_DECLARE_FWD_ENUM(), STLSOFT_DEFINE_FWD_ENUM_BEGIN(), and STLSOFT_DEFINE_FWD_ENUM_END() macros. Just #include , and you're away!

Acknowledgements

Thanks to my usual crew of disparate code pirates for their insightful criticisms: Bjorn Karlsson, Christopher Diggins, Chuck Allison, Garth Lancaster, Greg Peet, John Torjo, Nevin Liber, Sean Kelly, Thorsten Ottosen, Walter Bright.

About the Author

Matthew Wilson is a software development consultant for Synesis Software, and creator of the STLSoft libraries. He is the author of Imperfect C++ (Addison-Wesley, 2004), and is currently working on his next two books, one of which is not about C++. Matthew can be contacted via http://imperfectcplusplus.com/.

Notes and References

[1] Large Scale C++ Software Design, by John Lakos, Addison-Wesley, 1996
[2] "Imperfect enums, Part 1", Matthew Wilson, C/C++ Users Journal Expert's Forum, April 2005
[3] "True Typedefs", Matthew Wilson, C/C++ Users Journal, March 2003
[4] Imperfect C++, by Matthew Wilson, Addison-Wesley, 2004.
[5] The Adaptive Communications Environment; http://www.cs.wustl.edu/~schmidt/ACE.html.