This second part of the Imperfect Enums series looks at the issue of the forward declaration of enumerations. Why would one ever want to do that, I hear you cry? Certainly it's not a common need. But I have encountered situations where it's required, one of which we'll discuss later.
May 01, 2005
URL:http://www.drdobbs.com/flexible-c-12-imperfect-enums-part-2-for/184403894
< html
This second part of the Imperfect Enums series looks at the issue of the forward declaration of enumerations. Why would one ever want to do that, I hear you cry? Certainly it's not a common need. But I have encountered situations where it's required, one of which we'll discuss later.
Notwithstanding any requirements to do so, the language does not allow the forward declaration of enumerations. Why is that so? Ploughing the newsgroups seems to give three reasons for the illegality of the forward declaration of enumerations:
The usual argument for forward declaration of enumerations is physical decoupling. As the language has matured, and been used for larger and larger projects, this issue has raised greater prominence. (See John Lakos' seminal work [1] for more information on physical coupling than you could shake a stick at; even this classic work, however, fails to offer a cogent and forthright tactic for dealing with enumeration coupling. It discusses using static/const members to replace class implementation constants, and discuses, though does not wholeheartedly recommend, the use of integral types instead of enumerations.)
One common scenario where enumerations are both desirable and undesirable is as a return code. We may choose to define our return code type, RC as an enumeration, as follows:
enum RC { RC_SUCCESS = 0 , RC_OUT_OF_MEMORY , RC_INVALID_ARGUMENT , RC_BAD_BCD_DIGIT . . . // etc. etc. }
(This enumeration is not-namespaced, see Part 1 [2], perhaps because it's intended to be used by C and C++, so is using the RC_ prefix for symbol disambiguation.)
The advantage of defining RC as an enumeration is threefold. First, we get type-safety in the assignment to instances of RC. Second, we get uniquely defined return code values by default, so long as no-one gets the bright idea of applying a value to any but the zeroth element. (It's common practice to explicitly give the zeroth element the value 0 to aid readability, even though the compiler would do so automatically.) The third advantage is more prosaic: Integrated development environments are more likely to render you a human-readable symbol rather than an integral value in the debugger.
But there are two disadvantages to using enumerations. First, the order of the return codes thus defined may never be changed. If some order-obsessed maintenance programmer decides to move some around, or prune some now-defunct values, all manner of nasties will occur if two link-units compiled at different times are brought in to play together. As discussed in Part 1 [2], the rule is that you should never remove or change the order of extant items. How your development team defines extant in this case may vary, but at a minimum it should include values that have been built into released components.
The second objection to using enumerations for return codes is that it introduces physical coupling, and a lot of it. Consider a common development scenario, whereby the code of different components in a given product suite share core library functionality, including a set of common return-codes and their manipulating functions. Naturally, as the components evolve, they will require the introduction of new return codes. As long as ordering is not disrupted that's all fine and proper, but it does mean that logically independent changes result in physically dependent rebuild requirements.
The converse option for return codes is, of course, to use an integral type, and to define the values as constants (#defines in C, #defines/constants in C++). The advantages and disadvantages of this approach are the mirror image of the enumerations. First, the coupling, though it does not go away entirely, can be much reduced. This is because it is feasible, and may even be preferable, to allot specific ranges to the subsystems, and split the definitions of the return code values across separate include files. Further, partitioning into sub-systems, and enforcing the extant rule on a group basis can similarly dilute the ordering and pruning restrictions. The disadvantages are that we lose type-safety (unless you use True Typedefs [3,4]), have to manually ensure that values are distinct (unless we auto-generate the return code headers from a database tool), and we're more likely to be looking at uninformative integral values in the debugger watch window.
However, though we could live with the lack of forward declaration for enumerations for the RC type, we had cause to forward declare a different enumeration for another reason--to leverage re-use of a component without compromising on type-safety. Let me explain.
The product suite was a set of networking processes that carried out multiplexing, routing, arbitration and translation, tying together legacy systems operating different communications protocols (e.g. TCP/IP) and middleware (e.g. TibCo EMS). To deal with such cheek-blanching complexity, I designed a foundation message passing architecture piggy-backed on top of the Adaptive Communications Environment (ACE) [5]. The messages were represented as a reference-counted interface, INotification, which could carry arbitrary data with them, and were represented by an identifier. It was the type of this identifier, NotificationId, which we required to be an enumeration in order that we could maximise robustness.
As the product suite evolved, the notification mechanism was naturally migrated to a common arena within our source structure, such that each separate component--programs and dynamical libraries--could use it. However, the NotificationId values used by the different system components were disjoint sets, and we did not want to have all the physical coupling, and increase in complexity, involved were we to have all components share a common set of the union of all notifications. The idea of parts of the codebase needing to be "aware" of enumerators that are not part of their respective problem spaces was not attractive. We needed forward declaration of enumerations, such that the INotification interface and the notification infrastructure classes might be defined independently of the actual values of the notification ids, but still have all that type-safety.
Since my client's development team were using Visual C++ (6 and 7.1), we could have taken the cheap tactic, and used forward declaration of enumerations; Visual C++ is a member of the non-too-small group of compilers that supports them as a proprietary extension. However, because we were following one of the central messages of my book, Imperfect C++ [4], which is to compile your sources with a variety of compilers in order to catch as many warnings and errors as possible, and because using proprietary extensions should raise the hackles of all good engineers, we wanted to a standards-compliant solution.
If we're going to forward declare an enumeration legally, we're really going to have to have it masquerade inside something that can be legally forward declared: a class/struct/union. In this case, I chose a struct. Let's look at how the forward declaration is done first. Bearing in mind our lesson from Part 1 [2] to avoid leakage of enumerator names by wrapping in a namespace--the Namespace-Bound Enumeration technique--what we're aiming at is emulating:
namespace NotificationId { enum NotificationId; } // namespace NotificationId
What the portable forward declared enumeration actually looks like is as follows (with the portable enumeration part highlighted):
// Forward declaration of NotificationId::NotificationId namespace NotificationId { struct NotificationId__type; typedef NotificationId__type const &NotificationId; } // namespace NotificationId
There's no enumeration at all, just a structure and a typedef. The typedef's there so that in client code you can write NotificationId::NotificationId just as you would with a regular enumeration:
class INotification { . . . virtual NotificationId::NotificationId GetId() const = 0; // Valid with declaration only. //No need for definition here . . . };
// Definition of NotificationId::NotificationId namespace NotificationId { enum NotificationId__enum { unknown = -1 , null , systemShutdown . . . // etc. etc. , end // The sentinel value }; struct NotificationId__type { NotificationId_(NotificationId__enum v) : m_value(v) {} operator NotificationId__enum() const { return m_value; } private: NotificationId__enum m_value; }; } // namespace NotificationId
#define DECLARE_FWD_ENUM(X) class X ## __type; typedef X ## __type const &X
There's a slight complication, however. We cannot declare it as a reference-to-const because the language does not allow double referencing, and we'd want to be able to support signatures such as:
void f(NotificationId const &id); // === void f(NotificationId__type const &&id);
So it has to be:
#define DECLARE_FWD_ENUM(X) class X ## __type; typedef X ## __type X
This means that passing our "enum" by value results in passing copies of the NotificationId__type structure, but this is not an issue, because it's a very simple structure indeed, and compilers will optimise such things in their sleep. Using the macro form we can now neatly forward declare the NotificationIdenumeration (or any other, for that matter), in a full standards-compliant and portable form:
// INotification.h - common to all projects namespace NotificationId { DECLARE_FWD_ENUM(NotificationId) } class INotification { . . . virtual NotificationId::NotificationId GetId() const = 0; // Valid w/ declaration only. //No need for full defn of enum here . . . };
Similarly, the enumeration definition is also improved by the use of two macros DEFINE_FWD_ENUM_BEGIN() and DEFINE_FWD_ENUM_END(), which look like:
#define DEFINE_FWD_ENUM_BEGIN(X) \ \ enum X ## __enum \ { #define DEFINE_FWD_ENUM_END(X) \ \ }; \ \ DECLARE_FWD_ENUM(X); \ \ struct X ## __type \ { \ public: \ typedef X ## __enum enum_type; \ public: \ static X ## __type cast(long l) \ { \ return X ## __type (static_cast(l)); \ } \ \ public: \ X ## __type(X ## __enum e) \ : m_e(e) \ {} \ \ operator X ## __enum () const \ { \ return m_e; \ } \ \ private: \ X ## __enum m_e; \ }; // NotificationId.h - local to each project
and are used as follows:
namespace NotificationId { DEFINE_FWD_ENUM_BEGIN(NotificationId) unknown = -1 , null , systemShutdown . . . // etc. etc. , end // The sentinel value DEFINE_FWD_ENUM_END(NotificationId) }
So how does it all work? Well, to client code, the NotificationId::NotificationId needs to act like an enumeration, so it needs to be initialisable from any of the values [NotificationId::first, . . . , NotificationId::third]. This is achieved by giving NotificationId::NotificationId__type a conversion constructor from the actual enumeration NotificationId::NotificationId__enum. We also need to be able to use it in switch statements, which means it needs to have an implicit conversion operator to an integral type. Since enumerations are integral types, we can implicitly convert to NotificationId::NotificationId__enum itself, which is nice. Now we can write code such as the following:
void func(NotificationId::NotificationId id) { switch(id) { case NotificationId::null: fprintf(stdout, "Null notification\n"); break; case NotificationId::systemShutdown: fprintf(stdout, "Time to say goodnight\n"); break; default: assert("Invalid notification id", 0); fprintf(stdout, "INVARIANT VIOLATION: Give up your day job!\n"); break; case NotificationId::end: assert("Unexpected notification id", 0); fprintf(stdout, "INVARIANT VIOLATION: The end is nigh!\n"); break; } }
Note that the enum-size objection to forward declaration of enumerations is now no longer an issue, since any user of any of the values of the enumeration will see the real underlying enumeration--NotificationId__enum. The forward declarative aspects are all encapsulated within the struct, and that's 100 percent legal. Sure, the full code's a bit verbose, but if you can bring yourself to use macros, then it's pretty straightforward. Naturally, you needn't forward declare macros that often, but now you can when you need to.
NotificationId::NotificationId nid; // Compile error!
Matthew Wilson is a software development consultant for Synesis Software, and creator of the STLSoft libraries. He is the author of Imperfect C++ (Addison-Wesley, 2004), and is currently working on his next two books, one of which is not about C++. Matthew can be contacted via http://imperfectcplusplus.com/.