Channels ▼
RSS

Flexible C++ #11: Imperfect enums, part 1: Declarations, Definitions, and Namespace Leakage


Welcome to a series on enumerations in C and C++, in which I cover their uses, good practices for managing them, and two imperfections in the way the language handles them. In this first installment, I cover the overview of enumerations, and examine the first imperfection: the leakage of enumeration symbol names into the surrounding namespace. In the second instalment, next month, I address the issue of enumerations not being legally forward declarable, and look at a standards-compliant technique for emulating this functionality.

What's an enum?

Quoting straight from the source [1], an enumeration is "a type that can hold a set of values specified by the [programmer]". Enumerations may or may not be named, and may have one or more named values. The following are all valid enumerations:

enum Day { MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY, SUNDAY };

enum { size = 4 };

enum Verbosity
{
    Unknown = -1
  , Silent  = 0
  , Terse   = 1
  , Chatty  = 2
  , Verbose = 3
  , Default = Terse
};

The members of an enumeration, known as "enumerators" ([2]), hold integral values. When no values are explicitly specified, the enumerators are assigned consecutive values starting at 0. Hence, in the Day enumeration, MONDAY has the value 0, and SUNDAY the value 6. In the case where some enumerators are explicitly given values and others are not, the ones that are not are automatically indexed from their preceding enumerator. For example, you might have started the Day enumeration at 1 by explicitly assigning the value 1 to MONDAY, in which case TUESDAY would have been equal to 2, WEDNESDAY equal to 3, and so on.

The type of an enumerator in a named enumeration is its enumeration, i.e. the type of Silent is Verbosity. A named enumeration defines a distinct type that can participate in function overloading, as in:

1 void f(int );
2 void f(Day );
3 void f(Verbosity );

f(size);     // calls 1
f(Terse);    // calls 3
f(SATURDAY); // calls 2
f(10);       // calls 1

(Note that we can use the enumerators' names directly, since they live in the surrounding scope. This is the first imperfection, which I deal with later in this article.)

Enumerators are implicitly convertible to int, as in:

4 void g(int );
5 void g(Verbosity );

g(size);     // calls 4
g(Terse);    // calls 5
g(SATURDAY); // calls 4
g(10);       // calls 4

But the converse is not true:

6 void h(Day );
7 void h(Verbosity );

h(size);     // compile error
h(Terse);    // calls 7
h(SATURDAY); // calls 6
h(10);       // compile error

The type of an enumerator in an unnamed, or anonymous, enumeration is int, and enumerators for named enumerations are implicitly convertible to int. Hence:

8  void i(short );
9  void i(int );
10 void i(long );

i(size);     // calls 9
i(Terse);    // calls 9
i(SATURDAY); // calls 9

There are a few other subtleties with enumerations [1], but the foregoing summarises the important aspects of their syntax.

Uses

There are three primary uses of enumerations: Named Value Enumerations; Bit Flag Enumerations; Member Constant Enumerations.

Named Value Enumerations: Strongly Valued Types

A Named Value Enumeration (NVE) is a means of defining a strongly valued type, that is to say a type in which all possible values of the type are known at compile time. Both the Day and Verbosity enumerations are examples of NVEs. NVEs are used to add type-safety and readability, as well as an aid to compilers, which are able to warn if, for example, a switch statement uses some, but not all, of the values in an NVE, without specifying a default value.

Bit Flag Enumerations: Grouping Options

A Bit Flag Enumeration (BFE) is a means of grouping related bit flags, which, when used in concert, are used to moderate the behavior of some component. For example, the recls [3] API defines the enumeration RECLS_FLAG, for use with the Recls_Search(), Recls_SearchProcess(), and Recls_Stat() functions.

enum RECLS_FLAG
{
    RECLS_F_FILES           = 0x00000001 /*!< Include files in search. Default if none specified */
  , RECLS_F_DIRECTORIES     = 0x00000002 /*!< Include directories in search */
  , RECLS_F_LINKS           = 0x00000004 /*!< Include links in search. Ignored in Win32 */
  , RECLS_F_DEVICES         = 0x00000008 /*!< Include devices in search */
  , RECLS_F_TYPEMASK        = 0x00000FFF
  , RECLS_F_RECURSIVE       = 0x00010000 /*!< Searches given directory and all sub-directories */
  , RECLS_F_NO_FOLLOW_LINKS = 0x00020000 /*!< Does not expand links */
  , RECLS_F_DIRECTORY_PARTS = 0x00040000 /*!< Fills out the directory parts */
  , RECLS_F_DETAILS_LATER   = 0x00080000 /*!< Does not fill out anything other than the path */
  , RECLS_F_PASSIVE_FTP     = 0x00100000 /*!< Passive mode in FTP */
};

RECLS_FNDECL(recls_rc_t) Recls_Search(        recls_char_t const        *searchRoot
                                            , recls_char_t const        *pattern
                                            , recls_uint32_t            flags
                                            , hrecls_t                  *phSrch);

RECLS_FNDECL(recls_rc_t) Recls_SearchProcess( recls_char_t const        *searchRoot
                                            , recls_char_t const        *pattern
                                            , recls_uint32_t            flags
                                            , hrecls_process_fn_t       pfn
                                            , recls_process_fn_param_t  param);

RECLS_FNDECL(recls_rc_t) Recls_Stat(          recls_char_t const        *path
                                            , recls_uint32_t            flags
                                            , recls_info_t              *phEntry);

A search is conducted based on the absence/presence of these various flags in combination, as in:

// Search for all C++ implementation files in current directory and any sub-directories
recls_rc_t rc = Recls_Search(".", "*.cpp", RECLS_F_FILES | RECLS_F_RECURSIVE, &hSrch);

// Search for all directories in current directory only
recls_rc_t rc = Recls_Search(".", "*", RECLS_F_DIRECTORIES, &hSrch);

// Search for all directories and files on the system under /usr/include, and fill out the directory parts for each entry found
recls_rc_t rc = Recls_Search("/usr/include", "*", RECLS_F_DIRECTORIES | RECLS_F_FILES | RECLS_F_RECURSIVE | RECLS_F_DIRECTORY_PARTS, &hSrch);

This works because enumerators implicitly convert to integers, and can therefore participate in bit-wise OR operations. In this case, I combine the bit-flag enumerators to be passed to the Recls_???() functions in the form of a 32-bit unsigned integer. Inside the implementations of such functions the combined value--now integral type, remember, not RECLS_FLAG any more--are tested using bit-wise AND, as in:

if(RECLS_F_DIRECTORY_PARTS == (flags & RECLS_F_DIRECTORY_PARTS))
{
  numParts = count_dir_parts(dir0, end);
}

Hybrid Enumerations: Named Values & Bit-flags

There are some occasions where an enumeration is a hybrid of NVE and BFE, as in the following:


enum FileOperations
{
    Copy    = 1
  , Delete  = 2
  , Move    = 3

  , WithVisualFeedback  = 0x1000
  , OverrideReadOnly    = 0x2000
};

In this case, the Copy, Delete, and Move enumerators form the NVE part, and represent distinct and separate operations. Conversely, the WithVisualFeedback and OverrideReadOnly enumerators are flags that are used to moderate the file operation. It's customary to see such things because gathering them all in one enumeration emphasises their relatedness, even if it does blur the enumeration concept(s) somewhat.

Member Constant Enumerations

Given the wealth of meta-programming in modern C++ practice, it is now commonplace to see member constants in classes. And as of C++-98, the language supports them in the form of initialized static const member variables of integral type. For example:

class MyStuff
{
public:
  static const int  is_pointer    = 0; // Member constant
  static const int  is_reference  = 0; // Member constant
  static const int  maxLimit;          // const static member
  . . .
};

is_pointer and is_reference are member constants, and can take part in evaluation at compile-time. (And unless their addresses are taken, they do not need to occupy any storage.) Because it is not initialized in the class definition, maxLimit is just a "regular" static member variable, albeit a const one, and may not take part in compile-time evaluations. It must be separately initialized outside the class, e.g.

const int MyStuff::maxLimit = config["MyStuff.MaxLimit"];

Some older, but still widely used compilers do not support member constants (see [4]). When writing libraries for maximum portability, therefore, it can be preferable to use enumerations, as in:

class MyStuff
{
public:
  enum
  {
      is_pointer    = 0
    , is_reference  = 0
  };
  . . .
};

I prefer using unnamed enumerations because they stand out better than member constants. But this may be experiential bias, and therefore a circular argument. Who can say? What we can say is that we support a greater number of compilers by using enumerations than using constants [4].

When used as member constants it is a good idea to make it clear to users of your code that that's the intent by defining each "constant" within its own enumeration. Consider the following definitions of a num_traits template (8-bits per byte are assumed):

template <>
struct num_traits<uint32_t>
{
    enum { bytes = 4, bits = 8 * bytes };
};

template <>
struct num_traits<uint32_t>
{
    enum { bytes = 4         };
    enum { bits  = 8 * bytes };
};

I would suggest that the second form makes it unequivocally clear that bytes and bits are member constants that are unrelated by type. Although most people would probably not take it so, it is possible to see the first form as a NVE, which it most certainly is not.

Layout

As with most other parts of C and C++, the use of whitespace within the definition of an enumeration is pretty much unrestricted. You can, therefore, use a variety of formats.

For NVEs that don't have explicit initializers, I don't see much of significance to choose between horizontal, or vertical, or mixed layout, other than following one's existing coding conventions. Hence:


enum Day { MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY, SATURDAY, SUNDAY };

enum Day
{
    MONDAY
  , TUESDAY
  , WEDNESDAY
  , THURSDAY
  , FRIDAY
  , SATURDAY
  , SUNDAY
};

enum Day { MONDAY, TUESDAY, WEDNESDAY, 
           THURSDAY, FRIDAY, SATURDAY, SUNDAY };

However, when there are initializers involved, I think a vertical layout is preferable:

enum Day
{
    MONDAY = 1
  , TUESDAY = 2
  , WEDNESDAY = 3
  , THURSDAY = 4
  , FRIDAY = 5
  , SATURDAY = 6
  , SUNDAY = 7
};

and even more so with strict vertical alignment of the enumerators and of their initializers:


enum Day
{
    MONDAY    = 1
  , TUESDAY   = 2
  , WEDNESDAY = 3
  , THURSDAY  = 4
  , FRIDAY    = 5
  , SATURDAY  = 6
  , SUNDAY    = 7
};

since it helps you to spot where you might have used the same initializer for two enumerators. With BFEs style issues become much more important. Consider an alternative but equivalent form of the RECLS_FLAG enumeration:

enum RECLS_FLAG
{
    RECLS_F_FILES = 1, RECLS_F_DIRECTORIES = 2, RECLS_F_LINKS = 4
  , RECLS_F_DEVICES = 8, RECLS_F_TYPEMASK = 4095, RECLS_F_RECURSIVE = 65536
  , RECLS_F_NO_FOLLOW_LINKS = 131072, RECLS_F_DIRECTORY_PARTS = 262144
  , RECLS_F_DETAILS_LATER   = 524288, RECLS_F_PASSIVE_FTP     = 1048576
};

In contrast with the actual definition, this has lost a lot of the information implicitly used by humans when reading enumeration definitions. First, the use of decimals has made it unclear, for all but the greatest adepts at base converting, that we're dealing with a BFE rather than a NVE. Sure, we can all spot a 65536 and know it's 0x10000, but who's got the hex equivalent of 1048576 imprinted in their minds?

Second, the mixed vertical/horizontal layout makes it hard to see how many enumerators there are. You've also lost the obvious relationship between the first four members, which are used to select the types of file-system entitles to be retrieved, and the mask flag, RECLS_F_TYPEMASK (0xFFF). The loss of the strict vertical alignment, including that of the initializer values, means you're forced to think more, even if the values were expressed in hex.

I think there's little equivocation on the value of the use of spaces in laying out enumeration values in the cases shown. In my opinion, BFEs should always be explicitly initialized, vertically aligned (including the initializers), and the values expressed in hexadecimal (and vertically aligned).

Managing Initializer Values

Note that I've used hexadecimal constants in the initializers for BFEs shown thus far. There are other numbering schemes that can be useful when one needs to set up a BFE's values, using left-shift operations. Consider the simple BFE FileAttributes:

enum FileAttributes
{
    ReadOnly    = 0x0001
  , Hidden      = 0x0002
  , System      = 0x0004
  , Temporary   = 0x0008
  , Compressed  = 0x0010
};

It can be expressed in two alternate, but equivalent, forms:

enum FileAttributes
{
    ReadOnly    = 1 << 0
  , Hidden      = 1 << 1
  , System      = 1 << 2
  , Temporary   = 1 << 3
  , Compressed  = 1 << 4
};

and:


enum FileAttributes
{
    ReadOnly    = 0x0001
  , Hidden      = ReadOnly   << 1
  , System      = Hidden     << 1
  , Temporary   = System     << 1
  , Compressed  = Temporary  << 1
};

Naming

There are two issues with enumeration naming: how to name the enumeration, and how to name the enumerators. Strangely, and perhaps uniquely in C++, the naming of the members of the construct, the enumerators, is more important than the name of the enumeration itself.

I'm going to look at why this is so, the problems that it causes, and a mechanism for obviating those problems later in this instalment. For now, I concentrate on the names themselves.

Enumeration Names

People have different naming schemes for types, usually LikeThis or like_this, and I'd suggest that enumeration names should follow along with whichever scheme is in use. I think the only point to be made about the naming of enumerations is whether one should use plurals or not. For example, would you call it "Day" or "Days", "Verbosity" or "Verbosities", "RECLS_FLAG" or "RECLS_FLAGS"? My preference is to go for a singular name, since many enumeration names, especially NVEs, would not be appropriate in plural form. With BFEs, there's more of a case for plurals, but I prefer to go for consistency, and be singular for all.

One thing worth nothing is that, although most of us tend to do so, there's actually no syntactic purpose to providing names for BFEs, since they're intended to be used in combination, in which guise they are of integral type and not the type of the enumeration. We tend to do so out of habit, or to enhance readability, or to facilitate auto-documentation tools.

Enumerator Names

When it comes to enumerator names, it often depends on whether your enumeration is purely for C++ compilation, or for C and/or C/C++. In the latter case, it's better to use some kind of prefix for the enumerator names, as we've seen with the RECLS_FLAG enumerators, which are all prefixed with RECLS_F_ for this reason. After all, there's only one namespace in C, the global namespace, and it's pretty big. Calling an enumerator size in C is a really bad idea.

In C++ compilation enumerations, like any other type, may be defined within a specific namespace--whether the global namespace, a named namespace, or the namespace of a class/struct/union--so it's possible to avoid conflicts with arbitrary names from other namespaces. However, there are still problems.

Some textbooks [1,5] advocate or demonstrate that enumerator names (for C++ enumerators) be in UPPER_CASE_FORMAT. My personal preference is to go for lower_case_format. However, I'm going to argue here that both are ill advised (albeit the second is much less so).

The good thing about C++, and namespaces, is that one can avoid names from other namespaces. However, you cannot avoid names from the pre-processor namespace, and you cannot avoid reserved words. Hence, neither of the following enumerations are compilable:

enum Verbosity
{
    unknown = -1
  , silent  = 0
  , terse   = 1
  , chatty  = 2
  , verbose = 3
  , default = terse
};

enum STRING_STATE
{
    EMPTY
  , NULL
  , LOWERCASE
  , UPPERCASE
};

In the first case, the name default clashes with the C/C++ keyword. In the latter, the name NULL clashes with the C/C++ standard pre-processor symbol (defined in stddef.h). Well, that's fine, you may say, since you can change them, perhaps to the uglified dfault, defaultVerbosity, NULL_STATE, etc. But there are two reasons why this is the wrong tack. First, by uglifying some enumerators in an enumeration you decrease its usability, requiring users to think about how your enumeration is named when they'd be better off thinking about how it may be put to use.

Second, and of greater importance, is the fact that this is putting out a bush-fire a bucket at a time. Sure, you can readily identify the reserved words for the language(s) supported by our enumeration. However, it is, in principal, impossible to anticipate the (global) pre-processor context within which your code might find itself. Given that, it seems nothing short of foolhardy to use the same naming scheme for our enumerators that is the de-facto naming standard for the pre-processor namespace. Hence, I recommend that enumerators for C++-only compilation use the CamelCaseFormat or lowerCamelCaseFormat. An exception to this can be with MCEs, which may more properly take the natural form of the ambient coding convention for member variables.

Maintenance: Managing Changes to Enumerators

An enumeration defines, at a given point in time, the valid values, in the form of its enumerators, for that type. However, as with all other software, it is subject to evolution. Therefore, it behoves you to consider how enumerations can be best managed in light of changes.

If the code base for a particular enumeration is entirely self-contained within your development organisation/team, you can change the constitution of enumerators with relative ease; the only constraint is that any part of the code base that utilizes the enumeration is recompiled and retested.

However, if you're writing open-source library code, or code that's going to interface at a binary level with other link-units (see chapters 7 and 8 of Imperfect C++ [4]) which may or may not be recompiled in light of such changes, it's much more important that you exercise due care in the changes. I'll consider the ramifications for the different enumeration types.

Changing NVE Enumerators

The rule for changing NVEs in such circumstances is very simple: never make any changes that will change the value of an extant enumerator. This is somewhat similar to the principle for API functions, and for COM interface definitions. Because you may compile code with the new enumerator values that may have to interact with code compiled with the old enumerator values, you must not change the values of any enumerators that the old code "knows about".

A practical example of this is the Win32 API's TOKEN_INFORMATION_CLASS enumeration, whose enumerators relate to the types of information retrievable from an access token (a Win32 security object representing the security information associated with a logon session). This enumeration is expanded according to the expansion of the Win32 security model with each evolution of the NT family operating systems. For example, an early form of the enumeration, included with the Visual C++ compiler versions 2.0-5.0, is defined as follows:

typedef enum _TOKEN_INFORMATION_CLASS {
  TokenUser = 1,
  TokenGroups,
  TokenPrivileges,
  TokenOwner,
  TokenPrimaryGroup,
  TokenDefaultDacl,
  TokenSource,
  TokenType,
  TokenImpersonationLevel,
  TokenStatistics
} TOKEN_INFORMATION_CLASS;

(Note: the use of the typedef enum X { . . .} Y; idiom is only necessary for compatibility with C. In C++, enumerations may be referenced without qualification of the enum keyword, so you can just efine enum Y { . . . }; and be done. Also note that the use of a leading underscores is reserved for the implementation (see C99 [6], section 7.1.3), and you should not do the same in your own code.)

Later versions have added, up to the time of writing, the following enumerators:

  TokenRestrictedSids,
  TokenSessionId,
  TokenGroupsAndPrivileges,
  TokenSessionReference,
  TokenSandBoxInert,
  TokenAuditPolicy,
  TokenOrigin

None of the original enumerators have been removed, or reordered, or had their values explicitly changed. Hence, they all still retain the same values: e.g. TokenDefaultDacl has the value 6 now in Windows Server 2003 just as it did with Windows NT 3.51. By adding new values on to the end of the enumerator list, the old enumerators retain their value, and the enumeration maintains its integrity with respect to backwards compatibility. Was this not the case, of course, a program compiled on a later operating system (but using only features common to all versions) would not work on an earlier operating system. Naturally, such a thing would be a veritable deathblow to the Windows NT family, and Microsoft's famed backwards compatibility.

Note that inserting an enumerator into the middle of the list would change the values for all subsequent enumerators. This is a big no-no for NVEs.

Now consider what we can do if we want to deprecate an enumerator, say to represent a now-deleted feature. For example, the TokenAuditPolicy is documented as "Reserved for future use". Let's assume that no such future use is identified, and we want to get rid of it. You cannot simply remove it from the enumeration, because then the value of TokenOrigin (== 17) , which is used, would change, and all hell would break loose for any program recompiled with the new definition of TOKEN_INFORMATION_CLASS. There are three options available:

1. Leave the enumerator as it is. Unfortunately, this has the effect of leaving any code using it--say a TIC2String() function--as still valid, whereas you (probably will) want to identify any such uses via compile-time errors.

2. Remove the enumerator, and explicitly correct the value of the next one in the list, as in:

typedef enum _TOKEN_INFORMATION_CLASS {
  TokenUser = 1,
  . . .
  TokenSandBoxInert,
  TokenOrigin = TokenSandBoxInert + 2
} TOKEN_INFORMATION_CLASS;

Unfortunately, this can get very hairy in the general case, if anyone should have cause to remove TokenSandBoxInert or TokenOrigin, in the future. Although when you're absolutely concretely sure that you're only ever going to do it once, it's arguably acceptable, but even then it's something I'd never trust in.

3. Leave the enumerator in place, but change its name: e.g. TokenAuditPolicy would change to TokenAuditPolicy_NOW_DEPRECATED. This is the approach I favor, since it causes the desired compilation errors but avoids the fragility of removal + manual patching of the numbering. Furthermore, it's also the most robust form when you want to allow explicitly-requested backwards compatibility:

typedef enum _TOKEN_INFORMATION_CLASS {
  TokenUser = 1,
  . . .
  TokenSandBoxInert,
#if defined(TOKEN_INFORMATION_CLASS_ALLOW_TOKENAUDITPOLICY)
  TokenAuditPolicy,
#else /* ? TOKEN_INFORMATION_CLASS_ALLOW_TOKENAUDITPOLICY */
  TokenAuditPolicy_NOW_DEPRECATED,
#endif /* ? TOKEN_INFORMATION_CLASS_ALLOW_TOKENAUDITPOLICY */
  TokenOrigin /* No need for manually specifying values here!! */
  . . .
} TOKEN_INFORMATION_CLASS;

or the somewhat less ugly:

typedef enum _TOKEN_INFORMATION_CLASS {
  TokenUser = 1,
  . . .
  TokenSandBoxInert,
  TokenAuditPolicy_NOW_DEPRECATED,
  TokenOrigin /* No need for manually specifying values here!! */
  . . .
  /* At the end of the enumeration */
#if defined(TOKEN_INFORMATION_CLASS_ALLOW_TOKENAUDITPOLICY)
  TokenAuditPolicy = TokenAuditPolicy_NOW_DEPRECATED,
#endif /* ? TOKEN_INFORMATION_CLASS_ALLOW_TOKENAUDITPOLICY */
} TOKEN_INFORMATION_CLASS;

Changing BFE Enumerators

Since BFE enumerators are (or at least should be!) explicitly initialized, addition/removal of an enumerator shouldn't be a problem. For example, you could remove the RECLS_F_DETAILS_LATER flag (which feature has never been implemented) from the RECLS_FLAG enumeration without damaging the integrity of any code that depends on the other values. (Of course, code that uses RECLS_F_DETAILS_LATER would be broken, and would require a fix, but that's outside the scope of the discussion. We're addressing the issues of avoiding the introduction of bugs, not a general philosophy of maintenance and refactoring.) RECLS_FLAGis immune to damage in this case because the value of RECLS_F_DETAILS_LATER is given an explicit and absolute initializer.

Adding new enumerators to the enumeration simply involves their insertion at an appropriate place, along with selection of the appropriate value. Note that, because BFEs are (or should be) explicitly initialized it is possible for the programmer to mistakenly use a value that is already used in another enumerator. For example, we might add RECLS_F_CALLBACK_PROGRESS with a value of 0x00080000, which is already used by RECLS_F_DETAILS_LATER.

BFEs may use different initialization schemes. We've already looked at explicit decimal initialization, and pretty much discounted that. But there are two other alternatives to the explicit hexadecimal initialization scheme that are of far greater utility. Consider if we'd built up RECLS_FLAG with using relative left-shifts, as in:

enum RECLS_FLAG
{
  . . .
  , RECLS_F_RECURSIVE       = . . .
  , RECLS_F_NO_FOLLOW_LINKS = RECLS_F_RECURSIVE       << 1
  , RECLS_F_DIRECTORY_PARTS = RECLS_F_NO_FOLLOW_LINKS << 1
  , RECLS_F_DETAILS_LATER   = RECLS_F_DIRECTORY_PARTS << 1
  , RECLS_F_PASSIVE_FTP     = RECLS_F_DETAILS_LATER   << 1
};

Now, the removal of RECLS_F_DETAILS_LATER would cause RECLS_F_PASSIVE_FTP to be broken, with the obvious ramifications. To avoid this you'd either have to remember to change the initializer for RECLS_F_PASSIVE_FTP to be RECLS_F_DIRECTORY_PART << 2, or leave in RECLS_F_DETAILS_LATER but change its name (e.g. RECLS_F_DETAILS_LATER_now_deprecated). Clearly this initialization scheme is really unattractive when it comes to removal of elements. By contrast, however, addition of a new element is very straightforward and low risk. You simply follow the scheme

enum RECLS_FLAG
{
  . . .
  , RECLS_F_PASSIVE_FTP       = RECLS_F_DETAILS_LATER   << 1
  , RECLS_F_CALLBACK_PROGRESS = RECLS_F_PASSIVE_FTP     << 1
};

The one caveat to this is that it is possible to overrun the limit of the enumeration range (enums are generally limited ), without realizing it.

Using the absolute left-shift technique is much less fragile with respect to removal of elements. But I'd still caution that, in my opinion, the resultant enumeration might still precipitate an ill-advised tidying by a compulsively neat maintenance programmer [7]:

enum RECLS_FLAG
{
  . . .
  , RECLS_F_RECURSIVE       = . . .
  , RECLS_F_NO_FOLLOW_LINKS = 1 << 16
  , RECLS_F_DIRECTORY_PARTS = 1 << 17
  , RECLS_F_PASSIVE_FTP     = 1 << 19 /* Better just tidy that up to an 18 ... !!! */
};

Adding an element, like the explicit integral initializer scheme, requires the programmer to select the right number, but one is much less likely to overflow, since eyebrows will be raised as soon as one sees 1 << 32 in the list.

From the foregoing analysis, I think it's clear that each scheme has its pros and cons. For my part, I'll be sticking with the absolute hexadecimal initializer form.

Changing MCE Enumerators

Since MCEs should be unnamed and, if you take my advice, contain only one enumerator each, removing, adding, or changing the value of MCE enumerators, will not, in and of itself, impact the integrity of the other enumerators (in the other MCEs) within the containing type.

More fundamentally, since MCEs are mostly for the purpose of providing compile-time characteristics of the types within which they reside, the ramifications for runtime are much reduced, if not entirely eliminated. Of course, some member constants do have runtime effects, but the ramifications of changing them are almost exactly the same as they are for static const members, and integral member constants (see Chapter 15 of Imperfect C++ [4] for discussion), and so are outside the scope of this article.

Suggested Enum Syntax Guide

Given the issues highlighted in the foregoing, I'd like to suggest a syntax guide to aid in distinguishing between the three forms of enumerations in your code:

Table 1
Feature NVE BFE MCE
Enumeration has name? Yes; singular Optional No
Has explicit initializers Optional Yes; absolute hexadecimal preferred Yes; usually 0 or 1
Layout Any; vertical if has initializers Vertical; initializers also vertical --
Initializer base Optional; usually decimal Hexadecimal Decimal
Separate enumeration per enumerator No No Yes
Enumerator naming convention: C++ only CamelCase CamelCase Ambient member variable naming
Enumerator naming convention: C or C/C++ Prefixed; preferably uppercase Prefixed uppercase --

Imperfection: Name Leakage

Section 7.2;10 of the C++ standard ([8]) states that "The enum-name and each enumerator declared by an enum-specifier is declared in the scope that immediately contains the enum-specifier." This means that the enumerators in the Day and Verbosity enumerations, along with the size enumerator in the unnamed enumeration, are all declared in the global namespace. As we saw in the function calls above, it is not necessary to qualify them with their enumeration name. Indeed, to attempt to do so is a syntax error, and will elicit a compiler error along the lines of "name followed by "::" must be a class or namespace name".

i(Verbosity::Terse); // Compile error

You can say that the names of the enumerators of an enumeration "leak" into the enclosing namespace, and this is necessary to be compatible with C. But this can be a big problem if you're defining enumerations within the global namespace: any time you've got naming ambiguities, you've got surprising code, and surprising code isn't really a nice thing to anyone other than the uber-geek who resides in the corner office. Given our discussion above (see section "Naming"), it's reasonable to narrow this debate to NVEs within exclusively C++ compilation units.

You might suggest that the solution is eminently simple: always define your enumerations, along with you other types, within namespaces. Alas, that's not sufficient.

First, there's the real position that many people write application code in the global namespace. That's perfectly valid, and to proscribe that practice to avoid enumerator name conflicts is neither reasonable nor practicable.

Second, that doesn't even cover it. Let's look at a couple of cases. First, we might have an enumerator clash with a function within a given namespace, e.g.

namespace Local
{
  enum SearchType
  {
      files       = 0x0001
    , directories = 0x0002
  };

  . . .

  vector<string> files(char const *dir); // Compile error!
};

The enumerator files clashes with the function files(). One might look at this code and say that it's easily avoided by selecting appropriate names. But this mistakes the way namespaces work. Namespaces are "open"--you add other definitions to it at a later point--so the enumeration and the function declaration might be in different namespace blocks


namespace Local
{
  enum SearchType
  {
      files       = 0x0001
    , directories = 0x0002
  };
};

  . . .

namespace Local
{
  vector<string> files(char const *dir); // Compile error!
};

They might even be in separate files, and have coexisted within the same source code base, without being in the same compilation unit, for a considerable time. Fixing such a circumstance would be quite a pain.

A real scenario I encountered recently caused me enough consternation to prompt me to work towards the solution described here. I was developing a communications system for a client where the messages passed between hosts were composed of fields of different types. Some were alphanumeric; others were BCD. Some were variable length; others fixed. Furthermore, the definitions of the fields and the messages were also dependent on the protocol dialect utilised by a particular vendor. To cope with all this complexity, we naturally went for auto generation of the field and message classes from configuration files (using Open-RJ [9] of course ).

Field types were identified by the FieldType enumeration, with enumerators such as BcdFixed, AlphaNumericSpecial, and so on.

enum FieldType
{
    Unknown = 0
  , BcdVariable
  , BcdFixed
  , Track2Data
  , AlphaNumeric
  , AlphaNumericSpecial
  , AlphaNumericSpecial_Variable
};

The problem we had was that these same names, which perfectly describe the field types, are also desirable as the names of the field classes. (Thankfully my client is not among the shrinking number of organisations that use the horrible MFC class-name prefix affectation CMyClass.) One of the advantages of higher-level languages is that one can use descriptive names for constructs. So it seems undesirable to have to use an inappropriate name for a class type, or have to affect an obfuscatory syntax on enumerators. What could we do?

At this point it's instructive to see what other languages do. Java avoids the problem by not having enumerations: Ta-Da! The same goes for Python and Ruby. Not helpful. By contrast, C# and D both define the enumeration as a distinct namespace, which means that one must specify the enumerator by the enumeration.enumerator syntax, as in:

FieldType ft = FieldType.BcdFixed;

The Solution: the Namespace-Bound Enumeration technique It was at this point that I had my Eureka moment. (As with all such "insights", it is distressingly and prosaically obvious in retrospect.) You want to isolate the enumerators in a separate namespace. So why not put the enumeration in a separate namespace? And that's exactly the technique:

namespace FieldType
{
  enum FieldType
  {
      Unknown = 0
    , BcdVariable
    , BcdFixed
    , Track2Data
    , AlphaNumeric
    , AlphaNumericSpecial
    , AlphaNumericSpecial_Variable
  };
} // namespace FieldType

Now the enumerator names do not clash with identical names in the surrounding namespace, because they're defined within the FieldType namespace. To access them outside this namespace they must be qualified, just as they would be in C# and D: FieldType::AlphaNumeric. The one complication is that the type of such an enumerator, from the perspective of the enclosing namespace (i.e. the one in which the enumeration is notionally defined, and in which the field and message types are defined), is FieldType::FieldType. Hence, the BcdFixed class would indicate its type by the function GetFieldType() declared as:

namespace FieldType
{
  enum FieldType;
}

class BcdFixed
  : public Field
{
  . . .
  virtual FieldType::FieldType GetFieldType() const;
  . . .
};

FieldType::FieldType BcdFixed::GetFieldType() const
{
    return FieldType::BcdFixed;
}

And that's it! I call it the Namespace-Bound Enumeration technique. Now your enumerator names need not leak out into their enclosing namespace, and you isolate your code against hard-to-fix potential incompatibilities in large codebases sharing a single, or a small number of, namespaces. (They may also perform more nicely with IDE code completion, as shown in a similar technique described by John Torjo [10].)

Note that this technique would not have been possible pre C++-98 by the use of classes to provide the namespace, since the class name is effectively reserved within a class definition as the type of the class itself, and the name of its constructor(s) and destructor.


class FieldType
{
  enum FieldType // Compile error!
  {
    . . .

Part 2 Preview

Next month I'm going to continue the discussion of Imperfect enums by looking at the issue of forward declaration, and demonstrate a technique for allowing forward declaration of enums in a 100% language-compliant manner.

Acknowledgements

Thanks to my usual crew of disparate code pirates for their insightful criticisms: Bjorn Karlsson, Christopher Diggins, Chuck Allison, Garth Lancaster, Greg Peet, John Torjo, Nevin :) Liber, Sean Kelly, Thorsten Ottosen, Walter Bright.

About the author

Matthew Wilson is a software development consultant for Synesis Software, and creator of the STLSoft libraries. He is the author of Imperfect C++ (Addison-Wesley, 2004), and is currently working on his next two books, one of which is not about C++. Matthew can be contacted via http://imperfectcplusplus.com/.

Notes and References

[1] The C++ Programming Language, Special Edition, Bjarne Stroustrup, Addison-Wesley, 2000.

[2] To those, like me, that digested way too much COM (Component Object Model) during the 1990s, the word enumerator will forever mean IEnumXXXX. You'll just have to keep reminding yourself "enumerator === named enum value". :-)

[3] recls is an open-source, platform-independent recursive search library. It's available from http://recls.org/

[4] Imperfect C++, Matthew Wilson, Addison-Wesley, 2004.

[5] C++ Coding Standards, Andrei Alexandrescu & Herb Sutter, Addison-Wesley, 2005

[6] ISO/IEC 9899

[7] I know this because I was an ill-advised maintenance programmer with neatness compulsions in my formative years. :-)

8] ISO/IEC 14882:98

[9] Open-RJ is an open-source, platform-independent structured file reader library for the Record JAR format. It's available from http://openrj.org/

[10] "Simplify your coding with user-friendly enumerations", John Torjo, TechRepublic, February 2005; http://techrepublic.com.com/5100-10548_11-5463083.html


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video