Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Tagged Unions


January, 2005: Tagged Unions

Herb Sutter (http://www.gotw.ca/) chairs the ISO C++ Standards committee and is an architect in Microsoft's Developer Division. His most recent books are Exceptional C++ Style and C++ Coding Standards (Addison-Wesley). Jim Hyslop is a senior software designer for Leitch Technology International. He can be reached at [email protected].


Tag, you're it!" Wendy broke into my reverie.

"Hunh? Whah?" I responded deftly. Or was it daftly? "I was meditating."

"Sure you were, sport, sure. But like I said, you're it—Kerry needs deprogramming."

That woke me a little more. "Bob?"

"Bahb," she confirmed, and looked over her shoulder. "Oh no, here he comes. Gotta go!" And she disappeared around the corner.

I had no escape, and could only wait as voices got closer and finally Bob heaved into view with Kerry in tow. I mean that "heaved" almost literally; he must have been working out even less than usual, and his usual was not at all. Bob's latte sloshed in his cup as he boomed out a greeting and held forth: "Hey there, Junior. Just the boy I wanted to see."

"Hi, Bob," I muttered.

"I need you to help the kid here," he indicated Kerry, "and, ah, deprogram him a bit."

I couldn't help but lift my eyebrows a little. "You want me to deprogram him?"

"Yeah. She's getting to him. Hey, kid," he addressed Kerry, "show him your little problemo."

I was still trying to absorb this sudden interruption as Kerry grabbed my keyboard and, after a few seconds of searching, pulled up his code. Removing some intervening statements, the code looked something like this:

U u;
// ...
char ch = u.c;

I blinked. "What's wrong with that?" I asked.

"Exactly!" Bob boomed, and I had a sinking feeling. It's generally not a good thing to have Bob agree with you, even if you had only just been awoken from a mid-afternoon nap. "Her Weirdness told him the line was wrong."

Kerry started: "That's not quite what she—"

"And that line is perfectly good," Bob continued without pausing, running over Kerry's attempt to participate in his monologue. "Type safety or something, she said. Well, what's the type of u.c, kid?"

"It's a char but—"

"And the type of ch? Huh? What about that? It doesn't get simpler than this. C'mon now, out with it."

"It's a char, too," Kerry answered helplessly.

"There y'go."

"I think I can take it from here, Bob," I intervened, forcing a smile.

"Sure. But one more thing before I go. That union needs another member. When I wrote it, it only needed to hold either an int or a char. But lately, we added a Widget to the system and if you have one of those, you don't need the int or the char, so the Widget needs to be in the union, too. I'll let you guys make the change." And he left.

I looked at Kerry. "Did he say 'union'?" Kerry nodded. "Okay, let's see it." Kerry pulled up the definition:

union U {
  int i;
  char c;
};

Kerry added brightly: "And sir, while we're at it, let's make the change Bob wanted." Before I could stop him, he changed it to:

union U {
  Widget w;
  int i;
  char c;
};

"Are you sure that's such a good idea?" I frowned.

Kerry shrugged, and tried to build the file. The compiler chugged a short time, hacked, and spit out an errball. "Oh," Kerry said. "That didn't work."

"Hunh. No, it didn't. Well, let's start at the top: Is Widget a class?"

Kerry nodded.

"Is it a simple value, or does it have virtual functions or a nontrivial copy constructor or a nontrivial destructor or...?" I went on for a short while, then stopped.

Kerry blinked. After a few seconds, as he finished parsing and caught up, he said: "Uh. It has an assignment operator."

"Thought so. That's enough. Unions are bundles-o-bits. They never get constructed, not with a real constructor, and anyway you wouldn't know which member to construct even if you could. Each member is just a view onto the bytes, or some of the bytes. Clear?"

Kerry blinked again. "As mud."

"Never mind. Bob should know better than that. You just can't put real class types into unions, that's all. Take it back out again, will you?"

He did, and we were back to just plain:

union U {
  int i;
  char c;
};

"Okay, now what was the problem, that assignment from u.c? Let's take a look at that code again." As we scrolled through it, deep inside one nested conditional, I found:

u.i = 42;

"Hey."

"Yes?"

"Hey. That's i, not c. Is that the last time a u member gets assigned to before you read u.c?"

"I'm not sure." Kerry investigated further, and I watched over his shoulder. Soon the answer became apparent: "Yes. Yes, it is. So the code is really like this..." He wrote on the whiteboard:

U u;
// ...
u.i = 42;
// ...
char ch = u.c;

Snap. I know, I know, but we both jumped anyway. The Guru stood behind us, a slim tome in her hand. "Psalm 97: 'Don't use unions to reinterpret representation,'" she read from the tome, and closed it [1].

"Hi," we said. I added: "Yeah, we just found the problem."

"My apprentice, my acolyte," she greeted us in turn. "It is good. What is the problem? How would you describe it generally?"

"Kerry?" I prompted, not wanting to take a chance on it myself.

He hesitated. Finally: "I should not read c unless c was the member last set?"

"Excellent, my acolyte," the Guru said. "Apprentice: How would you detect such errors?"

I offered: "Maybe add a tag? I mean, type tags are usually bad, but in this case, since you have to know what type it is...?"

"Indeed, my apprentice." The Guru smiled, nodded, and wrote:

struct U
  union {
    int i;
    char c;
  };
  // remember which field was last written
  char tag;	// 1 means i is active, 2 means c is active
};

"Or, if the possibly greater size of an enum member is acceptable..." She wrote:

struct U
  union {
    int i;
    char c;
  };
  // remember which field was last written
  enum { I_ACTIVE = 1, C_ACTIVE } tag;
};

"Then you adopt the discipline: When setting a member, set the tag. When reading a member, check first that the correct tag was set."

I looked at both tag members dubiously. "That seems, well, uh, er, hard."

"Indeed," she agreed, "and potentially fragile and needful of discipline. But the safety can be worth it, and the reads and writes can be encapsulated within functions. In any event, young ones, never read from a union any member but the one that was last set."

"But I saw..." Kerry trailed off.

"What, Kerry?"

"I, uh, saw Bob use a union to treat a pointer as an int and..." he trailed off again. Then he gulped, and added: "Was that bad?"

"Bad enough, my child, and less honest than a reinterpret_cast. You must never do such a thing as this." She wrote:

union {
  long l;
  char* p;
};
l = someValue;
strcpy( p, q );	// bad: scribble into memory... somewhere...

"It is," she finished, "an abomination."

"So you were right after all—I mean, you were right all the time, of course," I caught myself. "The problem really was in the line that read the u.c member."

"In combination with the one that last set the u.i member, yes. Both contributed to the delinquency of this minor." Then I think I saw a twinkle in her eye. "I trust you have been sufficiently...how shall I say...reprogrammed correctly?"

And she glided away...

References

[1] Sutter, H. and A. Alexandrescu. C++ Coding Standards, Addison-Wesley, 2005.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.