Channels ▼
RSS

Conversations: Hungarian wartHogs


November 2001 C++ Experts Forum/Conversations


"Breakthrough!" one of the officers called, poking her head into the mess hall.

Several people, Jeannine and I among them, looked up. "What?" "Whose breakthrough?" "What happened?"

The sudden excitement was palpable, and no wonder — things were worse than ever since the senior officers had made known we'd lost contact with the orbiting station and the main surface base. Likely, they were overrun by the incoming forces, although our officers had not yet officially admitted that we were in a hostile situation. When the news came, we had been sealed into the local base near the excavation site for two days already. And progress on the alien artifacts had continued to be maddeningly slow, knowing how near we must be to a breakthrough. Jeannine seemed closer than ever to figuring out the power requirements.

"Jäger says he's figured out part of the language, that's what," she informed us. The obscure writing on the ancient, and mostly defunct, equipment had been a major obstacle to progress. Until now, attempts to decipher it had been largely unsuccessful. "Something about a built-in redundancy — he's doping it out. Don't all rush in to bug him; he needs to concentrate. Thought you'd like to know." And then the officer's head disappeared as she continued elsewhere to spread the news.

"Redundant information," I sat back and mused aloud. "Does that ever sound familiar. Why, back on my first job..."


"Warts!"

"Beg pardon?" I turned at the sound of the Guru's voice.

"You have warts, my child."

Suddenly feeling rather self-conscious, I asked: "Uh, where? My medicated creams usually contr—"

"Your latest code is covered in warts!" she interrupted. "Otherwise known as Hungarian notation. Your variables are beginning to look like this." She picked up the whiteboard marker and wrote: wartHog. "In this case, the variable hog's wart has nothing to do with the pottery school [1]."

"Oh, that," I breathed a sigh of relief. "Hungarian notation? Is that all? Sure, yeah, I read a cool article about Hungarian notation, and it sounded like a good idea. Apparently it was quite the trendy thing for a while. Almost kind of a type calculus-ish direction, sort of in a way, you know. So I—"

"Gibberish!" the Guru exclaimed.

"Ah, well," I faltered. "I thought I said that rather clearly."

"Not you, my child," the Guru corrected. "The naming convention you chose to experiment with results in code that looks like gibberish. Even the prophet Petzold had his off days."

"Hmmmph," I hmmmphed, only slightly mollified. "Well, it's supposed to make code easier to write and read. For example, I can catch some type errors just by reading the code. If I write something like strcpy( szDestination, pachSource );, I can see a problem immediately — the second parameter is a pointer to an array of characters, but not necessarily null-terminated as strcpy requires. Or if I write printf( "%s", ulValue );, I can see that I'm passing an unsigned long where a null-terminated string ought to go. The code tells me that I'm doing the wrong thing."

"Of dubious practical advantage even for type-unsafe calls in C, or in environments when nearly everything is a type-challenged int or void* handle," she shook her head sadly, "and of no relevance at all in a type-safe object-oriented language like C++. This, you must unlearn. Warts are not information; they are disinformation.

"Consider your own parables. In your first example, in C++ you would use strings, which are objects, and problems like the first cannot arise because a string's semantics are well defined and encapsulated. One who writes string destination = source; just cannot get the buffer-copying semantics wrong, for they are never exposed and the calling programmer never required to assist them; the implementation details are well and truly encapsulated and always managed for you. As for your second example, in C++ you would normally use streams or other type-safe methods to write output, and type checking along with overload resolution guarantee a type-correct result even if a conversion is needed. One who writes cout << value; just cannot get the type wrong: if value is an unsigned long, the operator<< for unsigned longs will be invoked; or, if value has a type for which no operator<< is defined but value's type can be converted unambiguously to a type for which operator<< is defined, that operator<< will be invoked; otherwise, a compile-time error occurs. Not only does the C++ language and standard library detect what would be run-time type errors with other type-unsafe calls, it turns them into compile-time errors.

"In sum, the compiler already knows much more than you do about an object's type. Changing the variable's name to embed type information adds little value and is in fact brittle. And if there ever was reason to use some Hungarian notation in C-style languages, which is debatable, there certainly remains no value when using type-safe languages."

"Maybe so," I acquiesced. If nothing else, she had out-soliloquized me. "But you have to admit that, once you know the rules, using Hungarian makes it easier to create variable names."

"So certain are you?" the Guru arched her eyebrows. She placed herself in the guest chair. I knew what that meant: she was warming up for a debate. "I had that in mind when I said 'brittle.' How is it easier, say you?"

"Because the types tell you what to call the variables. It's almost mechanical," I said. "The variable name is pretty much generated from its type. Once you know the type, you can generate the name," I rambled, then realized I was rambling, and stopped. When I had begun, I had somehow had the sense there would be something much more profound to say; now that I heard the words coming out of my mouth, it all sounded a bit superficial. I remember wondering why, when it had seemed so much deeper the first time.

"And if the type changes...?" she prompted me just then.

"Well, you'd have to change the variable name, I guess. But!" I exclaimed, "That then forces you to examine each usage of the variable in your program, to ensure that it is still being used properly."

"Not a bit of it," she riposted. "It 'forces' you to do no such thing. In many cases, the programmers will simply forget — or worse, not bother — to consistently and globally change the variable name, never mind check the usage. And once they do not change the name, the code is lying to you, thus violating the commandment that says: 'Speak truth each one of you with his cubicle neighbor.' This," she shook her head quietly, "is reprehensible. Such is the disinformational evil that must of necessity follow, sooner or later but probably sooner, from the deceptive wartHog style."

"Ah," I smiled, "that's what global search-and-replace is for!"

"Most assuredly not," she shook her head again. "Never mind that you could inadvertently change other similar names of objects whose types have not changed! But even if the global replacement were done correctly, what value has it added? For it leaves the programmer no better off for his troubles than he would have been otherwise. If an error has been introduced by the change in type, such as because of implicit conversions, the error remains the same regardless of the object's superficial name, and you have merely added the menial and meaningless documentation work of changing the name. This too is vanity and a striving after wind."

It was time for me to fall back and regroup. "But you'd have the error no matter what naming system you used. If I have int count; and I change it to short count;, then many programmers might not bother to check the usage at all and just hope the compiler catches any range problems."

"That," the Guru acknowledged, "is what I just said. The problem is the same whether you uglify the variable's name or not, and by uglifying it you have merely added useless maintenance work because then you must additionally maintain the warty name. If you are not yet convinced, my child, I have one small question for you now: How would you apply Hungarian notation to templates?"

That stumped me. "Touch, I guess," I acknowledged. "A template doesn't really have a type of its own, because the template generates an unknown number of types, one for each set of parameters it's instantiated with. There's no type until it's instantiated, so you can't really create a variable name that encodes the type of the template itself."

"Well spoken, well said. Even more," the Guru added, "inside the template definition itself, how would you wartify the names of objects of a template parameter type? You do not know what they are."

"Oh, I see," I said. "You mean like this." On the whiteboard, I scribbled an offhand example:

template<typename T>
T AddOne( T wartT ) // what wart should wart be?
                    // papuch? lpsz? huh? 
                    // (handle to unbounded harm)
{
 return wartT + T(1);
}

"I believe source code is a form of communication," the Guru pressed on. "The question is, who is that communication aimed at? The compiler? No. Source code is a medium of communication from one programmer to another. It is an expression of intent, of what is desired to happen. We must strive to keep that communication as simple and clear as possible. In order to do that, variable names should reflect the roles that those variables play. The exact type is secondary to the role. A variable name such as sz tells you only that you are looking at a C-style string. It conveys no information as to the role that string is to play."

"Well," I put in, "I don't think anyone would use just sz for a variable name."

"No? If we are to discuss Hungarian notation, we should discuss the canonical version presented by Dr. Simonyi [2]. His examples use variables named sz, pch, and so on. Such names, alas, present no useful information. If the variables were instead called xyzzy and yeti, respectively, or even merely x and y, I would still know simply by looking at their declarations that the first indicates an array of characters, and the second is a pointer to a character. Calling them sz and pch adds no useful information not already present in the code, and in particular, it adds no information not already well known to the compiler. Worse, it could be a lie if the type has changed since the wart was chosen. In any event, even if the wart lies not, the questions are still: What are the variables? What are they for? What do they do? How are they used? The wart helps not at all. I have to study the code to understand the roles they play. The code fails to communicate the intent of the programmer; therefore the names sz and pch are poor choices."

"In your opinion," I ventured.

"In my opinion," the Guru nodded. "Yes, this is still somewhat an area of opinion, rather than hard fact alone, although fact it is that names such as sz are next to useless and that Hungarian warts are brittle in the face of change. When we drafted the coding style guidelines several years ago, Hungarian notation was but one of many areas of lively debate. Unlike some discussions, this one was actually reasonably civilized. We examined the naming convention, listened to the experiences of those who had used it, and eventually came to the consensus that we did not like the convention. Although we did not dislike it enough to prohibit its use outright, we did consider it brittle enough to actively discourage it. Hence, to quote this team's standards:

"Avoid Hungarian notation. It will make a liar out of you.
Warts are not information, but disinformation."

"Hungarian is not only mendacious, but it is high-cholesterol; it is suspected of being fattening, and it is in all probability a flagstone on the road leading to a wasted and dissolute life. Indeed, I recall only one time when Hungarian notation was useful on a project."

This hook intrigued me. "What was that?"

"One of the programmers on the project was named Paul," the Guru explained. "Several months into the project, while still struggling to grow a ponytail and build his report-writing module, he pointed out that Hungarian notation had helped him find a sense of identity, for he now knew what he was..." She paused.

I blinked. It took me about ten seconds, and then I shut my eyes and grimaced painfully. "Pointer to array of unsigned long," I groaned.

She smiled, enjoying my pain. "True story," she said [3].

It was then that I thought I had found a way to corner her in an inconsistency. "Well," I asked innocently, having quickly recovered from the awful pun, "what about our convention of using a trailing underscore suffix for member variables? Isn't that kind of a watered-down version of Hungarian?"

The Guru smiled pleasantly. "So it may appear to the uninitiated, but appearances are in this case deceiving. The underscore has nothing to do with type — it has to do with flagging scope and privacy. It also has a small practical benefit: we realized that passing a parameter to a member function — particularly an initialization function or a constructor — we would often want to use the same name for both the member variable, and for the passed parameter. For example:

class T
{
 int count_;
public:
 T() : count_(0) {}
 void init(int count) { count_=count; }
};

"We are giving the init function a count. Why come up with a different name for the parameter, when both the parameter and the member play the same role? We actually discussed several options for such cases: prefixing member variable names with my (or with our for static data), prefixing the parameter name with given, and so on. In the end, we decided to adopt a trailing underscore for the member, and no underscore for the provided parameter."

I frowned. "Isn't this convention a little debatable?"

The Guru grinned wryly. "Wendy's personal preference is the other way around. But then no one is perfect, and she did just produce a most exceptionally beautiful child in Jeannine [4], so one must make allowances."


"Maybe you should keep your mind more on the here and now," one of the others at the table muttered. Tensions were high, and everyone was hoping for results from my and Jeannine's work, too.

I was not surprised that it was Jeannine who quickly jumped to my defense, but I was surprised at the vocabulary she applied to set the mutterer straight. A broad vocabulary, deftly applied, is far more effective than mere repetitive epithets and expletives, and Jeannine's tongue-lashing was nothing if not effective. Hmm, I said to myself, said I, now there's a gal who knows about communication...

Further Reading

Ottinger's rules for naming variables: <www.objectmentor.com/publications/naming.htm>.

References

[1] The first reader to correctly identify the source of this oblique reference, as well as why it's appropriate for this November 2001 column, will receive an autographed copy of Sutter's More Exceptional C++ when it is available in November 2001. All submissions must be sent by email to hsutter@acm.org with the subject line "I know! I know!" and must include a valid (non-munged) return email address and snail-mail postal address. Contest closes at midnight on October 15, 2001.

[2] Charles Simonyi. "Hungarian notation." Reprinted at <http://msdn.microsoft.com/library/en-us/dnvsgen/html/hunganotat.asp>. Readers are encouraged to study the naming convention and decide for themselves, rather than rely on the opinions of the authors.

[3] Indeed a true story that happened to one of the authors.

[4] Not the same Jeannine as in the framing story. See "Conversations: Back To Base-ics," C/C++ Users Journal C++ Experts Forum, September 2001, <www.cuj.com/experts/1909/hyslop.htm>.

Jim Hyslop is a senior software designer at Leitch Technology International Inc. He can be reached at jim.hyslop@leitch.com.

Herb Sutter is an independent consultant and secretary of the ISO/ANSI C++ standards committee. He is also one of the instructors of The C++ Seminar (<www.gotw.ca/cpp_seminar>). Herb can be reached at hsutter@acm.org.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video