Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

About // Comments


The New C: About // Comments

When former CUJ Editor-in-Chief Marc Briand first contacted me to write a column on C99, I suggested a rather obvious approach: Write a first installment [1] that gave an overview of the entire C99 Standard, and then follow with additional installments, each devoted to a single C99 feature. Even as I suggested this, in the back of my mind I worried about // comments. That feature was so simple (and widespread) that I really did not see any need to mention the feature again after the overview installment. At most, I might mention it in passing in a column on another feature.

Now that this series has entered its third year, it occurs to me that I failed to appreciate the humble // comment. Although the feature itself is simple, it touches upon several interesting topics:

  • The order that a compiler translates a program.
  • The relationship between extensions and standards, and
  • The compatibility between different versions of the C Standard.
Thus, this unlikely feature gets a column of its own, after all. There is even a brain teaser for the reader who likes a challenge.

// Comments

The // comment came to C from C++. The idea of end-of-line comments existed in many languages, but Stroustrup [2] lists the programming language BCPL as the specific inspiration for // comments in C++. As C++ became popular in the early 90s, many C compilers included // comments as an extension. I remember former CUJ Senior Editor P.J. Plauger once remarked to me (a year or two before // comments were proposed for C99) that so many C articles submitted to CUJ contained // comments that most programmers must not realize that // comments were not Standard C.

The // comment feature could not be simpler: Two adjacent / characters, except in three special contexts, begin a comment that includes all of the characters up to, but not including, the newline ending the line. For example, the following line is an assignment with an insipid comment attached:

x = 0; // clear x
The three contexts where // comments are not recognized are the same as traditional /* */ comments: character constants, string literals, and comments. Thus, '//' and "//" are, respectively, a two-character character constant and a two-character string literal. The lack of nesting of comments means that the following three lines are all assignment statements:

/* assign to x // */ x = 1;
y = 2; // second assignment /*
z = 3; // don't need a */
The // comments have an advantage over traditional /* */ comments when "commenting out" a block of code. Consider:

// /* sum the array a into the int
//  * variable sum. */
// sum = 0;  // clear sum
// for (i = 0; i < 10; ++i) {
//   /* do the work */
//   sum += a[i];
// }
While // comments handled the aformentioned job cleanly, traditional comments would be awkward. Since comments do not nest, the */ ending the preexisting comments would also terminate a traditional comment attempting to comment out the block. While the preprocessor can easily comment out the block if you add #if 0 before the block and #endif after it, the preprocessor approach provides little visual clue that the block of code has been commented out. This might cause confusion in the future. I sometimes fear that I might comment out a block of code temporarily during development and forget to restore it before checking the code back in. // comments can help prevent such errors.

The Phases of Translation

Here is a question for you: // comments cause all of the following characters up to newline to be ignored. On the other hand, any line can be continued by putting a \ immediately before the end of the line. Can // comments be continued like any other line, meaning that a \ before a newline is not ignored? Or can // comments not be continued, meaning that \ before a newline does not always mean continuation?

The C and C++ Standards resolve a great many ambiguities like the above by giving an order in which the language rules apply, called the "phases of translation."

A simplified version of the phases of translation is:

1. Read the source file, and if it is in a different character set, translate it into the character set used by the compiler.

2. Each sequence of \ followed by a newline in the original source is deleted, thus joining the current line with the next.

3. The source is divided into tokens (as specified by the grammar for preprocessing tokens) and sequences of whitespace. Comments are replaced by a single-space character.

4. Preprocessing directives are executed and macros are expanded. Source lines read from #include files undergo a recursive translation of phases 1 through 4. The preprocessing directives are then deleted.

5. The characters and escape sequences (such as \n) in character constants and string literals are converted to the characters they represent in the character set of the translated program. (The compiler and translated program might use different character sets.)

6. Adjacent string literals are concatenated.

7. The source is syntactically and semantically analyzed and translated (usually to an object file).

8. The parts of the program are linked to form a ready-to-run program.

I have left out some of the rules for obscure features, some error conditions, and some provisions for unusual source representations. For all the details, see [3,4]. C++ differs from C in that, after phase 7, a new phase performs template instantiation before the last phase that links the program. Note that the Standards do not require implementations to follow the phases as long as the implementations get the same results as if they did. An interpreter that translates no source file before it is needed is acceptable.

The phases of translation answer the question posed at the beginning of this section. Since line continuation is handled in phase 2 and comments are not recognized until phase 3, // comments can be continued like any other line. Thus the following program fragment represent two valid assignments with a continued comment in between:

x = 0; // clear x and \
also clear y
y = 0;

Standards and Extensions

The C Standard has sometimes been called a "treaty" between C implementations and C programmers. The Standard spells out the requirements on implementations and informs programmers the rules for using them. Implementations agree to provide arrays. Programmers agree not to index outside of them. If both sides follow the treaty, programs are portable.

The approach that the C Standard took in writing the treaty was to define the perfectly portable program, which the Standard calls a "strictly conforming program." Strictly conforming programs can only use the features of the Standard; they must not use any extensions. To quote the Standard, a strictly conforming program "shall not produce output dependent on any unspecified, undefined, or implementation-defined behavior, and shall not exceed any minimum implementation limit." The terms "unspecified behavior," "undefined behavior," "implementation-defined behavior," and "minimum implementation limit" are all technical terms defined in the Standard. A strictly conforming program will produce the same output under any Standard C implementation whether it is running on a massively parallel supercomputer or a VCR (assuming a Standard C implementation for the VCR).

Frequently, when programmers hear about strictly conforming programs, they resolve to write only strictly conforming programs. This is a noble, but somewhat naive, goal since the requirements are somewhat draconian. For example, the minimum implementation limit on ints means that programmers must not assume that they hold more than 16 bits worth of information. The minimum character set identifies less than 100 portable characters (no $, for example). The minimum limit on size of objects requires each data object be smaller than 64K bytes. Programs can be good, useful, and widely portable without meeting all of the requirements on strictly conforming programs.

A Standard C implementation is permitted to have extensions, but those extensions must not cause a strictly conforming program to fail. For example, adding a new keyword named bigint as an extension might cause a strictly conforming program to fail because the program uses bigint as a variable name. To permit implementations, some leeway in adding new keywords, predefined macros, extern variables, and such, the Standard reserves all identifiers that begin with "_" followed by an uppercase letter or another underscore. Thus, an implementation can add a keyword _Bigint or __bigint, since strictly conforming programs are not free to use such identifiers for their own purposes.

One (perhaps surprising) way to look at language extensions is that they take a program that is erroneous under the rules of the Standard and turn it into a program acceptable to the implementation. For example,

_Bigint x = 0;
is a syntax error as long as the program has not defined _Bigint as a macro or typedef, and since no strictly conforming program can do so, any attempt to write a strictly conforming program with such a line is an error. Thus, the implementation is free to make an extension (_Bigint is a new type) that makes the program acceptable to the implementation.

Implementers like extensions that look like errors from the point of view of the Standard. It means the implementation can offer such an extension without failing the requirement to handle every strictly conforming program.

That is one of the reasons why C implementers were so comfortable offering // comments as an extension almost 10 years before the C99 Standard officially permitted (really required) them. They believed it was "obvious" that any // comment is a syntax error in the original C Standard (C90).

The only problem with that analysis is that it was wrong, although // comments were pretty widespread before anyone pointed it out.

// Comments as an Extension

Consider:

#include <stdio.h>
#define STR(s) #s
int main()
{
  printf("%s\n", STR(//));
  return 0;
}
The program is a strictly conforming program under the C90 Standard. It uses the unary # "stringize" operator in the STR macro to turn the macro argument into a quoted string and write it out. If the compiler does not support // comments, the program writes out //. If you compile the program with a compiler that does support // comments, you get a syntax error because the comment is recognized in translation phase 3 and thus eliminates the close parenthesis needed by the macro expansion in translation phase 4. Note that if comment processing occurred after phase 4, the program would produce the same results regardless of whether // comments were supported or not. But, such a change in translation phases would cause other programs to produce different results.

Strictly speaking, a Standard C90 compiler was not free to offer // comments, since // comments broke at least one strictly conforming program. Compilers were permitted to offer such an extension as long as there was some way to turn off // comments.

Perhaps the most important obligation that any standards committee has is to attempt to preserve the community's investment in the standard. For a programming language standard, that means avoiding any change that breaks existing programs, unless the popularity or utility of the change outweighs the cost of repairing the programs affected.

Changes come in two forms. The first is the noisy change. Such a change causes programs to fail to compile or to fail to link, and it is therefore obvious that the program has the problem and where and how to correct it. Standards committees dislike to make noisy changes, and they usually even try to warn in a previous standard that a future standard might make such a change.

As bad as the noisy change is, the quiet change is far worse. A quiet change causes the program to just change its meaning. Without any error message when compiling or linking the program, the program's behavior just changes, and it starts to produce different results. Standards committees hate to make quiet changes.

Surprisingly, // comments are a quiet change. It is possible to write programs that compile and run but write different results if // comments are supported or not. Here is the promised brain teaser. Write a program that shows the quiet change. For extra credit, write a second program that uses a different technique to show that // comments are a quiet change. Make sure you know how to enable and disable support for // comments in the compiler that you are using. Some compilers require a pedantic, standard, or C90 mode be used to disable // comments. If you want to try the brain teaser, put down the column now.

Before discussing my two examples of the quiet change, I will discuss why the C99 Committee decided to add // comments despite the fact that // comments are a quiet change. By the time the C99 Committee considered the issue (1994), the comments were already supported by many compilers. As I have said before, C99 largely standardized features already supported by some compiler or the other, and so even if your C compiler is not C99 conforming, it probably has some C99 features already in it. Also, the programs that show the quiet change look artificial, and it seems unlikely that anyone wrote such code unless attempting to show the quiet change. Given the years of lack of problems with // comments in the real world, the C Committee did not think that // comments were a real issue.

Listings 1 and 2 show the two ways the // comment quiet change can manifest itself. Listing 1 is just a variation of the previous program that showed that // comments could break a strictly conforming program. The two close parentheses are moved to the next line so that the program does not get a syntax error. The letter a before the // is there to meet a requirement of C90 that you cannot have an empty set of tokens as a macro argument. Listing 1 prints a// if // comments are not supported or a if they are supported.

Listing 2 uses a different trick. C compilers recognize tokens in a left-to-right pass that is "greedy" (which means that the compiler tries to use as many characters as possible to make a longer token rather than make a smaller token). The greedy rule is why ++ is recognized as a single operator instead of two + operators. So, in Listing 1, if // comments are supported, the //* sequence is recognized as the start of a // comment that consumes the rest of the line. If // comments are not supported, //* is a divide operator followed by a traditional comment. The program prints either 1 or 0.

Last Thoughts

As you have read this, I hope you have gained some insights into compatibility and the evolution of standards. Although compatibility from one version of a standard to the next is always a goal, the committee also must weigh progress against the costs of compatibility. Sometimes those costs are very small (as with // comments) and justify an incompatibility.

Perhaps the greatest feature of C99 is its compatibility with the previous Standard, which was a great concern of the committee while developing the Standard. In all likelihood, if your C90 compiler was replaced with a C99 compiler, you would encounter no problems with any of your programs.

[Co-columnist's note: When Randy and I started collaborating on these columns, this article on two-slash comments had already mostly written itself in his head. I agree with the details and the point of view, but it's all his. In the next issue of the column, the collaboration will resume. -- Thomas Plum]

References

[1] Randy Meyers. "The New C," C/C++ Users Journal, October 2000.

[2] Bjarne Stroustrup. The C++ Programming Language, third edition (Addison Wesley, 1997), Page 10.

[3] ANSI/ISO/IEC 9899:1999, Programming Languages -- C, Subclause 5.1.1.2. 1999. Available in Adobe PDF format for $18 from <http://www.techstreet.com/ncitsgate.html>.

[4] ANSI/ISO/IEC 14882:1998., Programming Languages -- C++, Subclause 2.1. 1998. Available in Adobe PDF format for $18 from <http://www.techstreet.com/ncitsgate.html>.

About the Author

Randy Meyers is a consultant providing training and mentoring in C, C++, and Java. He is the current chair of J11, the ANSI C committee, and previously was a member of J16 (ANSI C++) and the ISO Java Study Group. He worked on compilers for Digital Equipment Corporation for 16 years and was Project Architect for DEC C and C++. He can be reached at [email protected].


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.