Channels ▼


Practical Parsing for ANSI C

Source Code Accompanies This Article. Download It Now.

...And Trickier

Type stacks are employed to process type specifiers, but type specifiers can appear both in declarations and in cast expressions. You observe no conflicts, except when the two features are used together, as in:

1: typedef int * intp_t;
2: char pc;
3: void f() {
4: int * pi = (intp_t) & pc;
5: }

where line 4 contains a variable declaration and initialization, and the initializer contains a cast expression. Appropriate countermeasures must be taken to tokenize intp_t in line 4 as TYPE_NAME. In fact, according to the aforementioned logic, it is tokenized as IDENTIFIER. The lexer should inhibit the behavior "force to IDENTIFIER" when it is in the middle of a variable initializer. Unfortunately, whether or not we are in the middle of a variable initializer is something that only the parser can say, and only after the lexical analysis is complete. The lexical choice to tokenize intp_t would then depend on how a superset of input text is parsed after lexical analysis: It is a circular dependence.

To solve the issue, I exploit the fact that cast expressions always appear between parentheses. The lexer maintains a "parenthesis level"; that is, a counter of how many parentheses are open. This counter is incremented at each encountered "(", and decremented at each ")". Then, type names are not forced to IDENTIFIER when we are inside a couple of parentheses. The corresponding modification can be seen in Listing Three.

int category()
  Block * blockp = Block::currentp;
  while (blockp) 
     if (blockp->symbol_table.name_is_defined(yytext)) 
        if (blockp->symbol_table.name_is_typedef(yytext)) 
           if (Block::currentp->type_stack.is_valid_type() &&
         return IDENTIFIER;
         return TYPE_NAME;
          return IDENTIFIER; 
     blockp = blockp->parentp;
  return IDENTIFIER;
Listing Three

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.