Channels ▼


Practical Parsing for ANSI C

Source Code Accompanies This Article. Download It Now.

A Lazy Trick

Tags are optional names you can associate to a struct or union definition in order to recall that definition later. In this example, tag_employee is a tag:

struct tag_employee 
   char     * name;
   unsigned   age;
/* ... */
struct tag_employee my_employee;

The peculiar thing with tags is that they live in a separate namespace from declared variables and types. Consequently, it is perfectly legal (although it may be very stupid) to use the same name both as a struct tag and as a user type name:

 1:  typedef unsigned long int ulint_t;
 2:  struct ulint_t {
 3:     char a;
 4:     float b;
 5:  } a;
 6:  void function()
 7:  {
 8:     ulint_t         my_var;
 9:     struct ulint_t  my_struct;
10:  }

Unfortunately, our current front-end fails to parse this code. In fact, it tokenizes ulint_t on line 8 as TYPE_NAME, when an IDENTIFIER is expected after the struct or union keyword, as the syntax rules from the C grammar show:

	: struct_or_union IDENTIFIER
	| struct_or_union IDENTIFIER '{' struct_declaration_list '}'
	| struct_or_union            '{' struct_declaration_list '}'

To solve this, I use a simple (though not elegant) workaround. I do not make the lexer capable of distinguishing tags from identifiers and type names. Instead, I modify the grammar to make it irrelevant whether a tag is tokenized as IDENTIFIER or TYPE_NAME. To do so, in the previous rules, I add for each rule involving IDENTIFIER a copy that involves TYPE_NAME. The result is:

  : struct_or_union IDENTIFIER
  | struct_or_union IDENTIFIER      '{' struct_declaration_list '}'
  | struct_or_union TYPE_NAME
  | struct_or_union TYPE_NAME       '{' struct_declaration_list '}'
  | struct_or_union      '{' struct_declaration_list '}'   

The solution is simple and effective, it does not cause any ambiguity in the grammar, and it addresses all the cases where a tag can be used, including cast expressions such as pq = (struct tag_q *) ps;.

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.