A Lazy Trick
Tags are optional names you can associate to a struct or union definition in order to recall that definition later. In this example, tag_employee is a tag:
struct tag_employee { char * name; unsigned age; }; /* ... */ struct tag_employee my_employee;
The peculiar thing with tags is that they live in a separate namespace from declared variables and types. Consequently, it is perfectly legal (although it may be very stupid) to use the same name both as a struct tag and as a user type name:
1: typedef unsigned long int ulint_t; 2: struct ulint_t { 3: char a; 4: float b; 5: } a; 6: void function() 7: { 8: ulint_t my_var; 9: struct ulint_t my_struct; 10: }
Unfortunately, our current front-end fails to parse this code. In fact, it tokenizes ulint_t on line 8 as TYPE_NAME, when an IDENTIFIER is expected after the struct or union keyword, as the syntax rules from the C grammar show:
struct_or_union_specifier : struct_or_union IDENTIFIER | struct_or_union IDENTIFIER '{' struct_declaration_list '}' | struct_or_union '{' struct_declaration_list '}' ;
To solve this, I use a simple (though not elegant) workaround. I do not make the lexer capable of distinguishing tags from identifiers and type names. Instead, I modify the grammar to make it irrelevant whether a tag is tokenized as IDENTIFIER or TYPE_NAME. To do so, in the previous rules, I add for each rule involving IDENTIFIER a copy that involves TYPE_NAME. The result is:
struct_or_union_specifier : struct_or_union IDENTIFIER | struct_or_union IDENTIFIER '{' struct_declaration_list '}' | struct_or_union TYPE_NAME | struct_or_union TYPE_NAME '{' struct_declaration_list '}' | struct_or_union '{' struct_declaration_list '}' ;
The solution is simple and effective, it does not cause any ambiguity in the grammar, and it addresses all the cases where a tag can be used, including cast expressions such as pq = (struct tag_q *) ps;.