Dr. Dobb's | C++ Theory and Practice

C++ Theory and Practice

October 01, 1996
URL:http://www.drdobbs.com/c-theory-and-practice/184403243

October 1996/C++ Theory and Practice

C++ syntax rules won't always catch invalid declarations, such as pointers to references. But we can still detect them by applying the appropriate semantic constraints.

Copyright © 1996 by Dan Saks

C++ uses declarators in many places. In some contexts, such as simple declarations or member declarations, each declarator must have a declarator-id. That is, each declarator must actually declare a name. In other contexts, such as cast expressions or template argument lists, a declarator must not have a declarator-id. In such cases, a declarator is not declaring a name, but rather specifying part of a type.

Last month, I focused on the problem of parsing a declarator in a context where the declarator-id is optional, such as in parameter declarations or exception declarations. For example, in the function declaration


int atexit(void (*func)());

func is the declarator-id of the declarator in the parameter declaration. Here, the declarator-id is optional. You can also write the function declaration as


int atexit(void (*)());

to refer to the same atexit function.

By the way, this function declaration actually has two declarators, one inside the other. (*)() is the declarator in the parameter declaration void (*)(). That parameter declaration in turn forms the bulk of the enclosing declarator atexit(void (*)()). In this outer declarator, the declarator-id atexit is not optional.

The grammar in the draft C++ standard refers to a declarator that must have a declarator-id as a plain ordinary declarator. It refers to a declarator that cannot have a declarator-id as an abstract-declarator. (These terms came from Kernighan and Ritchie's original description of C [1], and also appear in the C Standard.) The grammar uses one set of productions for declarator, and a nearly identical, but separate, set for abstract-declarator. In those places where a declarator-id is optional, the grammar presents an explicit choice between declarator and abstract-declarator, as in


parameter-declaration =
    decl-specifier-seq
( declarator | abstract-declarator ) .

Last month, I explained why using separate productions for declarators and abstract declarators leads to difficult parsing problems that require elaborate lookahead. I believe I eliminated those problems by merging the productions into a single set for all declarators. I also introduced new semantic rules so that we could still distinguish those declarators with a declarator-id from those without.

Table 1 shows the grammar for a simple-declaration with one set of productions for declarator. (As always, the grammar uses the EBNF notation, summarized in Table 2. ) In Table 1, the non-terminal declarator represents a declarator that may or may not have a declarator-id. For those contexts where the declarator must have a declarator-id, the grammar uses the non-terminal concrete-declarator defined by the production


concrete-declarator =
    declarator.

This production does not by itself convey the fact that a concrete declarator must have a declarator-id. You need a separate semantic statement for that. However, as I noted last time, there is ample precedent in C and C++ for using semantic rules to disambiguate the syntax. Such statements do not appear in the grammar, but rather in text (such as this) that accompanies the grammar.

The grammar in Table 1 uses plain declarator (which may or may not have a declarator-id) in the production for parameter-declaration. It uses concrete-declarator (which must have a declarator-id) in the production for declarator-list, which, in turn, appears directly in simple-declaration. A complete grammar for C++ also requires a production for


abstract-declarator =
    declarator.

along with a rule explaining that an abstract-declarator must not have a declarator-id. However, simple-declaration does not need this production, so it does not appear in Table 1.

By the way, I've been grappling with whether concrete and abstract are the right names for the two different flavors of declarator. Concrete is indeed an antonym for abstract. So given that both C and C++ already use the term abstract-declarator as I have, I suppose concrete-declarator is an appropriate term. On the other hand, I can't help but think that named-declarator and unnamed-declarator might be better terminology. I'm reluctant to invent too much new terminology, so for now I'll stick with concrete and abstract. But I'll keep thinking about it.

Parsing Declarators

Now, at last, it's time to put this theory into practice by applying it to my decl program, which translates C++ declarations into English. The last time I presented the parser in its entirety (in "The Column That Needs a Name: Recovering from Parsing Errors," CUJ, April 1996), it recognized function declarators, but only with empty parameter lists. The source code for a new decl parser appears in Listing 1. This program now recognizes non-empty parameter lists in function declarators. Moreover, the parameter declarations may use abstract declarators as well as concrete declarators.

As suggested in the grammar, there is a single parsing function called declarator that recognizes both concrete declarators and abstract declarators. Although a parameter declaration accepts either, a simple declaration accepts only concrete declarators. Therefore, the declarator function now has a parameter that indicates if it should only accept one kind of declarator or the other or both. That parameter has an enumeration type:


enum declarator_category
    { ABSTRACT, CONCRETE, EITHER };

defined as a member of the parser class.

The declarator function doesn't really do anything with its declarator_category parameter except pass it to direct_declarator, which recognizes the presence or absence of a declarator-id. direct_declarator doesn't let the declarator_category value influence the way it parses, only what it finally accepts. That is, when direct_declarator encounters a declarator-id (an identifier), it simply checks that the declarator is not supposed to be abstract before proceeding, using statements of the form:


if (tc == token::IDENTIFIER)
    {
    if (dc == ABSTRACT)
        error("tsk, tsk");
    // continue parsing concrete declarator
    }
else if (tc == token::LEFT_PAREN)
    ...

Similarly, when direct_declarator determines that the declarator has no declarator-id, it checks that the declarator is not supposed to be concrete before proceeding, using statements of the form:

if (dd == "")
    {
    if (dc == CONCRETE)
        error("tsk, tsk");
    // continue parsing abstract declarator
    }

Neatness Counts

As I explained earlier, parameter lists introduce the notion of declarators nested within declarators. Indenting a nested construct with respect to its enclosing construct typically yields more readable output. Therefore, decl displays the output for a parameter list indented with respect to the output of its function name and return type.

For example,


int atexit(void (*)());

translates into


atexit is ...
function with parameter(s) ...
    <unnamed> is ...
    pointer to ...
    function with no parameters ...
    returning ...
    void
returning ...
int

Notice also that <unnamed> takes the place of the parameter name for an unnamed parameter (from an abstract declarator).

The parser implements the indenting using a private data member:


string indent;

whose value is simply a sequence of zero or more spaces. (You can easily change the indent string to a single tab character if you prefer.) Various parsing functions prefix their result string with indent.

indent is initially the empty string. The parameter_list parsing function increases the indent level using


indent += tab_stop;

where += is a string concatenation operator and tab_stop is a constant string of spaces. parameter_list decreases the indent level as it leaves using


indent = string
    (indent, 0, indent.length() - tab_stop.length());

The string constructor string(s, p, n) constructs a string by copying n character from s starting at position p. (Positions start at zero.)

Checking Semantic Constraints

As with the grammar in the draft C++ Standard, the grammar in Table 1 allows declarations that, while syntactically correct, are semantically incorrect. For instance, the production


direct-declarator =
    [ declarator-id | "(" declarator ")" ]
        { array-suffix | function-suffix } .

suggests that a declarator can have any number of array and function suffixes in any order. That is, the grammar allows all of the following declarators:


x()()   - function returning function
x()[N]  - function returning array
x[N]()  - array of function
x[N][N] - array of array

when, in fact, the semantic rules of C++ allow only the last.

For example, the decl parser ( Listing 1) accepts

int f(int)();

and reports that


f is ...
function with parameter(s) ...
    <unnamed> is ...
    int
returning ...
function with no parameters ...
returning ...
int

However, C++ does not actually accept a function that returns a function. This is different from a function that returns a pointer to a function, such as


int (*f(int))();

which C++ certainly does allow.

C++ semantics also severely constrains declarators involving references. Although the grammar allows


&&r    - reference to reference
&*r    - pointer to reference
&T::*r - pointer to member to
         reference
&r[N]  - array of reference

none actually make it past a C++ compiler's semantic checks.

It's tempting to try to express these constraints in the grammar. The pattern appears to be that:

if a declarator has a function-suffix, it cannot have an array-suffix or another function-suffix.
if a declarator has an array-suffix, it can have another array-suffix (representing a multi-dimensional array), but it cannot have a function-suffix.

Were it not for grouping parentheses in declarators, rewriting the production for direct-declarator as


direct-declarator =
    [ declarator-id |
      "(" declarator ")" ]
    [ { array-suffix } |
      function-suffix ] .

would do the trick of allowing exactly those patterns for array and function suffixes that are also semantically correct. However, grouping parentheses render this production ineffective.

For example, the declarator (x())() satisfies the production just above, even though it specifies a function returning a function. The problem is that production prohibits consecutive function suffixes only at the same parenthetical level. Though it may be possible to devise a grammar to express the constraints properly, I think it's much easier and clearer to implement the constraints as semantic checks.

The CUJ online sources and code disk (see p. 3) include an implemention of the decl parser that includes additional checks to catch the semantic violations described above.

My implementation for the semantic checks maps each declarator operator into a type_category value, where type_category is a private member of the parser class defined as:


enum type_category
    {
    SIMPLE, ARRAY, FUNCTION,
    POINTER, REFERENCE
    };

The declarator and direct_declarator functions return a type_category value through a reference parameter. That value describes the type category of the outermost declarator operator just parsed. (The outermost operator is the last one bound to the declarator.) The outermost type category of a declarator with no operators is SIMPLE. The type categories ignore cv-qualifiers and regard a pointer to member as just POINTER.

For example, the outermost type category for the declarator f(int) is FUNCTION. For *f(int), the category is POINTER (because the * has lower precedence than the parameter list). For (*const f)(int), the outermost type category is once again FUNCTION.

When declarator recognizes another operator (by calling ptr_operator), it simply checks the type category of that operator against the outermost type category of the rest of the declarator. If the type categories conflict, declarator reports an error. direct_declarator has similar logic.

Compiling the Code

As in the past, the entire decl program consists of two source files (decl.cpp and scanner.cpp) and one header file (scanner.h). decl.cpp is as shown in Listing 1 with the changes for semantic checks described above. The scanner files are largely unchanged since I last presented them a few months ago. (See "C++ Theory and Practice: Abstract Declarators, Part 2," CUJ, July 1996). You can obtain the complete source code for decl (including the scanner) on CUJ's ftp site and code disk. See page 3 for ftp information.

I recently upgraded several of my compilers. I can now compile the entire program using Borland C++ 5.0, Microsoft Visual C++ 4.1, and Watcom C++ 10.6 (all under Windows 95). Both the Borland and Microsoft compilers come with versions of STL (the Standard Template Library that's part of the draft C++ standard). The Watcom compiler does not include the STL, but I found a copy specifically tailored to the Watcom compiler in Hewlett-Packard's ftp site at butler.hpl.hp.com, in the file stl/wat_stl.zip. The documentation for this version says it was developed for Watcom C++ 10.0, but it seems to work just fine when using version 10.6.

The Borland STL implementation (licensed from Rogue Wave Software) places the library components in namespace std. The C++ namespace feature is currently the sharpest point on the bleeding edge -- not all compilers support it, and there's still considerable variation among those that do. I had to add the using-directive

using namespace std;

to scanner.h, immediately after #include directives for the STL headers to use Borland's library. Since not all compilers support namespaces, I wrapped the using-directive inside a preprocessor conditional:


#ifdef NAMESPACE
using namespace std;
#endif

Thus, I could build the program using the command line


bcc32 -DNAMESPACE decl.cpp
    scanner.cpp

For some reason, Borland's 16-bit compiler, bcc, gags on the program.

The Microsoft compiler (version 4.1) supports namespaces, but the STL that comes with it does not place the components in namespace std. However, one of the compiler's README files explains how to tweak the headers to place the components in namespace std. I made the changes, and compiled decl with both the original headers, and again with the modified headers.

References

[1] Brian Kernighan and Dennis Ritchie. The C Programming Language (Prentice Hall, 1978).

Dan Saks is the president of Saks &Associates, which offers consulting and training in C++ and C. He is secretary of the ANSI and ISO C++ committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield OH, 45504-4906, by phone at (513)324-3601 (the area code changes to 937 after September 1996), or electronically at [email protected].

October 1996/C++ Theory and Practice/Listing 1

Listing 1: A declaration parser that also recognizes non-empty parameter lists (with possibly unnamed parameters) in function declarations


//
// decl.cpp - translate C++ declarations into English
// with syntactic error detection and recovery
// using exception handling
//
// Copyright (C) 1996 by Dan Saks.
// May be copied for private, non-commercial use,
// provided this copyright notice remains intact.
// All other rights reserved.
//
#include <iostream>
#include "scanner.h"
class parser
    {
public:
    parser(istream &, ostream &);
private:
    enum declarator_category { ABSTRACT, CONCRETE, EITHER };
    struct recoverable_error { };
    scanner input;
    ostream &output;
    string indent;
    void error(const string &);
    void must_be(token::category);
    string array_suffix();
    string cv_qualifier_seq();
    string declarator(declarator_category);
    string decl_specifier_seq();
    string direct_declarator(declarator_category);
    string function_suffix();
    string parameter_declaration();
    string parameter_list();
    string ptr_operator();
    string simple_declaration();
    parser(const parser &);
    parser &operator=(const parser &);
    };
void parser::error(const string &why)
    {
    output << "error: " << why << '\n';
    input.reset();
    indent = "";
    throw recoverable_error();
    }
void parser::must_be(token::category tc)
    {
    if (input.current().kind() == tc)
        input.get();
    else
        error(string("\'") + image(tc) + "' expected");
    }
//
// array-suffix =
//     "[" [ constant-name | integer-literal ] "]" .
//
string parser::array_suffix()
    {
    must_be(token::LEFT_BRACKET);
    string as = "array with ";
    token::category tc = input.current().kind();
    if (tc == token::NAME || tc == token::INT_LITERAL)
        {
        as += input.current().text() + ' ';
        input.get();
        }
    else
        as += "unspecified number of ";
    as += "elements of type...\n";
    must_be(token::RIGHT_BRACKET);
    return indent + as;
    }
//
// cv-qualifier-seq =
//     { "const" | "volatile" } .
//
string parser::cv_qualifier_seq()
    {
    bool cq = false;
    bool vq = false;
    token::category tc;
    for (;;)
        {
        tc = input.current().kind();
        if (tc == token::CONST)

{

            if (cq)
                error("redundant 'const' qualifier");
            else
                cq = true;
            }
        else if (tc == token::VOLATILE)
            {
            if (vq)
                error("redundant 'volatile' qualifier");
            else
                vq = true;
            }
        else
            break;
        input.get();
        }
    string t;
    if (cq)
        t += "const ";
    if (vq)
        t += "volatile ";
    return t;
    }
//
// declarator =
//     direct-declarator | ptr-operator declarator .
//
string parser::declarator(declarator_category dc)
    {
    token::category tc = input.current().kind();
    if (tc == token::AMPERSAND || tc == token::STAR
    || tc == token::NAME)
        {
        string p = ptr_operator();
        return declarator(dc) + p;
        }
    else
        return direct_declarator(dc);
    }
//
// decl-specifier-seq =
//     {
//     "const" | "volatile" | type-keyword | type-name
//     } .
//
string parser::decl_specifier_seq()
    {
    bool cq = false;
    bool vq = false;
    string tn;
    token::category tc;
    for (;;)
        {
        tc = input.current().kind();
        if (tc == token::NAME)
            {
            input.mark();
            tc = input.get().kind();
            input.backup();
            if (tc == token::SCOPE)
                break;
            tc = input.current().kind();
            }
        if (tc == token::CONST)
            {
            if (!cq)
                cq = true;
            else
                error("redundant 'const' qualifier");
            }
        else if (tc == token::VOLATILE)
            {
            if (!vq)
                vq = true;
            else
                error("redundant 'volatile' qualifier");
            }
        else if (tc == token::TYPE_KEYWORD
        || tc == token::NAME)
            {
            if (tn == "")
                tn = input.current().text();
            else
                break;
            }
        else
            break;
        input.get();
        }
    if (tn == "")
        {
        if (!(cq | vq))
            error("no type specifier");
        tn = "int";

}

    string t;
    if (cq)
        t += "const ";
    if (vq)
        t += "volatile ";
    return t + tn;
    }
//
// direct-declarator =
//     [ declarator-id | "(" declarator ")" ]
//         { array-suffix | function-suffix } .
//
string parser::direct_declarator(declarator_category dc)
    {
    string dd;
    token::category tc = input.current().kind();
    if (tc == token::IDENTIFIER)
        {
        if (dc == ABSTRACT)
            error("declarator-id not allowed in abstract-declarator");
        dd = indent + input.current().text() + " is ...\n";
        input.get();
        }
    else if (tc == token::LEFT_PAREN)
        {
        bool its_grouping = false;
        input.mark();
        tc = input.get().kind();
        if (tc == token::IDENTIFIER
        || tc == token::AMPERSAND || tc == token::STAR
        || tc == token::LEFT_BRACKET || tc == token::LEFT_PAREN
        || (tc == token::NAME && input.get().kind() == token::SCOPE))
            its_grouping = true;
        input.backup();
        if (its_grouping)
            {
            input.get();
            dd = declarator(dc);
            must_be(token::RIGHT_PAREN);
            }
        }
    if (dd == "")
        {
        if (dc == CONCRETE)
            error("declarator-id missing or obscured"
                " by something before it");
        dd = indent + "<unnamed> is ...\n";
        }
    for (;;)
        {
        tc = input.current().kind();
        if (tc == token::LEFT_BRACKET)
            dd += array_suffix();
        else if (tc == token::LEFT_PAREN)
            dd += function_suffix();
        else
            break;
        }
    return dd;
    }
//
// function-suffix =
//     "(" [ parameter-list ] ")" cv-qualifier-seq .
//
string parser::function_suffix()
    {
    must_be(token::LEFT_PAREN);
    string fs = "function with ";
    token::category tc = input.current().kind();
    if (tc == token::NAME || tc == token::TYPE_KEYWORD
    || tc == token::CONST || tc == token::VOLATILE)
        fs += "parameter(s) ...\n" + parameter_list();
    else
        fs += "no parameters ...\n";
    must_be(token::RIGHT_PAREN);
    fs += indent + "returning ...\n";
    string cvs = cv_qualifier_seq();
    if (cvs != "")
        fs = cvs + "member " + fs;
    return indent + fs;
    }
//
// parameter-declaration =
//     decl-specifier-seq declarator .
//

string parser::parameter_declaration()

    {
    string dss = indent + decl_specifier_seq();
    string pd = declarator(EITHER) + dss + '\n';
    return pd;
    }
//
// parameter-list =
//     parameter-declaration { "," parameter-declaration } .
//
string parser::parameter_list()
    {
    const string tab_stop = "    ";
    indent += tab_stop;
    string pl = parameter_declaration();
    while (input.current().kind() == token::COMMA)
        {
        input.get();
        pl += parameter_declaration();
        }
    indent = string(indent, 0, indent.length() - tab_stop.length());
    return pl;
    }
//
// ptr-operator =
//    "&" | [ type-name "::" ] "*" cv-qualifier-seq .
//
string parser::ptr_operator()
    {
    token t = input.current();
    if (t.kind() == token::AMPERSAND)
        {
        input.get();
        return indent + "reference to ...\n";
        }
    else
        {
        string p = "pointer to ";
        if (t.kind() == token::NAME)
            {
            p += "member of " + t.text()
                + " with type ";
            input.get();
            must_be(token::SCOPE);
            }
        must_be(token::STAR);
        return indent + cv_qualifier_seq() + p + "...\n";
        }
    }
//
// simple-declaration =
//     decl-specifier-seq declarator { "," declarator } .
//
string parser::simple_declaration()
    {
    string d = decl_specifier_seq();
    string sd = declarator(CONCRETE) + d + '\n';
    while (input.current().kind() == token::COMMA)
        {
        input.get();
        sd += declarator(CONCRETE) + d + '\n';
        }
    return sd;
    }
//
// parser =
//     { simple-declaration ";" } .
//
parser::parser(istream &is, ostream &os)
    : input(is), output(os)
    {
    for (;;)
        try
            {
            while (input.get().kind() != token::NO_MORE)
                {
                string s = simple_declaration();
                if (input.current().kind() != token::SEMICOLON)
                    error("';' expected");
                else
                    output << s;
                }
            break;
            }
        catch (const recoverable_error &)
            {
            }
    }
int main()
    {
    parser declarations(cin, cout);
    return 0;

}

//End of File

October 1996/C++ Theory and Practice/Table 1

Table 1: A simplified grammar for C++ object and
function declarations

simple-declaration =
    decl-specifier-seq declarator-list .

decl-specifier-seq =
    { decl-specifier } .

decl-specifier =
    type-specifier .

type-specifier =
    cv-qualifier | simple-type-specifier .

simple-type-specifier =
    type-keyword | type-name .

type-name =
    name .

declarator-list =
    concrete-declarator { "," concrete-declarator } .

concrete-declarator =
    declarator .

declarator =
    direct-declarator |
    ptr-operator declarator .

direct-declarator =
    [ declarator-id | "(" declarator ")" ]
        { array-suffix | function-suffix } .

declarator-id =
    identifier .

array-suffix =
    "[" [ constant-expression ] "]" .

constant-expression =
    constant-name | integer-literal .

constant-name =
    name .

function-suffix =
    "(" [ parameter-list ] ")" cv-qualifier-seq .

parameter-list =
    parameter-declaration { "," parameter-declaration } .

parameter-declaration =
    decl-specifier-seq declarator .

ptr-operator =
    "&" | [ class-name "::" ] "*" cv-qualifier-seq .
class-name =
    name .

cv-qualifier-seq =
    { cv-qualifier } .

October 1996/C++ Theory and Practice/Table 2

Table 2: A Summary of EBNF operators


X = Y . means "an X is defined to be a Y"

X | Y   means "either an X or a Y"

[ X ]   means 0 or 1 occurrence(s) of an X"

{ X }   means "0 or more occurrences of an X"

also

( )     group sub-expressions

" "     enclose a terminal symbol