C/C++

The Spirit Parser Library: Inline Parsing in C++

By Joel de Guzman and Dan Nuffer, September 01, 2003

Powerful parsing made easy via modern template techniques.

Take note that the factor rule does not explicitly use ch_p. Instead, the '(', ')', and '-' are applied directly as arguments to the >> operator. If one of the operands in a binary expression is a Spirit parser, such as the factor rule, the other operand may be a literal character. The result of such an expression is another Spirit parser. ch_p is used in the integer rule because a unary operator (!) is applied. Without the ch_p, !'-' would simply evaluate to false.

This example is not very useful, although it is valid code. All it can do is recognize a string that belongs to the grammar. Listing One contains the complete code for a calculator that evaluates the expression given to it.

Listing One: A calculator that evaluates the expression given to it .

//  Copyright (c) 2001, Joel de Guzman and Dan Nuffer
//  Permission is granted to use this code without restriction as
//  long as this copyright notice appears in all source files.

#include <boost/spirit/spirit.hpp>
#include <iostream>
#include <stack>
#include <functional>

using namespace std;
using namespace spirit;

stack<long> evaluation_stack;

struct push_int
{
    void operator()(char const* str, char const*) const
    {
        long n = std::strtol(str, 0, 10);
        evaluation_stack.push(n);
    }
};

template <class op>
struct do_op
{
    do_op(op const& the_op) : m_op(the_op) {}
    op m_op;

    void operator()(char const*, char const*) const
    {
        long rhs = evaluation_stack.top();
        evaluation_stack.pop();
        long lhs = evaluation_stack.top();
        evaluation_stack.pop();
        evaluation_stack.push(m_op(lhs, rhs));
    }
};

template <class op>
do_op<op> make_op(op const& the_op)
{
    return do_op<op>(the_op);
}

struct do_negate
{
    void operator()(char const*, char const*) const
    {
        long lhs = evaluation_stack.top();
        evaluation_stack.pop();
        evaluation_stack.push(-lhs);
    }
};

int main()
{
    rule<> expression, term, factor, integer;

    integer =
        lexeme[ (!ch_p('-') >> +digit)[push_int()] ];

    factor =
            integer
        |   '(' >> expression >> ')'
        |   ('-' >> factor)[do_negate()];

    term =
        factor >>
            *( ('*' >> factor)[make_op(std::multiplies<long>())]
             | ('/' >> factor)[make_op(std::divides<long>())]);

    expression  =
        term >>
            *( ('+' >> term)[make_op(std::plus<long>())]
             | ('-' >> term)[make_op(std::minus<long>())]);

    char str[256];
    cin.getline(str, 256);
    if (parse(str, expression, space).full)
    {
        cout << "parsing succeeded\n";
        cout << "result = " << evaluation_stack.top() << "\n\n";
        evaluation_stack.pop();
    }
    else
    {
        cout << "parsing failed\n";
    }
}

A semantic action can be either a free function or function object (or functor in STL terminology). One may be attached to any expression within a Spirit grammar definition, using the expression p[a] where p is a parser and a is a semantic action. When a match is made, the action will be called and passed beginning and ending iterators (much like an STL algorithm) to the input that was matched.

The push_int semantic action functor converts its input from a string into a long int and then pushes it onto evaluation_stack.

The do_op semantic action template struct will apply another function to the top two values popped from evaluation_stack and then push the result back onto the stack. It is used to perform all binary arithmetic operations done in the calculator.

For each binary operation (+, -, *, and /), the appropriate do_op is created using the function objects from the standard library to do the operations. The make_op helper function facilitates creating do_op classes.

The do_negate semantic action will be called when the unary negation - operator is invoked. do_negate pops a value from evaluation_stack, negates it, and pushes the result back onto the stack.

After the parser has been created, the program reads a line from cin. The expression is then parsed using the free parse function from the library. Various parse functions in Spirit can be used in different situations. The parse function used in the example takes as its parameters a NULL terminated string, the top-level rule (traditionally called the start symbol) of the grammar to be used for parsing, and a skip parser, which in this case is spirit::space. The skip parser instructs the involved parser what to skip. Using spirit::space as the skip parser simply means that all space characters in between symbols and words in the input will be skipped.

parse returns a parse_info struct:

template <typename IteratorT>
struct parse_info
{
    IteratorT  stop;
    bool       match;
    bool       full;
    unsigned   length;
};

The member stop points to the final parse position (i.e., parsing processed the input up to this point). match is true if parsing is successful, which may be so if the parser consumed all the input (full match) or if the parser consumed only a portion of the input (partial match). full will be true when a full match occurs, meaning the parser consumed all the input. length is the number of characters consumed by the parser and is valid only if a successful match has occurred.

Finally, if the parse was successful, the result of the expression is printed along with a success message.

Compiling the example is straightforward because the Spirit library consists of only headers, as all the classes are templates. This makes it easy to use. The library can be used straight out of the box. You only need to include the spirit.hpp header. There is no library to link against.

The Parser

The most fundamental concept behind Spirit's design is the parser class. A parser models a recognizer of a language from the simplest to the most complex. It has a conceptual member function:

parse(iterator& first, iterator last)

This function does the work of inspecting the iterator range and reporting success or failure. The iterator first, which is passed by reference, is advanced accordingly when a match is found.

The parse member function is conceptual, as opposed to virtual, in the sense that the base class parser does not really have any such member function. Subclasses must provide one. The conceptual base class is a template class parametized by its subclass, which gives it access to its subclass. Whenever a parser is asked to do its task, it delegates the task to its subclass. This process is very similar to how virtual functions work, but the difference is that the member function is bound statically instead of dynamically (run-time bound). James Coplien first popularized this technique of compile-time polymorphism in an article in C++ Report entitled "Curiously Recurring Template Patterns" [1]. Listing Two shows the parser class and some examples of trivial subclasses.

Listing Two: The parser class and some trivial subclass examples.

//  Copyright (c) 2001, Joel de Guzman and Dan Nuffer
//  Permission is granted to use this code without restriction as
//  long as this copyright notice appears in all source files.

template <typename DerivedT>
struct parser
{
    DerivedT&
    derived()
    { return *static_cast<DerivedT*>(this); }

    DerivedT const&
    derived() const
    { return *static_cast<DerivedT const*>(this); }
};

template <typename DerivedT>
struct char_parser : public parser<DerivedT>
{
    template <typename IteratorT>
    match
    parse(IteratorT& first, IteratorT const& last) const
    {
        if (first != last)
            if (bool r = this->derived().test(*first))
            {
                ++first;
                return match(1);
            }
        return match();
    }
    ...
};

template <typename CharT = char>
class chlit : public char_parser<chlit<CharT> >
{
public:
    ...
    template <typename T>
    bool test(T ch_) const
    { return T(ch) == ch_; }

private:

    CharT  ch;
};

Though quite simple, the example is not contrived, but it is an actual part of the library. A chlit object merely compares a single character for a match and advances the iterator one step forward when successful. The success of a parse is encoded in a match object returned by the member function parse. Aside from reporting a true or false result, the number of matching characters from the input can also be obtained from this object.

Previous 1 2 3 4 5 6 7 8 9 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

C/C++

The Spirit Parser Library: Inline Parsing in C++

The Parser

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

C/C++

The Spirit Parser Library: Inline Parsing in C++

The Parser

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content