Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

The Spirit Parser Library: Inline Parsing in C++


The Rule

Ultimately, we want to encapsulate an arbitrarily complex parser expression in a placeholder variable that can be used just like any other parser. This is the tricky part. How do you encode an arbitrarily complex type, produced by a rather complex compositional expression, into a variable without knowing its exact type beforehand?

The problem is that each expression evaluates to a unique type signature. As shown above, the expression a | b results in a type quite different from the expression a | b | c, for example. Yet, we want both rule = a | b and rule = a | b | c to work. The only similarity between any types produced by compositional expressions is that they have the parser base class. Even then, parser is parameterized by its derived class. Thus, two parser bases with different derived template parameters are essentially unique types.

The solution: encode the type in a run-time polymorphic class that is derived from an abstract_parser class and contains a virtual member function parse. Make both the copy constructor and assignment operator of the rule member templates parameterized by the type of parser argument. Whenever the copy constructor or assignment operator of the rule is invoked, a concrete instance of this abstract_parser is created. A pointer to abstract_parser is saved in a member variable.

Listing Six shows the class declaration of our abstract_parser. Also shown is concrete_parser, a subclass that knows about the type of parser and multiply inherits from parser and abstract_parser.

Listing Six: The class declaration of abstract_parser and concrete_parser

//  Copyright (c) 2001, Joel de Guzman and Dan Nuffer
//  Permission is granted to use this code without restriction as
//  long as this copyright notice appears in all source files.

template <typename IteratorT>
class abstract_parser
{
public:
                    abstract_parser() {}
    virtual         ~abstract_parser() {}
    virtual match   parse(
                        IteratorT& first,
                        IteratorT const& last) const = 0;
};

template <typename ParserT, typename IteratorT>
class concrete_parser
    : public ParserT
    , public abstract_parser<IteratorT>
{
public:

                    concrete_parser(ParserT const& parser)
                    : ParserT(parser) {}
  virtual           ~concrete_parser() {}

  virtual match
  parse(IteratorT& first, IteratorT const& last) const
  { return ParserT::parse(first, last); }
};

Listing Seven shows a rule's assignment operator. This assignment operator is a template member function parameterized by the type of the parser parameter. Any unique parser type composed using SEBNF expressions, no matter how complex, can be encapsulated this way. Now that the right-hand side (rhs) of an assignment (and copy constructor) can be encapsulated in a rule, the rule may be used anywhere just like any other parser.

Listing Seven: A rule's assignment operator

//  Copyright (c) 2001, Joel de Guzman and Dan Nuffer
//  Permission is granted to use this code without restriction as
//  long as this copyright notice appears in all source files.

template <typename ParserT>
rule& operator = (ParserT const& parser)
{
    assert(meta == 0);
    meta = new impl::concrete_parser<ParserT, IteratorT>(parser);
    return *this;
}

Conclusion

Spirit makes it easy to write parsers directly in C++. Thanks to the expressiveness of C++ operator overloading and the flexibility of template metaprogramming, grammars can be written in a high-level syntax very similar to EBNF. These grammar descriptions can mix freely with C++ code. Semantic actions, attached to any part of the grammar, seamlessly link EBNF grammar specifications to C++ code. The intricacies of the parsing engine framework are hidden beneath an intuitive C++/EBNF interface. Using Spirit is an order of a magnitude easier, compared to coding a parser by hand or perhaps even using a stand-alone parser such as YACC. Since everything is done in C++, there is no need to use an external parser generator along with all the excess baggage that comes with it. Spirit is a flexible, lightweight, and simple-to-use C++ library suitable for many parsing related programming tasks.

Notes and References

[1] James Coplien. "Curiously Recurring Template Patterns," C++ Report, February 1995.

[2] Todd Veldhuizen. "Expression Templates," C++ Report, June 1995.

[3] Peter Naur (ed.). "Report on the Algorithmic Language ALGOL 60," CACM, May 1960.

[4] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns, Elements of Reusable Object-Oriented Software (Addison-Wesley, 1995).

[5] T. J. Parr, H. G. Dietz, and W. E. Cohen. PCCTS Reference Manual, Version 1.00 (School of Electrical Engineering, Purdue University, West Lafayette, August 1991).

[6] Adrian Johnstone and Elizabeth Scott. "RDP, A Recursive Descent Compiler Compiler," Technical Report CSD TR 97 25, Dept. of Computer Science, Egham, Surrey, England, Dec. 20, 1997.

[7] Joel de Guzman. "Spirit Version 1.1," July 2001.

[8] Joel de Guzman. "Spirit Version 1.2," November 2001.


Dan Nuffer is a software engineer. His interests include parsers, compilers, C++ XML, software engineering and Linux. Joel de Guzman is a consultant and software engineer at Boost Consulting (www.boost-consulting.com) with specialization and expertise in the implementation of generic libraries and frameworks, Joel is adept at writing code using modern C++ techniques, especially, but not limited to, template metaprogramming and functional programming in C++. After work, Joel loves to play distortion-laden heavy rock guitar, play around in his home studio, and record his own compositions.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.