Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

The Spirit Parser Library: Inline Parsing in C++


Composites

By itself, the chlit class might not seem very useful. Yet, composed to form a hierarchy, trivial parser classes can produce quite complex and elaborate parsing functionalities (see the alternative class in Listing Three for an example).

Listing Three: The alternative class.

//  Copyright (c) 2001, Joel de Guzman and Dan Nuffer
//  Permission is granted to use this code without restriction as
//  long as this copyright notice appears in all source files.

template <typename A, typename B>
struct alternative
    : public binary<A, B>
    , public parser<alternative<A, B> >
{
    alternative(A const& a, B const& b)
    : binary<A, B>(a, b) {}

    template <typename IteratorT>
    match
    parse(IteratorT& first,  IteratorT const& last) const
    {
        if (match hit = this->left().parse(first, last))
            return hit;
        return this->right().parse(first, last);
    }
};

This class is a composite parser class. It has two sub-parsers: left and right. parse returns a success if one of its subjects is successful.

Through object composition alone, we can build parsers of arbitrary complexity. The library includes a couple of predefined primitives (like chlit) as well as composites (such as alternative). Primitives for parsing include literal strings, ranges, and numerics. Composites include alternatives, sequences, intersections, and differences. These parsers can be combined to form a hierarchy in a multitude of ways.

Operators

Primitives and composites can be composed to form more complex composites using operator overloading, coded in a way that mimics EBNF. Composing an alternative, for example requires two operands and is accomplished by overloading the operator | as shown here:

template <typename A, typename B>
inline alternative<A, B>
operator | (parser<A> const& a, parser<B> const& b)
{
    return alternative<A, B>(
        a.derived(), b.derived());
}

Given two chlit objects a and b, we can form an alternative composite object using the | operator:

a | b

Assuming that both a and b have the type chlit<char>, the resulting composite is:

alternative<chlit<char>, chlit<char> >

Continuing the example, the expression:

a | b | c

where a, b, and c all have the type chlit<char>, will generate a nested composite with the type:

alternative<alternative<
     chlit<char>, chlit<char> >,
     chlit<char> >

All composite parsers follow the same style of object aggregation. The difference lies in the algorithms used by each class's parse member function. The sequence class appears in Listing Four.

Listing Four: The sequence class

//  Copyright (c) 2001, Joel de Guzman and Dan Nuffer
//  Permission is granted to use this code without restriction as
//  long as this copyright notice appears in all source files.

template <typename A, typename B>
struct sequence
    : public binary<A, B>
    , public parser<sequence<A, B> >
{
    sequence(A const& a, B const& b)
    : binary<A, B>(a, b) {}

    template <typename IteratorT>
    match parse(IteratorT& first, IteratorT const& last) const
    {
        IteratorT s = first;
        match ma, mb;
        if ((ma = this->left().parse(s, last)) &&
            (mb = this->right().parse(s, last)))
        {
            first = s;
            return ma + mb;
        }
        return match();
    }
};

The sequence class is very similar to alternative, but it has a slight twist in the implementation of its parse member function. sequence matches both its subjects (left and right) in strict sequence. Like alternative, sequence has a corresponding overloaded operator: the >> operator.

Unary and Binary Composites

So far, we have just tackled binary composites such as alternative and sequence. Unary composites work similarly to binary composites but with only a single subject. The Kleene star, used for zero or more iterations, and the positive closure, used for one or more iterations, are examples of unary composites. Listing Five shows the Kleene star class implementation.

Listing Five: The Kleene star class implementation.

//  Copyright (c) 2001, Joel de Guzman and Dan Nuffer
//  Permission is granted to use this code without restriction as
//  long as this copyright notice appears in all source files.

template <typename S>
struct kleene_star
    : public unary<S>
    , public parser<kleene_star<S> >
{
    kleene_star(S const& a)
    : unary<S>(a) {}

    template <typename IteratorT>
    match parse(IteratorT& first, IteratorT const& last) const
    {
        match hit(0);
        while (match next = this->subject().parse(first, last))
            hit += next;
        return hit;
    }
};

As is obvious by now, binary composites inherit from the binary class that does the work of storing a copy of its left and right subjects. In the same vein, unary composites inherit from the unary class, similar to the binary class but with only one subject. Most fundamentally, all parsers, whether primitive or composite, inherit from the base parser class.

All entities in the library are variations of the examples presented. There are entities that deal with set-like operations, iteration (Kleene star and its brethren), semantic actions (hooks to external C/C++ code), directives (unary composites that modify the functionality of its subjects), parser assertions, and error handlers.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.