Implementing Semantic Actions in the Boost Spirit Parser Framework

Dave Handley

4.95/5 (23 votes)

Oct 10, 2004

10 min read

121667

1333

Using composites to implement a modular arithmetic calculator with the Boost Spirit parser framework.

Introduction

In my last Code Project article, An Introduction to the Boost Spirit Parser framework, I introduced some basic concepts of the boost::spirit parser framework. In that article, I created the necessary code to successfully parse the input for a simple modular arithmetic calculator. In this article, I will add in the necessary code to carry out semantic actions on the input, and therefore provide a fully functional modular arithmetic calculator.

Background

There are numerous ways to parse input from files, the command line, or elsewhere. Despite this, all these techniques have to do two basic functions. The first action to be performed is understanding the structure of the input, ensuring that it meets the specification laid down for that input. After that, we must carry out semantic actions. What are semantic actions? Semantics is the science of extracting meaning from something, so it follows that semantic actions involve carrying out actions based on the meaning of something. In the context of a parser, semantic actions are the code that gets called each time you have successfully figured out part of the input.

In the Spirit framework, the addition of semantic actions is very simple. After any part of any rule in the definition of the grammar, you can add a semantic action. This semantic action will be called whenever that particular part of the rule has been successfully matched. This process is a very powerful extension of what is possible in other parsers such as Yacc. For people familiar with Yacc, you will know that each rule in the grammar can be followed by some embedded pseudo-C code. Spirit goes one step further by allowing semantic actions to be attached to any part of the rule. An example used to process a comma separated list of signed integers is:

integers = int_p[ProcessInt()] >>
    *( ',' int_p[ProcessInt()] )
    ;

This section of code states that the integers rule is defined as a single integer followed by zero or more additional integers, separated with commas. After each integer is processed, the framework will call the functor ProcessInt() passing in the integer that has been processed. I will deal with the details of these functors in a later section in this article.

There are many ways in which input can be processed. In the case of a simple command line calculator, it is usually possible to simply process the input as it comes in, with an evaluation stack. In fact, this concept is used by the calculators that come as demonstration projects with the Spirit library. I decided that in this case, I would demonstrate another commonly used tool associated with parsing, the conversion of input into a composite tree.

Modeling the Input Using a Composite

Many input formats can be parsed successfully by generating a composite tree out of the parser. Obvious examples such as XML or VRML parsers immediately spring to mind, but there are other file formats that can sensibly be parsed into a tree. For example: SQL scripts or mathematical expressions. Once you have the composite tree, performing a variety of actions on the tree simply becomes a problem of writing an appropriate composite visitor. A visitor can easily re-output the tree in a new file format, or output the data after it has been edited by the program. Another visitor could be used to transform the tree into a different data structure, or to evaluate the tree. The possibilities are endless.

In this article, I have used code from a previous article that I wrote entitled: Composites and Visitors - a Templatised Approach. This article provides a generic composite base class, as well as provides various schemes for visiting the entire composite tree in a particular order.

To write my modular arithmetic calculator, I have written a composite tree that has a number of different nodes. The most important of these is a root node which sits at the root of the tree and has all other nodes underneath. The reason for writing this root node concerns a particular detail of writing parsers that convert input into composite trees. Often, when you are writing a parser that generates a composite tree, you tend to create the tree starting with the leaf nodes, rather than with the root node. I tend to solve this particular problem by growing the tree below a customized root node. As each node in the tree is created, it is added to the root node. Initially, leaf nodes are added, but as composite nodes are added, they detach certain leaf nodes from underneath the root node, and add them to their own children lists. This grows the tree from the root, gradually pushing other nodes further down the hierarchy. To those of you familiar with the process of building a balanced binary tree, this process is remarkably similar. The diagram below demonstrates the process:

Process of Constructing a Composite Tree

Following this process through, we can see that the first two steps involve adding numeric leaf nodes to the root node. The next step involves adding a plus operator which takes charge of the two children and attaches itself under the root node. The root node therefore contains a function to do this operation:

void 
CTreeRoot::AddParent( CExpressionTree *node )
{
    // Take the last 2 children from the root, add them to the new node
    CExpressionTree *last1 = NULL, *last2 = NULL;
    shallow_iterator iter = begin_shallow();
    for ( ; iter != end_shallow(); ++iter )
    {
        last2 = last1;
        last1 = *iter;
    }
    if ( last1 == NULL || last2 == NULL )
    {
        assert( false );
        delete node;
        return;
    }
    node->push_back( remove( last2 ) );
    node->push_back( remove( last1 ) );
    // Now add the new node to the tree
    push_back( node );
}

You can see that this function looks for the last two nodes under the root node, removes them from the root node, adding them to the new node, before finally adding the new node underneath the root.

The other nodes in the composite tree represent a variety of basic entities in the expression syntax, for example, integer numbers, and the various operators such as plus or minus.

I have written two visitors for the composite tree. The first is a left to right visitor that will print the expression represented by the tree. The second is a bottom to top visitor that evaluates the expression represented by the tree.

class CCalculateTreeVisitor :
    public CVisitorBase,
    public Types::CVisitorBottomToTop<CCalculateTreeVisitor, CExpressionTree>
{
public:
    CCalculateTreeVisitor( unsigned int modular ) : m_modular( modular ) {}
    virtual ~CCalculateTreeVisitor() {}

    virtual void Visit( COperatorPlus &node );
    virtual void Visit( COperatorMinus &node );
    virtual void Visit( COperatorMultiply &node );
    virtual void Visit( COperatorDivide &node );
    virtual void Visit( COperatorUnaryMinus &node );
    virtual void Visit( CValue &node );

    int pop();
    void push( int value );

protected:
    int ReturnModular( int value ) const;

private:
    unsigned int m_modular;
    std::stack<INT> m_eval;
};

The class definition for this visitor is pretty self explanatory. As the composite is visited from the bottom of the tree to the top, results from visiting each node are placed on a stack. The operator nodes work in a very similar way to a Forth interpreter; whereby the plus node, for example, pops the top two elements off the stack, calculates the sum of them, and pushes on the results:

void 
CCalculateTreeVisitor::Visit( COperatorPlus &node )
{
    int b = pop(), a = pop();
    push( ReturnModular( a + b ) );
}

Other operator nodes are largely similar.

Adding Semantic Actions to a Spirit Parser

The preparatory work of defining composites and visitors for the expression tree have prepared us for doing the important work of adding semantic actions to our existing calculator parser. As I have already stated, the process of adding semantic actions to a Spirit parser is quite easy. It relies on using function pointers, or in a more object oriented way, functors, to callback from the parser into the parent application.

Defining the Semantic Action Functors

Functors are simply classes that implement a function call operator, operator(). This means that they can act as both classes and function pointers at the same time. The Spirit framework uses either function pointers or functors to callback from the parser into your application. I personally recommend using functors because of the increased object orientation. The functor itself must follow a set format, of which the Spirit framework supports a number. Two of these are demonstrated below:

struct AddChild
{
public:
    AddChild( CParser &parser, createPtr create ) :
        m_parser( parser ), m_create( create ) {}
    template <typename IteratorT>
    void operator()( IteratorT, IteratorT ) const
    {
        m_parser.GetRoot()->AddChild( m_create() );
    }
    template <typename IteratorT>
    void operator()( IteratorT ) const
    {
        m_parser.GetRoot()->AddChild( m_create() );
    }
private:
    CParser &m_parser;
    createPtr m_create;
};

Here, we have two operator()s, one taking a single IteratorT, and the other taking a pair of IteratorTs. If you are parsing a standard string, the Spirit framework will use the char const* specialization of these. The single parameter version will be called when you attach an action to a single character parser, for example, the ch_p parser. The two parameter version will be used under most other circumstances. The iterator parameters give you access to the input just parsed, although in this example I did not need the input.

When you are designing a functor, it is important to understand the lifetime of these functors. A new functor is not created each time a semantic action is used, instead the same functor exists for the lifetime of the grammar object. For this reason, I construct the example functor using a function pointer createPtr that can be used to create the necessary new expression tree nodes, rather than passing in a new object on construction, which would cause the system to attempt to add the same object each time the functor is called. As a general rule of thumb, you should keep these functors lightweight. The Spirit framework can pass them around by value quite a lot, so don't load them down with loads of member variables unless you want to take a hit on the speed of parsing. The functor should really be seen as a point of communication between the parser and the rest of your application.

Other signatures of functor can be used when you are parsing numeric values. If you use the built in number parsers, for example, the uint_p parser will require a functor that takes a single unsigned int. This can be seen below:

struct AddNumericChild
{
public:
    AddNumericChild( CParser &parser ) : m_parser( parser ) {}
    void operator()( unsigned int num ) const
    {
        m_parser.GetRoot()->AddChild( new CValue( (int) num ) );
    }
private:
    CParser &m_parser;
};

Communicating Between the Parser and the Application

The communication between the parser and the application usually involves two stages. Firstly, you need to place the functors in suitable locations within the grammar. Secondly, you need to ensure that the functors can pass any action onto the application appropriately. If you look back at the two functors defined above, both of them take a reference to a CParser which in the case of this application is the overall managing class. By storing this reference, the functor gets a lightweight way of communicating back into the application, in both of these cases to add new children into the composite tree.

To demonstrate the process of placing functors in the grammar, I will show you a section of the grammar:

definition( Syntax const &self )
{
    // Parse a single integer, calling the AddNumericChild functor
    integer =
        uint_p[ Private::AddNumericChild( self.m_parser ) ]
        ;
    // Parse a bracketed expression, a single integer, or a unary
    // plus or minus.  The unary minus is dealt with by adding
    // appropriate nodes into the tree
    factor =
        integer |
        '(' >> expression >> ')' |
        ( ch_p('-')[ Private::AddChild( self.m_parser, &CEmpty::Create ) ] >> factor )
            [ Private::AddParent( self.m_parser, &COperatorUnaryMinus::Create ) ] |
        ( '+' >> factor )
        ;
    // Parse a multiply or divide expression, adding appropriate
    // nodes into the tree
    term =
        factor
        >> *( ( '*' >> factor )
                [ Private::AddParent( self.m_parser, &COperatorMultiply::Create ) ]
            | ( '/' >> factor )
                [ Private::AddParent( self.m_parser, &COperatorDivide::Create ) ]
            )
        ;
    // Parse a plus or minus expression, adding appropriate nodes
    // into the tree
    expression =
        term
        >> *( ( '+' >> term )
                [ Private::AddParent( self.m_parser, &COperatorPlus::Create ) ]
            | ( '-' >> term )
                [ Private::AddParent( self.m_parser, &COperatorMinus::Create ) ]
            )
        ;
}

All of the functors have been defined in the Private namespace.

Once the input has been parsed, an expression tree will exist in the m_parser object. You can visit this tree to get out the results of the expression, or to print the tree in a formatted way. For example:

CCalculateTreeVisitor value( m_parser.GetModular() );
value( m_parser.GetRoot() );
int result = value.pop();

This code sets the base of the modular arithmetic on the CCalculateTreeVisitor, runs the visitor on the expression tree, and gets the result from the top of the visitor stack.

Compatibility

The code produced for this article relies on both the boost::spirit library and the Loki library. Both of these libraries should be installed and set in your include path for the code to compile. Furthermore, due to the cutting edge language features that both these libraries rely on, the code will only compile on Visual C++ 7.1 or later. Also note that many simple compile errors will report in strange ways, even with the latest Microsoft compilers. For example, if you use a functor which has not yet been defined, you will quite often get a C1001, Internal Compiler Error.

Conclusions

Once you have created a parser in the Spirit Framework, it is pretty easy to plumb in semantic actions. So long as you follow the rules to keep your functors lightweight, and you remember the way in which the parser calls back to the main application through these functors, you shouldn't have too many problems. The Spirit framework is a very powerful way of creating highly object oriented parsers. The complex syntax and use of cutting edge language features are not for the fainthearted, but I think that the effort required to learn the framework will be well rewarded.

History

10 Oct 04: Version 1 released.