|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Want a new Job?
Chapters
Services
Feature Zones
|
IntroductionIn my last Code Project article, An Introduction to the Boost Spirit Parser framework, I introduced some basic concepts of the BackgroundThere are numerous ways to parse input from files, the command line, or elsewhere. Despite this, all these techniques have to do two basic functions. The first action to be performed is understanding the structure of the input, ensuring that it meets the specification laid down for that input. After that, we must carry out semantic actions. What are semantic actions? Semantics is the science of extracting meaning from something, so it follows that semantic actions involve carrying out actions based on the meaning of something. In the context of a parser, semantic actions are the code that gets called each time you have successfully figured out part of the input. In the Spirit framework, the addition of semantic actions is very simple. After any part of any rule in the definition of the grammar, you can add a semantic action. This semantic action will be called whenever that particular part of the rule has been successfully matched. This process is a very powerful extension of what is possible in other parsers such as Yacc. For people familiar with Yacc, you will know that each rule in the grammar can be followed by some embedded pseudo-C code. Spirit goes one step further by allowing semantic actions to be attached to any part of the rule. An example used to process a comma separated list of signed integers is: integers = int_p[ProcessInt()] >> *( ',' int_p[ProcessInt()] ) ; This section of code states that the integers rule is defined as a single integer followed by zero or more additional integers, separated with commas. After each integer is processed, the framework will call the functor There are many ways in which input can be processed. In the case of a simple command line calculator, it is usually possible to simply process the input as it comes in, with an evaluation stack. In fact, this concept is used by the calculators that come as demonstration projects with the Spirit library. I decided that in this case, I would demonstrate another commonly used tool associated with parsing, the conversion of input into a composite tree. Modeling the Input Using a CompositeMany input formats can be parsed successfully by generating a composite tree out of the parser. Obvious examples such as XML or VRML parsers immediately spring to mind, but there are other file formats that can sensibly be parsed into a tree. For example: SQL scripts or mathematical expressions. Once you have the composite tree, performing a variety of actions on the tree simply becomes a problem of writing an appropriate composite visitor. A visitor can easily re-output the tree in a new file format, or output the data after it has been edited by the program. Another visitor could be used to transform the tree into a different data structure, or to evaluate the tree. The possibilities are endless. In this article, I have used code from a previous article that I wrote entitled: Composites and Visitors - a Templatised Approach. This article provides a generic composite base class, as well as provides various schemes for visiting the entire composite tree in a particular order. To write my modular arithmetic calculator, I have written a composite tree that has a number of different nodes. The most important of these is a root node which sits at the root of the tree and has all other nodes underneath. The reason for writing this root node concerns a particular detail of writing parsers that convert input into composite trees. Often, when you are writing a parser that generates a composite tree, you tend to create the tree starting with the leaf nodes, rather than with the root node. I tend to solve this particular problem by growing the tree below a customized root node. As each node in the tree is created, it is added to the root node. Initially, leaf nodes are added, but as composite nodes are added, they detach certain leaf nodes from underneath the root node, and add them to their own children lists. This grows the tree from the root, gradually pushing other nodes further down the hierarchy. To those of you familiar with the process of building a balanced binary tree, this process is remarkably similar. The diagram below demonstrates the process:
Following this process through, we can see that the first two steps involve adding numeric leaf nodes to the root node. The next step involves adding a plus operator which takes charge of the two children and attaches itself under the root node. The root node therefore contains a function to do this operation: void CTreeRoot::AddParent( CExpressionTree *node ) { // Take the last 2 children from the root, add them to the new node CExpressionTree *last1 = NULL, *last2 = NULL; shallow_iterator iter = begin_shallow(); for ( ; iter != end_shallow(); ++iter ) { last2 = last1; last1 = *iter; } if ( last1 == NULL || last2 == NULL ) { assert( false ); delete node; return; } node->push_back( remove( last2 ) ); node->push_back( remove( last1 ) ); // Now add the new node to the tree push_back( node ); } You can see that this function looks for the last two nodes under the root node, removes them from the root node, adding them to the new node, before finally adding the new node underneath the root. The other nodes in the composite tree represent a variety of basic entities in the expression syntax, for example, integer numbers, and the various operators such as plus or minus. I have written two visitors for the composite tree. The first is a left to right visitor that will print the expression represented by the tree. The second is a bottom to top visitor that evaluates the expression represented by the tree. class CCalculateTreeVisitor : public CVisitorBase, public Types::CVisitorBottomToTop<CCalculateTreeVisitor, CExpressionTree> { public: CCalculateTreeVisitor( unsigned int modular ) : m_modular( modular ) {} virtual ~CCalculateTreeVisitor() {} virtual void Visit( COperatorPlus &node ); virtual void Visit( COperatorMinus &node ); virtual void Visit( COperatorMultiply &node ); virtual void Visit( COperatorDivide &node ); virtual void Visit( COperatorUnaryMinus &node ); virtual void Visit( CValue &node ); int pop(); void push( int value ); protected: int ReturnModular( int value ) const; private: unsigned int m_modular; std::stack<INT> m_eval; }; The class definition for this visitor is pretty self explanatory. As the composite is visited from the bottom of the tree to the top, results from visiting each node are placed on a stack. The operator nodes work in a very similar way to a Forth interpreter; whereby the plus node, for example, pops the top two elements off the stack, calculates the sum of them, and pushes on the results: void CCalculateTreeVisitor::Visit( COperatorPlus &node ) { int b = pop(), a = pop(); push( ReturnModular( a + b ) ); } Other operator nodes are largely similar. Adding Semantic Actions to a Spirit ParserThe preparatory work of defining composites and visitors for the expression tree have prepared us for doing the important work of adding semantic actions to our existing calculator parser. As I have already stated, the process of adding semantic actions to a Spirit parser is quite easy. It relies on using function pointers, or in a more object oriented way, functors, to callback from the parser into the parent application. Defining the Semantic Action FunctorsFunctors are simply classes that implement a function call operator, struct AddChild { public: AddChild( CParser &parser, createPtr create ) : m_parser( parser ), m_create( create ) {} template <typename IteratorT> void operator()( IteratorT, IteratorT ) const { m_parser.GetRoot()->AddChild( m_create() ); } template <typename IteratorT> void operator()( IteratorT ) const { m_parser.GetRoot()->AddChild( m_create() ); } private: CParser &m_parser; createPtr m_create; }; Here, we have two When you are designing a functor, it is important to understand the lifetime of these functors. A new functor is not created each time a semantic action is used, instead the same functor exists for the lifetime of the grammar object. For this reason, I construct the example functor using a function pointer Other signatures of functor can be used when you are parsing numeric values. If you use the built in number parsers, for example, the struct AddNumericChild { public: AddNumericChild( CParser &parser ) : m_parser( parser ) {} void operator()( unsigned int num ) const { m_parser.GetRoot()->AddChild( new CValue( (int) num ) ); } private: CParser &m_parser; }; Communicating Between the Parser and the ApplicationThe communication between the parser and the application usually involves two stages. Firstly, you need to place the functors in suitable locations within the grammar. Secondly, you need to ensure that the functors can pass any action onto the application appropriately. If you look back at the two functors defined above, both of them take a reference to a To demonstrate the process of placing functors in the grammar, I will show you a section of the grammar: definition( Syntax const &self ) { // Parse a single integer, calling the AddNumericChild functor integer = uint_p[ Private::AddNumericChild( self.m_parser ) ] ; // Parse a bracketed expression, a single integer, or a unary // plus or minus. The unary minus is dealt with by adding // appropriate nodes into the tree factor = integer | '(' >> expression >> ')' | ( ch_p('-')[ Private::AddChild( self.m_parser, &CEmpty::Create ) ] >> factor ) [ Private::AddParent( self.m_parser, &COperatorUnaryMinus::Create ) ] | ( '+' >> factor ) ; // Parse a multiply or divide expression, adding appropriate // nodes into the tree term = factor >> *( ( '*' >> factor ) [ Private::AddParent( self.m_parser, &COperatorMultiply::Create ) ] | ( '/' >> factor ) [ Private::AddParent( self.m_parser, &COperatorDivide::Create ) ] ) ; // Parse a plus or minus expression, adding appropriate nodes // into the tree expression = term >> *( ( '+' >> term ) [ Private::AddParent( self.m_parser, &COperatorPlus::Create ) ] | ( '-' >> term ) [ Private::AddParent( self.m_parser, &COperatorMinus::Create ) ] ) ; } All of the functors have been defined in the Once the input has been parsed, an expression tree will exist in the CCalculateTreeVisitor value( m_parser.GetModular() ); value( m_parser.GetRoot() ); int result = value.pop(); This code sets the base of the modular arithmetic on the CompatibilityThe code produced for this article relies on both the ConclusionsOnce you have created a parser in the Spirit Framework, it is pretty easy to plumb in semantic actions. So long as you follow the rules to keep your functors lightweight, and you remember the way in which the parser calls back to the main application through these functors, you shouldn't have too many problems. The Spirit framework is a very powerful way of creating highly object oriented parsers. The complex syntax and use of cutting edge language features are not for the fainthearted, but I think that the effort required to learn the framework will be well rewarded. History10 Oct 04: Version 1 released.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||