|
|||||||||||||||||||||
|
|||||||||||||||||||||
|
Announcements
Want a new Job?
Chapters
Services
Feature Zones
|
IntroductionThis is the first of a two part article describing the use of the BackgroundOver the years, it seems that I have been doomed to write numerous parsers. My first was a programming language parser, written directly in machine code on an 8-bit machine, written when I was about 16. Since then I've used Lex and Yacc; I've written my own generic LALR parser; done custom parsers; and pretty much faced every problem imaginable. I always liked Lex and Yacc, but have never been happy with them when I'm writing object oriented code. Even the supposedly object based variants of these tools are not completely satisfactory in my view. Last year, I decided to have a look at Spirit. Spirit had just been added to the Boost library, of which I am a huge fan, and I decided to have a go. Initially, I wrote a functional programming language, but in the article, I am using a modular arithmetic calculator. I wrote the modular arithmetic calculator to help me with a Masters course in number theory, but I don't intend to explain what modular arithmetic is, so if you can't figure out what is being returned from a divide, don't expect this article to answer the question! Spirit is a fully object oriented LL parser and lexical analyzer. Since a lexical analyzer is actually nothing more than a parser optimized to process data into token streams, Spirit treats both parts of the process virtually identically. Essentially, in Spirit, a parser is a small object that knows how to parse a particular string. For example, Spirit then uses a whole variety of overloaded operators to allow a programmer to chain together various parsers using a syntax that is remarkably similar to Extended Backus Naur Form (EBNF). EBNF is a simple extension to BNF that introduces concepts such as the Kleene Star (familiar to those who use regular expressions, it means zero or more of the preceding entity). Using the Kleene Star allows many of the recursive definitions common to BNF to be replaced with iterative definitions. Please note that this code will only compile on Visual C++ 7.1 or later, and requires the latest Boost library to be installed, and set up in your include path. Writing a ParserWriting a parser with Spirit could not be easier. A grammar must be described by inheriting off The grammar must follow a specific layout. The grammar itself inherits from the struct Syntax : public boost::spirit::grammar<Syntax> { public: Syntax( CParser &parser ); virtual ~Syntax(); template <typename ScannerT> struct definition { public: definition( Syntax const &self ) { integer = lexeme_d[ (+digit_p) ] ; factor = integer | vars | '(' >> expression >> ')' | ( '-' >> factor ) | ( '+' >> factor ) ; term = factor >> *( ( '*' >> factor) | ( '/' >> factor ) ) ; expression = term >> *( ( '+' >> term ) | ( '-' >> term ) ) ; assignment = vars >> '=' >> expression ; var_decl = lexeme_d [ ( ( alpha_p >> *( alnum_p | '_' ) ) - vars )[vars.add] ] ; declaration = "int" >> var_decl >> *( ',' >> var_decl ) ; baseExpression = str_p( "exit" )[*self.m_finishedProcessing] | str_p( "mod" ) >> integer | declaration | assignment | '?' >> expression ; } boost::spirit::symbols<int> vars; boost::spirit::rule<ScannerT> integer, factor, term, expression, assignment, var_decl, declaration, baseExpression; const boost::spirit::rule<ScannerT> &start() const { return baseExpression; } }; friend struct definition; private: Private::FinishedProcessing *m_finishedProcessing; }; As you can see, the definition term ::= factor ( ( '*' factor ) | ( '/' factor ) )* Another rule to look a little more closely at is the Semantic actions are included in the Final points of note are that the rules that have been defined in the Running the ParserRunning the parser is simplicity in itself. If we have a grammar defined called boost::spirit::parse_info<> info; Syntax parser; info = boost::spirit::parse( line.c_str(), parser, boost::spirit::space_p ); The results class Spirit ProblemsI can't say that programming in the Spirit parser has not caused me some problems. I'll explain a few of them here to try and give you an idea of what you may face if you decide to use this library. Compilation ErrorsCompilation errors produced when using Spirit often run to multiple pages of impenetrable garbage. Unfortunately, the complex templatised code used with the Spirit library causes the Microsoft compiler to report the error as each layer of templatised code is unwrapped. In order to debug a compile problem, you quite often have to gradually work through each reported layer of errors, examining each carefully, only to eventually discover that you missed a semicolon. Also, under extreme circumstances, you can get an error that results in a C1001 internal compiler error with no idea of why that has happened. I have personally seen the dreaded C1001 simply because I had forgotten to declare a functor. Ouch! The writers of Spirit are pushing the capabilities of C++ to its limits, and unfortunately, the compiler writers are only just starting to implement many of these features. Spirit will not compile on Visual C++ 7.0, because it needs the partial template specialization features (and other features) only introduced on Visual C++ 7.1. Don't think that things will be any easier on other compilers. The errors reported by GCC can be just as impenetrable. Don't take this as a reason not to use Spirit, I think you gradually get used to the errors and you can live with them, but don't expect an easy ride straight away. Compilation TimesAgain, because Spirit pushes the compiler to its limits, expect compilation times to be slow, especially on complex grammars. Be very careful about including your parser in too many CPP files, and this issue will be manageable. The Spirit documentation covers various techniques to try and simplify your parsers and reduce compilation times should things get out of hand. Minimal RebuildsAs far as I can tell, the minimal rebuilds option in Visual Studio and Spirit do not mix. I have spent quite a while trying to debug a parser which isn't working as I expect, only to find that it isn't recompiling because the minimal rebuild framework fails to pick up a change in the depths of one of the template definitions. Turn minimal rebuild off as soon as you incorporate Spirit into your application, then you should have no problems. Impenetrable GrammarAt first glance, the complex templates and huge amount of operator overloading that are all part of the Spirit library, can make the grammar impenetrable. My recommendation, at least initially, is to treat all the general plumbing code as boilerplate, and just concentrate on the main parser code. Brush up on your Extended Backus Naur Form (EBNF) and you'll see that everything looks similar. At this point, it all starts to make sense. ConclusionsThe History9 Oct 04: Version 1 released. | ||||||||||||||||||||