|
|||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||
|
Announcements
Want a new Job?
Chapters
Services
Feature Zones
|
While reading the article on lexx and yacc (see [5]) that was published a few days ago in CP, I came accross a thread that spoke about Spirit (see here). I must say I was happy to have read this thread... IntroductionSpirit (see [1]) is a parser generator framework. It is part of Boost (see [2]) since the release 1.30. (Boost is somewhat a collection of libraries that extend the STL, a must for any C++ developper). In this article, I will try to give a short idea of what Spirit is capable of. If I succeed you will never approach the problem of string parsing the same way again. Moreover, I strongly recommend the readers interrested with Spirit to jump to the documentation page (see [1]) as it is complete, highly pedagogical and really (2x) well done. At last, I do not pretend to be a Spirit specialist, neither a grammar theorician. In fact, while playing with parsing in my previous article (see [4]), I was looking for new solutions and I came accross Spirit which made me think that all I had written before could be trashed away. A mini but very impressive exampleBefore getting into any details, let's give you a good reason to continue to read this article: Consider that you want to parse a string std::string str; // the string to parse
std::vector<double> v; // the vector to fill
A good old C approach would use // defining the parser rule rule<> r = real_p[append(v)] >> *(',' >> real_p[append(v)]); // parsing the string with the rule r parse(str.c_str(), r, space_p); Pretty impressing. Personnaly, I was really amazed by that snippet. Let's analyse the rule
Great, simply great. In two lines of code, you can get rid of Parsers and SpiritFirst, let's do our homework on parsers. To do that, we can begin analysing the sentence in the very beginning of the Spirit documentation: Spirit is an object-oriented recursive-descent parser generator framework implemented using template meta-programming techniques. Expression templates allow us to approximate the syntax of Extended Backus-Normal Form (EBNF) completely in C++. Sounds pretty scary... what does this sentence mean exactly ?
Spirit is that and more: it is extremely well documented, full of working examples and already contains parsers for C, C++, pascal, xml, etc... If you didn't switch yet to the Spirit home page yet (see [7]), it's time to go for an exercice: Yet Another Command Parser Yet Another Command Line ParserInternet is full of command line parsers, they are usually coded in C and "hand made". We can use the power of Spirit to build a robust command line parser that will store the pair of key-value into a Command line descriptionLet's look at a classic command line: some_command -key1=value1 -key2="this is the value 2" The characteristics of the above are:
Key - Value parserLet's begin to build the rules used in a key-value parser:
Now that we have those rules, we can put them together in a struct keyvalue_grammar : public grammar<keyvalue_grammar> { // Constructor, takes two reference to string // str_key and str_value will hold the parse result keyvalue_grammar(std::string& str_key_, std::string& str_val_) : str_key(str_key_), str_val(str_val_){}; template <typename ScannerT> struct definition { definition(keyvalue_grammar const& self) { equal = ch_p('='); key_tag = ch_p('-') | ch_p('/'); key = key_tag >> (+alnum_p)[ assign_string(self.str_key) ]; value = ( confix_p( '"', (+ c_escape_ch_p )[assign_string(self.str_val)] , '"' ) | (+ alnum_p )[assign_string(self.str_val)] ); key_value = key >> equal >> value; } rule<ScannerT> key_tag, key, equal, value, key_value; rule<ScannerT> const& start() const { return key_value; }; }; std::string& str_key; std::string& str_val; }; Key value adder functorWe need a functor to add the pairs key-value to the map. The functor needs to provide the When the grammar template<typename keyvalue_container> class add_keyvalue_pair { public: // key_ and val_ should point to the string modified in keyvalue_grammar // kvc_ is the map of key - values add_keyvalue_pair( keyvalue_container& kvc_, std::string& key_, std::string& val_) : kvc( kvc_ ), key(key_), val(val_) { } // the method called by the parser template <typename IteratorT> void operator()(IteratorT first, IteratorT last) const { kvc.insert( keyvalue_container::value_type(key, val) ); } private: std::string& key; std::string& val; keyvalue_container& kvc; }; The parserWe are almost done. The rule to parse the command line are:
The full definition of the grammar is here: template<typename keyvalue_container> struct cmdline_grammar : public grammar< cmdline_grammar > { // Constructor // kvc_ is a key-value container // str_command will hold the application name cmdline_grammar( keyvalue_container& kvc_, std::string& str_command_) : kvc(kvc_), str_command(str_command_) {}; template <typename ScannerT> struct definition { definition( cmdline_grammar<keyvalue_container> const& self ) : key_value( key, value ) { command = (+alnum_p)[assign_string(self.str_command)]; line = command >> *( key_value [ add_keyvalue_pair<keyvalue_container>( self.kvc, key, value ) ] ); } rule<ScannerT> command, line; keyvalue_grammar key_value; std::string key; std::string value; rule<ScannerT> const& start() const { return line; } }; keyvalue_container& kvc; std::string& str_command; }; As emphasized in the code above, Encapsulating all and making the parse callYou can encapsulate the grammar into another class that hides the details. For example, the following method parses a string and returns true if succeded. bool parse(const std::string& str) { kvc.clear(); cmdline_grammar<keyvalue_container> cmdline_parser(kvc, command); parse_info<> info = boost::spirit::parse( str.c_str(), cmdline_parser, boost::spirit::space_p ); return info.full; }; The full source of the example is available in the downloads. Otherwize, Spirit contains numerous other examples. ConclusionsAs promised, parsing string should not longer be done by old fashion, error-prone succession of History
Reference
| ||||||||||||||||||||||||||||||||