Click here to Skip to main content
13,195,170 members (70,547 online)
Click here to Skip to main content
Add your own
alternative version


25 bookmarked
Posted 27 Jun 2005

The biscuit parser library

, 5 Jul 2005
Rate this:
Please Sign up or sign in to vote.
An object-oriented recursive-descent parser generator framework implemented using class templates.


I was looking for a light and un-strict XML parser. Boost.Spirit and Boost.Xpressive were too big and their performance at compile-time was not good. I suspected that they were not static, but YARD written by Christopher Diggins was really static, small and fast. In the meantime, I noticed that the lazy instantiation of templates could allow us to write recursive grammars, and I found that I could bind YARD and the finite state machine found at C++ Template Metaprogramming. It was named biscuit.


biscuit is an object-oriented recursive-descent parser generator framework implemented using class templates. The templates allow us to author Extended Backus-Normal Form (EBNF) in C++. Technical information about the same is available at YARD.

A simple EBNF grammar snippet:

group      ::= '(' expression ')'
factor     ::= integer | group
term       ::= factor (('*' factor) | ('/' factor))*
expression ::= term (('+' term) | ('-' term))*

is approximated using biscuit's facilities as seen in this code snippet:

struct expression ;
struct group      : seq< str<'('>, expression, str<')'> > { };
struct factor     : or_< integer, group > { };
struct term       : seq< factor, star< or_< seq< str<'*'>, 
                    factor >, seq< str<'/'>, factor > > > > { };
struct expression : seq< term, star< or_< seq< str<'+'>, 
                    term >, seq< str<'-'>, term > > > > { };

Through the magic of the lazy template instantiation, they are the perfect valid types. The production rule expression is in fact a type that has a static member function parse. As parse will be instantiated later by algorithms, all you have to do is not to define but to declare a type:

struct element; // forward declaration

struct content :
          element, // the magical recursion!
{ };

struct element :
    seq<STag, content, ETag>
{ };

Quick start

  1. Get and install the latest release of Boost C++ Libraries. (biscuit uses only their headers.)
  2. Include headers of biscuit:
    #include "biscuit/algorithm.hpp"
    #include "biscuit/parser.hpp"
    using namespace biscuit;
  3. Define your own parser type:
    typedef seq<
      star_until< any, str<'*','/'> >
    > c_comment;
  4. Call algorithms:
    if (match<c_comment>("/* hello, biscuit */")) {
  5. If you build the test project, it is required to build Boost.Test:
    bjam -sTOOLS=vc-7_1 --with-test install

Basic concepts


A parser is any type that has a static member function:

template< class State, class UserState >
static bool parse(State& s, UserState& us);

As a parser is a type, it can't have any runtime-state. You can pass any UserState object to the algorithm and the object is passed to parse.

User state

Any type.

Forward range

A Forward Range is a concept similar to STL Container. More on this concept is available at Boost.Range.

Semantic action class

A semantic action class is any class of Function Object that has a member function:

void operator()(ForwardIter first, ForwardIter last, UserState& us);

Predefined parsers

Some parser templates are predefined as a means for parser composition and embedding.


The table below lists EBNF and their equivalents in biscuit.

EBNF (or PERL)biscuitMeaning
.anyany object
A | Bor_<A, B>alternation of A and B
A Bseq<A, B>sequence of A and B
A*star<A>zero or more times, greedy
A+plus<A>one or more times, greedy
A?opt<A>zero or one time, greedy
A - Bminus<A, B>matches A, but the sub-match of A doesn't match B
A{n,m}repeat<A, n, m>between n and m times, greedy
A*? Bstar_until<A, B>zero or more As and B
^beginbeginning of the sequence
$endend of the sequence
\neolend of line
\ddigita digit
\Dnot_<digit>not a digit
\sspacea space
\Snot_<space>not a space
[0-9]char_range<'0','9'>characters in the range '0' through '9'
[abc]char_set<'a','b','c'>characters 'a','b', or 'c'
[0-9abc]or_< char_range<'0','9'>, char_set<'a','b','c'> >characters 'a','b','c' or in the range '0' though '9'
[^abc]not_< char_set<'a','b','c'> >not characters 'a','b', or 'c'
(?=A)before<A>positive look-ahead assertion
(?!A)not_< before<A> >negative look-ahead assertion

YARD and biscuit have no back-tracking on star operations. The maximum supported arity of parsers is now twenty.


actor is a parser that triggers a semantic action class object:

struct decorate_action
  template< class ForwardIter, class Stream >
  void operator()(ForwardIter first, ForwardIter last, 
                                          Stream& out)
    out << "[";
    out << std::string(first, last);
    out << "]";

struct xml_comment :
        minus< any, str<'-'> >,
          minus< any, str<'-'> >
{ };

struct xml_comment_action : actor< xml_comment, 
                                    decorate_action >
{ };

std::stringstream out;
std::string s0("<!-- xml comment no.1 -->");
match<xml_comment_action>(s0, out);
BOOST_CHECK( std::string("[<!-- xml comment no.1 -->]") == 
                                               out.str() );

You can pass a semantic action class to an actor, but not a function pointer. This trouble is fixed by the grammar below.


Directives are also parsers. Some ports of Boost.Spirit's directives:

lexeme_d[A]impossibleturn off white space skipping.
as_lower_d[A]as_lower<A>convert inputs to lower-case.
no_actions[A]no_actions<A>all semantic actions not to fire.
???definitive_actions<A>parse twice and suppress non-intended actions.
longest_d[A|B]longest<A, B>choose the longest match.
shortest_d[A|B]shortest<A, B>choose the shortest match.
limit_d[A]requires<A, PredicateClass>ensure that the result of a parser is constrained.
???transform<A, FunctorClass>convert inputs using functor.


Algorithms of biscuit work with Forward Range. Bear in mind that parsers don't know the value_type of the range. For instance, a parser str works fine, if the value_type of the range is comparable with char.


match returns true if a parser runs through the range; otherwise false:

BOOST_CHECK( match<xml_comment>("<!-- hello, xml comment -->"));
BOOST_CHECK( !match<xml_comment>
         ("<!-- not well-formed comment -- -->") );


search returns the first sub matched boost::iterator_range; otherwise an empty range:

std::string s0("  /* c comment no.1 */x int i; "
               "/* c comment no.2 */ i = 1; "
               "/* c comment no.3 */ ++i;  ");
boost::sub_range<std::string> sr = search<c_comment>(s0);
BOOST_CHECK( std::string(boost::begin(sr), 
    boost::end(sr)) == "/* c comment no.1 */" );



filter_range is a filtered boost::iterator_range by a parser:

typedef filter_range< c_comment, std::string > fr_t;
std::string s0("  /* c comment no.1 */ int i; "
               "/* c comment no.2 */ i = 1; "
               "/* c comment no.3 */ ++i;  ");
fr_t fr(s0);
BOOST_CHECK(( match< repeat< c_comment, 3 > >(fr) ));

There is no reason why chains of filter_range cannot work:

  match< str<'x','y','z'> >(
    make_filter_range< alpha >(
      make_filter_range< not_<space> >(
        make_filter_range< not_<digit> >
                     ("x & 4 y . 125 %  z")


match_results is a Forward Range of boost::iterator_range:

typedef match_results<c_comment, std::string> mrs_t;
typedef boost::range_const_iterator<mrs_t>::type iter_t;

std::string s0("  /* c comment no.1 */int i; "
               "/* c comment no.2 */i = 1; "
               "/* c comment no.3 */ ++i;  ");
mrs_t mrs(s0);
for (iter_t it = boost::const_begin(mrs); 
               it != boost::const_end(mrs); ++it)
  std::cout << std::string(boost::begin(*it), 
                boost::end(*it)) << std::endl;


/* c comment no.1 */
/* c comment no.2 */
/* c comment no.3 */


As parsers are just types, they have no runtime-state. Non-type template argument is fairly limited. What should we do, if the value_type of the Forward Range is not an Integral Constant like char? C++ Template Metaprogramming says that member function pointers can bind templates and objects. Boost.Spirit makes an expression templates object from expression templates objects, but you can make an expression type from expression templates using biscuit.

grammar binds parsers and objects:

struct my_grammar : grammar< my_grammar, 
                  std::vector<std::string> >
  std::string text0() { return "hello"; };
  std::string text1() { return "grammar"; };
  std::string text2() { return "value"; };

  struct start :
  { };

void grm_value_test()
  std::cout << "grm_value_test ing..." << std::endl;

  std::vector<std::string> texts;
  my_grammar the_grammar;
  BOOST_CHECK( match<typename grammar_start<my_grammar>::type >
                                          (texts, the_grammar));

biscuit has no limitation on the value_type of Forward Range parsed. As std::vector<std::string> is a Forward Range of std::string, it works. Keep in mind that the UserState object is now your grammar object.

Full example

Here is a port of Boost.Spirit's calculator:

struct calculator : grammar< calculator, std::string >
  void do_int(iterator_type str, iterator_type end)
    std::string s(str, end);
    std::cout << "PUSH(" << s << ')' << std::endl;

  void do_add(iterator_type, iterator_type)   
      { std::cout << "ADD\n"; }
  void do_subt(iterator_type, iterator_type)  
      { std::cout << "SUBTRACT\n"; }
  void do_mult(iterator_type, iterator_type)  
      { std::cout << "MULTIPLY\n"; }
  void do_div(iterator_type, iterator_type)   
      { std::cout << "DIVIDE\n"; }
  void do_neg(iterator_type, iterator_type)   
      { std::cout << "NEGATE\n"; }

  struct expression;
  struct term;
  struct factor;

  struct expression :
          actor_< seq< str<'+'>, term >, 
                       &calculator::do_add >,
          actor_< seq< str<'-'>, term >, 
                       &calculator::do_subt >
  { };
  struct term :
          actor_< seq< str<'*'>, factor >, 
                        &calculator::do_mult >,
          actor_< seq< str<'/'>, factor >, 
                        &calculator::do_div >
  { };

  struct factor :
      actor_< plus<digit>, &calculator::do_int >,
      seq< str<'('>, expression, str<')'> >,
      actor_< seq< str<'-'>, factor >, 
                           &calculator::do_neg >,
      seq< str<'+'>, factor >
  { };

  struct start : expression { };

actor_ makes an actor from the member function pointer. Enjoy the simplicity, compile-time performance and the small-size of the executable.


biscuit emulates Boost.Spirit's debugging:

struct start_tag { };
struct integer_tag { };
struct factor_tag { };
struct term_tag { };
struct expression_tag { };

template< class ForwardRange >
struct calculator_debug :
  grammar< calculator_debug, ForwardRange >, 
                             private boost::noncopyable
  calculator_debug(std::stack<long>& eval_) : 
                                     eval(eval_) { }

  void push_int(iterator_type first, iterator_type last)
    std::string s(first, last);
    // std::strtol(str, 0, 10);
    long n = boost::lexical_cast<long>(s); 
    std::cout << "push\t" << long(n) << std::endl;

  // ...

  struct integer;
  struct factor;
  struct term;
  struct expression;

  // struct start : debugger<start,
  //  also ok, but long...
  struct start : debugger<start_tag, 
  { };

  struct integer : debugger<integer_tag,
              actor_< plus<digit>, 
              &calculator_debug::push_int >
  { };
  // ...

debugger uses the type-name of the first argument for outputs. If your grammar is a class template like the above, type-name can be very long. I think you want to define start_tag etc. Well, the debugger automatically disappears on release-compile.


1 + 2
struct start_tag: "1+2"
  struct expression_tag: "1+2"
    struct term_tag: "1+2"
      struct factor_tag: "1+2"
        struct integer_tag: "1+2"
push    1
        /struct integer_tag: "+2"
      /struct factor_tag: "+2"
    /struct term_tag: "+2"
    struct term_tag: "2"
      struct factor_tag: "2"
        struct integer_tag: "2"
push    2
        /struct integer_tag: ""
      /struct factor_tag: ""
    /struct term_tag: ""
popped 1 and 2 from the stack. pushing 3 onto the stack.
  /struct expression_tag: ""
/struct start_tag: ""
Parsing succeeded
result = 3

Points of interest

You can find more on composing inlined algorithms at Boost.MPL TODO list. YARD and biscuit are examples of it. By the way, this article was actually a hopeless war vs. Boost.Spirit. But don't you think another war will break out?

A snippet:

using namespace cranberry;

int x = 7;
int a = apply< plus< _1, _2 > >(x)(8);
int b = apply< plus< _1, _2 > >()(7, 8);
int c = apply< plus< _1, _2 > >(7, 8)();
int d = apply< plus< _1, int_<8> > >(7)();
int e = apply< bind< _1, _2, _3 > >
                  (std::plus<int>())(7, 8);
int f = apply< bind< _1, _2, _3 > >
                  (&free_plus)(7, 8);

int ar[5] = {1,2,3,4,5};
std::transform(ar, ar+5, ar, 
   apply< plus< _1, _2 >, int const& >(x));
std::transform(ar, ar+5, ar, 
   apply< bind< _1, _2, _3 > >(std::plus<int>(), x));
std::transform(ar, ar+5, ar, 
   apply< bind< _1, _2, int_<7> > >(&free_plus));
std::for_each(ar, ar+5, apply< std_cout< _1 > >());


Release notes

  • Fixed the name confusion between limit and repeat.
  • Directives became first-class parsers.
  • Added debugger parser.


This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


About the Author

Japan Japan
I am worried about my poor English...

You may also be interested in...


Comments and Discussions

QuestionHow can we generate a Tree from the parsing ? Pin
Nockawa13-Aug-07 4:55
memberNockawa13-Aug-07 4:55 
AnswerRe: How can we generate a Tree from the parsing ? Pin
mb2sync13-Aug-07 12:51
membermb2sync13-Aug-07 12:51 
GeneralRe: How can we generate a Tree from the parsing ? Pin
Nockawa13-Aug-07 17:55
memberNockawa13-Aug-07 17:55 
QuestionWhich version of Boost Library ??? Pin
miaoux19-Jul-05 4:50
membermiaoux19-Jul-05 4:50 
AnswerRe: Which version of Boost Library ??? Pin
mb2sync19-Jul-05 9:25
membermb2sync19-Jul-05 9:25 
GeneralRe: Which version of Boost Library ??? Pin
miaoux19-Jul-05 21:19
membermiaoux19-Jul-05 21:19 
GeneralRe: Which version of Boost Library ??? Pin
mb2sync19-Jul-05 22:09
membermb2sync19-Jul-05 22:09 
GeneralRe: Which version of Boost Library ??? Pin
miaoux19-Jul-05 22:35
membermiaoux19-Jul-05 22:35 
GeneralRe: Which version of Boost Library ??? Pin
mb2sync11-Oct-05 2:39
membermb2sync11-Oct-05 2:39 
GeneralToo complicated Pin
CP Visitor4-Jul-05 23:43
memberCP Visitor4-Jul-05 23:43 
GeneralRe: Too complicated Pin
mb2sync5-Jul-05 4:21
membermb2sync5-Jul-05 4:21 
GeneralRe: Too complicated Pin
Roger Jane5-Jul-05 4:26
memberRoger Jane5-Jul-05 4:26 
Grammatical parsing isn't something you get on a plate (however much the 'biscuits on a plate' phrase might sound attractive). When you have a scheme such as this that bullies the compiler into doing a load of work for you, you have a pay a penalty. I'm sure you'll get used to the kind of messages the compiler will throw - just take it a step at a time and think yourself lucky you're not parsing the grammar from scratch...!

QuestionHow is the parser runtime speed? Pin
Christopher Diggins2-Jul-05 18:17
memberChristopher Diggins2-Jul-05 18:17 
AnswerRe: How is the parser runtime speed? Pin
mb2sync2-Jul-05 23:08
membermb2sync2-Jul-05 23:08 
GeneralRe: How is the parser runtime speed? Pin
fuckithollyshit11-Jul-05 0:06
sussfuckithollyshit11-Jul-05 0:06 
GeneralRe: How is the parser runtime speed? Pin
mb2sync11-Jul-05 3:18
membermb2sync11-Jul-05 3:18 
GeneralExcellent work Pin
Christopher Diggins29-Jun-05 19:09
memberChristopher Diggins29-Jun-05 19:09 
GeneralRe: Excellent work Pin
mb2sync30-Jun-05 2:35
membermb2sync30-Jun-05 2:35 
GeneralRe: Excellent work Pin
Christopher Diggins2-Jul-05 8:25
memberChristopher Diggins2-Jul-05 8:25 
GeneralRe: Excellent work Pin
mb2sync2-Jul-05 17:07
membermb2sync2-Jul-05 17:07 
GeneralRight step, right direction, too much files Pin
Martin.Holzherr28-Jun-05 4:53
memberMartin.Holzherr28-Jun-05 4:53 
GeneralRe: Right step, right direction, too much files Pin
mb2sync28-Jun-05 17:03
membermb2sync28-Jun-05 17:03 
Generalspirit Pin
Goran Mitrovic28-Jun-05 0:31
memberGoran Mitrovic28-Jun-05 0:31 
GeneralRe: spirit Pin
mb2sync28-Jun-05 2:49
membermb2sync28-Jun-05 2:49 
GeneralQuestion about not well-formed comment Pin
Jurgen Eidt27-Jun-05 17:01
memberJurgen Eidt27-Jun-05 17:01 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web03 | 2.8.171019.1 | Last Updated 5 Jul 2005
Article Copyright 2005 by mb2sync
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid