Click here to Skip to main content
Click here to Skip to main content

Spart, a parser generator framework 100% C#

By , 22 Dec 2003
 

Sample image

Note: If you like this tool please remember to vote, as this will get other people interested and is more likely to help improve the final product
-- ToDoList organizer, Dan.G.

Introduction

The Spart library an object oriented recursive descent parser generator framework implemented in C#. In fact, it is a partial port of the excellent Spirit library, which is written in C++ and uses meta-programming.

The Spart framework enables a target grammar to be written exclusively in C#. An EBNF grammar can be closely match using C# code. In retrospect, conventional compiler-compilers or parser-generators have to perform an additional translation step from the source EBNF code to C or C++ code.

I have takened the liberty to use the structure (and some sentence) of the Spirit documentation. Along the article, some notes are added regarding some issues about the port to C#: Spirit-2-Spart Notes (SSN).

As always, this article presents an overview of the library. For deeper details, please refer to the NDoc documentation. The library also comes with a battery of NUnit tests.

Quick Start

Spart is designed to bring you parser capabilites quickly directly into your code. While it is not suited for creating parsers for entire language like C,C++, it is very effective for building micro-grammars in your code.

When you need to build a new parser, there are existing solution: a combination of ....Parse (like int.Parse) calls, or using regular expression (Regexp class) or a combination of both. However, these tools do not scale well when attempting to write more complex parsers: maintenance and readability become ackward.

So, as for Spirit, one of the main objective of Spart is to let you build easily grammar in C#. To fix this ideas, a few simple grammars illustrate Spart usage:

Trivial example #1:

Create a parser that will parse a digit:

Prims.Digit

(This is a trivial case, Char.IsDigit already does that). Prims is a static class, Digit is a property that create a new parser for digits. In fact, Prims is a helper class that creates primitive parsers for you and hides implementation details.

SSN: This parser is equivalent to num_p.

Trivial example #2:

Create a parser that will parse a sequence of two digits

Ops.Seq( Prims.Digit, Prims.Digit )

Here you see the familiar Prims.Digit parser enclosed in a Ops.Seq call. Like Prims, Ops is a static helper class that creates combine parsers for you. The Seq method creates a parser that is a sequence of two parsers (>> in Spirit):

Ops.Seq(a,b) <=> match a and b in sequence

Note: when we combine parsers, we end up with a "bigger" parser, But it's still a parser. Parsers can get bigger and bigger, nesting more and more, but whenever you glue two parsers together, you end up with one bigger parser. This is an important concept.

SSN: The operator >> does not accept arbitrary operands, they must be of integral type which restrict it's use.

Trivial Example #3

Create a parser that will accept an arbitrary number of digit. (Arbitrary means anything from zero to infinity).

Ops.Star(Prims.Digit)

This is like the regular expression Kleene Star.

SSN: * cannot be an unary operator in C#.

Less trivial example #4

Create a parser that parse a sequence of comma separated digits and record them in a collection (note this can easily be done using String.Split).

// spirit: num_p >> *( ch_p(',') >> num_p)
Ops.Seq( Prims.Digit, Ops.Start( Ops.Seq(Prims.Ch(','), Prims.Digit)))

Note that Prims.Ch('x') will math the character x. Prims.Ch(',') will match a comma. In this case, the Kleene star matches a more complex expression, namely

Ops.Seq(Prims.Ch(','), Prims.Digit)

which matches a sequence of comma and digit.

Note: the Ops.Sep(Prims.Ch(','),... expression can be simplified to Ops.Seq(',', ... to simplify notations.

Using the parser

Once we have built our parser, we want to use it. A parser can be used directly as is:

// string to parser
String s = "1,2,3,4";
// creating the parser as above
Parser p = Ops.Seq(...);
// creating a scanner of the string
StringScanner scan = new StringScanner(s);
// parsing the string through the scanner
ParserMatch m = p.Parse(s);

Notes:

  • Parser is an abstract base class for all parsers. It contains the Parse method that can be used to parse some input.
  • The parser does not work directly on string but rather on some modified stream (scanners). Therefore, it is possible to parse directly from files or stream. StringScanner implements the scanner interface and wraps the string s.
  • ParserMatch is the parser result (see below)

Now that we have parsed the text, the ParserMatch object can help answer questions like: was the match successful, what was the match value, etc... :

if (m.Success)
    Console.Write("successfull match!");

Semantic Actions

Our parser above is nothing but a recognizer, it does no take any actions. It answers "did our data match the grammar?" but it does not record anything. Remember that we wanted to record the digits into a collection. For example, whenever we parse a digit, we wish to store the parsed number after a successful match. We now wish to extract information from the parser. Semantic actions do this. Semantic actions may be attached to any point in the grammar specification. They through events and event handler.

The Parser class has an event, Act, that is called on a successful match. We need to add a event handler on the Prims.Digit parser that records the digit into a collection. First, we write an actor that will record the digits

// A digit recorder actor
public class MyActor
{
    ArrayList digits; // digits collection
    ...
    public void RecordDigit(Object sender, ActionEventArgs args)
    {
        // record the digit into the digits collection
        digits.Add( args.TypedValue );
    }
}

Then, we add a digit recorder handler to each digit parser:

MyActor a = new MyActor();

// digit parser
Parser d = Prims.Digit;
// register actor
d.Act += new ActionEventHandler( a.RecordDigit );

// create parser
Parser p = Ops.Seq( d, Ops.Start( Ops.Seq(',', d)));

This is the same parser as above but now, MyParser.RecordDigit is called on each successfull digit match and therefore, the collection is filled.

Basic concepts

Spart follows the concepts of Spirit. There are a few fundamental concepts that need to be understood well: 1) The Parser, 2) the Match, 3) The Scanner, and 4) Semantic Actions. These basic concepts interact with each other, and the functionalities of each interweave throughout the entire framework to make it one coherent whole.

I will go quickly over those concepts since they are very well explained in the Spirit documentation and I recommend you take a look there first.

The parser

Central to the framework is the parser. The parser does the actual work of recognizing a linear input stream of data read sequentially from start to end by the scanner. The parser attempts to match the input following a well-defined set of specifications in the form of grammar rules. The parser reports the success or failure to its client through a match object. When successful, the parser calls a client-supplied semantic action. Finally, the strategically-placed semantic action extracts structural information depending on the data passed to it by the parser and the heirarchical context of the parser it is attached to.

Parsers come in a lot of flavors and usually you don't need to write your own parser. Spart has a collection of built-in parsers that you can combine to create your grammars. The built-in parsers come in two (main) flavors:

Primitives

Primitive parsers can be used to match characters, string, lower case character, digits, etc... The Prims class can be used to create such parsers.

Combination

Combination parsers can be used to combine parsers, like sequence and star in the example. The Ops class can be used to create such parsers.

The match

The ParserMatch class describes the parser match.

The Scanner

Like the parser, the scanner is also an abstract concept, represented by the IScanner interface. The task of the scanner is to feed the sequential input data stream to the parser. The scanner is of an input source and a cursor. The cursor is moved along by the parsers. Parsers extract data from the scanner and position the iterator appropriately through its member functions.

Semantic actions

A composite parser forms a hierarchy. Parsing proceeds from the topmost parent parser which delegates and apportions the parsing task to its children recursively to its childeren's children and so on until a primitive is reached. By attaching semantic actions to various points in this hierachy, in effect we can transform the flat linear input stream into a structured representation. This is essentially what parsers do.

The Rule

The Rule class represents a non-terminal parser. Basically, it is a wrapper around another parser. This aspect will be illustrated in the example below.

The Classic Calculator Example

There is still a lot to say about Spirit and Spart but I will cut to a final example. A better documentation should appear in a near future as this library is totally new!

The favorite grammar example in the Spirit documentation is a calculator grammar:

  • EBNF:
    group       ::= '(' expression ')'
    factor      ::= integer | group
    term        ::= factor (('*' factor) | ('/' factor))*
    expression  ::= term (('+' term) | ('-' term))*
  • Spirit (C++):
    group       = '(' >> expression >> ')';
    factor      = integer | group;
    term        = factor >> *(('*' >> factor) | ('/' >> factor));
    expression  = term >> *(('+' >> term) | ('-' >> term));

Let us now show how to build this grammar with Spart (you can find in the demo applciation). Passing aside some initialization details, the C# code would look as follows:

group.Parser       = Ops.Seq('(',Ops.Seq(expression,')'));
factor.Parser      = integer | group;
term.Parser        = Ops.Seq( factor, Ops.Klenee( 
                        Ops.Seq('*',factor) | Ops.Seq('/',factor) ));
expression.Parser  = Ops.Seq(term,Ops.Klenee(Ops.Seq('+',term) | 
                        Ops.Seq('-',term) ))

In the above, group, factor, term and expression are Rule object that have been previously initliazed:

// declaration 
Rule group; 
...
group = new Rule();

Note also, that the | operator can be overriden in C#, therefore we use it to create an alternative parser.

Debugging

It can be usefull to trace the scanner state and the parser matches. Similarly to Spirit, you can attach a tracer that will output a lot of interresting info:

// declaration in the Calculator class
Debugger debug;
...
// in the constructor
// create a debugger that outputs to the console
debug = new Debugger( Console.Out ); 

// setting rules name for debugging purpose
group.ID = "group"; 
...

// tracing rules
debug += group;
debug += expression;
...

The output is formated as follows:

  • the output is indented to reflect the parser tree,
  • parser name: ... reflects the start of the parsing and the current scanner input
  • / parser name: ... reflects that the parser has successfully mathed
  • # parser name: ... reflects that there was not mathch.

For example, when launching the parser on the string (5+2)*(4*2), it will output something like this:

expression: (5+2)*(4*2)
    term: (5+2)*(4*2)
        factor: (5+2)*(4*2)
            group: (5+2)*(4*2)
                expression: 5+2)*(4*2)
                    term: 5+2)*(4*2)
                        factor: 5+2)*(4*2)
                            group: 5+2)*(4*2
                            #group: 5+2)*(4*
                            integer: 5+2)*(4
                            /integer: +2)*(4
                        /factor: +2)*(4*2)
                    /term: +2)*(4*2)
                    term: 2)*(4*2)
                        factor: 2)*(4*2)
                            group: 2)*(4*2)
                            #group: 2)*(4*2)
                            integer: 2)*(4*2
                            /integer: )*(4*2
                        /factor: )*(4*2)
                    /term: )*(4*2)
                /expression: )*(4*2)
            /group: *(4*2)
        /factor: *(4*2)
        factor: (4*2)
            group: (4*2)
                expression: 4*2)
                    term: 4*2)
                        factor: 4*2)
                            group: 4*2)
                            #group: 4*2)
                            integer: 4*2)
                            /integer: *2)
                        /factor: *2)
                        factor: 2)
                            group: 2)
                            #group: 2)
                            integer: 2)
                            /integer: )
                        /factor: )
                    /term: )
                /expression: )
            /group:
        /factor:
    /term:
/expression:

ToDo List

There are still a lot of features from Spirit that are not implemented in Spart:

  • numeric parsers!
  • closures,
  • directives,
  • parsers: list_p, confix_p
  • dynamic rules,
  • etc...

References

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Jonathan de Halleux
Engineer
United States United States
Member
Jonathan de Halleux is Civil Engineer in Applied Mathematics. He finished his PhD in 2004 in the rainy country of Belgium. After 2 years in the Common Language Runtime (i.e. .net), he is now working at Microsoft Research on Pex (http://research.microsoft.com/pex).

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
GeneralCommentsussMårten R18 Dec '03 - 22:18 
After using boost/spirit rather extensively I have longed for something similar for C#.
 
And now we have it.
 
That's why I gave it a 5.
 
/Mårten
GeneralRe: CommentmemberJonathan de Halleux18 Dec '03 - 23:19 
Smile | :)
 
Want to contribute ?
 
Jonathan de Halleux.

www.dotnetwiki.org

GeneralURLTOTDmemberleppie18 Dec '03 - 7:49 
http://boost.sourceforget.net/ [^]
 
" Sabatier ChefStar 14-Piece Knife Block Set What a GREAT set of Knifes!
I consider myself a chef and when I was in school learning, carving and cutting, slicing and dicing, well all I can say is these knifes fit all my at home needs from cutting steak to slicing bread and even to cutting chicken into the parts I want it to be
"
 
leppie::AllocCPArticle("Zee blog");
Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.

GeneralRe: URLTOTDmemberJonathan de Halleux18 Dec '03 - 22:17 
Fixed !
 
Jonathan de Halleux.

www.dotnetwiki.org

GeneralRe: URLTOTDmemberleppie28 Dec '03 - 11:44 
Better recheck your website too, I noticed the hapy link pointed to sourceforget too Poke tongue | ;-P
 
leppie::AllocCPArticle("Zee blog");
Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.

GeneralCongratulations for platinum membership!memberxxxyyyzzz18 Dec '03 - 6:23 
Hihi!
GeneralRe: Congratulations for platinum membership!memberJonathan de Halleux18 Dec '03 - 7:08 
Not yet Smile | :) ... waiting to take out Champaign !
 
Jonathan de Halleux.

www.dotnetwiki.org

GeneralYet now! Hihi.memberxxxyyyzzz21 Dec '03 - 17:12 
I just envy you have so much free time!
GeneralRe: Yet now! Hihi.memberJonathan de Halleux21 Dec '03 - 21:45 
Do a PhD...
 
ps: your site is full a js error D'Oh! | :doh:
 
Jonathan de Halleux.

www.dotnetwiki.org

GeneralRe: Re: Yet now! Hihi.memberxxxyyyzzz23 Dec '03 - 11:32 
Yes.
If and only if I have time, I will reconstruct it.
But time is sparse.

GeneralLooks good :)memberleppie18 Dec '03 - 6:08 
I have been working on something similar, but not based on anything (in fact Relax NG looks about the closest). The Spirit parser I havent looked at though, but appears to be similar (seq, opt, choice, etc) Smile | :) . The main problem I'm current stuck at is, getting the damn selfhosting to work, as that was a main design goal... every shall change propegates thru to everything else, and currently I have to carefully change everything accordingly (painfull, but errors gets picked up quickly).
 
Anyways if you are interested, I can send you the source (with lexer(gen) and GUI(gen) too)...
 
Cheers Smile | :)
 
leppie::AllocCPArticle("Zee blog");
Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.

GeneralRe: Looks good :)memberJonathan de Halleux18 Dec '03 - 7:03 
Hi leppie,
 

leppie wrote:
getting the damn selfhosting to work, as that was a main design goal... every shall change propegates thru to everything else, and currently I have to carefully change everything accordingly (painfull, but errors gets picked up quickly).
 
I'm not sure I understand this? maybe an example ?
 

leppie wrote:
Anyways if you are interested, I can send you the source (with lexer(gen) and GUI(gen) too)...
 
Yes yes yes!!!! I'll find 4 hours more in my 35hours day Smile | :)
 
Jonathan de Halleux.

www.dotnetwiki.org

GeneralRe: Looks good :)memberleppie18 Dec '03 - 7:44 
Jonathan de Halleux wrote:
I'm not sure I understand this? maybe an example ?
 
Its a parser that can parse itself, like GCC , and Mono's C# parser, and producing the same output, whether it was compiled itself or by the initial process. Understand? Smile | :)
 
Another example is compiling a subset of your langauge just enough to compile the compiler.
 
Or stated like this. The compiler is written in its own output language.
 
leppie::AllocCPArticle("Zee blog");
Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.

GeneralRe: Looks good :)memberJonathan de Halleux18 Dec '03 - 22:16 
So basically you end with problem like these:
 
grow = "grow old";
lay = "lay an egg";
chicken = egg >> grow;
egg = chicken >> lay;
???
 
Aside form this stupid example, I may have a nice solution (maybe too trivial). Ok, here I go:
 
1. Write a EBNF parser. Since this the classical way of describing grammar, it would be the right place to start.
2. Build the Finite State Automata: Once this is done and tested, we can use the parser to build the FA of parser (using semantic action and your lib or QuickGraph)
3. Walk the automata and output code
 
Confused | :confused: Confused | :confused: Confused | :confused:
 
Jonathan de Halleux.

www.dotnetwiki.org

GeneralRe: Looks good :)memberleppie19 Dec '03 - 6:10 
Jonathan de Halleux wrote:
1. Write a EBNF parser. Since this the classical way of describing grammar, it would be the right place to start.
2. Build the Finite State Automata: Once this is done and tested, we can use the parser to build the FA of parser (using semantic action and your lib or QuickGraph)
3. Walk the automata and output code

 
1. Write a EBNF parser. Since this the classical way of describing grammar, it would be the right place to start.
2. Build the Finite State Automata: Once this is done and tested, we can use the parser to build the FA of parser (using semantic action and your lib or QuickGraph)
3. Walk the automata and output code

 
1. Funnilly enough, I dont have a chicken/egg problem Smile | :)
2. Everything happens dynamically, no building FSA. My parser i would define as a Recursive Descent Zero-Lookahead Backtracking parser taking an LL grammar.
3. This weekend I'm gonna get it going Smile | :)
 
leppie::AllocCPArticle("Zee blog");
Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.

GeneralRe: Looks good :)memberleppie18 Dec '03 - 8:46 
OK i give up, your mail server rejects me. Fix it, and you can have a look Laugh | :laugh:
 
This message was created automatically by mail delivery software (Exim).
 
A message that you sent could not be delivered to one or more of its
recipients. This is a permanent error. The following address(es) failed:
 
dehalleux at auto.ucl.ac.be
SMTP error from remote mailer after RCPT TO::
host ns1.auto.ucl.ac.be [130.104.239.135]: 550 :
Sender address rejected: Spam not accepted

 
leppie::AllocCPArticle("Zee blog");
Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.

GeneralRe: Looks good :)memberJonathan de Halleux18 Dec '03 - 22:11 
I'll forward that to my IT! Damn this s.... Could you send me your email privately ?
 
Jonathan de Halleux.

www.dotnetwiki.org

GeneralRe: Looks good :)memberleppie19 Dec '03 - 6:13 
Jonathan de Halleux wrote:
Could you send me your email privately
 
Lol, that took a few minutes to understand, I'll do it now Smile | :)
 
leppie::AllocCPArticle("Zee blog");
Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.

GeneralRe: Looks good :)memberleppie28 Dec '03 - 10:10 
Any luck yet? I have made quite a bit of progress over the last few days Smile | :)
 
leppie::AllocCPArticle("Zee blog");
Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.

GeneralRe: Looks good :)memberJonathan de Halleux28 Dec '03 - 11:21 
Smile | :) Smile | :) Smile | :) The wrapper strats to get looking good. You can download it at http://www.dotnetwiki.org[^] Smile | :)
 
Jonathan de Halleux.

www.dotnetwiki.org

GeneralRe: Looks good :)memberleppie28 Dec '03 - 11:41 
Hehe, I meant whether you have solved the email blocking issue? I really want you to look at the code of mine Wink | ;)
 
BTW: what is this wrapper strats you are talking about?
 
leppie::AllocCPArticle("Zee blog");
Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.

GeneralGot my 5memberJonathan de Halleux18 Dec '03 - 3:45 
and a beer. Thank you Jonathan.
 
Jonathan de Halleux.

www.dotnetwiki.org

GeneralRe: Got my 5memberJonathan de Halleux18 Dec '03 - 3:45 
No thank you, Jonathan.
 
Jonathan de Halleux.

www.dotnetwiki.org

GeneralBut not mine :)sussJonathan de Halleux 18 Dec '03 - 6:12 
D'Oh! | :doh:
 
leppie::AllocCPArticle("Zee blog");
Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.

GeneralBecoming szicofrenicmemberJonathan de Halleux18 Dec '03 - 7:08 
Aren't you Jonathan ?
 
Jonathan de Halleux.

www.dotnetwiki.org

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130523.1 | Last Updated 23 Dec 2003
Article Copyright 2003 by Jonathan de Halleux
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid