Click here to Skip to main content
Email Password   helpLost your password?

Sample image

Note: If you like this tool please remember to vote, as this will get other people interested and is more likely to help improve the final product
-- ToDoList organizer, Dan.G.

Introduction

The Spart library an object oriented recursive descent parser generator framework implemented in C#. In fact, it is a partial port of the excellent Spirit library, which is written in C++ and uses meta-programming.

The Spart framework enables a target grammar to be written exclusively in C#. An EBNF grammar can be closely match using C# code. In retrospect, conventional compiler-compilers or parser-generators have to perform an additional translation step from the source EBNF code to C or C++ code.

I have takened the liberty to use the structure (and some sentence) of the Spirit documentation. Along the article, some notes are added regarding some issues about the port to C#: Spirit-2-Spart Notes (SSN).

As always, this article presents an overview of the library. For deeper details, please refer to the NDoc documentation. The library also comes with a battery of NUnit tests.

Quick Start

Spart is designed to bring you parser capabilites quickly directly into your code. While it is not suited for creating parsers for entire language like C,C++, it is very effective for building micro-grammars in your code.

When you need to build a new parser, there are existing solution: a combination of ....Parse (like int.Parse) calls, or using regular expression (Regexp class) or a combination of both. However, these tools do not scale well when attempting to write more complex parsers: maintenance and readability become ackward.

So, as for Spirit, one of the main objective of Spart is to let you build easily grammar in C#. To fix this ideas, a few simple grammars illustrate Spart usage:

Trivial example #1:

Create a parser that will parse a digit:

Prims.Digit

(This is a trivial case, Char.IsDigit already does that). Prims is a static class, Digit is a property that create a new parser for digits. In fact, Prims is a helper class that creates primitive parsers for you and hides implementation details.

SSN: This parser is equivalent to num_p.

Trivial example #2:

Create a parser that will parse a sequence of two digits

Ops.Seq( Prims.Digit, Prims.Digit )

Here you see the familiar Prims.Digit parser enclosed in a Ops.Seq call. Like Prims, Ops is a static helper class that creates combine parsers for you. The Seq method creates a parser that is a sequence of two parsers (>> in Spirit):

Ops.Seq(a,b) <=> match a and b in sequence

Note: when we combine parsers, we end up with a "bigger" parser, But it's still a parser. Parsers can get bigger and bigger, nesting more and more, but whenever you glue two parsers together, you end up with one bigger parser. This is an important concept.

SSN: The operator >> does not accept arbitrary operands, they must be of integral type which restrict it's use.

Trivial Example #3

Create a parser that will accept an arbitrary number of digit. (Arbitrary means anything from zero to infinity).

Ops.Star(Prims.Digit)

This is like the regular expression Kleene Star.

SSN: * cannot be an unary operator in C#.

Less trivial example #4

Create a parser that parse a sequence of comma separated digits and record them in a collection (note this can easily be done using String.Split).

// spirit: num_p >> *( ch_p(',') >> num_p)

Ops.Seq( Prims.Digit, Ops.Start( Ops.Seq(Prims.Ch(','), Prims.Digit)))

Note that Prims.Ch('x') will math the character x. Prims.Ch(',') will match a comma. In this case, the Kleene star matches a more complex expression, namely

Ops.Seq(Prims.Ch(','), Prims.Digit)

which matches a sequence of comma and digit.

Note: the Ops.Sep(Prims.Ch(','),... expression can be simplified to Ops.Seq(',', ... to simplify notations.

Using the parser

Once we have built our parser, we want to use it. A parser can be used directly as is:

// string to parser

String s = "1,2,3,4";
// creating the parser as above

Parser p = Ops.Seq(...);
// creating a scanner of the string

StringScanner scan = new StringScanner(s);
// parsing the string through the scanner

ParserMatch m = p.Parse(s);

Notes:

Now that we have parsed the text, the ParserMatch object can help answer questions like: was the match successful, what was the match value, etc... :

if (m.Success)
    Console.Write("successfull match!");

Semantic Actions

Our parser above is nothing but a recognizer, it does no take any actions. It answers "did our data match the grammar?" but it does not record anything. Remember that we wanted to record the digits into a collection. For example, whenever we parse a digit, we wish to store the parsed number after a successful match. We now wish to extract information from the parser. Semantic actions do this. Semantic actions may be attached to any point in the grammar specification. They through events and event handler.

The Parser class has an event, Act, that is called on a successful match. We need to add a event handler on the Prims.Digit parser that records the digit into a collection. First, we write an actor that will record the digits

// A digit recorder actor

public class MyActor
{
    ArrayList digits; // digits collection

    ...
    public void RecordDigit(Object sender, ActionEventArgs args)
    {
        // record the digit into the digits collection

        digits.Add( args.TypedValue );
    }
}

Then, we add a digit recorder handler to each digit parser:

MyActor a = new MyActor();

// digit parser

Parser d = Prims.Digit;
// register actor

d.Act += new ActionEventHandler( a.RecordDigit );

// create parser

Parser p = Ops.Seq( d, Ops.Start( Ops.Seq(',', d)));

This is the same parser as above but now, MyParser.RecordDigit is called on each successfull digit match and therefore, the collection is filled.

Basic concepts

Spart follows the concepts of Spirit. There are a few fundamental concepts that need to be understood well: 1) The Parser, 2) the Match, 3) The Scanner, and 4) Semantic Actions. These basic concepts interact with each other, and the functionalities of each interweave throughout the entire framework to make it one coherent whole.

I will go quickly over those concepts since they are very well explained in the Spirit documentation and I recommend you take a look there first.

The parser

Central to the framework is the parser. The parser does the actual work of recognizing a linear input stream of data read sequentially from start to end by the scanner. The parser attempts to match the input following a well-defined set of specifications in the form of grammar rules. The parser reports the success or failure to its client through a match object. When successful, the parser calls a client-supplied semantic action. Finally, the strategically-placed semantic action extracts structural information depending on the data passed to it by the parser and the heirarchical context of the parser it is attached to.

Parsers come in a lot of flavors and usually you don't need to write your own parser. Spart has a collection of built-in parsers that you can combine to create your grammars. The built-in parsers come in two (main) flavors:

Primitives

Primitive parsers can be used to match characters, string, lower case character, digits, etc... The Prims class can be used to create such parsers.

Combination

Combination parsers can be used to combine parsers, like sequence and star in the example. The Ops class can be used to create such parsers.

The match

The ParserMatch class describes the parser match.

The Scanner

Like the parser, the scanner is also an abstract concept, represented by the IScanner interface. The task of the scanner is to feed the sequential input data stream to the parser. The scanner is of an input source and a cursor. The cursor is moved along by the parsers. Parsers extract data from the scanner and position the iterator appropriately through its member functions.

Semantic actions

A composite parser forms a hierarchy. Parsing proceeds from the topmost parent parser which delegates and apportions the parsing task to its children recursively to its childeren's children and so on until a primitive is reached. By attaching semantic actions to various points in this hierachy, in effect we can transform the flat linear input stream into a structured representation. This is essentially what parsers do.

The Rule

The Rule class represents a non-terminal parser. Basically, it is a wrapper around another parser. This aspect will be illustrated in the example below.

The Classic Calculator Example

There is still a lot to say about Spirit and Spart but I will cut to a final example. A better documentation should appear in a near future as this library is totally new!

The favorite grammar example in the Spirit documentation is a calculator grammar:

Let us now show how to build this grammar with Spart (you can find in the demo applciation). Passing aside some initialization details, the C# code would look as follows:

group.Parser       = Ops.Seq('(',Ops.Seq(expression,')'));
factor.Parser      = integer | group;
term.Parser        = Ops.Seq( factor, Ops.Klenee( 
                        Ops.Seq('*',factor) | Ops.Seq('/',factor) ));
expression.Parser  = Ops.Seq(term,Ops.Klenee(Ops.Seq('+',term) | 
                        Ops.Seq('-',term) ))

In the above, group, factor, term and expression are Rule object that have been previously initliazed:

// declaration 

Rule group; 
...
group = new Rule();

Note also, that the | operator can be overriden in C#, therefore we use it to create an alternative parser.

Debugging

It can be usefull to trace the scanner state and the parser matches. Similarly to Spirit, you can attach a tracer that will output a lot of interresting info:

// declaration in the Calculator class

Debugger debug;
...
// in the constructor

// create a debugger that outputs to the console

debug = new Debugger( Console.Out ); 

// setting rules name for debugging purpose

group.ID = "group"; 
...

// tracing rules

debug += group;
debug += expression;
...

The output is formated as follows:

For example, when launching the parser on the string (5+2)*(4*2), it will output something like this:

expression: (5+2)*(4*2)
    term: (5+2)*(4*2)
        factor: (5+2)*(4*2)
            group: (5+2)*(4*2)
                expression: 5+2)*(4*2)
                    term: 5+2)*(4*2)
                        factor: 5+2)*(4*2)
                            group: 5+2)*(4*2
                            #group: 5+2)*(4*
                            integer: 5+2)*(4
                            /integer: +2)*(4
                        /factor: +2)*(4*2)
                    /term: +2)*(4*2)
                    term: 2)*(4*2)
                        factor: 2)*(4*2)
                            group: 2)*(4*2)
                            #group: 2)*(4*2)
                            integer: 2)*(4*2
                            /integer: )*(4*2
                        /factor: )*(4*2)
                    /term: )*(4*2)
                /expression: )*(4*2)
            /group: *(4*2)
        /factor: *(4*2)
        factor: (4*2)
            group: (4*2)
                expression: 4*2)
                    term: 4*2)
                        factor: 4*2)
                            group: 4*2)
                            #group: 4*2)
                            integer: 4*2)
                            /integer: *2)
                        /factor: *2)
                        factor: 2)
                            group: 2)
                            #group: 2)
                            integer: 2)
                            /integer: )
                        /factor: )
                    /term: )
                /expression: )
            /group:
        /factor:
    /term:
/expression:

ToDo List

There are still a lot of features from Spirit that are not implemented in Spart:

References

You must Sign In to use this message board.
 
 
Per page   
 FirstPrevNext
QuestionSuicide note:How to find an expression inside random input?
BreakableC
4:17 28 Nov '09  
Spart, clearly needs more examples. I am trying to find a Parseable expression in a sequence of random symbols.I am not able to do it no matter how hard I try. This is my latest invention. I am using the version from Palaso svn:

//Random sequence
Parser random = Ops.ZeroOrMore( Prims.AnyChar);

//String expression
Parser str = MakeStringParser("namespace");

//Random but not string
Parser randomSequenceNotStr = Ops.Difference(pp, str);

//Random with string inside
Parser randomSequenceWithStr = Ops.Sequence(Ops.Difference(pp, str), str, Ops.Difference(pp, str));

//See if string is found
str.Act += new Spart.Actions.ActionHandler(insideRandomData_Act);

//Not found and never will
ParserMatch match = randomSequenceWithStr.Parse(new StringScanner(" namespace " ));


//Nothing specia here
public static Parser MakeStringParser(string p)
{
List<Parser> parsers = new List<Parser>();
foreach (Char c in p.ToCharArray())
{
parsers.Add(Prims.Ch(c));

}
return Ops.Sequence(parsers.ToArray());
}


I don't claim to be a great programmer, I try to imitate one.

AnswerRe: Suicide note:How to find an expression inside random input?
BreakableC
5:26 29 Nov '09  
Seems I got it:

private static Rule InsideRandomSequence(Rule str)
{

Rule randomBRR = new Rule();
randomBRR.Parser = Prims.AnyChar;
randomBRR.ID = "RandomChar";

Rule randomCharOrString = new Rule();
randomCharOrString.Parser = Ops.Choice(str, randomBRR);
randomCharOrString.ID = "randomCharOrString";

Rule zeroOrMore = new Rule();
zeroOrMore.ID = "zeroOrMore";
zeroOrMore.Parser = Ops.ZeroOrMore(randomCharOrString);

//Console.Clear();
//Debugger debug = new Debugger(Console.Out);
//debug += str;
//debug += randomBRR;
//debug += randomCharOrString;
//debug += str;
//debug += zeroOrMore;
return zeroOrMore;
}


Should work for other parsers as well. Usage of Rule if just for debug. Will detect multiple instances, as well as instances within instances.

I don't claim to be a great programmer, I try to imitate one.

GeneralGood job!
Mubi | www.mrmubi.com
10:46 24 Jun '09  
cool stuff!

Mubshir Raza Ali
^^^^^^^^^^^^^^^^^^^^
http://www.mrmubi.com

GeneralUpdated version available
Patrick Fischer
0:17 11 Nov '08  
A newer version has been developed at http://code.wesay.org/Palaso/trunk/Palaso/Palaso/. It includes some minor changes and updates to take advantage of C# 2 (See also http://www.wesay.org/blogs/palaso/2007/09/30/10/)
GeneralWhy not use operator overlaoding?
Member 3453924
9:39 22 Mar '08  
I don't know if there is some limitation that prevents this library from using operator overloading, but e.g. to represent

term ::= factor (('*' factor) | ('/' factor))*

why not use

term.Parser = factor >> (('*' >> factor) | ('/' >> factor)).Klenee();

instead of

term.Parser = Ops.Seq( factor, Ops.Klenee(
Ops.Seq('*',factor) | Ops.Seq('/',factor) ));
GeneralInteresting
Astaelan
8:58 11 Nov '07  
I have recently become interested in writing a script engine for .NET conforming to similar, but not exact Lua-like syntax.

In doing so, I have found a grammar for Lua, and have been trying to implement it in Spart. I have started by modifying the basic variable declaration of Lua from using "local" to using "var", otherwise the remaining grammar has remained the same for this statement. So far I have tried to attach actions to various parsers within the statement, and the only action I can get to trigger is the "var" string parser action.

The test data I am using is a simple: var x = 123

The only action called from anything in the grammar seems to be for "var" as the value, and nothing else. I cannot get my identifierlist, or identifier indepenantly to trigger an action when parsed. As I understand it, I should be able to attach an action to any parser and when that parser is matched it will execute the attached actions. However, if I create an action for Identifier's and parse the above code, my action is not called and thus "x" is not printed for me.

It could be that my grammar is flawed, but it seems correct to me. I have never used spirit or spart before, and have only recently taken an interest in lex, yacc, and the descendant products from their influence. I would really appreciate an understanding as to why some actions are called and not others. My goal is to provide a stack implementation and have the actions work against a stack to perform their duties. In this simple example, I should be able to act upon each token, pushing operations and variables onto the stack to be processed by the stack API. This example should allow me to push a declaration operator, followed by a list of identifiers, followed by an assignment operator, followed by a list of expressions which may be further evaluated, but in this case would only push a numeric value. The stack containing these operations would allow me to execute an order of operations such that the identifiers would be associated in a dictionary as keys, to the values assigned (or null by default if the optional assignment is lacking).

Sorry for getting deep into this, but Spart looks like a viable option for my goal and I'd like to get it working before I digress to trying a combination of managed and unmanaged code using Spirit in MC++.

Thanks for your time, and thanks for your effort in porting a useful tool to the .NET world. Sure would have been nice to be able to implement all the operator overloads rather than falling back on static methods Smile
GeneralBug with DifferenceParser
leclerc9
17:33 8 Apr '07  
Heres my program:

static void Main(string[] args)
{
string s = "lint int";
StringScanner sc = new StringScanner(s);
Parser intParser = Prims.Str("int");
Parser diff = Ops.Difference(Ops.Positive(Prims.Letter), intParser);
Parser all = Ops.Positive(diff | intParser | Prims.WhiteSpace);
intParser.Act += new Spart.Actions.ActionHandler(intParser_Act);
diff.Act += new Spart.Actions.ActionHandler(diff_Act);
all.Parse(sc);
}

static void diff_Act(object sender, Spart.Actions.ActionEventArgs args)
{
Console.WriteLine("Diff.");
}

static void intParser_Act(object sender, Spart.Actions.ActionEventArgs args)
{
Console.WriteLine("Int.");
}

It prints out:

Diff.
Int.
Int.

It should print out:

Diff.
Int.

This is fixed by editing DifferenceParser.cs and changing
// doing difference
scan.Seek(offset);
ParserMatch d = SecondParser.Parse(scan);
to
// doing difference
scan.Seek(offset);
ParserMatch d = SecondParser.ParseMain(scan);
GeneralCurrent Version
b3rnd
15:45 16 Mar '07  
Hello,

does anybody know how much of the spirit implementation is covered by spart ? are there any updates available ?

thanks
GeneralVisual BNF parser generator
AdamSlosarski
19:10 13 Mar '07  
Look at this parser with testing of grammars: www.intralogic.eu

Testing your code against performance will keep you running with good scalability and maintenance for years.
NTime.exe - the free tool for real developers of high scalability applications!

Generalpb with Klenee
pado
23:57 19 Dec '06  
I have the following rules

//([aA-zZ0-9 ]+)
_opQt.Parser = Ops.Positive(Ops.Alternative(Prims.LetterOrDigit, ' '));


//CND_QT -> [OPQT] *(
_cnd_qt.Parser = Ops.Seq(
'[',
Ops.Seq(_opQt,
Ops.Seq(']',
Ops.Seq(Ops.Klenee(' '),
'('
))));


These rules are ok for parsing "[all] (" but I have an exception for "[all](" (no white space between ']' and '(').
The exception is "match resersed ?" and is throwed by the ParserMatch class.

What's wrong with my code ?
GeneralUsing Spart library to parse T-SQL
Alberto Venditti
13:33 19 Nov '06  
Hi Jonathan.
This my post is just to inform you about my article titled "DBdoc: automated CHM documentation for a SQL Server 2000 database", where I used your wonderful Spart library.

Thank you for your work!
AV
Generalbuild errors?
tbhanson
3:01 11 May '06  
hi,

this looks extremely interesting to me, so i tried to build it but get (after converting from Visual Studio 1.1 to 2.0 i guess):
Error 1 Source file 'D:\3rd_party\Spart\Spart\Debug\Debugger.cs' could not be opened ('The system cannot find the file specified. ') Spart
Error 2 Source file 'D:\3rd_party\Spart\Spart\Debug\DebugContext.cs' could not be opened ('The system cannot find the file specified. ') Spart

are these intentionally missing? is there a convention i should know about?

cheers,

Tim Hanson

GeneralRe: build errors?
rbarras
4:20 11 May '06  
hi

These are missing in the source.

I guess i deleted the reference to build it.
GeneralRe: build errors?
tbhanson
4:33 11 May '06  
Thanks. after doing that, installing and referencing NUnit, and commenting out a few references to the Debug stuff, it builds for me too.

thanks again.

GeneralRe: build errors?
phm37
14:56 11 Mar '07  
I've done the same, but it would be nice to have the source for debugging? WTF
GeneralProblem with empty match in StringParser class
rbarras
3:37 10 May '06  

public ParserMatch EmptyMatch
{
get
{
//return new ParserMatch(this, 0, 0);
return new ParserMatch(this,this.Offset,0); // modified
}
}

some problems with optional parser solved with this change Smile
GeneralAnyone have the updated source?
Curtis W
10:05 12 Jan '06  
The source available on this site does not include Ops.Start(), any of the Spart.Debug.* stuff, and lots of other things!

Thanks!
GeneralI have imporvd your TST class, here is the code
AndySoft
6:28 27 Aug '05  
internal void PartialMatchSearch(
TstDictionaryEntry p,
string key,
int index,
char wildChar,
IList matches,
bool ignoreLength
)
{
if (p==null)
return;
char c=key[index];
if (c == wildChar || c < p.SplitChar)
PartialMatchSearch(p.LowChild,key,index,wildChar,matches,ignoreLength);

if (c == wildChar || c == p.SplitChar)
{
if (index < key.Length-1)
PartialMatchSearch(p.EqChild,key,index+1,wildChar,matches,ignoreLength);
else
{
if (p.IsKey)
matches.Add(new DictionaryEntry(p.Key,p.Value));
if ( ignoreLength && index == key.Length-1 )
{
// Traverse+add all possible sub-values
GetAllKeys(p.EqChild,matches);
}
}
}
if (c == wildChar || c > p.SplitChar)
PartialMatchSearch(p.HighChild, key,index,wildChar,matches,ignoreLength);
}

internal void GetAllKeys(
TstDictionaryEntry p,
IList matches )
{
if (p==null)
return;
if (p.IsKey)
matches.Add(new DictionaryEntry(p.Key,p.Value));
GetAllKeys(p.LowChild,matches);
GetAllKeys(p.EqChild,matches);
GetAllKeys(p.HighChild,matches);
}



Andysoft
GeneralGreat Tool
alam_pune
12:37 22 Aug '05  
This is really a great tool. You get 5/5 from me.

But I think if you can add the functionality to specify the grammar in a saperate txt file like lex/yacc, it would be very useful. I am using your code but everytime there is a change in rule, I need to recompile. Is there any workaround.

Thanks in advance for your help.
GeneralRe: Great Tool
alam_pune
12:18 23 Aug '05  
Hey,

I finally solved this.

I am keeping the rules in an xml file. I am generating the c# parser grammer in-memory assembly at runtime using Reflection and Compiler Services.

Let me know if anybody has a better idea?
GeneralRuntime parser generation?
Micah Wedemeyer
11:30 18 May '05  
Hi,

Is there any way to create a parser at runtime based on input? In other words, can I write a grammar file and feed that to Spart in order to create a parser?Confused
GeneralPatch for RepetitionParser.cs
LondonFroggy
6:47 30 Dec '04  
As the new version disappeared from dotnetwiki.org, the version on this web site has a big bug on RepetitionParser.cs

Here's a corrected version for the ParserMatch method:


public override ParserMatch ParseMain(IScanner scanner)
{
// save scanner state
long offset = scanner.Offset;

ParserMatch m = null;
ParserMatch m_temp = scanner.EmptyMatch;

// execution bound
int count=0;

// lower bound, minimum number of executions
while(count < LowerBound && !scanner.AtEnd)
{
m_temp = Parser.Parse(scanner);
// stop if not successful
if (!m_temp.Success)
break;
else
{
if (m == null)
m = m_temp;
else
m.Concat(m_temp);
}
// increment count and update full match
++count;
}

if (count == LowerBound)
{
while(count < UpperBound && !scanner.AtEnd)
{
m_temp = Parser.Parse(scanner);

// stop if not successful
if (!m_temp.Success)
break;
else
{
if (m == null)
m = m_temp;
else
m.Concat(m_temp);
}

// increment count
++count;
}
}
else
m=scanner.NoMatch;

if (m == null)
{
if (LowerBound == 0)
m = scanner.EmptyMatch;
else
m = scanner.NoMatch;
}

// restoring parser failed, rewind scanner
if (!m.Success)
scanner.Seek(offset);

return m;
}



This is obviously only a quick correction. Jonathan assumed the count property would return the length of the match, which breaks when you combine operators like Ops.Klenee with subsequent Ops.Seq for example.

Enjoy!

Sebastien Lambla
http://blog.thetechnologist.net
GeneralError handling
Ivo Arabadjiev
13:10 14 Nov '04  
Is there something like error handling in Spart? Or may be just which rule or parser fails.

Thanks, in advance.
GeneralRe: Error handling
LondonFroggy
6:44 30 Dec '04  
There's an easy way to do that... Just listen for the PostParse event and look at the Match object's Success property.

Sebastien Lambla a.k.a. TheTechnologist
http://blog.thetechnologist.net
Generalwhitespace
Angel Todorov
1:37 9 Sep '04  
How is it possible to filter whitespace from the input text ? This is absolutely necessary for any kind of scanning / parsing. As far as I have seen, in the IFilter interface there only exist a method for converting * to lower case. In comparison , with JFlex you have much more flexibility regarding scanning. Otherwise, with Spart, you have to include whitespace parsing in the grammar itself, and then not associate actions for it, if you want it to be ignored, right ? Which is quite cumbersome provided you supply a grammar for a language with lots of ops and possible constructs

Regards,
Angel


Last Updated 23 Dec 2003 | Advertise | Privacy | Terms of Use | Copyright © CodeProject, 1999-2010