Click here to Skip to main content
Click here to Skip to main content

Spart, a parser generator framework 100% C#

By , 22 Dec 2003
 

Sample image

Note: If you like this tool please remember to vote, as this will get other people interested and is more likely to help improve the final product
-- ToDoList organizer, Dan.G.

Introduction

The Spart library an object oriented recursive descent parser generator framework implemented in C#. In fact, it is a partial port of the excellent Spirit library, which is written in C++ and uses meta-programming.

The Spart framework enables a target grammar to be written exclusively in C#. An EBNF grammar can be closely match using C# code. In retrospect, conventional compiler-compilers or parser-generators have to perform an additional translation step from the source EBNF code to C or C++ code.

I have takened the liberty to use the structure (and some sentence) of the Spirit documentation. Along the article, some notes are added regarding some issues about the port to C#: Spirit-2-Spart Notes (SSN).

As always, this article presents an overview of the library. For deeper details, please refer to the NDoc documentation. The library also comes with a battery of NUnit tests.

Quick Start

Spart is designed to bring you parser capabilites quickly directly into your code. While it is not suited for creating parsers for entire language like C,C++, it is very effective for building micro-grammars in your code.

When you need to build a new parser, there are existing solution: a combination of ....Parse (like int.Parse) calls, or using regular expression (Regexp class) or a combination of both. However, these tools do not scale well when attempting to write more complex parsers: maintenance and readability become ackward.

So, as for Spirit, one of the main objective of Spart is to let you build easily grammar in C#. To fix this ideas, a few simple grammars illustrate Spart usage:

Trivial example #1:

Create a parser that will parse a digit:

Prims.Digit

(This is a trivial case, Char.IsDigit already does that). Prims is a static class, Digit is a property that create a new parser for digits. In fact, Prims is a helper class that creates primitive parsers for you and hides implementation details.

SSN: This parser is equivalent to num_p.

Trivial example #2:

Create a parser that will parse a sequence of two digits

Ops.Seq( Prims.Digit, Prims.Digit )

Here you see the familiar Prims.Digit parser enclosed in a Ops.Seq call. Like Prims, Ops is a static helper class that creates combine parsers for you. The Seq method creates a parser that is a sequence of two parsers (>> in Spirit):

Ops.Seq(a,b) <=> match a and b in sequence

Note: when we combine parsers, we end up with a "bigger" parser, But it's still a parser. Parsers can get bigger and bigger, nesting more and more, but whenever you glue two parsers together, you end up with one bigger parser. This is an important concept.

SSN: The operator >> does not accept arbitrary operands, they must be of integral type which restrict it's use.

Trivial Example #3

Create a parser that will accept an arbitrary number of digit. (Arbitrary means anything from zero to infinity).

Ops.Star(Prims.Digit)

This is like the regular expression Kleene Star.

SSN: * cannot be an unary operator in C#.

Less trivial example #4

Create a parser that parse a sequence of comma separated digits and record them in a collection (note this can easily be done using String.Split).

// spirit: num_p >> *( ch_p(',') >> num_p)
Ops.Seq( Prims.Digit, Ops.Start( Ops.Seq(Prims.Ch(','), Prims.Digit)))

Note that Prims.Ch('x') will math the character x. Prims.Ch(',') will match a comma. In this case, the Kleene star matches a more complex expression, namely

Ops.Seq(Prims.Ch(','), Prims.Digit)

which matches a sequence of comma and digit.

Note: the Ops.Sep(Prims.Ch(','),... expression can be simplified to Ops.Seq(',', ... to simplify notations.

Using the parser

Once we have built our parser, we want to use it. A parser can be used directly as is:

// string to parser
String s = "1,2,3,4";
// creating the parser as above
Parser p = Ops.Seq(...);
// creating a scanner of the string
StringScanner scan = new StringScanner(s);
// parsing the string through the scanner
ParserMatch m = p.Parse(s);

Notes:

  • Parser is an abstract base class for all parsers. It contains the Parse method that can be used to parse some input.
  • The parser does not work directly on string but rather on some modified stream (scanners). Therefore, it is possible to parse directly from files or stream. StringScanner implements the scanner interface and wraps the string s.
  • ParserMatch is the parser result (see below)

Now that we have parsed the text, the ParserMatch object can help answer questions like: was the match successful, what was the match value, etc... :

if (m.Success)
    Console.Write("successfull match!");

Semantic Actions

Our parser above is nothing but a recognizer, it does no take any actions. It answers "did our data match the grammar?" but it does not record anything. Remember that we wanted to record the digits into a collection. For example, whenever we parse a digit, we wish to store the parsed number after a successful match. We now wish to extract information from the parser. Semantic actions do this. Semantic actions may be attached to any point in the grammar specification. They through events and event handler.

The Parser class has an event, Act, that is called on a successful match. We need to add a event handler on the Prims.Digit parser that records the digit into a collection. First, we write an actor that will record the digits

// A digit recorder actor
public class MyActor
{
    ArrayList digits; // digits collection
    ...
    public void RecordDigit(Object sender, ActionEventArgs args)
    {
        // record the digit into the digits collection
        digits.Add( args.TypedValue );
    }
}

Then, we add a digit recorder handler to each digit parser:

MyActor a = new MyActor();

// digit parser
Parser d = Prims.Digit;
// register actor
d.Act += new ActionEventHandler( a.RecordDigit );

// create parser
Parser p = Ops.Seq( d, Ops.Start( Ops.Seq(',', d)));

This is the same parser as above but now, MyParser.RecordDigit is called on each successfull digit match and therefore, the collection is filled.

Basic concepts

Spart follows the concepts of Spirit. There are a few fundamental concepts that need to be understood well: 1) The Parser, 2) the Match, 3) The Scanner, and 4) Semantic Actions. These basic concepts interact with each other, and the functionalities of each interweave throughout the entire framework to make it one coherent whole.

I will go quickly over those concepts since they are very well explained in the Spirit documentation and I recommend you take a look there first.

The parser

Central to the framework is the parser. The parser does the actual work of recognizing a linear input stream of data read sequentially from start to end by the scanner. The parser attempts to match the input following a well-defined set of specifications in the form of grammar rules. The parser reports the success or failure to its client through a match object. When successful, the parser calls a client-supplied semantic action. Finally, the strategically-placed semantic action extracts structural information depending on the data passed to it by the parser and the heirarchical context of the parser it is attached to.

Parsers come in a lot of flavors and usually you don't need to write your own parser. Spart has a collection of built-in parsers that you can combine to create your grammars. The built-in parsers come in two (main) flavors:

Primitives

Primitive parsers can be used to match characters, string, lower case character, digits, etc... The Prims class can be used to create such parsers.

Combination

Combination parsers can be used to combine parsers, like sequence and star in the example. The Ops class can be used to create such parsers.

The match

The ParserMatch class describes the parser match.

The Scanner

Like the parser, the scanner is also an abstract concept, represented by the IScanner interface. The task of the scanner is to feed the sequential input data stream to the parser. The scanner is of an input source and a cursor. The cursor is moved along by the parsers. Parsers extract data from the scanner and position the iterator appropriately through its member functions.

Semantic actions

A composite parser forms a hierarchy. Parsing proceeds from the topmost parent parser which delegates and apportions the parsing task to its children recursively to its childeren's children and so on until a primitive is reached. By attaching semantic actions to various points in this hierachy, in effect we can transform the flat linear input stream into a structured representation. This is essentially what parsers do.

The Rule

The Rule class represents a non-terminal parser. Basically, it is a wrapper around another parser. This aspect will be illustrated in the example below.

The Classic Calculator Example

There is still a lot to say about Spirit and Spart but I will cut to a final example. A better documentation should appear in a near future as this library is totally new!

The favorite grammar example in the Spirit documentation is a calculator grammar:

  • EBNF:
    group       ::= '(' expression ')'
    factor      ::= integer | group
    term        ::= factor (('*' factor) | ('/' factor))*
    expression  ::= term (('+' term) | ('-' term))*
  • Spirit (C++):
    group       = '(' >> expression >> ')';
    factor      = integer | group;
    term        = factor >> *(('*' >> factor) | ('/' >> factor));
    expression  = term >> *(('+' >> term) | ('-' >> term));

Let us now show how to build this grammar with Spart (you can find in the demo applciation). Passing aside some initialization details, the C# code would look as follows:

group.Parser       = Ops.Seq('(',Ops.Seq(expression,')'));
factor.Parser      = integer | group;
term.Parser        = Ops.Seq( factor, Ops.Klenee( 
                        Ops.Seq('*',factor) | Ops.Seq('/',factor) ));
expression.Parser  = Ops.Seq(term,Ops.Klenee(Ops.Seq('+',term) | 
                        Ops.Seq('-',term) ))

In the above, group, factor, term and expression are Rule object that have been previously initliazed:

// declaration 
Rule group; 
...
group = new Rule();

Note also, that the | operator can be overriden in C#, therefore we use it to create an alternative parser.

Debugging

It can be usefull to trace the scanner state and the parser matches. Similarly to Spirit, you can attach a tracer that will output a lot of interresting info:

// declaration in the Calculator class
Debugger debug;
...
// in the constructor
// create a debugger that outputs to the console
debug = new Debugger( Console.Out ); 

// setting rules name for debugging purpose
group.ID = "group"; 
...

// tracing rules
debug += group;
debug += expression;
...

The output is formated as follows:

  • the output is indented to reflect the parser tree,
  • parser name: ... reflects the start of the parsing and the current scanner input
  • / parser name: ... reflects that the parser has successfully mathed
  • # parser name: ... reflects that there was not mathch.

For example, when launching the parser on the string (5+2)*(4*2), it will output something like this:

expression: (5+2)*(4*2)
    term: (5+2)*(4*2)
        factor: (5+2)*(4*2)
            group: (5+2)*(4*2)
                expression: 5+2)*(4*2)
                    term: 5+2)*(4*2)
                        factor: 5+2)*(4*2)
                            group: 5+2)*(4*2
                            #group: 5+2)*(4*
                            integer: 5+2)*(4
                            /integer: +2)*(4
                        /factor: +2)*(4*2)
                    /term: +2)*(4*2)
                    term: 2)*(4*2)
                        factor: 2)*(4*2)
                            group: 2)*(4*2)
                            #group: 2)*(4*2)
                            integer: 2)*(4*2
                            /integer: )*(4*2
                        /factor: )*(4*2)
                    /term: )*(4*2)
                /expression: )*(4*2)
            /group: *(4*2)
        /factor: *(4*2)
        factor: (4*2)
            group: (4*2)
                expression: 4*2)
                    term: 4*2)
                        factor: 4*2)
                            group: 4*2)
                            #group: 4*2)
                            integer: 4*2)
                            /integer: *2)
                        /factor: *2)
                        factor: 2)
                            group: 2)
                            #group: 2)
                            integer: 2)
                            /integer: )
                        /factor: )
                    /term: )
                /expression: )
            /group:
        /factor:
    /term:
/expression:

ToDo List

There are still a lot of features from Spirit that are not implemented in Spart:

  • numeric parsers!
  • closures,
  • directives,
  • parsers: list_p, confix_p
  • dynamic rules,
  • etc...

References

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here

About the Author

Jonathan de Halleux
Engineer
United States United States
Member
Jonathan de Halleux is Civil Engineer in Applied Mathematics. He finished his PhD in 2004 in the rainy country of Belgium. After 2 years in the Common Language Runtime (i.e. .net), he is now working at Microsoft Research on Pex (http://research.microsoft.com/pex).

Sign Up to vote   Poor Excellent
Add a reason or comment to your vote: x
Votes of 3 or less require a comment

Comments and Discussions

 
You must Sign In to use this message board.
Search this forum  
    Spacing  Noise  Layout  Per page   
Generalnon greed / abstemiousmemberotuyama27 Dec '10 - 16:00 
There is a feature that I could not find in Spart: abstemious (non greed) match. I have to write alternative match, and a default match for anything. But when I use Ops.Klenee(Prims.AnyChar), it is greed, so it matches until the end of text. Is there some way to make it abstemious? (like regex *?)
Generaloverloadmemberotuyama27 Dec '10 - 15:58 
Spart is a great tool. But I think it should use operator overload avaiable in C#.
I wanted to write Calculator.cs like this:
 
                  group.Parser = "(" & expression & ")";
                  factor.Parser = group | integer;
                  term.Parser            = factor & +(mult      | div );
                  expression.Parser   = term & +(add | mult );
 
Where "&" is Seq, and "+" is prefix Klenee. So I changed the Parser.cs to:
 
            public static SequenceParser operator &(Parser left, Parser right) {
                  return Ops.Seq(left, right);
            }
            public static SequenceParser operator &(Parser left, String right) {
                  return Ops.Seq(left, right);
            }
            public static SequenceParser operator &(String left, Parser right) {
                  return Ops.Seq(left, right);
            }
            public static RepetitionParser operator +(Parser p) {
                  return Ops.Klenee(p);
            }
 
I fixed it for myself, but I think it would be useful for everybody.
QuestionSuicide note:How to find an expression inside random input?memberBreakableC28 Nov '09 - 3:17 
Spart, clearly needs more examples. I am trying to find a Parseable expression in a sequence of random symbols.I am not able to do it no matter how hard I try. This is my latest invention. I am using the version from Palaso svn:
//Random sequence
Parser random = Ops.ZeroOrMore( Prims.AnyChar);
 
//String expression
Parser str = MakeStringParser("namespace");
 
//Random but not string
Parser randomSequenceNotStr = Ops.Difference(pp, str);
 
//Random with string inside
Parser randomSequenceWithStr = Ops.Sequence(Ops.Difference(pp, str), str, Ops.Difference(pp, str));
 
//See if string is found
str.Act += new Spart.Actions.ActionHandler(insideRandomData_Act);
 
//Not found and never will
ParserMatch match = randomSequenceWithStr.Parse(new StringScanner(" namespace " ));
 

//Nothing specia here
        public static Parser MakeStringParser(string p)
        {
            List<Parser> parsers = new List<Parser>();
            foreach (Char c in p.ToCharArray())
            {
                parsers.Add(Prims.Ch(c));
 
            }
            return Ops.Sequence(parsers.ToArray());
        }

 
I don't claim to be a great programmer, I try to imitate one.

AnswerRe: Suicide note:How to find an expression inside random input?memberBreakableC29 Nov '09 - 4:26 
Seems I got it:
 
private static Rule InsideRandomSequence(Rule str)
{
 
Rule randomBRR = new Rule();
randomBRR.Parser = Prims.AnyChar;
randomBRR.ID = "RandomChar";
 
Rule randomCharOrString = new Rule();
randomCharOrString.Parser = Ops.Choice(str, randomBRR);
randomCharOrString.ID = "randomCharOrString";
 
Rule zeroOrMore = new Rule();
zeroOrMore.ID = "zeroOrMore";
zeroOrMore.Parser = Ops.ZeroOrMore(randomCharOrString);
 
//Console.Clear();
//Debugger debug = new Debugger(Console.Out);
//debug += str;
//debug += randomBRR;
//debug += randomCharOrString;
//debug += str;
//debug += zeroOrMore;
return zeroOrMore;
}

 
Should work for other parsers as well. Usage of Rule if just for debug. Will detect multiple instances, as well as instances within instances.
 
I don't claim to be a great programmer, I try to imitate one.

GeneralGood job!memberMubi | www.mrmubi.com24 Jun '09 - 9:46 
cool stuff!
 
Mubshir Raza Ali
^^^^^^^^^^^^^^^^^^^^
http://www.mrmubi.com

GeneralUpdated version availablememberPatrick Fischer10 Nov '08 - 23:17 
A newer version has been developed at http://code.wesay.org/Palaso/trunk/Palaso/Palaso/. It includes some minor changes and updates to take advantage of C# 2 (See also http://www.wesay.org/blogs/palaso/2007/09/30/10/)
QuestionWhy not use operator overlaoding?memberMember 345392422 Mar '08 - 8:39 
I don't know if there is some limitation that prevents this library from using operator overloading, but e.g. to represent
 
term ::= factor (('*' factor) | ('/' factor))*
 
why not use
 
term.Parser = factor >> (('*' >> factor) | ('/' >> factor)).Klenee();
 
instead of
 
term.Parser = Ops.Seq( factor, Ops.Klenee(
Ops.Seq('*',factor) | Ops.Seq('/',factor) ));
GeneralInterestingmemberAstaelan11 Nov '07 - 7:58 
I have recently become interested in writing a script engine for .NET conforming to similar, but not exact Lua-like syntax.
 
In doing so, I have found a grammar for Lua, and have been trying to implement it in Spart. I have started by modifying the basic variable declaration of Lua from using "local" to using "var", otherwise the remaining grammar has remained the same for this statement. So far I have tried to attach actions to various parsers within the statement, and the only action I can get to trigger is the "var" string parser action.
 
The test data I am using is a simple: var x = 123
 
The only action called from anything in the grammar seems to be for "var" as the value, and nothing else. I cannot get my identifierlist, or identifier indepenantly to trigger an action when parsed. As I understand it, I should be able to attach an action to any parser and when that parser is matched it will execute the attached actions. However, if I create an action for Identifier's and parse the above code, my action is not called and thus "x" is not printed for me.
 
It could be that my grammar is flawed, but it seems correct to me. I have never used spirit or spart before, and have only recently taken an interest in lex, yacc, and the descendant products from their influence. I would really appreciate an understanding as to why some actions are called and not others. My goal is to provide a stack implementation and have the actions work against a stack to perform their duties. In this simple example, I should be able to act upon each token, pushing operations and variables onto the stack to be processed by the stack API. This example should allow me to push a declaration operator, followed by a list of identifiers, followed by an assignment operator, followed by a list of expressions which may be further evaluated, but in this case would only push a numeric value. The stack containing these operations would allow me to execute an order of operations such that the identifiers would be associated in a dictionary as keys, to the values assigned (or null by default if the optional assignment is lacking).
 
Sorry for getting deep into this, but Spart looks like a viable option for my goal and I'd like to get it working before I digress to trying a combination of managed and unmanaged code using Spirit in MC++.
 
Thanks for your time, and thanks for your effort in porting a useful tool to the .NET world. Sure would have been nice to be able to implement all the operator overloads rather than falling back on static methods Smile | :)
GeneralBug with DifferenceParsermemberleclerc98 Apr '07 - 16:33 
Heres my program:
 
static void Main(string[] args)
{
string s = "lint int";
StringScanner sc = new StringScanner(s);
Parser intParser = Prims.Str("int");
Parser diff = Ops.Difference(Ops.Positive(Prims.Letter), intParser);
Parser all = Ops.Positive(diff | intParser | Prims.WhiteSpace);
intParser.Act += new Spart.Actions.ActionHandler(intParser_Act);
diff.Act += new Spart.Actions.ActionHandler(diff_Act);
all.Parse(sc);
}
 
static void diff_Act(object sender, Spart.Actions.ActionEventArgs args)
{
Console.WriteLine("Diff.");
}
 
static void intParser_Act(object sender, Spart.Actions.ActionEventArgs args)
{
Console.WriteLine("Int.");
}
 
It prints out:
 
Diff.
Int.
Int.
 
It should print out:
 
Diff.
Int.
 
This is fixed by editing DifferenceParser.cs and changing
// doing difference
scan.Seek(offset);
ParserMatch d = SecondParser.Parse(scan);
to
// doing difference
scan.Seek(offset);
ParserMatch d = SecondParser.ParseMain(scan);
GeneralRe: Bug with DifferenceParsermemberKack8 Sep '11 - 13:40 
The correct fix to the DifferenceParser.cs ParseMain function. Smile | :)
 
public override ParserMatch ParseMain(Spart.Scanners.IScanner scan)
{
            long offset = scan.Offset;
 
            ParserMatch m = FirstParser.ParseMain(scan);
            long goodOffset = scan.Offset;
 
            if (!m.Success)
            {
                scan.Seek(offset);
                return scan.NoMatch;
            }
 
            // doing difference
            scan.Seek(offset);
 
            // parse and call onaction if we where successful
            ParserMatch d = SecondParser.Parse(scan);
 
            if (d.Success)
            {
                scan.Seek(offset);
                return scan.NoMatch;
            }
            else if(m.Success)
            {
                // go back to our prev offset and call OnAction of first parser
                // we do this cause calling parse above will automaticlly call onaction if true.. but
                // if second parser is true then firstparser should have not called on action!
                FirstParser.OnAction(m);
            }
 
            // ok
            scan.Seek(goodOffset);
            return m;
}

GeneralCurrent Versionmemberb3rnd16 Mar '07 - 14:45 
Hello,
 
does anybody know how much of the spirit implementation is covered by spart ? are there any updates available ?
 
thanks
GeneralVisual BNF parser generatormemberAdamSlosarski13 Mar '07 - 18:10 
Look at this parser with testing of grammars: www.intralogic.eu
 
Testing your code against performance will keep you running with good scalability and maintenance for years.
NTime.exe - the free tool for real developers of high scalability applications!

Generalpb with Kleneememberpado19 Dec '06 - 22:57 
I have the following rules
 
//([aA-zZ0-9 ]+)
_opQt.Parser = Ops.Positive(Ops.Alternative(Prims.LetterOrDigit, ' '));

 
//CND_QT -> [OPQT] *(
_cnd_qt.Parser = Ops.Seq(
'[',
Ops.Seq(_opQt,
Ops.Seq(']',
Ops.Seq(Ops.Klenee(' '),
'('
))));
 

These rules are ok for parsing "[all] (" but I have an exception for "[all](" (no white space between ']' and '(').
The exception is "match resersed ?" and is throwed by the ParserMatch class.
 
What's wrong with my code ?
GeneralUsing Spart library to parse T-SQLmemberAlberto Venditti19 Nov '06 - 12:33 
Hi Jonathan.
This my post is just to inform you about my article titled "DBdoc: automated CHM documentation for a SQL Server 2000 database", where I used your wonderful Spart library.
 
Thank you for your work!
AV
Questionbuild errors?membertbhanson11 May '06 - 2:01 
hi,
 
this looks extremely interesting to me, so i tried to build it but get (after converting from Visual Studio 1.1 to 2.0 i guess):
Error 1 Source file 'D:\3rd_party\Spart\Spart\Debug\Debugger.cs' could not be opened ('The system cannot find the file specified. ') Spart
Error 2 Source file 'D:\3rd_party\Spart\Spart\Debug\DebugContext.cs' could not be opened ('The system cannot find the file specified. ') Spart
 
are these intentionally missing? is there a convention i should know about?
 
cheers,
 
Tim Hanson

AnswerRe: build errors?memberrbarras11 May '06 - 3:20 
hi
 
These are missing in the source.
 
I guess i deleted the reference to build it.
GeneralRe: build errors?membertbhanson11 May '06 - 3:33 
Thanks. after doing that, installing and referencing NUnit, and commenting out a few references to the Debug stuff, it builds for me too.
 
thanks again.

GeneralRe: build errors?memberphm3711 Mar '07 - 13:56 
I've done the same, but it would be nice to have the source for debugging? WTF | :WTF:
GeneralProblem with empty match in StringParser classmemberrbarras10 May '06 - 2:37 

public ParserMatch EmptyMatch
{
get
{
//return new ParserMatch(this, 0, 0);
return new ParserMatch(this,this.Offset,0); // modified
}
}
 
some problems with optional parser solved with this change Smile | :)
QuestionAnyone have the updated source?memberCurtis W12 Jan '06 - 9:05 
The source available on this site does not include Ops.Start(), any of the Spart.Debug.* stuff, and lots of other things!
 
Thanks!
GeneralI have imporvd your TST class, here is the codesussAndySoft27 Aug '05 - 5:28 
internal void PartialMatchSearch(
TstDictionaryEntry p,
string key,
int index,
char wildChar,
IList matches,
bool ignoreLength
)
{
if (p==null)
return;
char c=key[index];
if (c == wildChar || c < p.SplitChar)
PartialMatchSearch(p.LowChild,key,index,wildChar,matches,ignoreLength);

if (c == wildChar || c == p.SplitChar)
{
if (index < key.Length-1)
PartialMatchSearch(p.EqChild,key,index+1,wildChar,matches,ignoreLength);
else
{
if (p.IsKey)
matches.Add(new DictionaryEntry(p.Key,p.Value));
if ( ignoreLength && index == key.Length-1 )
{
// Traverse+add all possible sub-values
GetAllKeys(p.EqChild,matches);
}
}
}
if (c == wildChar || c > p.SplitChar)
PartialMatchSearch(p.HighChild, key,index,wildChar,matches,ignoreLength);
}
 
internal void GetAllKeys(
TstDictionaryEntry p,
IList matches )
{
if (p==null)
return;
if (p.IsKey)
matches.Add(new DictionaryEntry(p.Key,p.Value));
GetAllKeys(p.LowChild,matches);
GetAllKeys(p.EqChild,matches);
GetAllKeys(p.HighChild,matches);
}


 
Andysoft
GeneralGreat Toolmemberalam_pune22 Aug '05 - 11:37 
This is really a great tool. You get 5/5 from me.
 
But I think if you can add the functionality to specify the grammar in a saperate txt file like lex/yacc, it would be very useful. I am using your code but everytime there is a change in rule, I need to recompile. Is there any workaround.
 
Thanks in advance for your help.
GeneralRe: Great Toolmemberalam_pune23 Aug '05 - 11:18 
Hey,
 
I finally solved this.
 
I am keeping the rules in an xml file. I am generating the c# parser grammer in-memory assembly at runtime using Reflection and Compiler Services.
 
Let me know if anybody has a better idea?
QuestionRuntime parser generation?memberMicah Wedemeyer18 May '05 - 10:30 
Hi,
 
Is there any way to create a parser at runtime based on input? In other words, can I write a grammar file and feed that to Spart in order to create a parser?Confused | :confused:
GeneralPatch for RepetitionParser.csmemberLondonFroggy30 Dec '04 - 5:47 
As the new version disappeared from dotnetwiki.org, the version on this web site has a big bug on RepetitionParser.cs
 
Here's a corrected version for the ParserMatch method:
 
		public override ParserMatch ParseMain(IScanner scanner)
		{
			// save scanner state
			long offset = scanner.Offset;
 
			ParserMatch m = null;
			ParserMatch m_temp = scanner.EmptyMatch;
 
			// execution bound                                
			int count=0;
 
			// lower bound, minimum number of executions
			while(count < LowerBound && !scanner.AtEnd)
			{
				m_temp = Parser.Parse(scanner);
				// stop if not successful
				if (!m_temp.Success)
					break;
				else
				{
					if (m == null)
						m = m_temp;
					else
						m.Concat(m_temp);
				}
				// increment count and update full match
				++count;
			}
                
			if (count == LowerBound)
			{
				while(count < UpperBound && !scanner.AtEnd)
				{
					m_temp = Parser.Parse(scanner);
 
					// stop if not successful
					if (!m_temp.Success)
						break;
					else
					{
						if (m == null)
							m = m_temp;
						else
							m.Concat(m_temp);
					}
 
					// increment count
					++count;
				}
			}
			else
				m=scanner.NoMatch;
 
			if (m == null)
			{
				if (LowerBound == 0)
					m = scanner.EmptyMatch;
				else
					m = scanner.NoMatch;
			}
 
			// restoring parser failed, rewind scanner
			if (!m.Success)
				scanner.Seek(offset);
 
			return m;
		}
 
 
This is obviously only a quick correction. Jonathan assumed the count property would return the length of the match, which breaks when you combine operators like Ops.Klenee with subsequent Ops.Seq for example.
 
Enjoy!
 
Sebastien Lambla
http://blog.thetechnologist.net
GeneralError handlingmemberIvo Arabadjiev14 Nov '04 - 12:10 
Is there something like error handling in Spart? Or may be just which rule or parser fails.
 
Thanks, in advance.
GeneralRe: Error handlingmemberLondonFroggy30 Dec '04 - 5:44 
There's an easy way to do that... Just listen for the PostParse event and look at the Match object's Success property.
 
Sebastien Lambla a.k.a. TheTechnologist
http://blog.thetechnologist.net
GeneralwhitespacesussAngel Todorov9 Sep '04 - 0:37 
How is it possible to filter whitespace from the input text ? This is absolutely necessary for any kind of scanning / parsing. As far as I have seen, in the IFilter interface there only exist a method for converting * to lower case. In comparison , with JFlex you have much more flexibility regarding scanning. Otherwise, with Spart, you have to include whitespace parsing in the grammar itself, and then not associate actions for it, if you want it to be ignored, right ? Which is quite cumbersome provided you supply a grammar for a language with lots of ops and possible constructs
 
Regards,
Angel
QuestionMore parsing problems - are these bugs?memberbeep20 Apr '04 - 18:30 
Consider the following code, which is intended to parse a list of numbers.
 
Rule numList = new Rule();
numList.Parser = Comp.Seq(
	Nums.Real(),
	Comp.Star(Comp.Seq(Prims.Ch(','), Nums.Real())));
 
Debugger debug = new Debugger(Console.Out);
debug += numList;
numList.Parse("100"); // THROWS INDEX EXCEPTION
Console.WriteLine();
numList.Parse("88,d,88,9.090,"); // PARSES SUCCESSFULLY!
 
Am I doing this wrong, or is this a bug? I would expect that the first try at parsing would succeed, and the second would fail, but in fact the first throws an exception, and the second "succeeds".
 


AnswerRe: More parsing problems - are these bugs?memberJonathan de Halleux20 Apr '04 - 20:19 
There's seems to be an error with Comp.Star, so both problems are related. I'll will be looking at this soon.
 
Jonathan de Halleux - My Blog - www.dotnetwiki.org -
MbUnit - QuickGraph - NCollection

GeneralTrying to parse a number...memberbeep20 Apr '04 - 11:21 
Why does this not work?
 
integer.Parser = Comp.Seq(Prims.Digit, Comp.Star(Prims.Digit));
number.Parser = Comp.Seq(Comp.Opt(integer), Prims.Ch('.'), integer);
 
When I do "(.99+100)", it correctly finds the "number", but when I do "(5.99+100)", it will not find the number. What am I doing wrong?
 
Thanks in advance.
GeneralRe: Trying to parse a number...memberJonathan de Halleux20 Apr '04 - 11:51 
This may be a bug.
 
You can try to replace
Comp.Seq(Prims.Digit, Comp.Star(Prims.Digit))
by
Comp.Pos(Prims.Digit)

 
Jonathan de Halleux - My Blog - www.dotnetwiki.org -
MbUnit - QuickGraph - NCollection

QuestionHow to compute the expressionmemberPhilipos Sakellaropoulos15 Apr '04 - 6:21 
Thank you for this work. It is very useful.
The calculator sample shows the parsing but does not the compute the expression.
I enhanced the class with very little code to do this ..
Look the code below ( i use vb.net). It uses a NET stack object to push the numbers. When an arithmetic operation is parsed, it triggers an event to perform the operation. Since parsing operations are executed with the correct order in the stack, it works correctly!!
 

Imports Spart.Actions
Imports Spart.Scanners
Imports Spart.Parsers
Imports Spart.Parsers.NonTerminal
Imports System
Imports System.Collections
 
Public Class Calculator
Private group As Rule
Private term As Rule
Private factor As Rule
Private expression As Rule
Private integ As Rule
Private debug As Debugger
 
Public Sub New()
group = New Rule
group.ID = "Group"
term = New Rule
term.ID = "Term"
factor = New Rule
factor.ID = "Factor"
expression = New Rule
expression.ID = "Expression"
integ = New Rule
integ.ID = "Integer"
 
Dim add As Parser = Ops.Seq("+"c, term)
add.ID = "+,-"
AddHandler add.Act, AddressOf Me.SymbolCompute
Dim subt As Parser = Ops.Seq("-"c, term)
subt.ID = "+,-"
AddHandler subt.Act, AddressOf Me.SymbolCompute
Dim mult As Parser = Ops.Seq("*"c, factor)
mult.ID = "*,/"
AddHandler mult.Act, AddressOf Me.SymbolCompute
Dim div As Parser = Ops.Seq("/"c, factor)
div.ID = "*,/"
AddHandler div.Act, AddressOf Me.SymbolCompute
integ.Parser = Prims.Int
AddHandler integ.Act, AddressOf Me.NewNumber
 
group.Parser = Ops.Seq("("c, Ops.Seq(expression, ")"c))
factor.Parser = Ops.Alternative(group, integ)
term.Parser = Ops.Seq(factor, Ops.Klenee(Ops.Alternative(mult, div)))
AddHandler term.Act, AddressOf Me.ShowTerm
expression.Parser = Ops.Seq(term, Ops.Klenee(Ops.Alternative(add, subt)))
AddHandler expression.Act, AddressOf Me.ShowExpr
End Sub
 
Public Function Parse(ByVal s As String) As ParserMatch
Dim sc As StringScanner = New StringScanner(s)
Dim match As ParserMatch = expression.Parse(sc)
If Not sc.AtEnd Then
mParseErrorCol = CType(sc.Offset, Integer)
mParseError = String.Format("{0} expected at column {1}", Parser.Expected, sc.Offset)
End If
Return match
End Function
Private mParseError As String = ""
 
Public ReadOnly Property ParseError() As String
Get
Return mParseError
End Get
End Property
Private mParseErrorCol As Integer = 0
 
Public ReadOnly Property ParseErrorColumn() As Integer
Get
Return mParseErrorCol
End Get
End Property
Private scNumbers As Stack = New Stack
 
Public Sub NewNumber(ByVal sender As Object, ByVal args As ActionEventArgs)
Dim newnum As Integer = 0
Try
newnum = Integer.Parse(args.Value)
Catch
Console.Out.WriteLine("Invalid integer: {0}", args.Value)
End Try
scNumbers.Push(newnum)
End Sub
 
Public Sub SymbolCompute(ByVal sender As Object, ByVal args As ActionEventArgs)
Compute(sender, args)
End Sub
 
Public Sub ShowTerm(ByVal sender As Object, ByVal args As ActionEventArgs)
Console.Out.WriteLine("term: {0}", args.Value)
End Sub
 
Public Sub ShowExpr(ByVal sender As Object, ByVal args As ActionEventArgs)
Console.Out.WriteLine("expr: {0}", args.Value)
End Sub
 
Public Function Compute() As Integer
Return Compute(Me, Nothing)
End Function
 
Public Function Compute(ByVal sender As Object, ByVal args As ActionEventArgs) As Integer
Dim num1 As Integer
Dim num2 As Integer
Dim num As Integer = 0
If scNumbers.Count = 1 Then
num = CType(scNumbers.Pop, Integer)
Else
num2 = CType(scNumbers.Pop, Integer)
Dim op As Char = args.Value.Chars(0)
num1 = CType(scNumbers.Pop, Integer)
Select Case op
Case "+"c
num = num1 + num2
Case "-"c
num = num1 - num2
Case "*"c
num = num1 * num2
Case "/"c
num = num1 / num2
Case Else
Throw New ApplicationException("Invalid operation " + op.ToString)
End Select
Console.Out.WriteLine("{0}{3}{1}={2}", num1, num2, num, op)
scNumbers.Push(num)
End If
Return num
End Function
 
End Class
 

 
--- Remember yourself !! --------
GeneralVisitor and All things AbstractmemberBen Coding Monkey4 Jan '04 - 22:49 
I have never seen Spirit so I assume you have copied the design. I do think its interesting that there is an abstract syntax tree hierarchy for the grammar, however, when it comes to parsing the tokens nothing is really done with them when the parsing is successful. Hopefully I read the code correctly; two weeks off and monday morning its difficult sometimes.
 
I do like the oop style to parsing rather than the original way of one class. This always seems to me scalablity was a factor but the object approach of encapsulating a parse method away in its own class shows good design.
 
Top down compilers in my limited exposure to them generally build an internal representation of the parse tree, whereby pushing the concrete syntax up into the tree. The Abstract Syntax Tree then can be visited multiple times using something known as the visitor pattern. The visitor pattern structure provides hooks into the AST which can be used to traverse over the AST multiple times if requires. This allows implementation for semantic checks, optimisation and code generation.
 
This also allows for the calculation of static FOLLOW, FIRST and LOOKAHEAD sets for the Non terminals. Using this methods also means you can generate dynamic follow set allowing for follow set error recovery.
 
any thoughts
 
Ben
GeneralRe: Visitor and All things AbstractmemberJonathan de Halleux4 Jan '04 - 23:57 
Ben Coding Monkey wrote:
however, when it comes to parsing the tokens nothing is really done with them when the parsing is successful.
 
That's where the actions come into the game. When a token/parser has a succesfull match, it triggers it's action if any.
 
Ben Coding Monkey wrote:
The Abstract Syntax Tree then can be visited multiple times using something known as the visitor pattern.
 
Yes, I will be adding AST generation to Spart (in a undefinite future). It already took me quite some work to get a graph library running in C# (see QuickGraph) but it will be a good tool for building ASTs.
 
In Spirit, you handle AST generation by specifying which rule should produce a new token, or start a new branch, etc...
 
Ben Coding Monkey wrote:
This also allows for the calculation of static FOLLOW, FIRST and LOOKAHEAD sets for the Non terminals. Using this methods also means you can generate dynamic follow set allowing for follow set error recovery.
 
I'm lost. Got any reference on this ?
 
Jonathan de Halleux.

www.dotnetwiki.org

GeneralRe: Visitor and All things AbstractmemberBen Coding Monkey5 Jan '04 - 0:39 
Follow, First and Lookahead sets with respect to the grammar's non terminals so given our sample grammar.
 
E ::= T | T "x" E
T ::= F | F "+" T
F ::= D | "(" E ")"
D ::= 0 | 1 | ... | 9
 
The First sets are the first symbols that the non terminal can contain
 
E = First Set (T)
T = First Set (F)
F = First Set (D) Union [ "(" ]
D = [0,1,2,..,9]
 
The Follow set is the sets of symbols that can follow it. So in the right hand side of the non terminals what can follow it. Leaving us with the sets, I have ignored the empty word or termination symbol for clarity.
 
E = [ ")" ]
T = [ "x" ]
F = [ "+" ]
 
The lookahead sets are constructed of the First Set of a nonterminal and the follow of a nonterminal just constructing what could be expected from this nonterminal; I think.
 
These sets are static though. We have derived them from grammar analysis by looking at the grammar and evaluating them from the whole grammar. However, what if the sets where calculated on the fly ie what is immediate expected the immediate symbol the come. Could this then be used to recover from errors?
 
The answer is yes but its only an approximation. Errors recovery can perform three operation deletion, insertation or modification. Insertation is the easiest to explain. For example we have the Pascal variable declaration line
 
var i : integer;
 
however our code reads
 
var i integer;
 
The parsers has moved into the variable declaration method for parsing and has parsed the identified 'i' and is expecting the concrete syntax of the colon ':'. Its not there lets terminate throw millions of exceptions and die, or we could just put one in and continue.
 
Modification can also be used, however, tricky because this is where we are assuming something that could cause mistakes later, but here is an example.
 
var i . integer;
 
There is a full spot '.' when we expected a colon ':' so lets change it and continue. I am guessing you can see the problem with this, or the potential.
 
Deletion is the last resort because why would we want to remove anything. Well infact this is why its tricky. Basically we have constructed a set, immediate set of what can follow and thus the algorithm is to delete each token until a token is reach which matches one of the tokens in this set. There is a possibility of removing all tokens, however, an errors has already been reached so its wrong anyway. This is why sometimes you get shed loads of errors when you have made 1 or 2 mistakes because of the error recovery method, however, commercial compilers are generally bottom up.
 
So for our example grammar we calculate the immediate follow sets but this is for the right hand side symbols so even the terminals have follow sets:
 
E:
T = [ "x" ]
"x" = FirstSet (E)
T
F = [ "+" ]
"+" = FirstSet (T)
 
F
"(" = FirstSet (E)
E = ")"
 
Now we know these on accepting tokens a check can be perform to see if the next token is in the expecting set and action is perform otherwise. However, when the sub methods for non terminals are called the current immediate follow set must be passed down so that the immediate symbols of that set are not removed. So for example
 
E ::= T | T "x" E
parseExpression (Set followSet)
{
parseTerminal (followSet.Union ("x"));
if (Token.Multication == currentToken)
{
acceptToken (followSet.Union (Expression.FIRST));
parseExpression (followSet);
}
}
 
Sometimes however deletion error recovery should stop at specific tokens that are not contained in the immediate follow set, this is known as the stopping set and again is an additional set to the epilogue of the methods.
 
This method is known as Follow Set Error Recovery and was Constructed by Niklaus Wirth in the book "Algorithms + Data Structures = Programs".
 
Hope that is clear, lol.
GeneralRe: Visitor and All things AbstractmemberJonathan de Halleux5 Jan '04 - 1:42 
Ben Coding Monkey wrote:
Hope that is clear, lol.
 
I'll have a coffee before answering Smile | :)
 
Jonathan de Halleux.

www.dotnetwiki.org

GeneralRe: Visitor and All things AbstractmemberBen Coding Monkey5 Jan '04 - 1:51 
Jonathan de Halleux wrote:
I'll have a coffee before answering
 
Good Idea, I am going to finish eating my sandwich.
 
This kind of stuff is where multiple passes of the AST are required. In these passes further information is added to the tree. So decorating the AST is performed. The reason for this is the future pass may require some information etc. Semantic passing for example so that types can be distinguished.
 
Think happy thoughts
GeneralRe: Visitor and All things Abstractmemberadclay15 Mar '05 - 7:20 
An interseting alternative to the Follow Set Error Recovery is the Heuristics Based Error Recovery, based on the concepts of Synchronization Points and Weak Symbols. This method is used (for example) in Wirths Oberon compiler and is integrated in the compiler generator Coco/R.
 
Information on Heuristics Based Error Recovery and Coco/R for various platforms can be found at the university of Linz, just in case you'd like to have a look after the coffee Wink | ;)
 
PS -- A great and unorthodox approach to the parsing topic.
I am looking forward for more.
 

"Live long and prosper." - Spock
adclay

Generalv1.1 is outmemberJonathan de Halleux2 Jan '04 - 5:59 
You can get v1.1 at www.dotnetwiki.org: added numeric parsers, fixed missing files.

 
Jonathan de Halleux.

www.dotnetwiki.org

GeneralRe: v1.1 is outmemberinshua15 Jan '06 - 22:59 
hi, the page is neerly empty!!
 
asdf
GeneralZip file missing debug directorymemberandy brummer26 Dec '03 - 9:53 
I downloaded the latest version of the spart_src.zip file and the files in the debug directory are missing.
 
"Debug\DebugContext.cs"
"Debug\Debugger.cs"

GeneralRe: Zip file missing debug directorymemberJonathan de Halleux29 Dec '03 - 5:31 
Ooops, it looks like I've been doing too much cleaning... and deleted the Debug directory.
 
Right I cannot provider these files since there has been a lot of changes in the spart source and it is currently broken (I've implemented numeric parsers, symbol table, skipping, etc...).
 
Stay tuned Smile | :)
 
Jonathan de Halleux.

www.dotnetwiki.org

GeneralRe: Zip file missing debug directorymemberJonathan de Halleux2 Jan '04 - 6:00 
Hi, I've uploaded a new version at http://www.dotnetwiki.org[^] Smile | :)
 
Jonathan de Halleux.

www.dotnetwiki.org

GeneralRe: Zip file missing debug directorymemberGordo753 Nov '04 - 5:24 
Sorry, I don't seem to be able to find it on the site. Plus the codeproject source download still lacks the missing files.
 
Any ideas when this will be updated?
 
Cheers
Gordon
GeneralRe: Zip file missing debug directorymemberodsby26 Dec '04 - 6:25 
When we click on the link it says "Under construction.". Any idea as to when it might be available. Thanks
 
Surya
QuestionWhat about using an indexer?memberStephen Quattlebaum23 Dec '03 - 4:29 
Unfortunately I don't have time to implement and test this right now (job 1 + job 2 + thesis = aaaarrrrggggghhhh!!!), but here's a thought.
 
You can't overload [] in C# like you can in C++, but you can define an indexer that gives you the same syntax from the caller's point of view. The indexer can take an arbitrary object as the index and return an arbitrary object - so really the only difference from C++ in terms of signature is that the return values of all of your indexers must be the same (whereas in C++ they can differ as long as the index values are of different types). That shouldn't be a limitation for this idea, though.
 
The mechanics of an indexer *are* a little different, in that you can define separate 'get' and 'set' versions of the indexer that are called whenever the indexer is used as an r-value or l-value, respectively. One can think of the 'set' version as kind of a special-case, though, and of the 'get' version as being the direct analog of the C++ operator [], with the difference being that it can't return a variable by reference (since C# supports only a subset of the C++ reference semantics). However, as long as you don't try returning immutable objects or ValueType objects (like primitives and structs), the effect is the same, since returning a reference-type variable gets you reference semantics just as if you'd returned MyVar& in C++.
 
The question is, would it work to define an indexer on your Parser class that takes an ActionHandler and applies it? That would get you a little more spirit-like syntax, letting you attach action handlers inline instead of having to do them as separate += statements later.
 
For instance, you'd add something like this to your Parser class.
 
public virtual Parser this[ActionHandler act]
{
get
{
Act += act;
return this;
}
}
 
Then, a caller could do this, which looks much more like Spirit.
 
Parser parser = Prims.Ch('x') [ new ActionHandler(OnX) ];
 
Just a thought. Great work, btw! You have my 5. I've been using Spirit in my thesis work and have been pretty happy with it, and wished I could use it in some of my other work with .NET w/o having to write the code in managed C++ (uggh). I may end up throwing away an ANTLR grammar my .NET project uses and redoing it in Spart - that ANTLR grammar is the only Java dependency in an otherwise C++/C# project. The C# code that ANTLR generates is usable, but it's not pretty...it generates a slew of warnings so that I always have to modify the generated code by hand after changing the grammer.
 
So anyway, thanks!
AnswerRe: What about using an indexer?memberJonathan de Halleux23 Dec '03 - 4:40 
I'll try that. In the meantime there are still some rules to write to get Spart really functional:
 
- number parsing, uint, int, double
- comment_p, list_p (hence refactoring)
 


 
Jonathan de Halleux.

www.dotnetwiki.org

GeneralRe: What about using an indexer?memberStephen Quattlebaum23 Dec '03 - 9:48 
One thing I thought of but forgot to write before: when C# gets anonymous delegates in C# 2.0, you can do really cool stuff, similar to what you can do in C++ w/ a combination of spirit and boost::lambda. For instance:
 
IList myList = GetList();
Parser parser = Prims.Ch('x') [
new ActionHandler(sender, args) { myList.Add(args.Value); }
];
 
Note that there's no separate method declaration necessary - the content of the action is listed right in line. Also, the delegate has direct access to locals declared in the containing function. Anonymous delegate support will be especially useful for libraries like Spart.
GeneralRe: What about using an indexer?memberJonathan de Halleux23 Dec '03 - 22:18 
Yep, library writers are waiting for Generics (don't know why they don't call template ?).
 
Anyway, I plan to create a lot of predefined actors. For instance, you example can actually be done as follows
 
IList myList = GetList();
Parser parser = Prims.Ch('x');
parser.Act += Actions.Append(myList);

 
Jonathan de Halleux.

www.dotnetwiki.org

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Permalink | Advertise | Privacy | Mobile
Web02 | 2.6.130516.1 | Last Updated 23 Dec 2003
Article Copyright 2003 by Jonathan de Halleux
Everything else Copyright © CodeProject, 1999-2013
Terms of Use
Layout: fixed | fluid