Click here to Skip to main content
6,822,123 members and growing! (18,237 online)
Email Password   helpLost your password?
General Programming » Algorithms & Recipes » Parsers and Interpreters     Intermediate License: The Code Project Open License (CPOL)

a Tiny Parser Generator v1.2

By Herre Kuijpers

@TinyPG is a utility that makes it easier to write and try out your own parser/compiler.
C# (C#2.0, C#3.0), VB (VB7.x, VB8.0, VB9.0), Windows (Win2K, WinXP, Win2003, Vista), .NET (.NET2.0, .NET3.0, .NET3.5), Visual-Studio (VS2005, VS2008), Dev
Posted:1 Aug 2008
Updated:1 Sep 2008
Views:59,285
Bookmarked:228 times
Unedited contribution
printPrint   add Share
      Discuss Discuss   Broken Article?Report  
103 votes for this article.
Popularity: 9.82 Rating: 4.88 out of 5
3 votes, 2.9%
1

2
1 vote, 1.0%
3
5 votes, 4.9%
4
94 votes, 91.3%
5
TinyPG_v1.1.png

Introduction

@TinyPG stands for "a Tiny Parser Generator". This particular generator is an LL(1) recursive descent parser generator. This means that instead of generating a state machine out of a grammar like most compiler compilers, instead it will generate source code directly; basically generating a method for each non-terminal in the grammar. Terminals are expressed using .Net's powerful regular expressions. To help the programmer create .Net regular expressions a Regex tool is embedded in TinyPG. Grammars can be written using the extended BNF notation. TinyGP v1.2 now allows you to generate a scanner, parser and parsetree file in either C# or VB code (!). These can be compiled directly into your own projects. Additionally now it is possible to generate code for your own text highlighter which you can use directly into your own text editor project. A simple example is added at the end of this article.

In this article I will not go into depth about compiler theory. Basic understanding of compilers and grammars is required. For your reference I have included a list of terms used in this article explained on Wikipedia: grammar, BNF, terminal, non-terminal, LL(1), lookahead, recursive decent, concrete syntax tree (CST) , parse tree,

Nowadays, with the availability of numerous compiler compilers, it becomes hard to think of a new name for a compiler compiler. 'Yet Another Compiler Compiler' was taken so I decided to name this tiny tool after its many strengths:

  • a powerful tiny utility in which to define the grammar for the new compiler, allowing
  • syntax and semantics checking of the grammar
  • generating a tiny set of sources for the parser/scanner/parsetree (just 3 .cs or .vb files without any external dependencies)
  • while each generated file remains clearly human readably and debuggable with visual studio(!)
  • including an expression evaluation tool which
  • generates a traversable parse tree
  • option to include c# codeblocks inside the grammar, adding immediate functionality with just a few lines of code.
  • including a tiny regular expression tool
  • while trying to keep things as simple and tiny as possible.

Background

Since there are already a number of compiler compilers out there, you might wonder why write another one? The reasons for writing this utility are:

  • the fun factor, seriously, who doesn't want to write his/her own compiler compiler!? 8-)
  • the compiler compiler should be able to generate c# code
  • the utility should be free, allowing developers to use the generated code for whatever purpose (no restrictive licenses).
  • the generated code should be readable and relatively easy to debug if needed.
  • the generated code should be completely independent and not require any external libraries to run (e.g. libraries from the compiler compiler itself !)
  • source code available to allow developers to modify this utility as they like
  • the tool should be free
  • it should be possible to separate the semantics from the syntax (use subclassing), or combine them (inline codeblocks)

Using the tool

I will explain the usage of the tool by means of two tiny tutorials. In these tutorials we will start writing a tiny expression calculator. In the former we will define the grammar for the expression evaluator; this will allow us to parse expressions. In the latter tutorial we will add some codeblocks to implement the semantics of the expression evaluator; after this step we will be able to evaluate (=calculate) simple expressions.

Before we start the tutorial I want to explain a little about naming conventions I prefer to use. Right now there are no clear conventions for naming. Also this utility does not require you to use any specific convention. However for readability also in the generated code I propose the following:

  • For terminals use only upper case names. Also underscores are allowed, e.g. NUMBER, IDENTIFIER, WHITESPACE or BRACKET_OPEN.
  • For Non-terminals use Pascal casing. This corresponds to .Net standards. Since from each non-terminal a method is generated, the method will then also be Pascal cased. e.g. Start, Expression, MultiplyExpression

To write valid grammars you need to take the following rules into account:

  • Each production rule must be terminated with a ; character.
  • Start is a reserved word which indicates the starting symbol for the grammar
  • It is allowed to include comments using either // to comment a line, or /* ... */ to comment a section
  • When including codeblocks, the codeblock must be written between { ... }; Note that the closing bracket must be followed by the semicolon immediately (no whitespace) or it will result in errors.

Let's start the tutorial.

Simple expression calculator syntax

The goal of the expression evaluator will be to parse and evaluate simple numeric expressions consisting of (integer) numbers, +,-, * and / symbols. To make it a bit more fun we will also allow to use sub-expressions with the ( ) symbols. E.g. a valid expression could be: 4*(24/2-5)+14 which should evaluate to 42. This example is included in TinyPG, by opening the "simple expression1.tpg" (or the "simple expression1 vb.tpg" for the VB variant) file.

Terminals & regular expressions

Since we have decided what symbols to allow for in the calculator we can already start by defining the terminal symbols of our grammar using terminal production rules and regular expressions:

NUMBER -> @"[0-9]+"; 
PLUSMINUS -> @"(\+|-)"; 
MULTDIV -> @"\*|/"; 
BROPEN -> @"\("; 
BRCLOSE -> @"\)"; 

Terminals can be defined using .Net's Regex expression syntax (including the @ sign if it is so required). As you may have already guessed, the terminal definitions will be directly used within the Regex by the generated parser. Using .Net's Regex saves a lot of coding and keeps the scanner.cs very small and easily readable.

So indeed, this may be a good opportunity to brush up on your regular expression skills also. In order to play around with regular expressions I have included the Regex tool within the utility. Just click on the 'Regex tool'-tab and enter your regular expression. Any matches will be highlighted in the text immediately. This way you can test if your regular expression is matching the right symbols. At the end of this article I will include a number of often used regular expressions that may be very useful if you want to support those in your own language.

TinyPG evolvement cycle

The TinyPG does not have any reserved or predefined terminal symbols. E.g. some parser generators reserve the EOF terminal because it may be difficult to express. Regex can almost cope with any kind of terminal symbol, including an EOF symbol. An EOF symbol can be defined as follows:

EOF -> @"^$";

The ^ character indicates the regex will scan from the beginning of the text string/file, the $ indicates that the regex should scan until the end of the string/file. Because no characters where specified in between, this must be the end of the file (or text/string).

In order to be less restrictive about the formatting of the expression, we would like to allow whitespaces. However we do not want to check in the grammar for whitespaces. In fact we would like to have the scanner simply ignore whitespaces and continue scanning to the next symbol/token. This can be done by defining a terminal production rule for whitespace and prefixing it with the [Skip] attribute like so:

[Skip] WHITESPACE -> @"\s+";

[Skip] WHITESPACE -> @"\s+";

This indicates that any whitespace character will be skipped.

Non-terminals & production rules

Now that we have defined the terminals, let's continue to define the production rules for the non-terminals. TinyPG supports the extended BNF notation allowing the following symbols to be used in the production rules: *, +, ?, (, ), |, and a whitespace. These have the following meaning:

  • * - the symbol or sub-rule can occur 0 or more times
  • + - the symbol or sub-rule can occur 1 or more times
  • ? - the symbol or sub-rule can occur 0 or 1 time.
  • | - this defines a choice between 2 sub rules.
  • whitespace - the symbol or sub-rules must occur after each other
  • ( ... ) - allows definition of a sub-rule.

The grammar must start with a Start non-terminal. Hence the Start nonterminal is a reserved word. Let's define the grammar as follows:

Start     -> (AddExpr)? EOF;
AddExpr     -> MultExpr (PLUSMINUS MultExpr)*;
MultExpr     -> Atom (MULTDIV Atom)*;
Atom     -> NUMBER | BROPEN AddExpr BRCLOSE;

The Start production rule will check if there is an AddExpr (optional), and then expects an end of file (no other tokens are expected).
The AddExpr will only add or substract MultExpr expressions. The MultExpr will multiply or divide Atoms. By writing the grammar this way, the precedence of the * and / symbols over the + and - symbols is defined explicitly. The Atom for now can be either a number (integer) or another expression. So with only 4 simple production rules you can already parse complicated expressions like: (((5*8)/4+3*3-(7+2)) / 5)

Running the parser

Now press F6 to compile the grammar. The output pane should become visible displaying again the grammar as it is internally represented, but also displaying the 'First' symbols per non-terminal. The 'first' symbols are the symbols that the parser will make its decision on for which non-terminal or production rule to parse next. Also the generated c# code will be compiled internally and can be run. If all goes well the 'Compilation successful.' should display.

Press ctrl-e to open the Expression Evaluator pane. Now type in an expression for your grammar to evaluate, e.g. (((5*8)/4+3*3-(7+2)) / 5) and press F5 to run the parser. The expression will be evaluated. 'Parse was successful' should be displayed now. If the parse was successful the evaluator will continue to evaluate the resulting parse tree. Because we have not implemented any logic behind the production rules, you will get the following warning: 'Result: Could not interpret input; no semantics implemented.'

Simple expression calculator semantics

Adding codeblocks

Now that we have a working grammar we can start adding semantics to the production rules in terms of code blocks. Code blocks are snippets of c# (or VB) code that are almost directly inserted into the generated parser. Before it is inserted some variables are replaced first. Let's start with the first production rule:

Start -> (AddExpr)? EOF { return $AddExpr; };  

Notice the variable $AddExpr. $AddExpr corresponds to the value of the non-terminal AddExpr. During code generation, the $AddExpr is replaced by a .Net expression that will evaluate the AddExpr nonterminal. a $ variable is defined for each terminal and non-terminal in the production rule. E.g. in this example $EOF would also be a valid variable. Note that terminal and non-terminals always return values of the type object. You will need to do your own explicit casting if you want to do calculations. E.g. like in the following snippet:

AddExpr -> MultExpr (PLUSMINUS MultExpr)* 
    { 
        int Value = Convert.ToInt32($MultExpr); 
        int i = 1; 
        while ($MultExpr[i] != null) { 
            string sign = $PLUSMINUS[i-1].ToString(); 
            if (sign == "+") 
                Value += Convert.ToInt32($MultExpr[i++]); 
            else 
                Value -= Convert.ToInt32($MultExpr[i++]); 
        } 
        return Value; 
    }; 

Notice that in this production rule the term MultExpr is defined 2x. So to which value instance of MultExpr is the $MultExpr referring? Even more so, the latter MultExpr can be repeated endlessly. To refer to a specific instance of the MultExpr value, you can use indexers on the $ variable, e.g. $MultExpr[1] will refer to the second defined instance of the MultExpr of the input expression. So if we have for instance the expression '3+4*2-6' we will have 3 MultExpr non-terminals: 3, 4*2 (=8) and 6. So $MultExpr[1] will evaluate to the value 8. $MultExpr[3] however is not available and will therefore evaluate to null (= the .Net null value).

So this code evaluates the first MultExpr ($MultExpr is short for $MultExpr[0]) and assigns it to Value. We know that this one always exists accoording to the grammar. Then we loop through the following $MultExpr[i] and add or subtract the $MultExpr[i] to the Value until the $MultExpr[i] evaluates to null. In order to decide if a MultExpr should be added or subtracted we evaluate the $PLUSMINUS token. Hence now we can actually calculate additions and subtractions.

The same approach we can use for the MultExpr production rule. I will skip this in this article but you view the code if you open the 'simple expression2.tpg' file. Then last but not least there is the Atom production rule which can be defined as follows:

Atom -> NUMBER | BROPEN AddExpr BRCLOSE 
    { if ($NUMBER != null) 
            return $NUMBER; 
        else 
            return $AddExpr; 
    };

Because the Atom rule contains a choice between a NUMBER or a sub expression, we also need to check this choice in the code. Therefore we check if either of the sub rules is null. If not, we return this value.

Running the generated parser / compiler

Compile the grammar again by pressing F6. This should not result in any errors. Then type in the expression (((5*8)/4+3*3-(7+2)) / 5) and press F5. This time the expression should parse successfully and the outcome is calculated and should return 'Result: 2'.

congratulations, you have written your first TinyPG compiler!

Highlighting your expressions

To make things more interesting v1.2 allows you to add text highlighting to your grammar. Adding Text highlighting is done in two steps:

  • Add the TextHighlighter directive to the grammar
  • Add the Color attribute to terminals you want to have highlighted
When the TextHighlighter directive is added, @TinyPG will generate also a TextHighlighter.cs (or .vb) file. The generated TextHighlighter makes use of the generated parser/scanner to parse the input text and apply the color from the Color attribute to any terminal it recognizes. E.g. the directives and terminals of the simple expression evaluator could look like this:
TinyPG evolvement cycle
// By default the TextHighlighter is not generated. Set its Generate attribute to true
<% @TextHighlighter Generate="true" %>

// highlight numbers in red and symbols in blue
[Color(255, 0, 0)] NUMBER -> @"[0-9]+"; 
[Color(0, 0, 255)] PLUSMINUS -> @"(\+|-)"; 
[Color(0, 0, 255)] MULTDIV -> @"\*|/"; 
[Color(0, 0, 255)] BROPEN -> @"\("; 
[Color(0, 0, 255)] BRCLOSE -> @"\)"; 

When running expressions in the input pane, the expressions would be calculated and additionally highlighted in the input pane! This makes it extra simple to write your own text highlighters

Additional remarks

Note 1: it is not allowed to include terminal characters immediately in the production rules. This is allowed for some compiler compilers however I feel it does not add to the readability of the generated code. So each terminal must be defined explicitly as a regex expression. Non terminal production rules may only refer to teminals or other non-terminals in a LL(1) manner. This means that the parser will only be able to look ahead 1 token. When scanning and parsing an input token always corresponds to a terminal symbol (if it does not, you have a syntax error). So the parser will only need to look ahead 1 terminal to decide which production rule to choose.

E.g. the following grammar will result in errors because it is not LL(1):

// this is an LL(2) rule, you need to look ahead 2 symbols to determine which rule to choose 
Start -> PLUSMINUS NUMBER | PLUSMINUS Expression; 

The parser must choose the production (sub)rule based on 1 lookahead. Because both subrules are starting with the PLUSMINUS terminal, the parser cannot decide. TinyPG does not check grammars to be LL(1). It will simply generate the code. However on compilation of the code you will run into compilation errors. Luckely an LL(k) (k>1) rule can always be rewritten to LL(1) format, like so:

// the rule has been rewritten to LL(1), now only 1 symbol at a time is required to be looked at to make the decicion
Start -> PLUSMINUS ( NUMBER | Expression); 

By rewriting the rule like above, the production rule is LL(1) and can now be successfully generated into an LL(1) parser... or can it? The LL(k) problem is perhaps slightly more complicated. What if Expression is defined as:

Expression -> NUMBER | IDENTIFIER;

Again the parser will have the same problem, when it encounters a NUMBER token, should it choose the NUMBER rule of Start, or should it continue parsing an Expression? So here also the problem occurs. This time it is more difficult to solve the problem. In this case there is no easy solution but to rethink (partly rewrite) your grammar.

Note 2: TinyPG will not detect errors inside codeblocks itself. the .Net compiler will of course. This can lead to .Net compilation errors that are hard to track and map on the grammar codeblocks. It may be difficult to see what the issue is. the best way to debug this is to open the source code in visual studio and try compile it.

Note 3: It is also possible to separate the semantics from the syntax by not inserting codeblocks directly into the grammar. TinyPG will generate 3 source code files, the scanner, the parser and the parsetree. When parsing an input string successfully, the parser will return a filled parsetree. Normally TinyPG will insert codeblocks directly into the parsetree. The parsetree can then be evaluated separately. In this case we create a subclass of the parsetree and insert our own code there (the methods to implement can be overridden by the subclass). Then when calling the parser, you supply it with a new instance of your own parsetree. The parser will then fill this parsetree and return it again.

Of course an alternate manner is to simply evaluate the parsetree directly in the code, by traversing the tree nodes. However somehow I feel this option is less 'clean'.

Partial Context sensitive/ambiguous grammars

@TinyPG v1.2 now supports partial ambiguous grammars. Given the simple expression grammar, assume we would like to make a distinction between FACTORs and TERMs. The problem is, both FACTORs and TERMs are numbers and can be defined as:

        [Color(255, 0, 0)] FACTOR_NUMBER -> @"[0-9]+";    // mark factors in red
        [Color(0, 255, 0)] TERM_NUMBER -> @"[0-9]+";      // mark terms in green
    
This is typcically an ambiguous grammar because a number as input can match both symbols. Unless if you define your grammar for instance only to expect a term. e.g.
    Start -> TERM_NUMBER (MULTDIV) FACTOR_NUMBER;
    
The first input number is expected to be a TERM, the second number is expected to be a factor. So depending on the context (the rule the parser is parsing), the scanner will interpret a number as a TERM or as a FACTOR respectively. So the first number will be marked green, and the second in red.

Using the code

Once you have generated the Scanner, Parser, ParseTree and optionally the TextHighlighter classes and tested it with TinyPG, you obviously now want to use the code in your own project. This can be done be creating a new c# project with Visual Studio. Add the generated files to the project and compile, just to make sure there are no errors. To call the parser, use the following code:

#using TinyPG; // add the TinyPG namespace

...

// create the scanner to use
Scanner scanner = new Scanner(); 

// create the parser, and supply the scanner it should use
Parser parser = new Parser(scanner); 

//create a texthighligher (if one was generated) and attach the RichTextbox and parser and scanner.
TextHighlighter highlighter = new TextHighlighter(richTextbox, scanner, parser);

// define the input for the parser
string input = "... your expression here ...";

// parse the input. the result is a parse tree.
ParseTree tree = parser.Parse(input);

Notice that a ParseTree object is returned. The parse tree contains the structure of the input. If the syntax of the input is not correct the ParseTree contains errors. You can check for errors by investigating the ParseTree.Errors property.

Notice also that the TextHighlighter accepts a RichTextbox control. TextHighlighter will automatically start capturing its events analyzing its content and updating the content of the RichTextbox control.

If all is well, you can go ahead and evaluate the parse tree:

// evaluate the parse tree; do not pass any additional parameters
object result = ParseTree.Eval(null);

// write the result of the evaluation to the console:
Console.WriteLine("result: " + result.ToString());

Notice that the Eval(...) function returns a result of type object. During evaluation you are free to decide on what the return type will be. This will give you more freedom, however this also means you will have to cast types explicitly. To display the result cast it to a string using the result.ToString() method.

To conclude, this is all that is needed to build and implement your own grammar and parser. The generated code does not have any external dependencies nor are any additional libraries required in order to work with the code.

The tiny parser generator evolvement cycle

What I have always found intriguing about compiler compilers is that they are able to compile their own grammar and hence generate a compiler from that which in this case is a compiler compiler. Initially of course there is no compiler compiler available yet. So how to build one? In this section will explain the evolvement cycle as I have applied it for TinyPG as shown in the following graph:
TinyPG evolvement cycle Step 1: define the grammar as input text to be parsed.
Step 2: derive a concrete parse tree by parsing the input grammar
Step 3: transform the parse tree to an abstract parse tree: the grammar tree.
step 4: the grammar tree contains all information about the grammar stored as a tree.
step 4a: generate the grammar text from the grammar tree. This is a check to see if the input grammar corresponds to the grammar in the grammar tree. If they do not correspond, then most likely the transformation has gone wrong.
step4b: generate the parser c# source code
step 5: compile the sources into the actual parser.

next steps: take the generated grammar from step 4a and use it as input for the parser, continue the cycle in step 2.

To bootstrap the whole process, start with creating the abstract grammar tree manually (in code).

The most tricky parts are the Transformation and Generation processes. Once you have that going, the rest is relatively easy.

Using TinyPG Directives

Sometimes you want to be able to set some additional parameters for the tool so it knows how and where to generate the code, for instance specifying you want to generate C# or VB code. For this purpose I included the option to insert meta information directly into the grammar by means of directives. I was inspired by the way this is handled by aspx pages using the <% ... %> tags. I decided this would be handy and compact format which would be strict enough to allow only for some parameters to be specified. Also this will be easy to extend at a later stage, that is add more directives.

Notice that the syntax highlighting for codeblocks is now also implemented in v1.2. Codeblocks will be highlighted accoording to their respective Language setting of the @TinyPG directive

Currently the following directives are supported: @TinyPG, @Parser, @Scanner and @ParseTree and can be used as follows:

 // Namespace allows you to define a different namespace for the Scanner, 
//     Parser and ParseTree generated classes. By default this is set to "TinyPG"
// Language allows you to define the language to generate. Only C# (default) or VB are supported for now. 
// OutputPath allows you to define the absolute or relative outputpath for the
//     files. The relative path is relative to the grammar file. 
//     By default this is set to "./"
// Template path is relative to the TinyPG.exe file. By default this is set to 
//     "Templates"
<% @TinyPG Namespace="MyNamespace" Language="C#" OutputPath="MyGrammarPath" TemplatePath="MyParserTemplates" %>

// the Generate parameter specifies wether the file should be generated. By default this is set to True
<% @Parser Generate="True" %>
<% @ParserTree Generate="False" %>  // turn off generation of the the ParseTree.cs file
<% @Scanner Generate="True" %>
<% @TextHighlighter Generate="True" %> // indicates code for the TextHighlighter should be generated also

It is required that the directives are defined before the grammar implementation.

Some handy regular expressions.

Writing your own regular expressions not always easy. Specially not if you want to match tokens that are often used also in programming languages. I have summarized a few regular expressions here that can be very helpful.

// codeblock will match any text between { ... }; 
Regex codeblock = new Regex(@"\s*\{[^\}]*\}([^};][^}]*\})*;"); 

//eof will match the end of an input string only
Regex eof = new Regex(@"^$");

//whitespace will match any whitespace including tabs. This one is trivial, but often required.
Regex whitespace = new Regex(@"\s+");

//regex_string will match any text that is a .Net string. It also takes the ", "", @ and \ into account
Regex regex_string = new Regex(@"@?\""(\""\""|[^\""])*\""");

//commentline will match with a single line of text that starts with // and scan it until the end of the line
//very handy if you want to support commenting by your parser
Regex commentline = new Regex(@"//[^\n]*\n?"); 

//commentblock will match any text between /* ... */ . Very handy if you want to support commenting by your parser
Regex commentblock = new Regex(@"/\*[^*]*\*+(?:[^/*][^*]*\*+)*/", RegexOptions.Compiled); 

That's it for now for the interesting regular expressions I found on the web or wrote myself. If you have any interesting/complicated ones please drop me a line.

Points of interest

Apart from the parser generator functionality, the TinyPG utility also contains a number of additional components that may be interesting. The controls/feaures that are new in version 1.2 are made bold:

  • C# or VB code generation. I have seperated the generation code and templates inside the project and added support for VB.Net on top of C#. It is now possible to create grammars to directly generate either C# or VB code. Also inline C# or respective VB codeblocks are supported.
  • Partial support for context sensitive/ambigous grammars. Instead of evaluating all possible terminal symbols as possible input of the grammar, I have adapted to LookAhead function to check only look for the expected terminal symbols. So if for example 2 terminals are defined to match the same input (this results in an ambiguous grammar), the parser still know which terminal to look for, depending on the rule it is parsing.
  • Code highlighting. Once you have a parsetree, it is relatively easy to highlight both terminals and nonterminals. Because TinyPG v1.1 code highlighting becomes rather slow if the text becomes too large, I decided to make the code highlighting asynchronious in v1.2. This seems to be working rather well. However still when texts become too large (say over a 500 lines of code), highlighting may take a little time to finish. Additionally I have included code highlighting of C# and VB inline codeblocks (unfortunately no code completion yet).
  • Syntax highlighting.Because EBNF notation is a language on itself, why not have the syntax and semantics checked of the grammar as you type? Any syntactic or semantic errors found will be highlighted directly in the code by underline the text in red. Hovering over the error with the mouse will even reveal a tooltip showing what the error is, just as you are used to in Visual Studio. The TextMarker class is responsible for underlining the erroneous words. That class can be easily reused in your own projects as it has no dependencies. Add it to your project, wire it up with the RichtTextbox control and assign erroneous words to it. Those will then be automatically marked.
  • Context sensitive code completion. Depending on the section in which you are typing, code completion will appear with the relevant verbs to complete your typed word. Obviously this feature was also inspired by Visual Studio. This feature is implemented in the AutoComplete form which has no dependencies and can therefore also be reused. Just add it to your own project, wire it up with a RichTextbox control, add the keywords and you are set to go. Turning autocompletion on or off can be managed through the Enabled property of the AutoCompletion control.
  • TabControlEx. I included an extention on the standard .Net TabControl called TabControlEx. This one does render the tabs correctly when the control is turned upside down unlike its superclass. The only problem this one has right now is, that it does not render the tabs correctly when they are positioned on the left or right side.
  • The RegexControl is a fully functional drop in to test your regular expressions. This is not too exciting though, there are numerous of these tools out there.
  • DockExtender. I reused this code from another project I posted here on CodeProject. It will allow you to drag, drop and dock the panels on the main window.
  • HeaderLabel control. This is an extension on the label. The label is given a gradient background color based on the currently selected Windows Theme. It is also possible to activate or deactivate the label, giving it an active or inactive background color. To top it off I even added the little close 'x' on the label that will activate if you hover over it. Hence creating basically the caption header much like that as used on a form.
  • Parse tree. Once you are able to compile your grammar and parse your own expressions, the parse tree will become available. When clicking on nodes in the parse tree, it's corresponding tokens will be highlighted in the Expression Evaluator pane. This is a way to browse through the tree and see which tokens are parsed at what point in the parse tree. It may help you debug potential mistakes made in the grammar.

All All in all even though I named this the Tiny Parser Generator, this has become more than a tiny project. I have spent quite some effort in making this all work together nicely and now I feel this is worth sharing with the community. Although in a future release I would like to have additional functionility:

  • Option to specify the namespace for the generated classes.
  • Support for LL(k) (multiple look aheads), this makes it a bit easier to write grammars; even though LL(1) will produce nicer code.
  • Better code highlighting (will probably require partly rewriting the RichtTextBox control). This could be a separate project on itself. Any help from the community on this would be greatly appreciated!
  • Highlighting of codeblocks! this will of course require parsing c#. But hey, we have a parser generator available now! I just need to find the grammar for c#... anyone?>. This one is still high on my priority list but I am getting closer to implementing it.
  • Generate to different languages. Because this will most likely require some major reworking on some parts of the code I will stick with just c# for the time being.
  • Perhaps a nicer graphical display of the parse tree (e.g. something as is used in Antlr).
  • Display of a state machine like is done in Antlr. That's a nice feature and may also make the production rule more clear.
  • Better error handling and displaying. E.g. make it clickable to jump to the position where the error occurred.
  • Run the evaluator on a separate thread. If you insert faulty codeblocks (e.g. endless loops) the tool will hang currently. By running it in a separate thread, this can be controlled.

If you have any ideas for new features, comments or remarks please drop a note!

History

@TinyPG v1.0

Tiny Parser Generator Version 1.0 was released on 1st of August 2008. This version includes the basic implementation for generating simple top down parsers.

@TinyPG v1.1

Tiny Parser Generator Version 1.1 was released on 17th of August . Version 1.1 contains additional features, making the editor easier to use:

  • Text and syntax highlighting
  • Autocompletion
  • Improved /revised grammar for the EBNF language
  • Support for directives
  • Improved FIRST algorithm

@TinyPG v1.2

Tiny Parser Generator Version 1.2 was released on 1st of September. Version 1.2 contains additional features, making the editor easier to use:

  • Generation of C# and/or VB code (!)
  • Allows for context partial sensitive/ambiguous grammars
  • Codeblock highlighting for C# or VB code
  • Attributes now allow parameters
  • Color(R,G,B) attribute added
  • Generation of texthighlighter code
  • Asynchronious Text and syntax highlighting

Special thanks go out to William A. McKee for being actively involved with @TinyPG, inspiring me to improve the tool further and help me revise the EBNF grammar. I have implemented some of his ideas in @TinyPG v1.2, including support for (partly) ambiguous grammars.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Herre Kuijpers


Member
Currently Herre Kuijpers is employed at Capgemini Netherlands for over 10 years, where he developed skills with all kinds of technologies, methodologies and programming languages such as c#, ASP.Net, Silverlight, VC++, Javascript, SQL, UML, RUP, WCF. Currently he fulfills the role of software architect in various projects.

Herre Kuijpers is a very experienced software architect with deep knowledge of software design and development on the Microsoft .Net platform. He has a broad knowledge of Microsoft products and knows how these, in combination with custom software, can be optimally implemented in the often complex environment of the customer.
Occupation: Architect
Company: Capgemini
Location: Netherlands Netherlands

Other popular Algorithms & Recipes articles:

Article Top
You must Sign In to use this message board.
FAQ FAQ 
 
Noise Tolerance  Layout  Per page   
 Msgs 1 to 25 of 127 (Total in Forum: 127) (Refresh)FirstPrevNext
GeneralTiny PG with great man Pinmemberreza_arab20:35 2 Feb '10  
QuestionTiny PG Parsing Queries PinmemberManhar Goindi1:25 6 Jan '10  
AnswerRe: Tiny PG Parsing Queries PinmemberHerre Kuijpers4:24 6 Jan '10  
GeneralCompiling with project files. Pinmembercaptncraig11:20 5 Nov '09  
GeneralRe: Compiling with project files. PinmemberHerre Kuijpers22:47 5 Nov '09  
GeneralRe: Compiling with project files. Pinmembercaptncraig12:41 6 Nov '09  
GeneralRe: Compiling with project files. PinmemberHerre Kuijpers5:31 7 Nov '09  
GeneralExtra expressions in tree PinmemberLee Atkinson2:28 28 Oct '09  
GeneralRe: Extra expressions in tree PinmemberHerre Kuijpers6:27 29 Oct '09  
GeneralNice article Pinmemberel_marcondon7:35 21 Sep '09  
GeneralHow smart is your error reporting? PinmemberYumashin Alex9:25 19 Aug '09  
GeneralRe: How smart is your error reporting? PinmemberHerre Kuijpers10:08 19 Aug '09  
GeneralRe: How smart is your error reporting? PinmemberYumashin Alex0:36 20 Aug '09  
GeneralRe: How smart is your error reporting? PinmemberHerre Kuijpers6:34 20 Aug '09  
GeneralRe: How smart is your error reporting? PinmemberYumashin Alex9:40 20 Aug '09  
GeneralRe: How smart is your error reporting? PinmemberHerre Kuijpers0:50 22 Aug '09  
GeneralVery good job explaining and demonstrating PinmemberJasonShort10:15 4 Aug '09  
GeneralRe: Very good job explaining and demonstrating PinmemberHerre Kuijpers22:46 4 Aug '09  
NewsUsing TinyPG to parse Excel Formula PinmemberHerre Kuijpers7:02 6 Jul '09  
Generalapplying semantics Pinmemberbattlemodetwo5:50 24 Apr '09  
Generalruntime generation and text highlighting performance PinmemberVincentHarink9:38 6 Apr '09  
GeneralString expression Pinmemberegodefroy14:09 24 Mar '09  
GeneralUsing dots inside and outside strings PinmemberNiloPaim14:02 24 Feb '09  
GeneralRe: Using dots inside and outside strings PinmemberHerre Kuijpers3:43 25 Feb '09  
GeneralRe: Using dots inside and outside strings PinmemberNiloPaim13:24 25 Feb '09  

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads.

PermaLink | Privacy | Terms of Use
Last Updated: 1 Sep 2008
Editor:
Copyright 2008 by Herre Kuijpers
Everything else Copyright © CodeProject, 1999-2010
Web21 | Advertise on the Code Project