Click here to Skip to main content
Click here to Skip to main content
Add your own
alternative version

Parsing Expression Grammar Support for C# 3.0 Part 1 - PEG Lib and Parser Generator

, 7 Oct 2008
Introduction to the parsing method PEG with library and parser generator
PEG_GrammarExplorer.zip
PEG_GrammarExplorer
Documents
PEG_GrammarExplorer_fromCP-Dateien
PEG_GrammerExplorer.jpg
mssccprj.scc
PEG Explorer
mssccprj.scc
PEG Explorer.csproj.user
Properties
Settings.settings
vssver2.scc
vssver2.scc
PegBase
mssccprj.scc
Properties
vssver2.scc
vssver2.scc
PegSamples
BasicEncodingRules
input
CDURKR2URKR125195
DefiniteLengthForm
CDURKR2URKR125195
hello
TDAUTPTEUR0100011.tap3
vssver2.scc
hello
IndefiniteLengthForm
DefiniteLengthForm
TDAUTPTEUR0100011.tap3
TDAUTPTEUR0100011.tap3
TDAUTPTEUR0100011.tap3
TDAUTPTEUR0100011_withError.tap3
vssver2.scc
vssver2.scc
calc0_direct
input
vssver2.scc
vssver2.scc
calc0_tree
input
vssver2.scc
vssver2.scc
CSharp3
docu
input
vssver2.scc
vssver2.scc
C_KernighanRitchie2
input
vssver2.scc
vssver2.scc
EMail
input
vssver2.scc
vssver2.scc
Json
input
vssver2.scc
peg_template
vssver2.scc
vssver2.scc
mssccprj.scc
PEG Samples.csproj.user
PegGenerator
input
C#
TestCases
C#
.cs
vssver2.scc
vssver2.scc
vssver2.scc
vssver2.scc
Properties
vssver2.scc
python_2_5_2
input
adwords
awapi_python_samples_1.0.0
src
decoratorators_01
vssver2.scc
Problems
Sample PEG Console Parser
PEG Console Parser
input
Properties
Sample PEG Console Parser.csproj.user
vssver2.scc
PEG_GrammarExplorer_Submission.zip
PEG_GrammarExplorer_Submission
PEG_GrammerExplorer.jpg
PEG_GrammerExplorer.zip
<html xmlns="http://www.w3.org/1999/xhtml"><head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">


<title>
	CodeProject: Article HTML. Free source code and programming help
</title><link rel="stylesheet" type="text/css" href="PEG_GrammarExplorer_fromCP-Dateien/CodeProject.css"><!-- base href="http://www.codeproject.com/KB/recipes/" -->
</head><body>
    <!--  HTML for article "Parsing Expression Grammar Support for C# 3.0  Part 1 - PEG Lib and Parser Generator" by Martin.Holzherr\nURL: http://www.codeproject.com/KB/recipes/grammar_support_1.aspx
Copyright 2008 by Martin.Holzherr\nAll formatting, additions and  alterations Copyright © CodeProject, 1999-2008 -->
	
    <p><b>Please choose 'View Source' in your browser to view the HTML, or  
File | Save to save this file to your hard drive for editing.</b></p>
<hr class="Divider subdue">
    <div>
    <span id="ArticleContent">

<ul class="Download">
<li><a href="http://www.codeproject.com/KB/recipes/grammar_support_1/PEG_GrammarExplorer_Submission.zip">Download PEG Explorer with Source - 650 KB</a> </li>
</ul>

<h2><a name="Introduction0">Introduction</a></h2>
<p>This is the first part of a series of articles which cover the parsing technique <code>Parsing Expression Grammars</code>.
This part introduces a support library and a parser generator for C# 3.0 .
The support library consists of the classes  <code>PegCharParser</code> and <code>PegByteParser</code> which 
are for parsing text and binary sources and which support user defined error handling, 
direct evaluation during parsing,  parse tree generation and abstract syntax tree generation. 
Using these base classes results in fast, easy to understand and to extend Parsers, 
which are well integrated into the hosting C# program. 

</p><p>The underlying parsing methodology — called <code>Parsing Expression Grammar</code> <a href="#References">[1][2][3]</a> — is
relatively new (first described 2004), but has already many implementations. 

<code>Parsing Expressions Grammars</code> (PEG) can be easily implemented in any programming language,
but fit especially well into languages with a rich expression syntax like functional languages
and functionally enhanced  imperative languages  (like C# 3.0)
because PEG concepts have a close relationship to mutually recursive function calls, 
short-circuit boolean expressions and in-place defined functions (lambdas).

</p><p>A new trend in parsing is integration of parsers into a host language so that the
semantic gap between grammar notation  and implementation in the host language is as small as possible
(<code>Perl 6</code> and <code>boost::sprit</code> are forerunners of this trend).
Parsing Expression Grammars are escpecially well suited when striving for this goal.
Earlier Grammars were not so easy to implement, so that one grammar rule could result
in dozens of code lines. In some parsing strategies, the relationship between grammar rule and 
implementation code was even lost.
This is the reason, that until recently generators were used to build parsers.

</p><p>This article shows how the C# 3.0 lambda facility can be used to implement a support 
library for Parsing Expression Grammars, which makes parsing with the PEG technique easy.
When using this library, a PEG grammar is mapped to a C# grammar class which 
inherits basic functionality from a PEG base class and
each PEG grammar rule is mapped to a method in the C# grammar class. 
Parsers implemented with this libary should be fast 
(provided the C# compiler inlines methods whenever possible), 
easy to understand and to extend. 
Error Diagnosis, generation of a parse tree and addition of semantic actions 
are also supported by this library. 
</p><p>The most striking property of PEG and especially this library is the small footprint and 
the lack of any administrative overhead. 
</p><p>The main emphasis of this article is on explaining the PEG framework and on studying 
concrete application samples. One of the sample applications is a PEG parser generator,
which generates C# source code. The PEG parser generator is the only sample parser which has
been written manually, all other sample parsers were generated by the parser generator.
</p><h2>Contents</h2><ul>
<li><a href="#Introduction0">Introduction</a></li>
<li><a href="#Parsing%20Expression%20Grammar%20Tutorial">Parsing Expression Grammar Tutorial</a>
<ul>
          <li><a href="#Parsing%20Expression%20Grammars%20Basics">Parsing Expression Grammars Basics</a> </li>

          <li><a href="#Parsing%20Expression%20Grammars%20particularities%20and%20idioms">Parsing Expression Grammars partiularities and idioms</a> </li>
          <li><a href="#Integrating%20semantic%20actions%20into%20a%20PEG%20framework">Integrating semantic actions into a PEG framework</a> </li>
          <li><a href="#Parsing%20Expression%20Grammars%20exposed">Parsing Expression Grammars exposed</a> </li>
      </ul>
</li>
<li><a href="#Parsing%20Expression%20Grammar%20Implementation">Parsing Expression Grammar Implementations</a>

<ul>
          <li><a href="#General%20implementation%20strategy%20for%20PEG">General implementation strategy for PEG</a> </li>
          <li><a href="#Parsing%20Expression%20Grammars%20mapped%20to%20C#1.0">Parsing Expression Grammars mapped to C#1.0</a> </li>
          <li><a href="#Parsing%20Expression%20Grammars%20mapped%20to%20C#3.0">Parsing Expression Grammars mapped to C#3.0</a> </li>
      </ul>
</li>
<li><a href="#Parsing%20Expression%20Grammar%20Examples">Parsing Expression Grammar Examples</a>

<ul>
          <li><a href="#Json%20Checker%20%28Recognize%20only%29">Json Checker (Recognize only)</a> </li>
          <li><a href="#Json%20Tree%20%20%28Build%20Tree%29">Json Tree  (Build Tree)</a> </li>
          <li><a href="#Basic%20Encode%20Rules%20%28Direct%20Evaluation%20+%20Build%20Tree%29">Basic Encode Rules (Direct Evaluation + Build Tree)</a></li>
          <li><a href="#Scientific%20Calculator%20%20%28Build%20Tree%20+%20Evaluate%20Tree%29">Scientific Calculator  (Build Tree + Evaluate Tree)</a> </li>
</ul>

</li>
<li><a href="#PEG%20Parser%20Generator">PEG Parser Generator</a>
<ul>
     <li><a href="#A%20PEG%20Generator%20implemented%20with%20PEG">A PEG Generator implemented with PEG</a></li>
     <li><a href="#PEG%20Parser%20Generator%20Grammar">PEG Parser Generator Grammar</a></li>
    <li><a href="#The%20PEG%20Parser%20Generator%27s%20handling%20of%20semantic%20blocks">The PEG Parser Generator's handling of semantic blocks</a></li>
</ul>
</li>
<li><a href="#Parsing%20Expression%20Grammars%20in%20perspective">Parsing Expression Grammars in perspective</a>

<ul>
          <li><a href="#Parsing%20Expression%20Grammars%20Tuning">Parsing Expression Grammars Tuning</a> </li>
          <li><a href="#Comparison%20of%20PEG%20parsing%20with%20other%20parsing%20techniques">Comparison of PEG parsing with other parsing techniques</a> </li>
          <li><a href="#Translating%20LR%20grammars%20to%20PEG%20grammars">Translating LR grammars to PEG grammars</a> </li>

</ul>
</li>
<li><a href="#Future%20Developments">Future Developments</a></li>
</ul>
<h2><a name="Parsing Expression Grammar Tutorial">Parsing Expression Grammar Tutorial</a></h2>
<p>Parsing Expression Grammars are a kind of executable grammars. Execution of
a PEG grammar means, that grammar patterns matching the input string advance
the current input position accordingly.
Mismatches are handled by going back to
a previous input string position where parsing eventually continues with an alternative. 
The following subchapters explain PEGs in detail and introduce the basic PEG constructs,
which have been extended by the author in order to support error diagnosis, direct evaluation and
tree generation.
</p><h3><a name="Parsing Expression Grammars Basics">Parsing Expression Grammars Basics</a></h3>
<p>The following PEG grammar rule 
</p><pre>EnclosedDigits: [0-9]+ / '(' EnclosedDigits ')' ;</pre>
<p>introduces a so called Nonterminal <strong>EnclosedDigits</strong> and a right hand side consisting
of two alternatives.
</p><p>The first alternative (<small><strong>[0-9]+</strong></small>) describes a sequence of digits, 
the second (<small><strong> '(' EnclosedDigits ')'</strong></small>) something enclosed in parentheses. 
Executing<strong> EnclosedDigits</strong> with the string <strong>((123))+5</strong> as input would result in a match 
and move the input position to just before +5. 
</p><p>This sample also shows the  potential for recursive definitions, 
since <strong>EnclosedDigits</strong> uses  itself as soon as it recognizes an opening parentheses.
The following table shows the outcome of applying the above grammar for some other input strings.
The <strong>|</strong> character is an artifical character which visualizes the input position 
before and after the match.


<table class="ArticleTable">
<thead>
	<tr>
		<td>Input</td>
		<td>Match Position</td>
		<td>Match Result</td>
	</tr>
	</thead>
	<tbody><tr>
<td><strong>|</strong>((123))+5</td>

<td>((123))<strong>|</strong>+5</td>
<td>true</td>
</tr>
<tr>
<td><strong>|</strong>123</td>
<td>123<strong>|</strong></td>
<td>true</td>
</tr>

<tr>
<td><strong>|</strong>5+123</td>
<td><strong>|</strong>5+123</td>
<td>false</td>
</tr>
<tr>
<td><strong>|</strong>((1)]</td>
<td><strong>|</strong>((1)]</td>

<td>false</td>
</tr>
</tbody></table>
</p><p>For people familiar with regular expressions, it may help to think of a parsing expression 
grammar as a generalized regular expression which always matches the beginning of 
an input string (regexp prefixed with <strong>^</strong>). Whereas
a regular expression consists of a single expression, a PEG consists of set of rules; 
each rule can use other rules to help in parsing.
The starting rule matches the whole input and uses the other rules to match subparts of the input. 
During parsing one has always a current input position and the input string starting 
at this position must match against the rest of the PEG grammar. 
Like regular expressions PEG supports the postfix operators <strong>* + ? </strong>, 
the dot <strong>.</strong> and character sets enclosed in <strong>[]</strong>. 

</p><p>Unique to PEG are the prefix operators <strong>&amp;</strong> (peek) and <strong>!</strong> (not), 
which are used to look ahead without consuming input. 
Alternatives in a PEG are not separated by <strong>|</strong> but by <strong>/</strong> to indicate 
that alternatives are strictly tried in sequential order.
What makes PEG grammars powerful and at the same time a potential memory hogg is 
unlimited backtracking, meaning that  the input position can be set back to any of
the previously visited input positions in case an alternative fails.
A good and detailed explanation of PEG can be found in the wikipedia <a href="#References">[2]</a>. 
The following table gives an overview of the PEG constructs 
(and some homegrown extensions) which are
supported by the library class described in this article.
The following terminology is used

<table class="ArticleTable" width="100%">
    <thead>
	<tr>
		<td>Notion</td>
		<td>Meaning</td>
	</tr>
	</thead>
	<tbody><tr>
	<td>Nonterminal</td>
	<td>Name of a grammar rule. In PEG, there must be exactly one 
		grammar rule having a nonterminal on the left hand
		side. The right hand side of the grammar rule
		provides the definition  of the grammar rule.
		A nonterminal on the right hand side of a grammar rule
		must reference an existing grammar rule definition.;</td>

	</tr>
	<tr>
		<td>Input string</td>
		<td>string which is parsed.</td>
	</tr>
	<tr>
		<td>Input position</td>

		<td>Indicates the next input character to be read.</td>
	</tr>
	<tr>
		<td>Match</td>
		<td>A grammar element can match a stretch of the input.
		The match starts at the current input position.</td>
	</tr>
	<tr>

		<td>Success/<br>Failure	</td>
		<td>Possible outcome of matching a PEG element against the input</td>
	</tr>
	<tr>
		<td>e, e1, e2</td>
		<td>e, e1 and e2 stand each for arbitrary PEG expressions.</td>

	</tr>
</tbody></table>
</p><p>
The extended PEG constructs supported by this library are listed in the following table
<br> (<strong>|</strong>indicates the input position, italics like in <em>name</em> indicate a placeholder):
<table class="ArticleTable" width="100%">
<thead>
	<tr>
		<td>PEG element</td>

		<td>Notation</td>
		<td>Meaning</td>
	</tr>
	</thead>
	<tbody><tr>
		<td>CodePoint</td>
		<td><table>
			<tbody><tr><td>#32</td><td>(decimal)</td></tr>

			<tr><td>#x3A0</td><td>(hex)</td></tr>
			<tr><td>#b111</td><td>(binary)</td></tr>
                    </tbody></table>
		</td><td>Match input against the specified unicode character.
		      <table>
			<tbody><tr>
				<td bgcolor="beige">PEG</td>

				<td bgcolor="palegreen">Success</td>
				<td bgcolor="pink">Failure</td></tr>
			<tr bgcolor="palegreen">
				<td bgcolor="beige">#x25</td> 
				<td>%<strong>|</strong>1</td>
				<td bgcolor="pink"><strong>|</strong>1%</td>

			</tr>
			</tbody></table>
			</td>
	</tr>
	<tr>
<td>Literal</td>
<td>'<em>literal</em>'</td>
		<td>Match input against quoted string.
		<table>

			<tbody><tr>
				<td bgcolor="beige">PEG</td>
				<td bgcolor="palegreen">Success</td>
				<td bgcolor="pink">Failure</td></tr>
			<tr bgcolor="palegreen">
				<td bgcolor="beige">'for'</td> 
				<td>for<strong>|</strong>tran</td>

				<td bgcolor="pink"><strong>|</strong>afordable</td>
			</tr>
			</tbody></table>
			 Escapes take the same form as in the "C" syntax family.</td>
	</tr>
<tr>
<td>CaseInsensitive Literal</td>
<td>'<em>literal</em>'\i</td>

<td>Same as for Literal but compares case insensitive. \i must follow a Literal
	<table>
			<tbody><tr>
				<td bgcolor="beige">PEG</td>
				<td bgcolor="palegreen">Success</td>
				<td bgcolor="pink">Failure</td></tr>
			<tr bgcolor="palegreen">

				<td bgcolor="beige">'FOR'\i</td> 
				<td>FoR<strong>|</strong>TraN</td>
				<td bgcolor="pink"><strong>|</strong>affordable</td>
			</tr>
			</tbody></table>
</td>
</tr>

<tr>
<td>CharacterSet</td>
<td>[<em>chars</em>]</td>
<td>Same meaning as in regular expressions.
 Supported are ranges as in [A-Za-z0-9], single characters and escapes sequences.</td>
</tr>
<tr>
<td>Any</td>
<td><strong>.</strong></td>
<td>increment the input position except when being at the end of the input.
<table width="300">

			<tbody><tr>
				<td bgcolor="beige">PEG</td>
				<td bgcolor="palegreen">Success</td>
				<td bgcolor="pink">Failure</td></tr>
			<tr bgcolor="palegreen">
				<td bgcolor="beige">'this is the end' <strong>.</strong></td> 
				<td bgcolor="palegreen">this is the end!<strong>|</strong></td>

				<td bgcolor="pink"><strong>|</strong>this is the end</td>
			</tr>
</tbody></table>
</td>
</tr>
<tr>
<td>BITS</td>
<td>BITS&lt;<em>bitNo</em>&gt;
<br><br>BITS&lt;<em>low</em>-<em>high</em>&gt;</td>

<td>Interprets the Bit <em>bitNo</em>/Bitsequence [<em>low</em>-<em>high</em>] of the current input byte as integer
which must match is used as input for the PegElement.
<table>
			<tbody><tr>
				<td bgcolor="beige">PEG</td>
				<td bgcolor="palegreen">Success</td>
				<td bgcolor="pink">Failure</td></tr>

			<tr bgcolor="palegreen">
				<td bgcolor="beige">&amp;BITS<br>&lt;7-8,#b11&gt;</td> 
				<td bgcolor="palegreen"><strong>|</strong>11010101</td>
				<td bgcolor="pink"><strong>|</strong>01010101</td>
			</tr>
</tbody></table></td>
</tr>

<tr><td>Sequence</td>
<td><em>e1  e2</em></td>
<td>Match input against e1 and then -in case of
			success- against e2.
			<table>
			<tbody><tr>
				<td bgcolor="beige">PEG</td>
				<td bgcolor="palegreen">Success</td>
				<td bgcolor="pink">Failure</td></tr>

			<tr bgcolor="palegreen">
				<td bgcolor="beige">'#'[0-9]</td> 
				<td>#5<strong>|</strong></td>
				<td bgcolor="pink"><strong>|</strong>#A</td>
			</tr>
			</tbody></table>

</td></tr>
<tr>
<td>Sequentially executed alternatives</td>
<td><em>e1</em> / <em>e2</em></td>
<td>Match input against e1 and then - in case of failure - against e2.
	<table>
			<tbody><tr>
				<td bgcolor="beige">PEG</td>
				<td bgcolor="palegreen">Success</td>

				<td bgcolor="pink">Failure</td></tr>
			<tr bgcolor="palegreen">
				<td bgcolor="beige">'&lt;='/'&lt;'</td> 
				<td>&lt;<strong>|</strong>5</td>
				<td bgcolor="pink"><strong>|</strong>&gt;5</td>

			</tr>
         </tbody></table>
</td>
</tr>
<tr>
<td>Greedy Option </td>
<td><em>e</em>?</td>
<td>Try to match input against e.
<table>
			<tbody><tr>
				<td bgcolor="beige">PEG</td>

				<td bgcolor="palegreen">Success</td>
				<td bgcolor="palegreen">Success</td></tr>
			<tr bgcolor="palegreen">
				<td bgcolor="beige">'-'?</td> 
				<td bgcolor="palegreen">-<strong>|</strong>42</td>
				<td bgcolor="palegreen"><strong>|</strong>+42</td>

			</tr>
			</tbody></table>
</td>
</tr>
<tr>
<td>Greedy repeat zero or more occurrences </td>
<td><em>e</em>*</td>
<td>Match input repeated against e until the match fails.
<table>
			<tbody><tr>
				<td bgcolor="beige">PEG</td>

				<td bgcolor="palegreen">Success</td>
				<td bgcolor="palegreen">Success</td></tr>
			<tr bgcolor="palegreen">
				<td bgcolor="beige">[0-9]*</td> 
				<td bgcolor="palegreen">42<strong>|</strong>b</td>
				<td bgcolor="palegreen"><strong>|</strong>-42</td>

			</tr>
			</tbody></table>
</td>
</tr>
<tr>
<td>Greedy repeat one or more occurrences</td>
<td><em>e</em>+</td>
<td>Shorthand for  e e*
<table>
			<tbody><tr>
				<td bgcolor="beige">PEG</td>

				<td bgcolor="palegreen">Success</td>
				<td bgcolor="pink">Failure</td></tr>
			<tr bgcolor="palegreen">
				<td bgcolor="beige">[0-9]*</td> 
				<td bgcolor="palegreen">42<strong>|</strong>b</td>
				<td bgcolor="pink"><strong>|</strong>-42</td>

			</tr>
			</tbody></table>
</td>
</tr>
<tr>
<td>Greedy repeat 
between minimum 
and maximum 
occurrences </td>
<td>
e{<em>min</em>}<br>
e{<em>min</em>,<em>max</em>}<br>

e{,<em>max</em>}<br>
e{<em>min</em>,}
</td>
<td>Match input at least <em>min</em> times but not more than <em>max</em> times against e.
<table>
			<tbody><tr>

				<td bgcolor="beige">PEG</td>
				<td bgcolor="palegreen">Success</td>
				<td bgcolor="pink">Failure</td></tr>
			<tr bgcolor="palegreen">
				<td bgcolor="beige">('.'[0-9]*){2,3}</td> 
				<td bgcolor="palegreen">.12.36.42<strong>|</strong>.18b</td>

				<td bgcolor="pink"><strong>|</strong>.42b</td>
			</tr>
			</tbody></table>
</td>
</tr>
<tr>
<td>Peek</td>
<td>&amp;<em>e</em></td>
<td>Match e without changing input position.
    <table>

			<tbody><tr>
				<td bgcolor="beige">PEG</td>
				<td bgcolor="palegreen">Success</td>
				<td bgcolor="pink">Failure</td></tr>
			<tr bgcolor="palegreen">
				<td bgcolor="beige">&amp;'42'</td> 
				<td bgcolor="palegreen"><strong>|</strong>42</td>

				<td bgcolor="pink"><strong>|</strong>-42</td>
			</tr>
			</tbody></table>
</td>
</tr>
<tr>
<td>Not</td>
<td>!<em>e</em></td>
<td>Like <code>Peek</code> but Success&lt;-&gt;Failure

<table>
			<tbody><tr>
				<td bgcolor="beige">PEG</td>
				<td bgcolor="palegreen">Success</td>
				<td bgcolor="pink">Failure</td></tr>
			<tr bgcolor="palegreen">
				<td bgcolor="beige">!'42'</td> 
				<td bgcolor="palegreen"><strong>|</strong>-42</td>

				<td bgcolor="pink"><strong>|</strong>42</td>
			</tr>
			</tbody></table>
</td>
</tr>
<tr>
<td>FATAL</td>
<td>FATAL<br>&lt;"<em>message</em>"&gt;</td>

<td>Prints the message and error location to the error stream and
	quits the parsing process afterwards (throws an exception).</td>
</tr>
<tr>
<td>WARNING</td>
<td>WARNING<br>&lt;"<em>message</em>"&gt;</td>
<td>WARNING&lt;"<em>message</em>"&gt; Prints the message and  location to the error stream.
					 <br>Success:&lt;always&gt;</td>

</tr>
<tr>
<td>Mandatory</td>
<td>@<em>e</em></td>
<td>Same as (<em>e</em>/FATAL&lt;"e expected"&gt; )</td>
</tr>
<tr>
<td>Tree Node</td>

<td>^^<em>e</em></td>
<td>if <em>e</em> is matched, a tree node will be added to the parse tree.
	This tree node will hold the starting and ending match positions for <em>e</em></td>
</tr>
<tr>
<td>Ast Node</td>
<td>^<em>e</em></td>
<td>like ^^<em>e</em>, but node will be replaced by the child node if there is only one child </td>

</tr>
<tr>
<td>Rule</td>
<td><em>N</em>: <em>e</em>;</td>
<td><em>N</em> is the nonterminal; <em>e</em> the right hand side which is terminated by a semicolon.</td>
</tr>

<tr>
<td>Rule with id</td>
<td>[<em>id</em>]<em>N</em>: <em>e</em>;</td>
<td><em>id</em> must be a positive integer,<br> e.g. [1] Int:[0-9]+;<br>

  The id will be assigned to the the tree/ast node id.
</td>
<td></td>
</tr>
<tr>
<td>Tree building rule</td>
<td>[<em>id</em>]&nbsp;^^<em>N</em>: <em>e</em>;</td>

<td><em>N</em> will be allocated as tree node having the id &lt;id&gt;</td>
</tr>
<tr>
<td>Ast  building rule</td>
<td>[<em>id</em>]^<em>N</em>: <em>e</em>;</td>

<td><em>N</em> will be allocated as tree node and is eventually replaced by a child if
				  the node for <em>N</em> has only one child which has no siblings.
</td>
</tr>
<tr>
<td>Parametrized Rule</td>
<td>N&lt;<em>peg1</em>,<br><em>peg2</em>,..&gt;: <em>e</em>;</td>

<td><em>N</em> takes the PEG epressions <em>peg1</em>, <em>peg2</em> ... as parameter. This parameters
cant then be used in <em>e</em>.</td>
</tr>
<tr>
<td>Into variable</td>
<td><em>e</em>:<em>variableName</em></td>

<td>Set the host language variable (a <code>string</code>, <code>byte[]</code>, <code>int</code>, <code>double</code> or <code>PegBegEnd</code>) to the matched input stretch.
			The language variable must be declared either in the semantic block of the corresponding rule
			or in the semantic block of the grammar (see below).</td>
</tr>
<tr>
<td>Bits Into variable</td>
<td>BITS&lt;<em>bitNo,<br>:<em>variable</em>&gt;
<br>BITS&lt;<em>low</em>-<em>high</em>,<br>:<em>variable</em>&gt;</td>

<td>Interpret the Bit <em>bitNo</em> or the Bitsequence [<em>low</em>-<em>high</em>] as integer and store it in the host variable.
</td>
</tr>
<tr><td>Semantic Function</td>
<td><em>f_</em></td>
<td>call host language function <em>f_</em> in a semantic block (see below).
				A semantic function has the signature <em>bool f_();</em>.
				A return value of true is handled as success whereas a return value of false
				is handled as fail.</td>

</tr>
<tr><td>Semantic Block (Grammar level)</td>
<td><em>BlockName</em>{ //host<br>
	//language<br>
	//statements<br>
	}</td>
<td>The <em>BlockName</em> can be missing in which case a local class named <em>_Top</em> will be created.
Functions and data of a grammar-level semantic block
can be accessed from any other rule-level semantic block. 
Functions in the grammar-level semantic block can be used 
as semantic functions at any place in the grammar.</td>

<td></td>
</tr>
<tr>
<td>CREATE Semantic Block<br>(Grammar level)</td>
<td>CREATE{ //host<br>
	//language<br>
	//statements<br>
	}</td>

<td>This kind of block is used in conjunction with customized tree nodes as described at the very end of this table</td>
</tr>
<tr>
<td>
Semantic Block (Rule level)</td>
<td><em>RuleName</em> { //host<br>
	//language<br>
	//statements<br>

	}: <em>e</em>;</td>
<td>Functions and data of a rule-level semantic block 
are only available from within the associated rule.
Functions in the rule associated semantic block can be used 
as semantic functions on the right hand side of the rule.</td>
</tr>
<tr>
<td>Using semantic block<br>(which is elsewhere defined) </td>
<td><em>RuleName</em><br>
using <em>NameOfSemanticBlock</em>: <em>e</em>;</td>
<td>The using directive supports reusing the same semantic block
when several rules need the same local semantic block.</td>
</tr>
<tr>
<td>Custom Node Creation
</td>
<td>^^CREATE&lt;<br><em>CreaFunName</em>&gt; <em>N</em>: <em>e</em>;<br>

    ^CREATE&lt;<br><em>CreaFuncName</em>&gt; <em>N</em>: <em>e</em>;<br></td>
<td>Custom Node creation allows to create a user defined Node (which must be derived from the library node PegNode).
 The <em>CreaFunc</em> must be defined in a CREATE semantic block (see above) and must have the following overall structure<br>
 <pre>PegNode <em>CreaFuncName</em>(ECreatorPhase phase, PegNode parentOrCreated, int id)
{
     if (phase == ECreatorPhase.eCreate || phase == ECreatorPhase.eCreateAndComplete)
     { // create and return the custom node; if phase==ECreatorPhase.eCreateAndComplete this will be the only call
     }else{
      // finish the custom node and return <code>parentOrCreated</code>; one only gets here after successful parsing of the subrules 
     }
}

</pre>
</td>
</tr>
</tbody></table>

</p><h3><a name="Parsing Expression Grammars particularities and idioms">Parsing Expression Grammars Particularities and Idioms</a></h3>
<p>PEG's behave in some respects similar to regular expressions:
The application of a PEG to an input string can be explained
by a pattern matching process which assigns matching parts of
the input string to rules of the grammar (much like with groups in regexes)
and which backtracks in case of a mismatch. The most important difference
between a PEG and regexes is the fact, that PEG support recursivenesss
and that PEG patterns are greedy.
Compared to most other traditional language parsing techniques, PEG is surprisingly different. 
The most striking differences are:
</p><ul>
<li> Parsing Expression Grammars are deterministic and never ambigous,
thereby removing a
problem of most other parsing techniques. Ambiguity means that the same
input string can be
parsed with different sets of rules of a given grammar and that there
is no policy saying which of
the competing rules should be used. This is in most cases a serious
problem, since if this gets
undetected it results in different parse trees for the same input. The
lack of ambiguity is a big plus for PEG. But the fact, that the order
of alternatives in a PEG rule matters, takes getting used to. <br>The following PEG rule e.g.
<code>rel_operator:  '&lt;' / '&lt;=' / '&gt;' / '&gt;=';</code>
will never succeed in recognizing <code>&lt;=</code> because the first alternative will be chosen. 
<br>The correct
rule is:
<code>rel_operator: '&lt;=' / '&lt;' / '&gt;=' / '&gt;';</code>
</li><li>
Parsing Expression Grammars are scannerless, whereas most other parsers
divide the parsing task
into a low level lexical parse phase called scanning and a high level
phase - proper parsing. The lexical parse phase
just parses items like numbers, identifiers and strings and presents
the information as a so called token
to the proper parser. This subdivision has its merits and its weak
points. It avoids in some cases backtracking and makes it e.g. easy to
distinguish between a keyword and an identifier. A weak point of most
scanners is the lack of context information inside the scanner so that
a given input string always results in the same token. This is not
always desirable and makes e.g. problems in C++ for the input string
<code>&gt;&gt;</code> which can be a right shift operator or the closing of two
template brackets.
</li>
<li>Parsing Expression Grammars can backtrack to an arbitrary location
at the beginning of the input string.
PEG does not require that a file which has to be parsed must be read
completely into memory, but it prohibits
to give free any part of the file which has already been parsed. This
means that a file which foreseeably will
be parsed to the end, should be read into memory completely before
parsing starts. Fortunately memory is not anymore a scarce resource. In
a direct evaluation scenario (semantic actions are executed as soon as
the corresponding
syntax element is recognized) backtracking can also cause problems,
since already executed semantic actions
are in most cases not so easily undone. Semantic actions should
therefore be placed at points where backtracking
cannot anymore occur or where backtracking would indicate a fatal
error. Fatal errors in PEG parsing are best handled by throwing an
exception.
</li>
<li>For many common problems idiomatic solution exist within the PEG framework as shown in the following table
</li>
</ul>
<table class="ArticleTable" >
<thead>
	<tr>
		<td>Goal</td>
		<td>Idiomatic solution</td>
		<td>Sample</td>
	</tr>
</thead>
	<tbody><tr>
	<td>Avoid that<br>white space scanning<br> clutters up the grammar</td>
	<td>White Space scanning<br> should be done<br> immediately after reading a terminal,<br>
	but not in any other place.
	</td>
	<td>

<pre>//to avoid
[3]prod: val S ([*/] S val S)*;
[4]val : [0-9]+ / '(' S expr ')' S;
//to prefer
[3]prod: val ([*/] S val)*;
[4]val: ([0-9]+ / '(' S expr ')') S;</pre>
	</td>
	</tr><tr>
	<td>Reuse Nonterminal<br> when only a subset is applicable</td>
	<td><em>!oneOfExceptions<br>&nbsp;&nbsp;reusedNonterminal</em></td>
	<td><strong>Java spec</strong>
SingleCharacter: InputCharacter <small>but not ' or \</small><br>

<strong>Peg spec</strong>
<pre>SingleCharacter: !['\\] InputCharacter</pre>
</td>
	</tr>
	<tr>
	<td>Test for end of input</td>
	<td>!.</td>
	<td><pre>(!./FATAL&lt;"end of input expected"&gt; )</pre>

	</td>
	</tr>
	<tr>
	<td>Generic rule<br> for quoting situation</td>
	<td><em>GenericQuote<br>&nbsp;&nbsp;&nbsp;&lt;BegQuote,QuoteContent,EndQuote&gt;:<br>
            &nbsp;&nbsp;BegQuote QuoteContent EndQuote;</em></td>
	<td><pre>GenericQuote&lt;'"',(!'"' .)*,'"'&gt;</pre>
	</td>

	</tr>
	<tr>
	<td>Order alternatives<br>having the same start </td>
	<td><em>longer_choice / shorter_choice</em></td>
	<td><pre>&lt;= / &lt;</pre>
	</td>
	</tr>

	<tr>
	<td>Integrate<br>error handling<br>into Grammar</td>
	<td>Use error handling alternative.</td>
	<td><pre>//expect a symbol
'('  expr  @')'
//same as
'('  expr  (')'/FATAL&lt;"')' expected"&gt; );
	</pre>
	</td>

	</tr>
	<tr>
	<td>Provide detailed, expressive error messages </td>
	<td>Generate better error messages by<br> peeking at next symbol</td>
	<td><pre>//poor error handling
[4]object:   '{' S members?  @'}' S;
[5]members:   (str/num)(',' S @(str/num))*;
//better error handling
[4]object: '{' S (&amp;'}'/members) @'}' S;
[5]members: @(str/num)(',' S @(str/num))*; 
	</pre>
	</td>

	</tr>
	</tbody></table>
<br>
<h4><a name="PEG applied to real word grammar problems">PEG Idioms Applied to Real Word Grammar Problems</a></h4>
<p>Most modern programming languages are based on grammars, which can be almost parsed by the predominant
parsing technique (LALR(1) parsing).  The emphasis here is on almost, meaning that there are
often grammar rules which require special handling outside of the grammar framework.
The PEG framework can handle this exceptional cases far better as  will be shown for 
the C++ and C# grammar.</p><p>
</p><p>The <a href="http://msdn.microsoft.com/en-us/vcsharp/aa336809.aspx">C# Language Specification V3.0 </a>
e.g. has the following wording for  its cast-expression/parenthized-expression disambiguation rule:

</p><pre>A sequence of one or more tokens (§2.3.3) enclosed in parentheses is considered 
the start of a cast-expression only if at least one of the following are true:
    1) The sequence of tokens is correct grammar for a type, 
       and the token immediately following the closing parentheses is 
       the token <code>~</code>, the token <code>!</code>, the token <code>(</code>, an <code>identifier</code> (§2.4.1), 
       a <code>literal</code> (§2.4.4), or any <code>keyword</code> (§2.4.3) except <code>as</code> and <code>is</code>.
    2) The sequence of tokens is correct grammar for a type, but not for an expression.</pre>

<p>This can be expressed in PEG with
</p><pre>cast_expression: 	       
/*1)*/     ('(' S   type   ')' S &amp;([~!(]/identifier/literal/!('as' B/'is' B) keyword B)
/*2)*/	/  !parenthesized_expression   '(' S type ')' ) S   unary_expression;
B: ![a-zA-Z_0-9];
S: (comment/whitespace/new_line/pp_directive )*;</pre>
<p>The C++ standard has the following wording for its expression-statement/declaration disambiguation rule
</p><pre>An expression-statement ... can be indistinguishable from a declaration ... 
In those cases the statement is a declaration.</pre>
<p>This can be expressed in PEG with
</p><pre>statement: declaration /  expression_statement;</pre>
<h3><a name="Integrating semantic actions into a PEG framework">Integrating Semantic Actions into a PEG Framework</a></h3>
<p>A PEG grammar can only recognize an input string, 
which gives you just two results, a boolean value 
indicating match success or match failure and an input position 
pointing to the end of the matched string part.
But in most cases, the grammar is only a means to give the input string 
a structure.
This structure is then used to associate the input string with a meaning 
(a semantic) and to execute statements based on this meaning. 
These statements executed during parsing are called semantic actions.
The executable nature of PEG grammars makes integration of semantic actions easy.
Assuming a sequence of grammar symbols  <em>e1 e2</em>  and a semantic action <code>es_</code> which 
should be performed after recognition of <em>e1</em> we just get the sequence <em>e1</em> <code>es_</code> <em>e2</em>

where <em>es_</em> is a function of the host language. 
</p><p>From the grammar view point <code>es_</code> has to conform to the same interface as 
<em>e1</em> and <em>e2</em> or any other PEG component, what means that <code>es_</code> is a function 
returning a <code>bool</code> value as result, where <code>true</code> means success and  <code>false</code> failure.
The semantic function <code>es_</code> can be defined either local to the rule 
which uses (calls) <code>es_</code>  or in the global environment of the grammar.
A bundling of semantic functions, into-variables, helper data values and helper functions
forms then a semantic block.
</p><p>Semantic actions face one big problem in PEG grammars, namely backtracking. 
In most cases, backtracking should not occur anymore after a 
semantic function (e.g. computation of a result of an arithemtic subexpression) 
has been performed. The simplest way to guard against backtracking
in such a case is to handle any attempt to backtrack as fatal error. 
The FATAL&lt;msg&gt; construct presented here aborts parsing (by raising an exception).
</p><p>Embedding semantic actions into the grammar enables direct evaluation of the parsed 
construct.
A typical application is the stepwise computation of an arithmetical expression 
during the parse phase.
Direct evaluation is fast but very limiting since it can only use information present at 
the current parse point.
In many cases embedded semantic actions are therefore used to collect 
information during parsing for processing
after parsing has completed. 
</p><p>The collected data can have many forms, but the most important one is a tree.
Optimizing parsers and compilers delay semantic actions until the end of the parsing 
phase and just create a physical parse tree during parsing 
(our PEG framework supports tree generating by the prefixes ^ and ^^).
A tree walking process then checks and optimizes the tree. 
Finally the tree is intrerpreted at runtime or it is
just used to generate virtual or real machine code.  
The most important evaluation options are shown below

</p><pre>Parsing -&gt; Direct Evaluation
	-&gt; Collecting Information during Parsing
		-&gt; User defined datastructure
			-&gt;User defined evaluation
		-&gt; Tree Structure
			-&gt;Interpretation of generated tree
			-&gt;Generation of VM or machine code</pre>
<p>In a PEG implementation, tree generation must cope with backtracking by deleting 
tree parts which were built after
the backtrack restore point. 
Furthermore, no tree nodes should be created when a <code>Peek</code> or <code>Not</code> production is active.
In this implementation this is handled by tree generation aware 
code in the implemenations for <code>And</code>, <code>Peek</code>, <code>Not</code> and <code>ForRepeat</code> productions.

</p><h3><a name="Parsing Expression Grammars exposed">Parsing Expression Grammars Exposed</a></h3>
<p>The following sample grammar is also taken from the wikipedia article on PEG [2] 
(but with a sligthly different notation).
</p><pre>&lt;&lt;Grammar Name="WikiSample"&gt;&gt;
Expr:    S Sum;
Sum:     Product  ([+-] S Product)*;
Product: Value ([*/] S Value)*;  
Value:   [0-9]+ ('.' [0-9]+)? S / '(' S Sum ')' S;
S:       [ \n\r\t\v]*;
&lt;&lt;/Grammar&gt;&gt;</pre>

<p>During the application of a grammar to an input string, each grammar rule is called from some parent grammar rule
and matches a subpart of the input string which is matched by the parent rule. This results in a parse tree.
The grammar rule Expr would associate the  arithmetical expressions <code> 2.5 * (3 + 5/7)</code>
with the following parse tree:
</p><pre>Expr&lt;
	S&lt;' '&gt;
	Sum&lt;
		Product&lt;
			Value&lt;'2.5' S&lt;' '&gt;&gt;
			'*'          //[*] see text 
			S&lt;' '&gt;
			Value&lt;
				'(' 
				S&lt;''&gt; 
				Sum&lt;
					Product&lt;Value&lt;'3' S&lt;' '&gt;&gt;
					'+' 
					S&lt;' '&gt; 
					Product&lt; Value&lt;'5' S&lt;''&gt;&gt; '/' S&lt;''&gt; Value&lt;'7' S&lt;''&gt;&gt; &gt;

                                &gt;
                        &gt;
               &gt;
	&gt;
&gt;</pre>	
<p>The above parse tree is not a physical tree but an implicit tree which only exists during the parse process. 
The natural implementation for a PEG parser associates each grammar rule with a method (function). 
The right hand side of the grammar rule corresponds to the function body and each 
nonterminal on  the right hand side of the rule is mapped to a function call. 
When a rule function is called, it tries to match the input string at the current input 
position against the right hand side of the rule. If it succeeds it advances the input 
position accordingly and returns true otherwise the input position is unchanged and the result is false.
The above parse tree can therefore be regarded as a stack trace. 
The location marked with [*] in the above parse tree corresponds to the 
function stack <code>Value&lt;=Product&lt;=Sum&lt;=Expr</code> with the function 

<code>Value</code> at the top of the stack and the function <code>Expr</code> at the bottom of the stack.

</p><p>The parsing process as described above just matches an input string or it 
fails to match. But it is not difficult
to add semantic actions during this parse process by inserting helper 
functions at appropriate places.
The PEG parser for arithemtical expressions could e.g. compute the result of the 
expression during parsing.
Such direct evaluation  does not significantly slow down the parsing process. 
Using into variables and semantic blocks as listed above one would get
the following enhanced PEG grammar for arithmetical expressions which directly 
evaluates the result of the expression
and prints it out to the console.
</p><pre lang="cs">&lt;&lt;Grammar Name="calc0_direct"&gt;&gt;
Top{ // semantic top level block using C# as host language
  double result;
  bool print_(){Console.WriteLine("{0}",result);return true;}
}
Expr:    S Sum   (!. print_ / FATAL&lt;"following code not recognized"&gt;);
Sum
{  //semantic rule related block using C# as host language 
	double v;
	bool save_()  {v=  result;result=0; return true;}
	bool add_()   {v+= result;result=0;return true;}
	bool sub_()   {v-= result;result=0;return true;} 
  	bool store_() {result= v; return true;}
}	:     
	Product save_
                ('+' S Product add_
		/'-' S Product sub_)* store_		;
Product
{ //semantic rule related block using C# as host language 
	double v;
	bool save_()  {v=  result;result=0; return true;}
	bool mul_()   {v*= result;result=0; return true;}
	bool div_()   {v/= result;result=0;return true;} 
	bool store_() {result= v;return true;}	
}	: 
	Value  save_
		('*' S Value mul_
		/'/' S Value div_)* store_		;
Value:   Number S / '(' S Sum ')' S	;
Number
{ //semantic rule related block using C# as host language 
	string sNumber;
	bool store_(){double.TryParse(sNumber,out result);return true;}
}
	:  ([0-9]+ ('.' [0-9]+)?):sNumber store_	;
S:	 [ \n\r\t\v]*					;
&lt;&lt;/Grammar&gt;&gt;</pre>

<p>In many cases on the fly evaluation during parsing is not sufficient and one needs a 
physical parse tree or an abstract syntax tree  (abbreviated AST). 
An AST is a parse tree shrinked to the essential nodes thereby saving space and 
providing a view better suited for evaluation.
Such physical trees typically need at least 10 times the memory space of the input 
string and reduce the parsing speed by a factor of 3 to 10.
</p><p>The following PEG grammar uses the symbol ^ to indicate an abstract snytax node 
and the symbol ^^ to indicate a parse tree node. 
The grammar presented below is furthermore enhanced with the error handling item Fatal&lt; errMsg&gt;.
Fatal leaves the parsing process immediately with the result fail but the input position 
set to the place where the fatal error occurred.

</p><pre>&lt;&lt;Grammar Name="WikiSampleTree"&gt;&gt;
[1] ^^Expr:   S Sum (!./FATAL&lt;"end of input expected"&gt;) ;
[2] ^Sum:     Product  (^[+-] S Product)*               ;
[3] ^Product: Value (^[*/] S Value)*                    ;  
[4] Value:    Number S / '(' S Sum ')' S  /
                 FATAL&lt;"number or  ( &lt;Sum&gt; )  expected"&gt;;
[5] ^^Number: [0-9]+ ('.' [0-9]+)?                      ;
[6] S:        [ \n\r\t\v]*                              ;
&lt;&lt;/Grammar&gt;&gt;</pre>

<p>With this grammar the arithmetical expression <code>2.5 * (3 + 5/7)</code> would result in the following physical tree:
</p><pre>Expr&lt;

  Product&lt;
    Number&lt;'2.5'&gt;
    &lt;'*'&gt;
    Sum&lt;&nbsp;Number&lt;'3'&gt;;&nbsp;&lt;'+'&gt;&nbsp;Product&lt;Number&lt;'5'&gt;&lt;'/'&gt;Number&lt;'7'&gt;&nbsp;&gt;&gt;
  &gt;
&gt;</pre>
<p>With a physical parse tree, much more options for evaluation are possible, e.g. one can generate code for  a virtual
machine after first optimizing the tree.
</p><h2><a name="Parsing Expression Grammar Implementation">Parsing Expression Grammar Implementation</a></h2>
<p>In this chapter I first show how to implement all the PEG constructs one by one.
This will be expressed in pseudo code. Then I will try to find the best interface for 
this basic PEG functions in C#1.0 and C#3.0.
</p><h3><a name="General implementation strategy for PEG">General Implementation Strategy for PEG</a></h3>
<p>The natural representation  of a PEG is a top down recursive parser with 
backtracking.
PEG rules are implemented as functions/methods which call each other 
when needed and return true in case of a match and false in case of a mismatch. 
Backtracking is implemented by saving the input position before
calling a parsing function and restoring the input position to the saved one
in case the parsing function returns false. 
</p><p>Backtracking can be limited to the the PEG sequence construct 
and the e&lt;min,max&gt; repetitions if the input position is only moved forward
after successful matching in all other cases.
In the following pseudo code we use strings and integer variables, 
short circuit conditional expressions 
(using &amp;&amp; for AND and || for OR) and exceptions.

<strong>s</strong> stands for the input string and  <strong>i</strong> refers to the current input position.
<strong>bTreeBuild</strong> is an instance variable which inhibits tree build operations when set to false.
<table class="ArticleTable">
<thead>
<tr>
<td>PEG construct</td><td>sample</td><td>pseudo code to implement sample</td>
</tr>
</thead>
<tbody><tr>

<td>
CodePoint<br>#&lt;dec&gt;<br>#x&lt;hex&gt;<br>#x&lt;bin&gt;
</td><td>#32 (decimal)<br>#x3A0 (hex)<br>&lt;#b111&gt; (binary)</td><td>
<pre>if i&lt;length(s) &amp;&amp; s[i]=='\u03A0'
     {i+= 1; return true;}
else {return false;}

</pre>
</td>
</tr><tr>
<td>Literal<br>'<em>literal</em>'</td>
<td>'ab'</td> 		
<td><pre>if i+2&lt;length(s) &amp;&amp; s[i]=='a' &amp;&amp; s[i+1]=='b' 
      { i+= 2; return true; }
else {return false; }

</pre>
</td></tr><tr>
<td>
CaseInsensitive Literal</td><td>'ab'\i</td>
<td><pre>			
if    i+2&lt;length(s) &amp;&amp; toupper(s[i])=='A' &amp;&amp; toupper(s[i+1] =='B' 
     { i+= 2; return true; }
else {return false; }
</pre></td>
</tr><tr>
<td>
Charset<br>[<em>charset</em>]</td>

<td>[ab]</td>
<td><pre>if i+1&lt;length(s) &amp;&amp; (s[i]=='a'|| s[i]=='b')
     {i+= 1; return true;}
else {return false;}</pre>
</td></tr><tr>
<td>CharacterSet</td>
<td>[a-z]</td>
<td><pre>if i+1&lt;length(s) &amp;&amp; s[i]&gt;='a' &amp;&amp; s[i]&lt;='z'
     {i+= 1; return true;}
else {return false;}	

</pre></td>
</tr><tr>
<td>Any</td><td><strong>.</strong></td>
<td><pre>if i+1&lt;length(s) 
     {i+=1;return true;} 
else {return false;}</pre>
</td>
</tr>
<tr>
<td>BITS</td>
<td>BITS&lt;7-8,#b11&gt;</td>

<td><pre>if i&lt;length(s) &amp;&amp; ExtractBitsAsInt(s[i],7,8)==3 
     {i+=1;return true;}
else {return false;}
</pre></td>
</tr>
<tr>
<td>
Sequence<br>
</td>
<td><em>e1 e2</em></td><td><pre>int i0= i; 
TreeState t=SaveTreeState();
if e1() &amp;&amp; e2() 
      {return true;}
else  {i=i0; RestoreTreeState(t);return false;}
</pre></td>

</tr><tr>
<td>
Alternative</td><td><em>e1</em> / <em>e2</em></td>
<td><pre>return e1() || e2();</pre></td></tr><tr>
<td>Greedy Option</td><td> e?</td>
<td><pre>return e() || true;</pre></td>

</tr><tr><td>
Greedy repeat   0+
</td>
<td><em>e</em>*</td>
<td><pre>while e() {} return true;</pre></td><td>
</td></tr><tr><td>Greedy repeat 1+</td>
<td><em>e</em>+</td><td><pre>
if !e()   { return false};
while e() {}              
return true;
</pre></td></tr><tr>
<td>Greedy repeat &gt;=low&lt;=high</td>

<td><em>e</em>{<em>low</em>,<em>high</em>}</td><td><pre>	
int c,i0=i; 
TreeState t=SaveTreeState();
for(c=0;c&lt;high;++c){if !e(){ break;} }
if c&lt;low    { i=i0; RestoreTreeState(t); return false;}
else        {return true;}
</pre></td>
</tr><tr><td>Peek</td><td>&amp;<em>e</em></td><td><pre>int i0=i; 
bool bOld= bTreeBuild; bTreeBuild= false;
bool b= e();
i=i0; bTreeBuild= bOld;
return b;
</pre></td>
</tr><tr><td>Not</td>

<td>!<em>e</em></td><td><pre>int i0=i; 
bool bOld= bTreeBuild; bTreeBuild= false;
bool b= e();
i=i0; bTreeBuild= bOld;
return !b;
</pre></td>
</tr><tr><td>
FATAL</td><td>FATAL&lt; message &gt;</td><td><pre>		
PrintMsg(message); 
throw PegException();
</pre></td></tr><tr><td>
WARNING</td><td>WARNING&lt; message &gt;</td><td><pre>
PrintMsg(message); return true;
</pre></td>
</tr>
<tr><td>Into</td><td>e :<em>variableName</em></td><td><pre>int i0=i;
bool b= e();
variableName= s.substring(i0,i-i0);
return b;</pre></td>
</tr>
<tr><td>Bits Into variable</td><td>BITS&lt;3-5,:v&gt;</td><td><pre>int i0=i;
if i&lt;length(s) {v= ExtractBitsAsInt(s[i],3,5);++i;return true;}
else           {return false;}
</pre></td>

</tr>
<tr><td>Build Tree Node</td><td>^^<em>e</em></td><td><pre>TreeState t=SaveTreeState();
AddTreeNode(...)
bool b= e();
if !b  {RestoreTreeState(savedState);}
return b;</pre></td>
</tr><tr><td>Build Ast Node</td><td>^<em>e</em></td><td><pre>TreeState t=SaveTreeState();
AddTreeNode(..)
bool b= e();
if !b {RestoreTreeState(savedState);}
else {TryReplaceByChildWithoutSibling();}
return b;</pre></td>
</tr>
</tbody></table>

</p><h3><a name="Parsing Expression Grammars mapped to C#1.0">Parsing Expression Grammars Mapped to C#1.0</a></h3>

<p>In C#1.0 we can map the PEG operators <code>CodePoint,Literal, Charset, Any, FATAL,</code>
and <code>WARNING</code> to helper functions in a base class. But the other PEG constructs,
like <code>Sequence, Repeat, Peek, Not, Into</code> and Tree building cannot be easily outsourced
to a library module.
The Grammar for integer sums  
</p><pre>&lt;&lt;Grammar Name="IntSum"&gt;&gt;
Sum:    S [0-9]+  ([+-] S [0-9]+)* S 	;
S:	 [ \n\r\t\v]*			;
&lt;&lt;/Grammar&gt;&gt;</pre>

<p>results in the following C#1.0 implementation
(<code>PegCharParser</code> is a not shown base class with the field <code>pos_</code> and the 
 methods <code>In</code> and <code>OneOfChars</code>)
</p><pre lang="cs">class InSum_C1 : PegCharParser 
{
	public InSum_C1(string s) : base(s) { }
	public bool Sum()//Sum:    S [0-9]+  (S [+-] S [0-9]+)* S 	;
	{
	    S();
	    //[0-9]+
	    if( !In('0', '9') ){return false;}
	    while (In('0', '9')) ;
	    
	    for(;;){//(S [+-] S [0-9]+)*
	        int pos= pos_;
	        if( S() &amp;&amp; OneOfChars('+','-') &amp;&amp; S() ){
	            //[0-9]+
	            if( !In('0', '9') ){pos_=pos; break;}
	            while (In('0', '9')) ;
	        }else{
	            pos_= pos; break;
	        }
	    }
	    S();
	    return true;
	}
	bool S()//S:	 [ \n\r\t\v]*			;
	{
	    while (OneOfChars(' ', '\n', '\r', '\t', '\v')) ;
	    return true;
	}
}</pre>
<p>To execute the Grammar we must just call the method <code>Sum</code> of an object of the above class.
But we cannot be happy and satisfied with this solution. 
Compared with the original grammar rule, the method <code>Sum</code> in
the above class <code>InSum_C1</code> is large and in its use of loops and helper variables
quite confusing. But it is perhaps the best of what is possible in C#1.0. 
Many traditional parser generators even produce worse code. 
</p><h3><a name="Parsing Expression Grammars mapped to C#3.0">Parsing Expression Grammars Mapped to C#3.0</a></h3>
<p>PEG operators like <code>Sequence, Repeat, Into, Tree Build, Peek</code> and <code>Not</code> 

can be regarded as operators or
functions which take a function as parameter. 
This maps in C# to a method with a delegate parameter.
The Peg Sequence operator e.g can be implemented as a function 
with the following interface
<code>public bool And(Matcher pegSequence);</code>
where <code>Matcher</code> is the following delegate 
<code>public delegate bool Matcher();</code>.

</p><p>In older C# versions, passing a function as a parameter required some code lines, but with C#3.0 this changed.
C#3.0 supports lambdas, which are anonymous functions with a very low syntactical overhead.
Lambdas enable a functional implementation of PEG in C#.
The PEG Sequence <em>e1 e2</em> can now be mapped to the C# term <code>And(()=&gt;e1() &amp;&amp; e2())</code>.

<code>()=&gt;e1()&amp;&amp; e2() </code>looks like a normal expression, 
but is in effect a fullfledged function
with zero parameters (hence <code>()=&gt;</code>) and the function body <code> {return e1() &amp;&amp; e2();}</code>.
With this facility, the Grammar for integer sums  
</p><pre>&lt;&lt;Grammar Name="IntSum"&gt;&gt;
Sum:    S [0-9]+  ([+-] S [0-9]+)* S 	;
S:	 [ \n\r\t\v]*			;
&lt;&lt;/Grammar&gt;&gt;</pre>
<p>results in the following C#3.0 implementation

(<code>PegCharParser</code> is a not shown base class with
 methods <code>And</code>,<code>PlusRepeat</code>,<code>OptRepeat</code>, <code>In</code> and <code>OneOfChars</code>)
</p><pre lang="cs">class IntSum_C3 : PegCharParser
{
   public IntSum_C3(string s) : base(s) { }
   public bool Sum()//Sum:    S [0-9]+  (S [+-] S [0-9]+)* S ;
   {
      return 
         And(()=&gt;
            S()
         &amp;&amp; PlusRepeat(()=&gt;In('0','9'))
         &amp;&amp; OptRepeat(()=&gt; S() &amp;&amp; OneOfChars('+','-') &amp;&amp; S() &amp;&amp; PlusRepeat(()=&gt;In('0','9')))
	 &amp;&amp; S());
   }
   public bool S()//S:	 [ \n\r\t\v]*			;
   {
      return OptRepeat(()=&gt;OneOfChars(' ', '\n', '\r', '\t', '\v'));
   }
}</pre>
<p>Compared to the C#1.0 implementation this parser class is a huge improvement.
We have eliminated all loops and helper variables. The correctness (accordance with the grammar rule)
is also much easier to check. The methods <code>And</code>, <code>PlusRepeat</code>, <code>OptRepeat</code>, <code>In</code> and <code>OneOfChars</code>

are all implemented in both the <code>PegCharParser</code> and<code> PegByteParser</code> base classes.
</p><p>The following table shows most of the PEG methods available in the base library delivered with this article.
<table class="ArticleTable">
<thead>
<tr>
<td>PEG element</td>
<td>C# methods</td><td>sample usage</td><td></td>
</tr>
</thead>
<tbody><tr>

<td>CodePoint</td>
<td>Char(char)</td>
<td>Char('\u0023')</td>
</tr>
<tr>
<td>Literal</td>
<td>Char(char c0,char c1,...)<br>Char(string s)</td>
<td>Char("ab")</td>
</tr>
<tr>

<td>CaseInsensitive Literal</td>
<td>IChar(char c0,char c1,...)<br>IChar(string s)</td>
<td>IChar("ab")</td>
</tr>
<tr>
<td>Char Set<br>
<em>[&lt;c0c1...&gt;]</em></td>
<td>OneOf(char c0,char c1,...)<br>OneOf(string s)</td>

<td>OneOf("ab")</td>
</tr>
<tr>
<td>Char Set<br>
<em>[&lt;c0-c1...&gt;]</em>
</td>
<td>In(char c0,char c1,...)<br>In(string s)</td>
<td>In('A','Z','a'-'z','0'-'9')</td>
</tr>
<tr>

<td>Any<br><big>.</big></td>
<td>Any()</td>
<td>Any()</td>
</tr>
<tr>
<td>BITS</td>
<td>Bits(char cLow,char cHigh,byte toMatch)</td>
<td>Bits(1,5,31)</td>
</tr>
<tr>

<td>Sequence<br><em>e1 e2</em> ...</td>
<td>And(MatcherDelegate m)</td>
<td>And(() =&gt; S() &amp;&amp; top_element())</td>
</tr>
<tr>
<td>Alternative<br><em>e1 / e2 / ...</em></td>
<td>e1 || e2 || ...</td>
<td> @object() || array()</td>

</tr>
<tr>
<td>Greedy Option<br><em>e?</em></td>
<td>Option(MatcherDelegate m)</td>
<td>Option(() =&gt; Char('-'))</td>
</tr>
<tr>
<td>Greedy repeat 0+ <br><em>e*</em></td>
<td>OptRepeat(MatcherDelegate m)</td>
<td> OptRepeat(() =&gt; OneOf(' ', '\t', '\r', '\n'))</td>

</tr>
<tr>
<td>Greedy repeat 1+ <br><em>e+</em></td>
<td>PlusRepeat(MatcherDelegate m)</td>
<td> PlusRepeat(() =&gt; In('0', '9'))</td>
</tr>
<tr>
<td>Greedy repeat n0..n1  <br><em>e{low,high}</em></td>
<td>PlusRepeat(MatcherDelegate m)</td>

<td> ForRepeat(4, 4, () =&gt; In('0', '9', 'A', 'F', 'a', 'f'))</td>
</tr>
<tr>
<td>Peek  <br><em>&amp;e</em></td>
<td>Peek(MatcherDelegate m)</td>
<td> Peek(() =&gt; Char('}'))</td>
</tr>
<tr>
<td>Not<br><em>!e</em></td>

<td>Not(MatcherDelegate m)</td>
<td> Not(()=&gt;OneOf('"','\\'))</td>
</tr>
<tr>
<td>FATAL<br><em>FATAL&lt;message&gt;</em></td>
<td>Fatal("&lt;message&gt;")</td>
<td>Fatal("&lt;&lt;'}'&gt;&gt; expected")</td>

</tr>
<tr>
<td>WARNING<br><em>WARNING&lt;message&gt;</em></td>
<td>Warning("&lt;message&gt;")</td>
<td>Warning("non-json stuff before end of file")</td>
</tr>
<tr>
<td>Into<br><em>e :<em>variableName</em></td>

<td>Into(out string varName,MatcherDelegate m)<br>
Into(out int varName,MatcherDelegate m)<br>
Into(out PegBegEnd varName,MatcherDelegate m)</td>
<td>Into(out top.n, () =&gt; Any())</td>
</tr>
<tr>
<td>Bits Into<br>BITS&lt;3-5,:v&gt;<em>variableName</em></td>
<td>BitsInto(int lowBitNo, int highBitNo,out int varName)</td>

<td>BitsInto(1, 5,out top.tag)</td>
</tr>
<tr>
<td>Build Tree Node<br><em>[id] ^^RuleName:</em></td>
<td>TreeNT(int nRuleId, PegBaseParser.Matcher toMatch);</td>
<td>TreeNT((int)Ejson_tree.json_text,()=&gt;...)</td>
</tr>
<tr>
<td>Build Ast Node<br><em>[id] ^RuleName:</em></td>
<td>TreeAST(int id,PegBaseParser.MatcherDelegate m)<br></td>

<td>TreeAST((int)EC_KernighanRitchie2.external_declaration,()=&gt;...)</td>
</tr>
<tr>
<td>Parametrized Rule<br><em>RuleName&lt;a,b,...&gt;</em></td>
<td>RuleName(MatcherDelegate a,<br> MatcherDelegate b,...)<br></td>
<td>binary(()=&gt; relational_expression(),<br>
           ()=&gt;TreeChars(<br> 
	   ()=&gt;Char('=','=') || Char('!','=') 
          )</td>

</tr>
</tbody></table>


</p><h2><a name="Parsing Expression Grammar Examples">Expression Grammar Examples</a></h2>
<p>The following examples show uses of 
the <code>PegGrammar</code> class for all supported use cases:
</p><ol>
<li>Recognition only: 			The result is just match or does not match, 
					in which case an  error message is issued.</li>
<li>Build of a physical parse tree: 	The result is a physical tree.</li>
<li>Direct evaluation:			Semantic actions executed during parsing.</li>
<li>Build tree, interpret tree:		The generated Tree is traversed and evaluated.</li>

</ol>
<h3><a name="Json Checker (Recognize only)">JSON Checker (Recognize only)</a></h3>
<p>JSON (JavaScript Object Notation) <a href="#References">[5][6]</a> is an exchange format suited for serializing/deserializing program data.
Compared to XML it is featherweight and therefore a good testing candidate for parsing techniques.
The JSON Checker presented here gives an error message and error location in case the file does not conform
to the JSON grammar. The following PEG grammar is the basis of <code>json_check</code>.
</p><pre>&lt;&lt;Grammar Name="json_check" 
	  encoding_class="unicode"  encoding_detection="FirstCharIsAscii"
	  reference="www.ietf.org/rfc/rfc4627.txt"&gt;&gt;
[1]json_text:		S top_element expect_file_end			;
[2]expect_file_end:	!./ WARNING&lt;"non-json stuff before end of file"&gt;;
[3]top_element:		object / array  /
			FATAL&lt;"json file must start with '{' or '['"&gt;	;
[4]object:  		'{' S (&amp;'}'/members)  @'}' S		;
[5]members: 		pair S (',' S pair S)*				;	
[6]pair:		@string S @':' S value				;
[7]array:  		'[' S (&amp;']'/elements)  @']' S		;
[8]elements: 		value S (','S  value S)*			;
[9]value:    		@(string / number / object / 
				array / 'true' / 'false' / 'null')	;
[10]string:	 	'"' char* @'"'					;
[11]char:  		escape / !(["\\]/control_chars)unicode_char	;
[12]escape:		'\\' ( ["\\/bfnrt] / 
			'u' ([0-9A-Fa-f]{4}/FATAL&lt;"4 hex digits expected"&gt;)/
			     FATAL&lt;"illegal escape"&gt;);
[13]number:		'-'? int frac? exp?				;
[14]int:		'0'/ [1-9][0-9]*				;
[15]frac:		'.' [0-9]+					;
[16]exp:		[eE] [-+] [0-9]+				;
[17]control_chars:	[#x0-#x1F]					;
[18]unicode_char:	[#x0-#xFFFF]					;
[19]S:  		[ \t\r\n]*					;

&lt;&lt;/Grammar&gt;&gt;</pre>
<p>The translation of the above grammar to C#3.0 is straightforward and results in the
following code (only the translation of the  first 4 rules are reproduced).
</p><pre lang="cs">public bool json_text()    
	{return And(()=&gt;    S() &amp;&amp; top_element() &amp;&amp; expect_file_end() );}
public bool expect_file_end()  
{ return   
      Not(()=&gt; Any() )
   || Warning("non-json stuff before end of file");
}
public bool top_element()    
{ return   
      @object()
   || array()
   || Fatal("json file must start with '{' or '['");
}
public bool @object()    
{
   return And(()=&gt;  
     Char('{')
  &amp;&amp; S()
  &amp;&amp; (    Peek(()=&gt; Char('}') ) || members())
  &amp;&amp; (    Char('}') || Fatal("&lt;&lt;'}'&gt;&gt; expected"))
  &amp;&amp; S() );
}</pre>
<h3><a name="Json Tree  (Build Tree)">JSON Tree (Build Tree)</a></h3>
<p>With a few changes of the JSON checker grammar we get a grammar which
generates a physical tree for a JSON file. In order to have unique nodes for
the JSON values <code>true</code>, <code>false</code>, <code>null</code> 
we add corresponding rules. Furthermore, we add a rule which matches the
content of a string (the string without the enclosing double quotes). This gives
us the following grammar:
</p><pre>&lt;&lt;Grammar Name="json_tree" 
	  encoding_class="unicode"  encoding_detection="FirstCharIsAscii"
	  reference="www.ietf.org/rfc/rfc4627.txt"&gt;&gt;
[1]^^json_text:		(object / array)				;
[2]^^object:  		S '{' S (&amp;'}'/members) S @'}' S			;
[3]members: 		pair S (',' S @pair S)*				;	
[4]^^pair:		@string S ':' S value				;
[5]^^array:  		S '[' S (&amp;']'/elements) S @']' S		;
[6]elements: 		value S (','S  @value S)*			;
[7]value:    		@(string / number / object / 
				array / true / false / null)		;
[8]string:	 	'"' string_content '"'				;
[9]^^string_content: ( '\\' 
                           ( 'u'([0-9A-Fa-f]{4}/FATAL&lt;"4 hex digits expected"&gt;)
                           / ["\\/bfnrt]/FATAL&lt;"illegal escape"&gt; 
                           ) 
                        / [#x20-#x21#x23-#xFFFF]
                        )*	;
[10]^^number:        '-'? '0'/ [1-9][0-9]* ('.' [0-9]+)? ([eE] [-+] [0-9]+)?;
[11]S:  		[ \t\r\n]*					;
[12]^^true:		'true'						;
[13]^^false:		'false'						;
[14]^^null:		'null'						;

&lt;&lt;/Grammar&gt;&gt;</pre>
<p>The following table shows on the left hand side a JSON input file and 
on the right hand side the tree generated by the <code>TreePrint</code> helper class 
of our parser library.
<table class="ArticleTable">
<thead>
<tr><td>JSON Sample File</td><td>TreePrint Output</td></tr>
</thead>
<tbody><tr><td><pre>

{
   "ImageDescription": {
      "Width":  800,
      "Height": 600,
      "Title":  "View from 15th Floor",
      "IDs": [116, 943, 234, 38793]
    }
}

</pre>
</td>

<td><pre>json_text&lt;
  object&lt;
    pair&lt;
      'ImageDescription'
      object&lt;
        pair&lt;'Width' '800'&gt;
        pair&lt;'Height' '600'&gt;
        pair&lt;'Title' 'View from 15th Floor'&gt;
        pair&lt;'IDs' array&lt;'116' '943' '234' '38793'&gt;&gt;
      &gt;
    &gt;
  &gt;
&gt;
</pre>
</td>
</tr>
</tbody></table>
</p><h3><a name="Basic Encode Rules (Direct Evaluation + Build Tree)">Basic Encode Rules (Direct Evaluation + Build Tree)</a></h3>

<p>BER (Basic Encoding Rules) is the most commonly used format
for encoding ASN.1 data. Like XML, ASN.1 serves the purpose of representing 
hierarchical data, but unlike XML, ASN.1 is traditionally encoded in compact binary formats
and BER is one of the these formats (albeit the least compact one). The Internet
standards SNMP and LDAP are examples of ASN.1 protocols using BER as encoding.
The following PEG grammar for reading a BER file into a tree representation 
uses semantic blocks to store information necessary for further parsing. 
This kind of dynamic parsing which uses data read during the parsing process to
decode data further downstreams is typical for parsing of binary formats.
The grammar rules for BER  <a href="#References">[4]</a> as shown below express the following facts:
</p><ol>
<li>BER nodes consist of the triple Tag Length Value (abbreviated as TLV)
     where Value is either a primitive value or a list of TLV nodes.</li>
<li>The Tag identifies the element (like the start tag in XML).</li>
<li>The Tag contains a flag whether the element is primitive or constructed.
  Constructed means that there are children.
</li>
<li>The Length is either the length of the Value in bytes or it is the special pattern 0x80
(only allowed for elements with children), in which case the sequence of childrens
ends with two zero bytes (0x0000). 
</li>
<li>The Value is either a primitive value or -if the constructed flag is set- 
it is a sequence of Tag Length Value triples. The sequence of TLV triples ends
when the length given in the Length part of the TLV tripple is used up or 
in the case where the length is given as 0x80, when
the end marker 0x0000 has been reached.
</li>
</ol>
<pre>&lt;&lt;Grammar Name="BER" encoding_class="binary"
          reference="http://en.wikipedia.org/wiki/Basic_encoding_rules"
          comment="Tree generating BER decoder (minimal version)"&gt;&gt;
{
   int tag,length,n,@byte;
   bool init_()     {tag=0;length=0;           return true;}
   bool add_Tag_()  {tag*=128;tag+=n;          return true;}
   bool addLength_(){length*=256;length+=@byte;return true;}
}
[1] ProtocolDataUnit: TLV;
[2] ^^TLV:  init_
     ( &amp;BITS&lt;6,#1&gt; Tag ( #x80  CompositeDelimValue  #0#0 / Length CompositeValue )
     / Tag Length PrimitiveValue
     );
[3] Tag: OneOctetTag / MultiOctetTag / FATAL&lt;"illegal TAG"&gt;;
[4] ^^OneOctetTag:   !BITS&lt;1-5,#b11111&gt; BITS&lt;1-5,.,:tag&gt;;
[5] ^^MultiOctetTag: . (&amp;BITS&lt;8,#1&gt; BITS&lt;1-7,.,:n&gt; add_Tag_)* BITS&lt;1-7,.,:n&gt; add_Tag_;
[6] Length :   OneOctetLength / MultiOctetLength 
             / FATAL&lt;"illegal LENGTH"&gt;;
[7] ^^OneOctetLength: &amp;BITS&lt;8,#0&gt; BITS&lt;1-7,.,:length&gt;;
[8]^^MultiOctetLength: &amp;BITS&lt;8,#1&gt; BITS&lt;1-7,.,:n&gt; ( .:byte addLength_){:n};
[9]^^PrimitiveValue: .{:length} / FATAL&lt;"BER input ends before VALUE ends"&gt;;
[10]^^CompositeDelimValue: (!(#0#0) TLV)*;
[11]^^CompositeValue
{
     int len;
     PegBegEnd begEnd;
     bool save_()  {len= length;return true;}
     bool at_end_(){return len&lt;=0;}
     bool decr_()  {len-= begEnd.posEnd_-begEnd.posBeg_;return len&gt;=0;} 
}
	:  save_ 
           (!at_end_ TLV:begEnd 
                (decr_/FATAL&lt;"illegal length"&gt;))*;

&lt;&lt;/Grammar&gt;&gt;</pre>
<h3><a name="Scientific Calculator  (Build Tree + Evaluate Tree)">Scientific Calculator  (Build Tree + Evaluate Tree)</a></h3>
<p>This calculator supports the basic arithmetic operations + - * /,
built in functions taking one argument like 'sin','cos',..  and assignments to variables.
The calculator expects line separated expressions and assignments. It works
as two step interpreter which first builds a tree, then evaluates the tree.
The PEG grammar for this calculator can be translated to a peg parser by the
parser generator coming with the PEG Grammar Explorer. The evaluator must be 
written by hand. It works by walking the tree and evaluating the results as it visits
the nodes.
</p><p>The grammar for the calculator is:
</p><pre>&lt;&lt;Grammar Name="calc0_tree"&gt;&gt;
[1]^^Calc:  ((^'print' / Assign / Sum) 
            ([\r\n]/!./FATAL&lt;"end of line expected"&gt;)
            [ \r\n\t\v]* )+ 
            (!./FATAL&lt;"not recognized"&gt;); 
[2]^Assign:S ident S '=' S Sum;           
[3]^Sum:   Prod  (^[+-] S @Prod)*;
[4]^Prod:  Value (^[*/] S @Value)*;  
[5] Value: (Number/'('S Sum @')'S/Call/ident) S;
[6]^Call:   ident S '(' @Sum @')' S;
[7]^Number:[0-9]+ ('.' [0-9]+)?([eE][+-][0-9]+)?;
[8]^ident: [A-Za-z_][A-Za-z_0-9]*;
[9] S:	    [ \t\v]*;
&lt;&lt;/Grammar&gt;&gt;</pre>

<h2><a name="PEG Parser Generator">PEG Parser Generator</a></h2>
<h3><a name="A PEG Generator implemented with PEG">A PEG Generator Implemented with PEG</a></h3>
<p>The library classes <code>PegCharParser</code> and <code>PegByteParser</code> are designed for manual 
Parser construction of PEG parsers. But it
is highly recommended in any case to first write the grammar on paper before implementing it.
I wrote a little parser generator (using <code>PegCharParser</code>) which translates a 'paper' Peg grammar
to a C# program. The current version of the PEG parser generator 
just generates a C# parser. It uses optimizations for huge character sets and for big sets of literal alternatives. 
Future versions will generate source code for C/C++ and other languages
and furthermore support debugging, tracing and  direct execution of the grammar without the need to translate it to
a host language. But even the current version of the PEG parser generator is quite
helpful. 

</p><p>All the samples presented in the chapter 
<a href="#Parsing%20Expression%20Grammar%20Examples">Expression Grammar Examples</a>

were generated  with it. The PEG Parser Generator is an example of a PEG parser which generates
a syntax tree. It takes a PEG grammar as input, validates the generated syntax tree 
and then writes a set of C# code files, which implement the parser described by the PEG grammar.
</p><h3><a name="PEG Parser Generator Grammar">PEG Parser Generator Grammar</a></h3>
<p>The PEG Parser Generator coming with this article expects a set of grammar rules written as described in the
chapter <a href="#Parsing%20Expression%20Grammars%20Basics">Parsing Expression Grammars Basics</a>.
These rules must be preceded by a header and terminated by a trailer as described in the following PEG Grammar:
</p><pre>&lt;&lt;Grammar Name="PegGrammarParser"&gt;&gt;
peg_module:              peg_head peg_specification peg_tail;
peg_head:                S '&lt;&lt;Grammar'\i B S  attribute+ '&gt;&gt;';
attribute:               attribute_key S '=' S attribute_value S;
attribute_key:           ident;
attribute_value:         <em>"attribute value in single or double quotes"</em>;
peg_specification:       toplevel_semantic_blocks peg_rules;
toplevel_semantic_blocks:semantic_block*;
semantic_block:          named_semantic_block / anonymous_semantic_block;
named_semantic_block:    sem_block_name S anonymous_semantic_block;
anonymous_semantic_block:'{' semantic_block_content '}' S;
peg_rules:               S peg_rule+;
peg_rule:                lhs_of_rule ':'S  rhs_of_rule ';' S;
lhs_of_rule:             rule_id? tree_or_ast? create_spec? 
                         rule_name_and_params 
                          (semantic_block/using_sem_block)?;
rule_id:                 (![A-Za-z_0-9^] .)* [0-9]+ (![A-Za-z_0-9^] .)*;
tree_or_ast:             '^^'/'^';
create_spec:             'CREATE' S '&lt;' create_method S '&gt;' S;
create_method:           ident;
ident:                   [A-Za-z_][A-Za-z_0-9]*;
rhs_of_rule:             <em>"right hand side of rule as described in 
                                <a href="http://www.codeproject.com/KB/recipes/%22#Parsing" expression="" grammars="" basics&quot;="">Parsing Expression Grammars Basics</a>"</em>;
semantic_block_content:  <em>"semantic block content as described in 
                                <a href="http://www.codeproject.com/KB/recipes/%22#Parsing" expression="" grammars="" basics&quot;="">Parsing Expression Grammars Basics</a>"</em>;
peg_tail:   '&lt;&lt;/GRAMMAR'\i; S '&gt;&gt;';
&lt;&lt;/Grammar&gt;&gt;</pre>

<p>The header of the grammar contains HTML/XML-style attributes which are used to determine
the name of the generated C# file and the input file properties. The following attributes
are used by the C# code generator:
<table class="ArticleTable" width="100%">
<thead>
<tr><td>Attribute Key</td><td>Optionality</td><td>Attribute Value</td>
</tr>
</thead>
<tbody><tr><td>Name</td><td>Mandatory</td><td>Name for the generated C# grammar file and namespace</td>
</tr>
<tr><td>encoding_class</td>

     <td>Optional</td>
      <td>Encoding of the input file. Must be one of 
      <code>binary</code>, 
      <code>unicode</code>, 
      <code>utf8</code> or 
      <code>ascii</code>.
 Default is <code>ascii</code></td>
</tr>

<tr><td>encoding_detection</td>
     <td>Optional</td>
<td>Must only be present if the <code>encoding_class</code> is set to <code>unicode</code>.
 In this case one of the values <code>FirstCharIsAscii</code> or 
 <code>BOM</code> is expected.</td>

 </tr>
</tbody></table>
</p><p>All further attributes are treated as comments.
The attribute <code>reference</code> in the following sample header
</p><pre>&lt;&lt;Grammar Name="json_check" 
	  encoding_class="unicode"  encoding_detection="FirstCharIsAscii"
	  reference="www.ietf.org/rfc/rfc4627.txt"&gt;&gt;</pre>
<p>is treated as comment.  
</p><h3><a name="The PEG Parser Generator's handling of semantic blocks">The PEG Parser Generator's Handling of Semantic Blocks</a></h3>
<p>Semantic blocks are translated to local classes. The code inside
semantic blocks must be C# source text as expected in a class body, except that access keywords
can be left out. The parser generator prepends an <code>internal</code> access keyword when necessary. Top level semantic blocks are handled differently than local semantic blocks.
</p><p>A top level semantic block is created in the grammar's constructor, wheras a local semantic
block is created each time the associated rule method is called. There is no need to define
a constructor in a local semantic block, since the parser generator creates a constructor 
with one parameter, a reference to the grammar class. 
The following sample shows a grammar excerpt with a top level and a local semantic block and
its translation to C# code.

</p><pre>&lt;&lt;Grammar Name="calc0_direct"&gt;&gt;
Top{ // semantic top level block 
  double result;
  bool print_(){Console.WriteLine("{0}",result);return true;}
}
...
Number
{ //local semantic block 
	string sNumber;
	bool store_(){double.TryParse(sNumber,out result);return true;}
} :  ([0-9]+ ('.' [0-9]+)?):sNumber store_	;</pre>
<p>These semantic blocks will be translated to the following C# source code
</p><pre lang="cs">class calc0_direct : PegCharParser 
{
   class Top{ // semantic top level block 
      internal double result;
      internal bool print_(){Console.WriteLine("{0}",result);return true;}
   }
   Top top;

   #region Constructors
   public calc0_direct(): base(){ top= new Top();}
   public calc0_direct(string src,TextWriter FerrOut): base(src,FerrOut){top= new Top();}
   #endregion Constructors
  ...
  class _Number{ //local semantic block  
	internal string sNumber;
	internal bool store_(){double.TryParse(sNumber,out parent_.top.result);return true;}
        internal _Number(calc0_direct grammarClass){parent_ = grammarClass; }
        calc0_direct parent_;
   }
        public bool Number()    
        {
             var _sem= new _Number(this);
	...
}</pre>
<p>
Quite often, several grammar rules must use the same local semantic block. To avoid code duplication,
the parser generator supports the <code>using <em>SemanticBlockName</em></code> clause.
The semantic block named <code><em>SemanticBlockName</em></code> should be defined before the 
first grammar rule at the same place where the  top level semantic blocks are defined. But because
such a block is referenced in the using clause of a rule, it is treated as local semantic block.
<p>
Local semantic blocks also support destructors. A destructor is tranlsated to a <code>IDispose</code> interface and the destructor code is placed into the corresponding <code>Dispose()</code> function. 
The grammar rule function which is generated by the parser generator will be enclosed in a using block.
This allows to execute cleanup code at the end of the rule even in the presence of exceptions.
The following sample is taken from the <code>Python 2.5.2</code> sample parser.
<pre>
Line_join_sem_{
      bool prev_;
      Line_join_sem_ (){set_implicit_line_joining_(true,out prev_);}
      ~Line_join_sem_(){set_implicit_line_joining_(prev_);}
}
...
[8] parenth_form:         '(' parenth_form_content @')' S;
[9] parenth_form_content 
    using Line_join_sem_:   S expression_list?;
...
[17]^^generator_expression: '(' generator_expression_content @')' S;
[18]generator_expression_content
    using Line_join_sem_:   S expression genexpr_for;
</pre>
<p>
The <code>Line_join_sem</code> semantic block turns Python's implicit line joining on and off (Python is
line oriented except that line breaks are allowed inside constructs which are parenthized as in
<code>(...) {...) [...]</code>. The <code>Line_join_sem</code> semantic block and rule [8] of the 
above grammar excerpt are translated to
<pre>
class Line_join_sem_ : IDisposable{
   bool prev_;
   internal Line_join_sem_(python_2_5_2_i parent)
   { 
      parent_= parent; 
      parent_._top.set_implicit_line_joining_(true,out prev_);
   }
   python_2_5_2_i parent_;
   public void Dispose(){parent_._top.set_implicit_line_joining_(prev_);}
}
public bool parenth_form_content()    /*[9]   parenth_form_content 
                                              using Line_join_sem_:    S expression_list?;*/
{
    using(var _sem= new Line_join_sem_(this)){
       return And(()=>    S() && Option(()=> expression_list() ) 
     );            
}
</pre>

<h2><a name="Parsing Expression Grammars in perspective">Parsing Expression Grammars in Perspective</a></h2>
<p>Parsing Expression Grammars narrow the semantic gap between
formal grammar and implementation of the grammar in a functional or imperative programming language.
PEGs are therefore particularly well suited for manually written parsers as well as for attempts
to integrate a grammar very closely into a programming language. As stated in [1], the elements
which form the PEG framework are not new, but are well known and commonly used techniques when implementing
parsers manually. What makes the PEG framework unique is the selection and combination of the basic elements,
namely 
<table class="ArticleTable" width="100%">
<thead>
	<tr>
		<td>PEG Feature</td>

		<td>Advantage</td>
		<td>Disadvantage</td>
	</tr>
	</thead>
	<tbody><tr>
		<td>Scannerless parsing</td>
		<td>Only one level of abstraction. No scanner means no scanner worries.</td>
		<td>Grammar sligthly cluttered up. Recognition of overlapping tokens might be inefficient
		(e.g. identifier token overlaps with keywords -&gt; ident: !keyword [A-Z]+; )</td>

	</tr>
	<tr>
		<td>Lack of ambiguity</td>
		<td>There is only one interpretation for a grammar. 
			The effect is, that PEG grammars are "executable".</td>
		<td>---</td>
	</tr>
	<tr>

		<td>Error handling by using FATAL alternative</td>
		<td>Total user control over error diagnostics</td>
		<td>Bloats the grammar. </td>
	</tr>
	<tr>
		<td>Excessive Backtracking possible</td>
		<td>Backtracking adds to the powerfulness of PEG. If the input string
is in memory, backtrackting just means resetting the input position.</td>

		<td>Potential memory hogg. Interferes with semantic actions.
			Solution: Issue a fatal error in case backtracking cannot succeed anymore.
		</td>
	</tr>
	<tr>
		<td>Greedy repetition</td>
		<td>Greedy repetition conforms to the "maximum munch rule" used in scanning and therefore
		allows scannerless parsing.
				</td>
		<td>Some patterns  are more difficult to recognize.
		</td>
	</tr>

	<tr><td>Ordered Choice</td>
		<td>The author of the grammar determines the selection strategy.</td>
		<td>Potential error source for the unexperienced. R: '&lt;' / '&lt;=' ; 
		will never find &lt;=</td>
	</tr>
	<tr><td>Lookahead operators <strong>&amp;</strong> and<strong> !</strong></td>

		<td>Supports arbitrary lookaheads. 
		Good cost/gain ratio if backtracking is anyway supported.
		Lookahead e.g. allows better reuse of grammar rules and supports 
		more expressive error diagnostics.</td>
		<td>Excessive use of <strong>&amp;</strong> and <strong>!</strong> makes the parser slow.</td>
	</tr>
</tbody></table>
</p><p>A PEG grammar can incur a serious performance penalty, when backtracking occurs frequently. This is the reason
that some PEG tools (so called packrat parsers) memoize already read input and the associated rules. It can
be proven, that appropriate memoization guarantees linear parse time even when backtracking and unlimited lookahead 
occurs. Memoization (saving information about already taken paths) on the other hand has its own overhead and
impairs performance in the average case. The far better approach to limit backtracking is rewriting the grammar
in a way, which reduces backtracking. How to do this will be shown in the next chapter.
</p><p>The ideas underlying PEG grammars are not entirely new and many
of them are regularly used to manually construct parsers.
Only in its support and encouragement for backtracking and unlimited
lookahead deviates PEG from most earlier parsing techniques.
The simplest implementation for unlimited lookahead and backtracking
requires that the input file must be read into internal
memory before parsing starts. This is not a problem nowadays, but was
not acceptable earlier when memory was a scarce resource.
</p><h3><a name="Parsing Expression Grammars Tuning">Parsing Expression Grammars Tuning</a></h3>

<p>A set of grammar rules can recognize a given language.
But the same language can be described by many different grammars even within the same formalism (e.g. PEG grammars).
Grammar modifications can be used to meet the following goals:
<table class="ArticleTable" width="100%">
<tbody><tr><td><strong>Goal</strong></td>
<td><strong>Before modification</strong></td>
<td><strong>After modification</strong></td>
</tr>
<tr><td>More informative<br> tree nodes [1]</td>
<td><pre>[1]^^string: 
&nbsp;&nbsp;'"' ('\\' ["'bfnrt\\])/!'"' .)* '"';</pre> </td>
<td><pre>[1]^^string:   '"'(escs/chars)* '"';
[3] ^^escs:  ('\\' ["'bfnrt\\])*;
[4] ^^chars: (!["\\] .)*;</pre></td>

</tr>
<tr><td>Better placed<br> error indication [2]</td>
<td><pre>
'/*'  (!'*/' . )*  
('*/' / 
 FATAL&lt;"comment not closed before end"&gt; 
);</pre>
</td>
<td><pre>
'/*'  
( (!'*/' .)*  '*/' 
/ FATAL&lt;"comment not closed before end"&gt; 
);
</pre>
</td>
</tr>
<tr><td><em>Faster</em> Grammar<br> (reduce calling depth) [3]</td>

<td><pre>[10]string:   '"' char* '"'	;
[11]char: escape 
      / !(["\\]/control)unicode 
      /!'"' FATAL&lt;"illegal character"&gt;;
[12]escape:   '\\' ["\\/bfnrt];
[17]control   [#x0-#x1F];
[18]unicode:  [#x0-#xFFFFF];</pre></td>
<td><pre>[10]string: 
'"' 
( '\\'["\\/bfnrt] 
/ [#0x20-#0x21#0x23-#0x5B#0x5D-#0xFFFF] 
/ !'"' FATAL&lt;"illegal character"&gt;
     )*
'"';</pre></td>
</tr>
<tr>
<td><em>Faster</em> Grammar,<br> Less Backtracking (left factoring)</td>
<td><pre>[1] if_stat:   
   'if' S '('expr')' stat /
   'if' S '('expr')' stat 'else' S stat;</pre></td>

<td><pre>[1] if_stat: 
   'if' S '(' expr ')' stat 
             ('else' S stat)?;</pre></td>
</tr>
</tbody></table>
</p><p>Remarks:<br> 
[1] More informative tree nodes can be obtained by syntactical grouping of grammar
elements so that postprocessing is easier. In the above example, access 
to the content of the string is improved by grouping consecutive non-escape characters
into one syntactical unit.<br>
[2] The source reference place which is given by an error message is important.
In the example of a c comment which is not closed until the end of the input, 
the error message should be given where the comment opens.<br>
[3] Reducing calling depth means inlinig of function calls, since
 each rule corresponds to one function call in our PEG implementation.
Such a transformation should only be carried out for hot spots, otherwise
the expressiveness of the grammar gets lost. Furthermore, some aggressive
inlining compilers may do this inlining for you.
Reducing calling depth may be questionable, but left factorization
is certainly not. It not only improves performance but also eliminiates potential
disruptive backtracking. When embedding semantic actions into a PEG parser,
backtracking should in many cases not occur anymore, because undoing semantic actions
may be tedious.
</p><h3><a name="Comparison of PEG parsing with other parsing techniques">Comparison of PEG Parsing with other Parsing Techniques</a></h3>
<p>Most parsing strategies currently in use are based on the notion of
a context free grammar. (The following explanations follow
-for the next 50 lines- closely the material used in the
Wikipedia on Context free grammars [3])
A context free grammar consists
of a set of rules similar to the set of rules for a PEG parser.
But context free grammars are quite differently interpreted than
PEG grammars. The main difference is the fact, that context free
grammars are nondeterministic, meaning that 
</p><ol>
<li>Alternatives in context free grammars can be chosen arbitrarily </li>

<li>Nonterminals can be substituted in an arbitrary order 
(Substitution means replacing a Nonterminal on the right hand side of a rule by the
definition of the Nonterminal).
By starting with the start rule and choosing
alternatives and substituting nonterminals in all possible orders we can generate all the 
strings which are described by the grammar 
(also called the language described by the grammar).
</li>
</ol> 
<br>
With the context free grammar 
<pre>S : 'a' S 'b' | 'ab'; </pre>
<p>e.g. we can generate the following language strings
</p><pre>ab, aabb, aaabbb,aaaabbbb,...</pre>
<p>With PEG we cannot generate a language, we can only recognize an input string.
The same grammar interpreted as PEG grammar 
</p><pre>S: 'a' S 'b' / 'ab';</pre> 
<p>would recognize any of the following input strings 
</p><pre>ab
aabb
aaabbb
aaaabbbb
</pre>

<p>It turns out, that the nondeterministic nature of context free grammars, while being 
indispensable for generating a language, can be a problem when recognizing an input string.
If an input string can be parsed in two different ways we have the problem of ambiguity,
which must be avoided by parsers.
A further consequence of nondeterminism is that a context free input string recognizer 
(a parser) must choose a strategy how to substitute nonterminals on the right hand side of a rule.
To recognize the input string 
</p><pre>1+1+a</pre> with the context free rule
<pre>S: S '+' S | '1' | 'a';</pre>
we either can e.g use the following substitutions:
<pre>S '+' S -&gt; 
(S '+' S)  '+' S -&gt; 
(('1') '+' S) '+' S -&gt; 
(('1') '+' ('1')) '+' S -&gt; 
(('1') '+' ('1')) '+' ('a')
</pre>
<p>This is called a leftmost derivation. Or we can use the substitutions:
</p><pre>S '+' S -&gt; 
S '+' (S  '+' S) -&gt; 
S '+' (S '+' ('a')) -&gt; 
S '+' (('1') '+' ('a')) -&gt; 
('1') '+' (('1') '+' ('a'));
</pre>
<p>This is called a rightmost derivation.
A leftmost derivation parsing strategy is called LL, wheras a rightmost
derivation parsing strategy is called LR
(the first L in LL and LR stands for "parse the input string from
Left", but who will try it from right?).
Most parsers in use are either LL or LR parsers. Furthermore, grammars
used for LL parsers and LR parsers must obey different rules.
A grammar for an LL parser must never use left recursive rules, whereas
a grammar for an LR parser prefers immediate left recursive rules over
right recursive ones.
The C# grammar e.g. is written for an LR parser. The rule for a list of
local variables is therefore:
</p><pre>local-variable-declarators: 
  local-variable-declarator 
| local-variable-declarators ',' local-variable-declarator;</pre>                         
<p>If we want use this grammar with an LL parser, then we must rewrite this rule to:
</p><pre>local-variable-declarators: 
   local-variable-declarator  (',' local-variable-declarator)*;</pre>
<p>Coming back to original context free rule
</p><pre>S: S '+' S | '1' | 'a';</pre>
<p>and interpreting it as a PEG rule
</p><pre>S: S '+' S / '1' / 'a';</pre>
<p>we do not have to do substitutions or/and choose a parsing strategy to recognize the input string
</p><pre>1+1+a</pre> 

<p>We only must follow the execution rules for a PEG which translates to the following steps:
</p><ol>
<li>Set the input position to the start of the input string</li>
<li>Choose the first alternative of the start rule (here:<code> S '+' S</code>)</li>
<li>Match the input against the first component of the sequence <code>S '+' S</code></li>
<li>Since the first component is the nonterminal <code>S</code>, call this nonterminal.</li>
</ol>

<p>This obviously results in infinite recursion. The rule <code>S: S '+' S / '1' / 'a';</code>
is therefore not a valid PEG rule. But almost any context free rule can be transformed into a PEG rule.
The context free rule <code>S: S '+' S | '1' | 'a';</code> translates to the valid PEG rule
</p><pre>S: ('1'/'a')('+' S)*;</pre>
<p>One of the following chapters shows how to translate a context free rule into a PEG rule.
</p><p>The following table compares the prevailing parser types.
<table class="ArticleTable">
<thead>
<tr>
<td>Parser Type</td><td>Sub Type</td><td>Scanner</td><td>Lookahead</td><td>Generality</td><td>Implementation</td><td>Examples</td>

</tr>
</thead>
<tbody><tr>
<td>Context Free</td><td>LR-Parser</td><td>yes</td><td>-</td><td>-</td><td>table driven</td><td></td>		
</tr>
<tr>
<td>Context Free</td><td>SLR-Parser</td><td>yes</td><td>1</td><td>medium</td><td>table driven</td><td>handcomputed table</td>

</tr>
<tr>
<td>Context Free</td><td>LALR(1)-Parser</td><td>yes</td><td>1</td><td>high</td><td>table driven</td><td>YACC,Bison</td>
</tr>
<tr>
<td>Context Free</td><td>LL-Parser</td><td>yes</td><td>-</td><td>-</td><td>code or table driven</td><td></td>

</tr>
<tr>
<td>Context Free</td><td>LL(1)-Parser</td><td>yes</td><td>1</td><td>low</td><td>code or table driven</td><td>Predictive parsing</td>
</tr>
<tr>
<td>Context Free</td><td>LL(k)-Parser</td><td>yes</td><td>k</td><td>high</td><td>code or table driven</td><td>ANTLR,Coco-R</td>

</tr>
<tr>
<td>Context Free</td><td>LL(*)-Parser</td><td>yes</td><td>unlimited</td><td>high+</td><td>code or table driven</td><td>boost::spirit</td>
</tr>
<tr>
<td>PEG-Parser</td><td>PEG-Parser</td><td>no</td><td>unlimited</td><td>very high</td><td>code preferred</td><td>Rats,Packrat,Pappy</td>

</tr>
</tbody></table>
</p><p>The reason, that the above table qualifies the generality and powerfulness of PEG as very high
is due to the PEG operators <strong>&amp;</strong> (peek) and <strong>!</strong> (not). It is not difficult to implement these operations,
but heavy use of them can impair the parser performance and earlier generations of parser writers
carefully avoided such features because of the implied costs.

</p><p>When it comes to runtime performance, the differences between the above parser strategies are
not so clear. LALR(1) Parser can be very fast. The same is true for LL(1) parsers (predictive parsers).
When using LL(*) and PEG-Parsers, runtime performance depends on the amount of lookahead actually used
by the grammar. Special versions of PEG-Parsers (Packrat parsers) can guarantee linear runtime behaviour
(meaning that doubling the length of the input string just doubles the parsing time).


</p><p>An important difference between LR-Parsers and LL- or PEG-Parsers is the fact that LR-Parser are
always table driven. A manually written Parser is therefore in most cases either an LL-Parser or a PEG-Parser.
Table driven parsing puts parsing into a black box which only allows limited user interaction.
This is not a problem for a one time, clearly defined parsing task, but is not ideal if one
frequently corrects/improves and extends the grammar because changing the grammar means in 
case of a table driven parser a complete table and code regeneration.

<h3><a name="Translating LR grammars to PEG grammars">Translating LR grammars to PEG Grammars</a></h3>
<p>Most specifications for popular programming languages come with a grammar suited for an LR parser.
LL and PEG parsers can not directly use such grammars because of left recursive rules.
Left recursive rules are forbidden in LL and PEG parsers because they result in idefinite recursion.
Another problem with LR grammars is that they often use alternatives with the same beginning. 
This is legal in PEG but results in unwanted backtracking.
The following table shows the necessary grammar transformations when going from an LR grammar to
a PEG grammar.
<table class="ArticleTable">
<thead>
	<td>Transformation Category</td>
	<td>LR rule</td>
	<td>~PEG rule (result of transformation)</td>
</thead>
<tbody>
<tr>
<td>
Immediate Left Recursion =><br>
Factor out non recursive alternatives</td>
<td>
<pre>
// s1, s2 are terms which are not
//left recursive
// and not empty
A: A t1 | A t2 | s1 | s2;</pre>
</td>
<td><pre>A: (s1 | s2) (t1 | t2)*</pre></td>
</tr>
<tr>
<td>Indirect Left Recursion =><br>
Transfrom to Immediate Left Recursion=><br>
Factor out non recursive alternatives</td>
<td><pre>
A: B t1 | s1 ;
B: A t2 | s3 ; </pre>
</td>
<td><pre>
// we substitute B by its right hand side
A: (A t2 | s3) t1 | s1; 
//Eliminate immediate left recursion
A: (s3 t1 | s1) (t2 t1)* ...;
</pre></td>
</tr>
<tr>
<td>Alternatives with same beginning =><br>
Merge alternatives using Left Factorization</td>
<td><pre>
A: s1 t1 | s1 t2;
</pre></td>
<td><pre>
A: s1 (t1 | t2);
</pre>
</td>
</tr>
</tbody>
</table>
<br>
The following sample shows the transformation of part of the "C" grammar from the LR grammar as presented
in Kernighan and Ritchies book on "C" to a PEG grammar (the symbol S is used to denote scanning of white space).
<table class="ArticleTable">
<thead>
	<tr>
		<td>LR grammar: "C" snippet <em>declarator stuff</em>...</td>

		<td>PEG grammar: "C" <em>declarator stuff</em>...</td>
	</tr>
	</thead>
	<tbody><tr>
		<td><pre>declarator: pointer? direct_declarator;</pre></td>
<td><pre>declarator: pointer? direct_declarator;</pre></td>
		<td><pre></pre></td>
	</tr>
	<tr><td><pre>direct_declarator:
   identifier 
| '(' declarator ')' 
| direct_declarator '[' constant_expression? ']'
| direct_declarator '(' parameter_type_list ')'
| direct_declarator '(' identifier_list? ')';</pre></td>

		<td><pre>direct_declarator: 
 (identifier / '(' S declarator ')' S)
 (   '[' S constant_expression? ']' S
   / '(' S parameter_type_list ')' S
   / '(' S identifier_list? ')' S
  )*;</pre></td>
	</tr>
	<tr><td><pre>pointer: 
   '*' type_qualifier_list?
|  '*' type_qualifier_list? pointer;;</pre></td>
		<td><pre>pointer: 
   '*' S type_qualifier_list? pointer?</pre></td>
	</tr>
	<tr>
		<td><pre>parameter_type_list: 
   parameter_list 
|  parameter_list ',' '...';</pre></td>

<td><pre>parameter_type_list: 
   parameter_list (',' S '...')?;</pre></td>
		<td><pre></pre></td>
	</tr>
	<tr>
		<td><pre>type_qualifier_list: 
   type_qualifier
|  type_qualifier_list type_qualifier;</pre></td>
<td><pre>type_qualifier_list: 
   type_qualifier+;</pre></td>
		<td><pre></pre></td>
	</tr>

	<tr>
		<td><pre>parameter_declaration:
   declaration_specifiers declarator
|  declaration_specifiers abstract_declrator?;</pre></td>
<td><pre>parameter_declaration: 
   declaration_specifiers 
      (declarator / abstract_declarator?);</pre></td>
		<td><pre></pre></td>
	</tr>
	<tr>
		<td><pre>identifier_list:
   identifier
|  identifier_list ',' identifier;</pre></td>
<td><pre>identifier_list: 
   identifier S 
      (',' S identifier)*;</pre></td>

		<td><pre></pre></td>
	</tr>
</tbody></table>

</p><h2><a name="Future Developments">Future Developments</a></h2>
<p>The planned series of articles consists of the following parts
<table class="ArticleTable">
<thead>
<tr><td>Subject</td>
<td>Planned Release date</td>
<td>Description
</td></tr>
</thead>
<tbody><tr>
<td>Parser and Parser Generator</td>
<td>october 2008</td>
<td>C# classes to support work with Parsing expression grammars</td>
</tr>
<tr>
<td>PEG-Debugger/Interpreter</td>
<td>january 2009</td>
<td>Direct interpretation and debugging of PEG grammars without need to generate C# module</td>
</tr>
<tr>

<td>Sample Applications</td>
<td>june 2009</td>
<td>More grammar samples with postprocessing</td>
</tr>
</tbody></table>

</p><h2>History</h2>
<ul>
<li>2008 October:  initial version</li>
<li>2008 October:..minor update
<ul>
<li>improved semantic block support (using clause,IDispose Interface)</li>
<li>added new sample parser <code>Python 2.5.2</code>
</ul>
</ul>

<h2>References</h2>
<a name="References"></a>
<p>[1] <a href="http://www.brynosaurus.com/pub/lang/peg-slides/img0.html">Parsing Expression Grammars,Bryan Ford, MIT, January 2004</a>
[<a target="_blank" title="New Window" href="http://www.brynosaurus.com/pub/lang/peg-slides/img0.html">^</a>]<br>
[2] <a href="http://en.wikipedia.org/wiki/Parsing_expression_grammar">Parsing expression grammar, Wikipedia.en</a>
[<a target="_blank" title="New Window" href="http://en.wikipedia.org/wiki/Parsing_expression_grammar">^</a>]<br> 

[3] <a href="http://en.wikipedia.org/wiki/Context-free_grammar">Context-free grammar, Wikipedia.en</a>
[<a target="_blank" title="New Window" href="http://en.wikipedia.org/wiki/Context-free_grammar">^</a>]<br>
[4] <a href="http://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdf">ITU-T X.690: Specification of BER, CER and DER, INTERNATIONAL TELECOMMUNICATION UNION</a>
[<a target="_blank" title="New Window" href="http://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdf">^</a>]<br>
[5] <a href="http://www.ietf.org/rfc/rfc4627.txt">RFC 4627:The application/json Media Type for JavaScript Object Notation (JSON)</a>

[<a target="_blank" title="New Window" href="http://www.ietf.org/rfc/rfc4627.txt">^</a>]<br>
[6] <a href="http://www.json.org/">Introducing JSON</a>
[<a target="_blank" title="New Window" href="http://www.json.org/">^</a>]

</p></span>
    </div>
 

</body></html>

By viewing downloads associated with this article you agree to the Terms of Service and the article's licence.

If a file you wish to view isn't highlighted, and is a text file (not binary), please let us know and we'll add colourisation support for it.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Martin.Holzherr

Switzerland Switzerland
No Biography provided

| Advertise | Privacy | Mobile
Web02 | 2.8.140821.2 | Last Updated 7 Oct 2008
Article Copyright 2008 by Martin.Holzherr
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid