An LL(1) Syntax Directed Engine in C#






4.65/5 (10 votes)
This article is dealing with parsing and semantic analysis. A full LL(1) parsing engine is introduced as an example to show a possible implementation.
Introduction
This article deals with parsing and semantic analysis.
A full LL(1) parsing engine is introduced as an example to show a possible implementation.
The following samples do not describe the peculiarities of C# constructs; C# is used as an ordinary language except for its polymorphic behavior. Please check other articles for effective C# coding style.
The Basics
You may refer to a Compiler Design book for the theory: an excellent book is K. Louden [1997], see chapter 4.
If we consider Top Down Parsing, there are two equivalent methods to perform it.
We can use the familiar recursive descent parsing in which each Nonterminal is represented by a procedure and each Terminal is a match or an error. Formally, the inherited attributes are parameters to the procedures and the synthesized attributes are the function returned values.
Alternatively, we can use also LL(1) parsing in which recursion is simulated through a stack and the parser is table driven.
Here I present the two formally equivalent methods, given the same set of Productions. The first implementation just prints an inverted string of an arbitrary complex C declaration, the second builds a complete inverted type from a declaration with full semantic checking, using a stack to simulate recursion.
This method works for an LL(1) grammar, so you may need to review the constraints for this kind of grammars. The grammar must not contain left recursion, must be left factorized, constraints on First and Follow Sets are also enforced.
Recursive Descent Parsing
The first project is Recursive Descent. In a simple example of this first method, the output is just a string representation of an arbitrary C declaration.
The simple grammar is as follows:
Decl --> Pointer DirDcl ;
Pointer --> Star Pointer
Pointer --> Lambda
DirDcl --> Id DirDclTail
DirDcl --> ( Dcl ) DirDclTail
DirDclTail --> [ Num ] DirDclTail
DirDclTail --> ( ) DirDclTail
DirDclTail --> Lambda
Let's begin with the Lexer: the tokens are contained in an enum
and the returned lexeme is left in tokenValue
.
public enum TokenKind {INT, CHAR, LONG, Num, Id, LParen ,
RParen, LQParen, RQParen, Star, Comma, SemiColon, EOI , UnKnown };
public class Lexer
{
public string tokenvalue = "";
public TokenKind getToken(ref int index, string stream)
{
tokenvalue = "";
while (index < stream.Length && ( stream[index] == ' '
|| stream[index] == '\t' || stream[index] == '\n' || stream[index] == '\r'))
index++; // limit is stream length , skip white spaces
if (index == stream.Length || stream[index] == '$')
return TokenKind.EOI;
if ((stream[index] >= 'A' && stream[index] <= 'Z') || // an Id begins with a letter
(stream[index] >= 'a' && stream[index] <= 'z'))
{
tokenvalue += stream[index++];
while ((stream[index] >= 'A' && stream[index] <= 'Z') || // is followed by letters or numbers
(stream[index] >= 'a' && stream[index] <= 'z') ||
(stream[index] >= '0' && stream[index] <= '9') )
tokenvalue += stream[index++];
if (tokenvalue.ToUpper() == "INT") // add additional types if required
return TokenKind.INT;
if (tokenvalue.ToUpper() == "CHAR")
return TokenKind.CHAR;
if (tokenvalue.ToUpper() == "LONG")
return TokenKind.LONG;
return TokenKind.Id; // if no reserved keywords is an Id
}
if ((stream[index] >= '0' && stream[index] <= '9')) // a number begins with a number
{
tokenvalue += stream[index++];
while (stream[index] >= '0' && stream[index] <= '9') // collect the number until a different char
tokenvalue += stream[index++];
return TokenKind.Num;
}
switch (stream[index]) // single char token
{
case (';'):
index++;
return TokenKind.SemiColon;
case (','):
index++;
return TokenKind.Comma;
case ('*'):
index++;
return TokenKind.Star;
case ('('):
index++;
return TokenKind.LParen;
case ('['):
index++;
return TokenKind.LQParen;
case (')'):
index++;
return TokenKind.RParen;
case (']'):
index++;
return TokenKind.RQParen;
}
return TokenKind.UnKnown; // unknown Token
}
}
You may find useful to review the CodeProject article: How to interpret complex C/C++ declarations By Vikram A Punathambekar, 3 Jul 2004 . Look at the right to left rule.
The first parsing function is simple, just match the token and advance the input or signal an error and close the Form.
private void match(TokenKind token)
{
if (token == tokenKind)
tokenKind = lexx.getToken(ref index, stream);
else
{
MessageBox.Show(" unexpected token " + tokenKind);
this.Close();
}
}
To parse a declarator, parse Pointer
then a DirDcl
. This is accomplished by calling a procedure
for each nonterminal, from the right to left rule you can see that you must first go right then left
starting from the innermost Declarator.
private void _Decl()
{
int ns = 0;
_Pointer(ref ns);
_DirDcl(); // go right to generate output string
while (ns-- > 0) // this means go left at end
_out += " pointer to ";
}
To parse a Pointer, follows these rules:
a) if it is a star (*), simply count it
b) if it is a legal follow item, do nothing
c) return an error in all other cases
Please note the use of inherited attribute ns
.
private void _Pointer(ref int ns)
{
switch (tokenKind)
{
case (TokenKind.Star):
match(TokenKind.Star);
ns++;
_Pointer(ref ns);
break;
case (TokenKind.Id): // Follow items : do nothing
case (TokenKind.LParen):
break;
default:
MessageBox.Show(" syntax error");
this.Close();
break;
}
}
Match the first item or raise an error in order to parse a DirDcl
. To be noticed that ( Decl
) DirDclTail
implies an additional parsing of a
subexpression of type Decl
.
private void _DirDcl()
{
switch (tokenKind)
{
case (TokenKind.Id):
_out += " " + lexx.tokenvalue + " ";
match(TokenKind.Id);
_DirDclTail();
break;
case (TokenKind.LParen):
match(TokenKind.LParen);
_Decl();
match(TokenKind.RParen);
_DirDclTail();
break;
default:
MessageBox.Show(" syntax error");
this.Close();
return;
}
}
A DirDclTail
may be an array or a function. At the end follow items must be a Left Parenthesis or a semicolon.
private void _DirDclTail()
{
string num = null;
int nn = 0;
switch (tokenKind)
{
case (TokenKind.LQParen):
match(TokenKind.LQParen);
num = lexx.tokenvalue;
match(TokenKind.Num);
nn = Convert.ToInt16(num);
_out += " array [" + nn + "] of ";
match(TokenKind.RQParen);
_DirDclTail();
break;
case (TokenKind.LParen):
match(TokenKind.LParen);
match(TokenKind.RParen);
_out += " function returning ";
_DirDclTail();
break;
case (TokenKind.RParen):
case (TokenKind.SemiColon): // Follows items means do nothing
break;
default:
MessageBox.Show(" syntax error");
this.Close();
break;
}
}
Let's trace an execution path with this declaration: int *(*pi[5])[10];
Decl
_Pointer(ref ns)
ns++ parse *
_Pointer(ref ns)
_DirDcl()
match('(') parse (
_Decl // inner _Dcl Decl parses *pi[5]
_Pointer(ref ns)
ns++ parse *
Pointer(ref ns)
_DirDcl()
match(Id) output : pi
_DirDclTail
match('[') parse [
match(Num) parse 5
match(']') parse ] output : array[5] of
-> output : pointer to
match(')') parse )
_DirDclTail
match('[') parse [
match(Num) parse 10
match(']') parse ] output : array[10] of
--> output : pointer to
return to caller
output is : pi array[5] of pointer to array[10] of pointer to
The parsing is intuitive and the output is correct. The process is not automated and the flow of attributes can be complex. In section 5.1 of The C Programming Language, you can find a similar sample for complex declarations.
LL(1) Table Driven Parser
In a NonRecursive predictive parser, recursion is simulated through a semantic stack, parsing is guided by a predictive table. In order to create this table, you should review the concepts of First and Follow Sets: First Set represents the lookaheads that predict a particular production, the Follow Set is useful in Lambda production management. Prediction here means not recurring to BackTracking; in Algol like languages this is quite natural.
The theory behind this method is presented in Aho, Lam, Sethi and Ullman [1986] second ed., section 5.5.3 L-Attributed SDD's and LL Parsing with the use of a stack of objects. In addition, chapter 5 deals with inherited and synthesized attributes and how to manage them.
Here are the Productions used in the C complex declarations system.
Declaration
Declaration_Specifiers Declarator SemiColon
Declaration_Specifiers
Storage_Class_Specifier Sign_Specifier Type_Specifier
Storage_Class_Specifier
EXTERN STATIC AUTO REGISTER Lambda
Sign_Specifier
SIGNED UNSIGNED Lambda
Type_Specifier
VOID CHAR SHORT INT LONG
Declarator
Pointer Direct_Declarator
Direct_Declarator
Id Direct_Declarator_Tail
LParen Declarator RParen Direct_Declarator_Tail
Direct_Declarator_Tail
LSParen Integer_Constant RSParen Direct_Declarator_Tail
LParen Parameter_Type_List RParen Direct_Declarator_Tail
Lambda
Pointer
Star Type_Qualifier_List Pointer
Lambda
Parameter_Type_List
Parameter_List
Parameter_List
Parameter_Declaration Parameter_List_Tail
Parameter_List_Tail
Comma Parameter_Declaration Parameter_List_Tail
Lambda
Parameter_Declaration
Declaration_Specifiers Parameter_Declaration_Tail
Parameter_Declaration_Tail
Declarator
Here is the parser in LLParser.cs that represents the parsing state machine: it predicts productions, advance input or generate an error, it schedules actions at appropriate times. The functionality here is similar to that of the recursive version: when we predict a production, we push terminals and nonterminals in reverse order. Expanding a production is analogous to a function call: matching terminals advance input or signal an error until end of input.
Please note that synthesized attributes are pushed first in the stack and executed last. Tehy represent a final action on data computed so far.
Prediction in LL(1) parsing means to choose a production given a [NonTerminal,Terminal] pair. Execution begins with the start symbol
Declaration on top of SemanticStack
, Declaration_Specifiers
is the first NonTerminal of production number 1 that can be predicted by an
optional Storage_Class_Specifier
, an optional Sign_Specifier
and a mandatory Type_Specifier
.
The Type_Specifier
must be one of VOID
, CHAR
, SHORT
, INT
, LONG
otherwise an syntax error is raised. This means, for instance, that entry
[Type_Specifier,CHAR]
is a valid entry in the predictive table while blank entries represents error entries.
The Parse algorithm is not difficult to be understood: it checks the semantic stack looking for anything that is (derives from) a SynthesizedAttribute
and it executes the associated action popping the stack, if present. If it is another kind of object, it proceeds checking if it is a kind of
InheritedAttribute
and if it is so, it continues predicting, with a [NonTermnal,Terminal] pair, the next production (this means expanding a production), executing any function found, popping the semantic stack and pushing the contents of the production in reverse order (this is like a
function call in the recursive version).
If the symbol on top of the semantic stack is not anything like an Inherited attribute, then it must be a Terminal or EOI. If it's a terminal or EOI it compares it with the token from input: if they match, it executes a semantic action if present and get the next token; if they don't match, it signals an error and ends.
public bool Parse()
{
bool fret = false;
Production pp = null;
object sym = Stack.Top();
SynthesizedAttribute syn = sym as SynthesizedAttribute; // look for a synthesized attribute or derived on top
if (syn != null)
{
if (syn.FireAction != null)
{
fret = syn.FireAction(Stack, pp); // execute if any
if (fret == false)
return fret;
}
Stack.Pop();
return true;
}
SemanticAction sma = sym as SemanticAction;
if (sma != null)
{
Stack.Pop();
sym = Stack.Top();
}
InheritedAttribute inh = sym as InheritedAttribute; // look for an inherited attribute or derived
if (inh != null)
{
if (token != TokenLookup.TokenKind.UnKnown) // consult parsing table indexed by Nonterminal,Terminal to predict a production
pp = (Production)M[(int)inh.nTerm.Category, (int)token];
if (pp == null) // an error entry
{
MessageBox.Show("syntax error ");
return false;
}
if (sma != null) // execute action associated
{
fret = sma.FireAction(Stack, pp, lexx.TokenValue); // copy inherited top down into next production fields
if (fret == false) // pass down any parameter
return fret;
}
Stack.Pop();
for (int j = pp.rhs.Count - 1; j >= 0; j--) // here push production symbols in reversed order
{ // corresponds to a function call or production expansion
Terminal tt = pp.rhs[j] as Terminal;
if (tt != null && tt.Token == TokenLookup.TokenKind.Lambda) // eat the symbol if Lambda (do nothing)
continue;
if (pp.synth != null && pp.synth[j] != null) // push first synthesized attribute
{
if (pp.synth[j] is SDeclTypeAttribute)
{
SDeclTypeAttribute sa = (SDeclTypeAttribute)pp.synth[j] as SDeclTypeAttribute;
Stack.Push(new SDeclTypeAttribute(sa)); // copy constructor here means creating a new instance with parms passed by the semantic
} // action
pp.synth[j].Clear();
}
if (tt != null) // push terminals if defined
Stack.Push(pp.rhs[j]);
if (pp.inh != null && pp.inh[j] != null) // push inherited or derived
{
if (pp.inh[j] is IDeclTypeAttribute)
{
IDeclTypeAttribute ie = pp.inh[j] as IDeclTypeAttribute;
Stack.Push(new IDeclTypeAttribute(ie)); // note copy constructor for passing parameters
}
else
Stack.Push(new InheritedAttribute(pp.inh[j]));
pp.inh[j].Clear();
}
if (pp.action != null && pp.action[j] != null) // push any semantic action
Stack.Push(pp.action[j]);
}
}
else // is a Terminal or EOI (this is analogous to the match function in the recursive version)
{
Terminal tt = sym as Terminal;
if (token == TokenLookup.TokenKind.EOI) // at end ok
{
MessageBox.Show("stream successfully parsed ");
return false;
}
else
if (token == tt.Token) // match advance input and execute associated action if any
{
if (sma != null)
{
fret = sma.FireAction(Stack, null, lexx.TokenValue);
if (fret == false)
return fret;
}
Stack.Pop();
token = lexx.getToken();
return true;
}
else // no match means error
{
MessageBox.Show("Unexpected token : " + token + ", expected : " + tt.Token + ") ;
return false;
}
}
return true;
}
Let's analyze the structure of the semantic Engine: we have synthesized and inherited attributes represented by two parent classes and action routines. From these, you can declare classes that inherits from these basics classes and are the containers of data to be transferred by the program logic. These classes are in Semantic.cs.
public class SemanticStack
{
public object[] ss = null;
public int sp;
public SemanticStack(int nn)
{
ss = new object[nn];
sp = 0;
}
public object Top()
{
object sym = ss[sp - 1];
return sym;
}
public void Push(object sym)
{
ss[sp++] = sym;
}
public object Pop()
{
return ss[--sp];
}
}
public class SemanticAction
{
public delegate bool Action(SemanticStack Stack, Production pp, string token);
public Action FireAction;
public SemanticAction()
{
}
public SemanticAction(Action action)
{
FireAction = action;
}
public override string ToString()
{
return "a(" + FireAction.Method.Name.ToString() + ")";
}
}
public abstract class SynthesizedAttribute
{
public NonTerminal nTerm;
public delegate bool Action(SemanticStack Stack, Production pp);
public Action FireAction;
public virtual void Clear()
{
}
}
public class SDeclTypeAttribute : SynthesizedAttribute
{
public ArrayList Tuples = new ArrayList();
public SignSpecifier signed = SignSpecifier.SIGNED;
public StorageClass sclass = StorageClass.UnKnown;
public Type basety = null;
public Type tmpty = null; // temporary type
public Symbol sym = null;
public List<symbol> lsym = new List<symbol>();
public Stack<type> stype = new Stack<type>();
public int np = 0;
public string id = null;
public int size; // array types
...
public class InheritedAttribute
{
public string ErrMsg = null;
public NonTerminal nTerm;
public InheritedAttribute() { }
public InheritedAttribute(InheritedAttribute inh)
{
nTerm = inh.nTerm;
ErrMsg = inh.ErrMsg;
}
public InheritedAttribute(NonTerminal nt)
{
nTerm = nt;
}
public InheritedAttribute(NonTerminal nt, string _errmsg)
{
nTerm = nt;
ErrMsg = _errmsg;
}
public virtual void Clear()
{
}
}
public class IDeclTypeAttribute : InheritedAttribute
{
public ArrayList Tuples = new ArrayList();
public SignSpecifier signed = SignSpecifier.SIGNED;
public StorageClass sclass = StorageClass.UnKnown;
public Type basety = null; // type of variable
public Type tmpty = null; // temporary type
public Symbol sym=null;
public List<symbol> lsym = new List<symbol>(); // function attribute
public Stack<type> stype = new Stack<type>(); // pushed types
public int np = 0; // pointers number
public string id = null; // variable name
public int size; // array types
...
As you can see, we've a simple stack of objects that is pushed to and popped from continuosly by the parser. The SemanticStack
contains action routines from the class SemanticAction
, Inherited classes and derived, custom classes derived from SynthesizedAttribute
.
These classes are the basic containers of data and the data flow in the application logic is through these classes.
The action routines act on the custom data structures defined by the user based on Inherited and Synthesized custom classes.
We have a parser, we have the objects that represents the layout of the stack, and we have semantic actions that act on these objects at known and fixed positions inside the stack.
The layout on the stack of a given Production must be specified in advance all the time, that is the information that the parser must push and pop from the SemanticStack
.
The layout is specified with parallel lists, it's up to the programmer to fill these lists correctly, error checking is left to him.
BuildBaseType
is a SemanticAction
executed on matching the terminal INT
or CHAR
(for instance). We know that the parser on a match
executes a SemanticAction
if present: here BuildBaseType
is called with the token representing the basic type, it accesses the stack object
referring to the SDeclTypeAttribute
of Type_Specifier
at fixed displacement on the SemanticStack
and loads the basety field accordingly.
...
lhs = lg.FindNTermByName("Type_Specifier");
rhs = new List<syntaxsymbol>();
rhs.Add(lg.FindTermByName("CHAR"));
actions = new List<semanticaction>();
actions.Add(new SemanticAction(BuildBaseType)); // the action is linked to Terminal Char (executes on match)
lg.Prod.Add(new Production(lhs, rhs, actions, null, null)); // linking is obtained referring to same list position
lhs = lg.FindNTermByName("Type_Specifier");
rhs = new List<syntaxsymbol>();
rhs.Add(lg.FindTermByName("INT")); // same as above
actions = new List<semanticaction>();
actions.Add(new SemanticAction(BuildBaseType));
lg.Prod.Add(new Production(lhs, rhs, actions, null, null));
...
static bool BuildBaseType(SyntaxStack ss, Production pp, string token)
{
SDeclTypeAttribute sp = ss.ss[ss.sp - 2] as SDeclTypeAttribute; // Type_specifier
string ty = token.ToUpper();
switch (ty)
{
case ("CHAR"):
sp.basety = BuildType(TypeOperator.CHAR, null, 1,null);
break;
case ("INT"):
sp.basety = BuildType(TypeOperator.INT, null, 2,null);
break;
case ("LONG"):
sp.basety = BuildType(TypeOperator.LONG, null, 4,null);
break;
case ("VOID"):
sp.basety = BuildType(TypeOperator.VOID, null, 0, null);
break;
}
return true;
}
Here is ActionLoadBaseType
that gets scheduled when is on top of stack with a SDeclTypeAttribute
object of Type_Specifier
as we can see in Parse
. It loads the IDeclTypeAttribute
of Declarator
loading the basety
field.
The function is declared in advance in the layout of the Production as an Action associated with the SDeclTypeAttribute
of Type_Specifier
and gets scheduled at end of Production Type_Specifier --> INT
| CHAR
| LONG
|VOID
.
...
/*
declaration-specifiers:
storage-class-specifier declaration-specifiersopt
type-specifier declaration-specifiersopt
*/
lhs = lg.FindNTermByName("Declaration_Specifiers");
rhs = new List<syntaxsymbol>();
rhs.Add(lg.FindNTermByName("Storage_Class_Specifier"));
rhs.Add(lg.FindNTermByName("Sign_Specifier"));
rhs.Add(lg.FindNTermByName("Type_Specifier"));
inh = new List<inheritedattribute>();
inh.Add(new InheritedAttribute(lg.FindNTermByName("Storage_Class_Specifier"), null));
inh.Add(new InheritedAttribute(lg.FindNTermByName("Sign_Specifier"), null));
inh.Add(new InheritedAttribute(lg.FindNTermByName("Type_Specifier"), null));
synth = new List<synthesizedattribute>(); // these synthesized attributes executes at end of their own productions
synth.Add(new SDeclTypeAttribute(lg.FindNTermByName("Storage_Class_Specifier"), ActionLoadStorageClassModifier));
synth.Add(new SDeclTypeAttribute(lg.FindNTermByName("Sign_Specifier"), ActionLoadSignSpecifier));
synth.Add(new SDeclTypeAttribute(lg.FindNTermByName("Type_Specifier"), ActionLoadBaseType));
lg.Prod.Add(new Production( lhs, rhs, null, synth, inh));
...
static bool ActionLoadBaseType(SyntaxStack ss, Production pp)
{
IDeclTypeAttribute ip = ss.ss[ss.sp - 3] as IDeclTypeAttribute; // Declarator
SDeclTypeAttribute sp = ss.ss[ss.sp - 1] as SDeclTypeAttribute; // Type_specifier
switch (sp.basety.op)
{
case (TypeOperator.VOID):
case (TypeOperator.CHAR):
case (TypeOperator.INT):
case (TypeOperator.LONG):
ip.basety = sp.basety;
ip.basety.signed = ip.signed;
break;
}
return true;
}
And here is ActionExpandDeclarator
that executes when it is on Top of Stack: it proceeds to expand
the production Declarator-> Pointer Direct_Declarator
. It loads the IDeclTypeAttribute
of Pointer with the basic type created, accessing the
IDeclTypeAttribute
of Declarator
in the stack.
As you see, the production is accessed through the inh
field [0], referring to the IDecltypeAttribute
of Pointer, containing the right info.
This is done immediately after accessing the predictive table: the field contents are moved to the production, and next in the loop the
production contents are pushed in reverse order. Parse uses a copy constructor for each instance of the synthesized and inherited attributes.
We will have then a copy of this data on the semantic stack at known position.
...
/*
declaration:
function-definition
declaration
*/
lhs = lg.FindNTermByName("Declaration");
rhs = new List<syntaxsymbol>();
rhs.Add(lg.FindNTermByName("Declaration_Specifiers"));
rhs.Add(lg.FindNTermByName("Declarator"));
rhs.Add(lg.FindTermByName("SemiColon"));
inh = new List<inheritedattribute>();
inh.Add(new InheritedAttribute(lg.FindNTermByName("Declaration_Specifiers"), null));
inh.Add(new IDeclTypeAttribute(lg.FindNTermByName("Declarator"), null));
inh.Add(null);
actions = new List<semanticaction>();
actions.Add(null);
actions.Add(new SemanticAction(ActionOnExpandDeclarator)); // action is linked with NonTerminal Declarator at same list position
actions.Add(null);
synth = new List<synthesizedattribute>();
synth.Add(null);
synth.Add(new SDeclTypeAttribute(lg.FindNTermByName("Declarator"), ActionProcessSymbolTableDeclarator)); // executes at end of top
synth.Add(null); // Declarator production
lg.Prod.Add(new Production( lhs, rhs, actions, synth, inh));
...
static bool ActionOnExpandDeclarator(SyntaxStack ss, Production pp, string token)
{
IDeclTypeAttribute sp = ss.ss[ss.sp - 1] as IDeclTypeAttribute; //Declarator
IDeclTypeAttribute dp = null;
dp = pp.inh[0] as IDeclTypeAttribute; // Pointer
dp.basety = sp.basety;
dp.sclass = sp.sclass;
return true;
}
In this semantic engine you must declare the NonTerminals, the Terminals and the Productions in your language.
LL01.cs responsabily is to fill correctly the predictive table, once the table is created. Parsing is quite natural, you can forget about the predictive table, just add valid productions compatible with LL(1) grammar format.
To begin, we declare the grammar: actions synth and inh represents collections of data safely saved inside the Language
class.
A single action is a SemanticAction
, a synth
is a class derived from SynthesizedAttribute
and an inh
is an InheritedAttribute
or a derived class.
Giving the parser, the objects and the actions, we have all we need.
Inside DeclsGrammar.cs, we begin by declaring the basic blocks: NonTerminals
, Terminals
and Productions
to compute the First and Follow Sets
and the predictive table.
We specify then the layout of any given production. This allows the parser to work correctly.
// before executing here you must fill the TokenKind enum containing the Terminals returned by the Lexer and the Category
// enum containing the NonTerminals, that is all the left hand sides of productions added to the collection Prod.
TokenLookup.TokenKind[] tokens = (TokenLookup.TokenKind[])Enum.GetValues(typeof(TokenLookup.TokenKind));
string[] names = TokenLookup.TokenNames;
Terminal tt = null;
foreach (TokenLookup.TokenKind tok in tokens)
{
tt = new Terminal(names[idx], tok);
lg.Term.Add(tt);
if (lg.HashTerm[names[idx]] != null)
MessageBox.Show(" Terminal " + names[idx] + " already inserted");
else
lg.HashTerm.Add(names[idx], tt);
idx++;
}
Category[] categories = (Category[])Enum.GetValues(typeof(Category));
idx = 0;
NonTerminal nt = null;
foreach (Category cat in categories)
{
nt = new NonTerminal(Enum.Format(typeof(Category), cat, "G"), cat);
lg.NTerm.Add(nt);
if (lg.HashNTerm[cat] != null)
MessageBox.Show(" Nonterminal " + cat + " already inserted");
else
lg.HashNTerm.Add(cat.ToString(), nt);
}
// Productions
List<semanticaction> actions = new List<semanticaction>();
List<inheritedattribute> inh = new List<inheritedattribute>();
List<synthesizedattribute> synth = new List<synthesizedattribute>();
NonTerminal lhs = null;
List<syntaxsymbol> rhs = null;
lhs = lg.FindNTermByName("Declaration"); // here is the left hand side of this production: must be a declared NonTerminal
rhs = new List<syntaxsymbol>();
rhs.Add(lg.FindNTermByName("Declaration_Specifiers")); // this is right end side of the production both Terminals and NonTerminals
rhs.Add(lg.FindNTermByName("Declarator"));
rhs.Add(lg.FindTermByName("SemiColon"));
inh = new List<inheritedattribute>();
inh.Add(new InheritedAttribute(lg.FindNTermByName("Declaration_Specifiers"), null));
inh.Add(new IDeclTypeAttribute(lg.FindNTermByName("Declarator"), null)); // an IDeclTypeAttribute derives from InheriitedAttribute
inh.Add(null);
actions = new List<semanticaction>();
actions.Add(null);
actions.Add(new SemanticAction(ActionOnExpandDeclarator)); // this is associated to a Declarator,this way you can pass parameters top down
actions.Add(null);
synth = new List<synthesizedattribute>();
synth.Add(null);
synth.Add(new SDeclTypeAttribute(lg.FindNTermByName("Declarator"), ActionProcessSymbolTableDeclarator)); // this gets executed after
synth.Add(null); // parsing a full Declarator
lg.Prod.Add(new Production( lhs, rhs, actions, synth, inh));
...
lg.ComputeFirstSet();
lg.ComputeFollowSet();
lg.FillTable(); // fills the predictive table after the objects are declared to the Language class
Here we've added:
*) the Terminals contained in enum TokenKind
Tokens.c returned by the Lexer in a specialized collection Term
*) the NonTerminals contained in the Category enum
(in the same file) in the collection Nterm
These collections are used to compute First and Follow Sets; in addition the first production is added to the Prod
collection, used to fill the parsing table.
A production consists of action routines and the custom data structures associated.
We have basically three kinds of action routines:
a) those associated withNonTerminals
: they represent production expansion and are executed after a valid prediction b) those associated with
Terminals
: they means access to input tokens and are executed after a successful match with the input token if an action is required c) those relative to a
SynthesizedAttribute
and are specified in the right constructor.
Now we present the Language grammar with the introduction of the Declarator
subsystem. As you can imagine, it is a function with
input parameters and a return value (DeclGrammar.cs).
The Declarator
subsystem is exactly similar to the recursive version presented early.
The two functions given below executes on expansion of NonTerminals
Pointer and Direct_Declarator
. This means that in ActionOnExpandPointer
,
a Pointer is predicted by [Pointer,*]
and is also nullable: in other words, it can be canceled on the Follow Items. A Direct_Declarator
can be predicted
by an Id
and a (
.
The two actions are linked in the parallel list actions, containing the SemanticActions
, in the same position of IDeclTypeAttribute
Pointer and
Direct_Declarator
.
As we've seen, we copy the parameters in the production to be expanded, for example on Pointer -> Star Pointer, we know that the next production
has the IDeclTypeAttribute
Pointer at index 1
.
The field np
is the Pointer
count, on predicting [Pointer,*]
it increments the counter, when there are no more stars it copies the total left
to right (Inherited attribute flow) to Direct_Declarator
.
Note that while copying the data on predicting [Direct_Declarator,(]
to Direct_Declarator_Tail
it performs the right to left evaluation.
...
lhs = lg.FindNTermByName("Declarator");
rhs = new List<syntaxsymbol>();
rhs.Add(lg.FindNTermByName("Pointer"));
rhs.Add(lg.FindNTermByName("Direct_Declarator"));
inh = new List<inheritedattribute>();
inh.Add(new IDeclTypeAttribute(lg.FindNTermByName("Pointer"), null));
inh.Add(new IDeclTypeAttribute(lg.FindNTermByName("Direct_Declarator"), null));
actions = new List<semanticaction>();
actions.Add(new SemanticAction(ActionOnExpandPointer)); // this is linked with NonTerminal Pointer at same list position
actions.Add(new SemanticAction(ActionOnExpandDirectDecl)); // same as above with NonTerminal Direct_Declarator
synth = new List<synthesizedattribute>();
synth.Add(new SDeclTypeAttribute(lg.FindNTermByName("Pointer"), null));
synth.Add(null);
lg.Prod.Add(new Production( lhs, rhs, actions, synth, inh));
...
// here is simple parameter passing and count * tokens
static bool ActionOnExpandPointer(SemanticStack ss, Production pp, string token)
{
IDeclTypeAttribute sp = ss.ss[ss.sp - 1] as IDeclTypeAttribute; // Pointer
IDeclTypeAttribute ip = null;
switch (pp.prodno)
{
case (22): // Pointer -> Star Pointer
ip = pp.inh[1] as IDeclTypeAttribute;
ip.basety = sp.basety;
ip.sclass = sp.sclass;
ip.np = sp.np;
ip.np++; // counts * tokens
break;
case (23): // Pointer -> Lambda (copy rule)
ip = ss.ss[ss.sp - 4] as IDeclTypeAttribute; // Direct_Declarator copy data in left to right order
ip.basety = sp.basety;
ip.sclass = sp.sclass;
ip.np = sp.np;
break;
}
return true;
}
static bool ActionOnExpandDirectDecl(SemanticStack ss, Production pp, string token)
{
IDeclTypeAttribute sp = ss.ss[ss.sp - 1] as IDeclTypeAttribute; // Direct_Declarator
IDeclTypeAttribute ip = null;
switch (pp.prodno)
{
case (17): // Direct_Declarator -> Id Direct_Declarator_Tail
ip = pp.inh[1] as IDeclTypeAttribute;
ip.basety = sp.basety;
ip.sclass = sp.sclass;
ip.np = sp.np;
break;
case (18): // Direct_Declarator -> LParen Declarator RParen Direct_Declator_Tail
ip = pp.inh[3] as IDeclTypeAttribute; // Declarator acts as a function, ActionOnProcessTempDeclarator runs at a later time
ip.basety = sp.basety; // the output is a tmpty that will be concatenated to the current type (inner to outer)
ip.sclass = sp.sclass;
ip.np = sp.np; // note here outer copy of basics parms, recursion is simulated in this way
break;
}
return true;
}
...
The next two functions are ActionOnLoadIdToDirectDeclTail
and ActionOnProcessTempDeclarator
. ActionOnLoadIdToDirectDeclTail
, associated with Terminal Id
, eats an Id
and passes it to the IDeclTypeAttribute
of Direct_Declarator_Tail
in the stack with a fixed displacement. ActionOnProcessTempDeclarator
gets executed at the end of Declarator
production on synthesized attribute SDeclTypeAttribute
on top of stack: the function builds an inner type associated with Declarator
and passes the temporary concatenated type inner to outer to Direct_Declarator_Tail
.
The linkage is obtained inserting in the synth list the SDeclTypeAttribute
on a Declarator
with the function ActionOnProcessTempDeclarator
.
/* direct-declarator:
identifier
(declarator)
direct-declarator [ constant-expressionopt ]
direct-declarator ( parameter-type-list )
direct-declarator ( identifier-listopt )
*/
llhs = lg.FindNTermByName("Direct_Declarator");
rhs = new List<syntaxsymbol>();
rhs.Add(lg.FindTermByName("Id"));
rhs.Add(lg.FindNTermByName("Direct_Declarator_Tail"));
inh = new List<inheritedattribute>();
inh.Add(null);
inh.Add(new IDeclTypeAttribute(lg.FindNTermByName("Direct_Declarator_Tail"), null));
actions = new List<semanticaction>();
actions.Add(new SemanticAction(ActionOnLoadIdToDirectDeclTail)); // connected to Terminal Id (loads Id lexeme)
actions.Add(new SemanticAction(ActionOnExpandDirectDeclTail));
lg.Prod.Add(new Production( lhs, rhs, actions, null, inh));
lhs = lg.FindNTermByName("Direct_Declarator");
rhs = new List<syntaxsymbol>();
rhs.Add(lg.FindTermByName("LParen"));
rhs.Add(lg.FindNTermByName("Declarator"));
rhs.Add(lg.FindTermByName("RParen"));
rhs.Add(lg.FindNTermByName("Direct_Declarator_Tail"));
inh = new List<inheritedattribute>();
inh.Add(null);
inh.Add(new IDeclTypeAttribute(lg.FindNTermByName("Declarator"),null));
inh.Add(null);
inh.Add(new IDeclTypeAttribute(lg.FindNTermByName("Direct_Declarator_Tail"), null));
synth = new List<synthesizedattribute>();
synth.Add(null);
synth.Add(new SDeclTypeAttribute(lg.FindNTermByName("Declarator"),ActionOnProcessTempDeclarator)); // executes at end of inner
synth.Add(null); // Declarator Production
synth.Add(null);
actions = new List<semanticaction>();
actions.Add(null);
actions.Add(null);
actions.Add(null);
actions.Add(new SemanticAction(ActionOnExpandDirectDeclTail));
lg.Prod.Add(new Production(lhs, rhs, actions, synth, inh));
...
// this function gets executed later inside Direct_Declarator -> Id Direct_Declarator_Tail during parsing
// it loads the identifier in Direct_declarator_Tail, if you look at Parse a match with input has occurred and the token passed is the Identifier
// this is a particular usage of an action routine, just access the input token.
static bool ActionOnLoadIdToDirectDeclTail(SemanticStack ss, Production pp, string token)
{
IDeclTypeAttribute sp = ss.ss[ss.sp - 3] as IDeclTypeAttribute; // Direct_declarator_Tail
sp.id = token.ToUpper();
return true;
}
// here is the function associated to the synthesized attribute of type SDeclTypeAttribute (Declarator)
// will be executed at end of production Declarator processing,
// inside production 18: Direct_Declarator -> LParen Declarator RParen Direct_Declator_Tail
// the purpose is to create a temporary type (tmpty) that will contribute to produce the resulting inverted type
// input is an inner tmpty if present (np : star counter) and a type stack : stype.
// output is a linked tmpty with the newly created type
static bool ActionOnProcessTempDeclarator(SemanticStack ss, Production pp)
{
SDeclTypeAttribute sp = ss.ss[ss.sp - 1] as SDeclTypeAttribute; // Declarator
IDeclTypeAttribute ip = ss.ss[ss.sp - 4] as IDeclTypeAttribute; // Direct_Declarator_Tail
Type ty = null, tty = null;
for (int _j = 0; _j < sp.np; _j++) // a new type is created from np and the stacked types
tty = BuildType(TypeOperator.POINTER, tty, 4,null);
for (int _j = 0; _j < sp.stype.Count; _j++)
{
ty = sp.stype.Pop();
tty = BuildType(ty.op, tty, ty.size,ty.lsym);
}
ty = sp.tmpty;
while (ty != null && ty.type != null)
ty = ty.type;
if (sp.tmpty == null) // the type here is linked to the end of tmpty , this result into a temporary inverted type, inner to outer
sp.tmpty = tty;
else
ty.type = tty;
ip.tmpty = sp.tmpty; // new concatenated type is copied to Direct_Declarator_Tail
ip.id = sp.id;
return true;
}
...
The next function gets executed on expansion of a Direct_Declarator_Tail
. The prediction occurs on symbol [, ( and the item is nullable.
ActionOnLoadNumToDirectDeclTail
eats the Integer_Constant
returned by the Lexer inside parsing production
Direct_Declarator_Tail
-> [ Integer_Constant ] Direct_Declarator_Tail
.
lhs = lg.FindNTermByName("Direct_Declarator_Tail");
rhs = new List<syntaxsymbol>();
rhs.Add(lg.FindTermByName("LSParen"));
rhs.Add(lg.FindTermByName("Integer_Constant"));
rhs.Add(lg.FindTermByName("RSParen"));
rhs.Add(lg.FindNTermByName("Direct_Declarator_Tail"));
inh = new List<inheritedattribute>();
inh.Add(null);
inh.Add(null);
inh.Add(null);
inh.Add(new IDeclTypeAttribute(lg.FindNTermByName("Direct_Declarator_Tail"), null));
actions=new List<semanticaction>();
actions.Add(null);
actions.Add(new SemanticAction(ActionOnLoadNumToDirectDeclTail)); // connected to Termional Integer_Constant (loads size)
actions.Add(null);
actions.Add(new SemanticAction(ActionOnExpandDirectDeclTail));
lg.Prod.Add(new Production( lhs, rhs, actions, null, inh));
lhs = lg.FindNTermByName("Direct_Declarator_Tail");
rhs = new List<syntaxsymbol>();
rhs.Add(lg.FindTermByName("LParen"));
rhs.Add(lg.FindNTermByName("Parameter_Type_List"));
rhs.Add(lg.FindTermByName("RParen"));
rhs.Add(lg.FindNTermByName("Direct_Declarator_Tail"));
inh = new List<inheritedattribute>();
inh.Add(null);
inh.Add(new IDeclTypeAttribute(lg.FindNTermByName("Parameter_Type_List"), null));
inh.Add(null);
inh.Add(new IDeclTypeAttribute(lg.FindNTermByName("Direct_Declarator_Tail"), null));
actions = new List<semanticaction>();
actions.Add(null);
actions.Add(new SemanticAction(ActionOnExpandParameterTypeList));
actions.Add(null);
actions.Add(new SemanticAction(ActionOnExpandDirectDeclTail));
lg.Prod.Add(new Production( lhs, rhs, actions, null, inh));
lhs = lg.FindNTermByName("Direct_Declarator_Tail");
rhs = new List<syntaxsymbol>();
rhs.Add(lg.FindTermByName("Lambda"));
lg.Prod.Add(new Production( lhs, rhs, null, null, null));
...
static bool ActionOnExpandDirectDeclTail(SemanticStack ss, Production pp, string token)
{
IDeclTypeAttribute sp = ss.ss[ss.sp - 1] as IDeclTypeAttribute; // Direct_Declarator_Tail
IDeclTypeAttribute ip = null;
switch (pp.prodno) // this happens at production expansion
{
case (19): // Direct_Declarator_Tail -> LQParen Num RQParen Direct_Declator_Tail
ip = pp.inh[3] as IDeclTypeAttribute;
ip.basety = sp.basety;
ip.tmpty = sp.tmpty;
ip.sclass = sp.sclass;
ip.np = sp.np;
ip.id = sp.id;
__Copy_Stack(ip.stype, sp.stype);
break;
case (20): // Direct_Declarator_Tail -> LParen Parameter_Type_list RParen Direct_Declator_Tail
ip = pp.inh[3] as IDeclTypeAttribute;
ip.basety = sp.basety;
ip.tmpty = sp.tmpty;
ip.sclass = sp.sclass;
ip.np = sp.np;
ip.id = sp.id;
__Copy_Stack(ip.stype, sp.stype);
break;
case (21): // Direct_Declarator_Tail -> Lambda
SDeclTypeAttribute dp = ss.ss[ss.sp - 2] as SDeclTypeAttribute; // Declarator
dp.basety = sp.basety;
dp.tmpty = sp.tmpty;
dp.sclass = sp.sclass;
dp.np = sp.np;
dp.id = sp.id;
__Copy_Stack(dp.stype, sp.stype);
break;
}
return true;
}
// this function is executed later inside Direct_Declarator_Tail -> LQParen Num RQParen Direct_Declator_Tail production
// it stacks a ARRAY type with the given size.
// stacked types are used to build a temporary type
static bool ActionOnLoadNumToDirectDeclTail(SemanticStack ss, Production pp, string token)
{
IDeclTypeAttribute sp = ss.ss[ss.sp - 4] as IDeclTypeAttribute; // Direct_declarator_Tail
sp.stype.Push(BuildType(TypeOperator.ARRAY,null, Convert.ToInt16(token),null));
return true;
}
....
As you can see, there is a chain of static
functions that will be executed by the parser at appropriate time, expansion of productions that
will be executed at production prediction while consulting the predictive table and synthesized actions present in synthesized attributes that represent final actions, usually at end of production, to compute values or to perform a particular action (e.g. insert a fully parsed symbol
into a symbol table).
/* pointer:
* type-qualifier-listopt
* type-qualifier-listopt pointer
*/
lhs = lg.FindNTermByName("Pointer");
rhs = new List<syntaxsymbol>();
rhs.Add(lg.FindTermByName("Star"));
rhs.Add(lg.FindNTermByName("Pointer"));
inh = new List<inheritedattribute>();
inh.Add(null);
inh.Add(new IDeclTypeAttribute(lg.FindNTermByName("Pointer"), null));
actions = new List<semanticaction>();
actions.Add(null);
actions.Add(new SemanticAction(ActionOnExpandPointer)); // each time a Pointer is expanded
lg.Prod.Add(new Production( lhs, rhs, actions, null, inh));
lhs = lg.FindNTermByName("Pointer");
rhs = new List<syntaxsymbol>();
rhs.Add(lg.FindTermByName("Lambda"));
lg.Prod.Add(new Production( lhs, rhs, null, null, null));
Here is the last function that gets executed at end of Declarator
production processing. It computes a complete string
representation obtained
from partial temporary types built inner to outer.
The action here could be more significative: indeed we have a full inverted type and all associated data,
even a parameter list, if present, could have been processed, the usual action being to add the symbol to the appropriate symbol table
static bool ActionProcessSymbolTableDeclarator(SemanticStack ss, Production pp)
{
SDeclTypeAttribute sp = ss.ss[ss.sp - 1] as SDeclTypeAttribute; // Declarator
Type ty;
int nitems = 0;
for (int _j = 0; _j < sp.np; _j++) // np means how many * are present
sp.basety = BuildType(TypeOperator.POINTER, sp.basety, 4, null);
nitems = sp.stype.Count;
for (int _j = 0; _j < nitems; _j++) // here are the stacked types to form an inverted resulting type
{
ty = sp.stype.Pop();
sp.basety = BuildType(ty.op, sp.basety, ty.size, ty.lsym);
}
ty = sp.tmpty; // tmpty is input from inner Declarators
while (ty != null && ty.type != null)
ty = ty.type;
if (ty != null) // if an input tmpty is present, link the created type at end of it
{
ty.type = sp.basety;
sp.basety = sp.tmpty;
}
Symbol sym = new Symbol();
sym.name = sp.id;
sym.sclass = sp.sclass;
sym.type = sp.basety;
if (CheckInvalidType(sp.basety)) // helper function to detect semantic errors e.g. a function cannot return an array or a function
return false;
EAGrammar._type = _FormatType(sym); // formatted inverted type
return true;
}
Let's look at the parse tree for input int *(*pi[5])[10];
and trace execution. I have termed here Declarator
(Decl
), DirDecl
(Direct_Declarator
) and
DirDclTail
(Direct_Declarator_Tail
).
To follow the logic, put breakpoints inside the functions in the Declarator region:
(1) Input to Decl (Declarator)
is a basety of type int
.
(2) ActionOnExpandDeclarator
: expand Decl
and pass (basety=int
) to Pointer
(3) AncionOnExpandPointer
: first expansion of Pointer count *, np=1
(4) ActionOnExpandPointer
: second expansion of Pointer
to Lamdba, pass (np=1,basety=int
) to DirDecl (synth)
(5) ActionOnExpandDirectDecl
: expand DirDecl
with ( Decl ) DirDeclTail
pass (np=1,basety=int)
to DirDeclTail
(6) Expand Decl
(7) ActionOnExpandPointer : expand Pointer
and count np=1
(8) ActionOnExpandPointer
: expand Pointer
to Lambda and copy (np=1,basety=null
) to DirDecl (synth)
(9) ActionOnExpandDirectDecl
: expand DirDecl
to Id DirDeclTail
pass (np=1,basety=null)
to DirDeclTail
(10) ActionOnLoadIdToDirectDeclTail
: load DirDeclTail (with id=pi)
(11) ActionOnExpandDirectDeclTail :expand DirDeclTail
-> [ Num ] DirDeclTail
and load DirDeclTail
with (id=pi,basety=null,np=1
)
(12) ActionOnLoadNumToDirectDeclTail
: stack an stype
of (ARRAY 5
) to DirDeclTail
(13) ActionOnExpandDirectDeclTail
: expand DirDeclTail
to Lambda and pass (tmpty=null,basety=null,np=1,stype=(ARRAY 5)
) to synth Decl
(14) ActionOnProcessTempDeclarator
: load DirDeclTail
(with tmpty = (ARRAY 5 POINTER)
)
(15) ActionOnExpandDirectDeclTail
: expand DirDelcTail
with [ Num ] DirDeclTail
and load DirDeclTail
with (with tmpty = (ARRAY 5 POINTER)
)
(16) ActionOnLoadNumToDirectDeclTail
: stack an stype of (ARRAY 10
) to DirDeclTail
(17) ActionOnExpandDirectDeclTail
: expand DirDeclTail
to Lamdba and load synthesized Decl
with (basety=int,np=1,tmpty=(ARRAY 5 POINTER),stype=(ARRAY 10),id=pi
)
(18) ActionProcessSymbolTableDeclarator
: load EAGrammar._tyoe
with (PI: ARRAY 5 POINTER TO ARRAY 10 POINTER TO INT
)
As you can see, the output is correct and recursion is simulated with a stack.
Inherited attributes flow is left to right and top down, synthesized attributes flow is bottom up,
ActionOnProcessTempDeclarator
is associated with inner Declarators and is executed at end of production Declarator building a
temporary inverted type, ActionProcessSymbolTableDeclarator
is associated with the top level Declarator,
it is executed at end giving a full inverted type.
This flow of information conforms to the theory and results in a deterministic evaluation.
I haven't presented the Lexer because it's almost similar to the recursive version, it returns tokens and lexemes. Consult Semantic.cs for stacked objects and DataStructure.cs for common ds like Type.
The advantage of using the tabular method is that programming is modular and can be well documented, it's easy to
write a Compiler Generator Engine and common subsystems can be easily re-utilized, like the Declarator
one.
You can see from the code that the Declarator
subsystem is used without modifications also in parameter list processing.
A Declarator
builds an inverted type given some input parameters, the logic of complex C Declarations is performed at the
proper times inner expressions to outer resulting in a complete inverted type.
Bibliography
- Compiler Construction: Principles and Practice, K.C Louden [1997].
- Compilers Principles Techniques and Tools, 2nd Edition, Aho, Sethi, Lam and Ullman [1986].
- Algorithms for Compiler Design, O.G. Kakde [2002].