Click here to Skip to main content
15,881,424 members
Articles / Programming Languages / C#

LALR Parse Table Generation in C#

Rate me:
Please Sign up or sign in to vote.
4.84/5 (14 votes)
3 Jun 2015CPOL4 min read 72.3K   1.7K   21   26
LALR parse table generation using C#.

Introduction

Table based parser generation offers the possibility of both fast and flexible parser construction. This article describes an implementation of a particular method of constructing a parse table for an LR (left to right bottom up) parser called an LALR parser or Lookahead-LR parser.

Background

An LR parser consists of a parse table and a stack. A parser of this kind recognizes a string of input tokens by examining the input string from left to right and performing one of the following operations:

  • Shift the token onto the stack
  • Reduce several of the tokens on the stack by replacing the tokens on the right hand side of a grammatical production, which must exist on the stack with the token on the left hand side of the production
  • 'Accept' the input string
  • Produce an error

The LALR algorithm produces a parser table that decides on the possible reductions from a given state using a concept of look-ahead. The algorithm examines the productions that lead into and out of each state via both transitions and grammatical productions, and determines, for each production represented in the state, which tokens could come after that production.

Using the code

The code I've published creates a parse table, and formats the parse table to the console output.

C#
using System;
using CodeProject.Syntax.LALR;

namespace TestProject
{
    class MainClass
    {
        public static void Main (string[] args)
        {
            //
            // the following program produces a parse table for the following grammar
            // for infix expressions, and appropriately applies operator precedence of
            // the + - * / operators, otherwise evaluating the leftmost operations first
            //
            // S' -> e
            // e -> i
            // e -> ( e )
            // e -> e * e
            // e -> e / e
            // e -> e + e
            // e -> e - e
            //
            // 
            
            Grammar grammar = new Grammar();
            grammar.Tokens = new string[]{"S'", "e", "+", "-", "*", "/", "i", "(", ")"};
            
            grammar.PrecedenceGroups = new PrecedenceGroup[]
            {
                new PrecedenceGroup
                {
                    Derivation = Derivation.None,
                    Productions = new Production[]
                    {
                        //S' -> e
                        new Production{
                            Left = 0,
                            Right = new int[]{1}
                        },
                        //e -> i
                        new Production{
                            Left = 1,
                            Right = new int[]{6}
                        },
                        //e -> ( e )
                        new Production{
                            Left = 1,
                            Right =  new int[]{7, 1, 8}
                        }
                    }
                },
                new PrecedenceGroup
                {
                    Derivation = Derivation.LeftMost,
                    Productions = new Production[]
                    {
                        //e -> e * e
                        new Production{
                            Left = 1,
                            Right = new int[]{1, 4, 1}
                        },
                        //e -> e / e
                        new Production{
                            Left = 1,
                            Right = new int[]{1, 5, 1}
                        },
                    }
                },
                new PrecedenceGroup
                {
                    //productions are left associative and bind less tightly than * or /
                    Derivation = Derivation.LeftMost,
                    Productions = new Production[]
                    {
                        //e -> e + e
                        new Production{
                            Left = 1,
                            Right = new int[]{1, 2, 1}
                        },
                        //e -> e - e
                        new Production{
                            Left = 1,
                            Right = new int[]{1, 3, 1}
                        }
                    }
                }
            };

            // generate the parse table
            Parser parser = new Parser(grammar);
            
            // write the parse table to the screen
            Debug.DumpParseTable(parser);
        }
    }
}

And produces the following output:

+---+---+---+---+---+---+---+---+---+---+---+
|###| $ |S' | e | + | - | * | / | i | ( | ) |
|---+---+---+---+---+---+---+---+---+---+---+
|  0|   |   |S 1|   |   |   |   |S 2|S 3|   |
+---+---+---+---+---+---+---+---+---+---+---+
|  1|R 0|   |   |S 4|S 5|S 6|S 7|   |   |   |
+---+---+---+---+---+---+---+---+---+---+---+
|  2|R 1|   |   |R 1|R 1|R 1|R 1|   |   |R 1|
+---+---+---+---+---+---+---+---+---+---+---+
|  3|   |   |S 8|   |   |   |   |S 2|S 3|   |
+---+---+---+---+---+---+---+---+---+---+---+
|  4|   |   |S 9|   |   |   |   |S 2|S 3|   |
+---+---+---+---+---+---+---+---+---+---+---+
|  5|   |   |S10|   |   |   |   |S 2|S 3|   |
+---+---+---+---+---+---+---+---+---+---+---+
|  6|   |   |S11|   |   |   |   |S 2|S 3|   |
+---+---+---+---+---+---+---+---+---+---+---+
|  7|   |   |S12|   |   |   |   |S 2|S 3|   |
+---+---+---+---+---+---+---+---+---+---+---+
|  8|   |   |   |S 4|S 5|S 6|S 7|   |   |S13|
+---+---+---+---+---+---+---+---+---+---+---+
|  9|R 5|   |   |R 5|R 5|S 6|S 7|   |   |R 5|
+---+---+---+---+---+---+---+---+---+---+---+
| 10|R 6|   |   |R 6|R 6|S 6|S 7|   |   |R 6|
+---+---+---+---+---+---+---+---+---+---+---+
| 11|R 3|   |   |R 3|R 3|R 3|R 3|   |   |R 3|
+---+---+---+---+---+---+---+---+---+---+---+
| 12|R 4|   |   |R 4|R 4|R 4|R 4|   |   |R 4|
+---+---+---+---+---+---+---+---+---+---+---+
| 13|R 2|   |   |R 2|R 2|R 2|R 2|   |   |R 2|
+---+---+---+---+---+---+---+---+---+---+---+

Parser class

The Parser class encapsulates the LALR table construction algorithm, but also exposes several methods and properties useful for grammar analysis and debugging.

Parser - Constructor

Analyses the grammar producing the parse table, using the below methods, and other supporting methods.

GenerateLR0Items

Generates the LR(0) States of the parser by starting with the State S' -> .X where X is the start symbol of the grammar. The production S' -> X must be explicitly passed into the constructor above.

LR(0) items consists of the set of items {S' -> .X} and all of the states reachable by substituting the very next symbol after the '.' (X in this case) with the left hand side of any production that has X on the right hand side. This operation is called the closure of an LR(0) item. In the grammar above, State 0 consists of the following set of items.

S' -> . e
e -> . i
e -> . ( e )
e -> . e * e
e -> . e / e
e -> . e + e
e -> . e - e

We then find the states Goto(X) for any X which is a token of the grammar and find the successor states of state 0. This is done by including each item with the 'X' to the right of the '.' and putting them into the new state, then calculating the closure of that new state. On the token e in the grammar presented, the new state, state '1', has the following LR(0) items:

S' -> e .
e -> e . * e
e -> e . / e
e -> e . + e
e -> e . - e

The method repeats this process until there are no new states.

ComputeFirstSets

This method computes the set of terminals that are possible to occur after any token in the grammar. The first set of the token e is {i, (}. This construction makes it possible to compute the LALR states later.

CalculateLookAheads

Calculates the look-ahead items for each LR0 item from GenerateLR0Items above. A looka-head at the end of a production in the state tells the parser that it is safe to perform a reduction. For example, state 2 above on token ) is able to reduce by rule e -> i thus replacing an i on the stack with the non-terminal e. The parser knows it can do this because the LALR state 2 contains the LALR Item e -> i. , ) for production e -> i and look-ahead token ).

GenerateParseTable

This method constructs the actions of the parse table. It does this by combining the Goto actions from each state and the reduction actions possible depending on the look-aheads generated in the previous method. If at any state there is a conflict between either a goto or a reduction, the algorithm attempts to resolve this by choosing a rule from a higher precedence group. If there is no clear winner, then the algorithm checks whether the rule should produce left-most or right-most derivations. A left most derivation will favour a reduction, whereas a right-most derivation will favour a goto. If there is still no clear winner, the algorithm will announce either a Reduce-Reduce or a Shift-Reduce conflict error.

Debug class

The Debug class in the sample code contains several methods that might be useful for debugging a grammar or parser.

DumpParseTable

Writes a formatted parse table to the console.

DumpLR0State

Writes an LR(0) state to the console. For example, the following snippet will write State 0 above to the console.

C#
Debug.DumpLR0State(parser, parser.LR0States[0]);

DumpLALRState

Writes an LALR state to the console, including look-aheads. The following snippet will write the LALR items in State 0 of the generated parser to the console.

C#
Debug.DumpLR1State(parser, parser.LALRStates[0]);

References

The project code implements the LALR algorithm described in the dragon book "Compilers: Principles, Techniques, and Tools" (1986) by Aho, Sethi, and Ullman.
 

Feature backlog

I intend to implement the following features in later updates.

  • Generate a C# type that will parse an input grammar
  • Parse an input grammar from a file
  • ComponentModel/Reflection invocation of types/methods that perform the reduction rules of the grammar
  • Generate parsers in different languages

History

  • 2011-09-07: Initial implementation generating parse table.
  • 2015-06-03: Added link to resource

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
New Zealand New Zealand
I've spent time programming for the security industry, video games, and telephony. I live in New Zealand, and have a Bachelor of Computing and Mathematical Sciences specializing in Computer Technology, from the University of Waikato in Hamilton, New Zealand.

Comments and Discussions

 
QuestionHow to resolve error CS0246. Pin
Member 1467143728-Nov-19 7:01
Member 1467143728-Nov-19 7:01 
PraiseThanks for the code! Pin
Ken Domino17-Nov-16 7:45
professionalKen Domino17-Nov-16 7:45 
GeneralRe: Thanks for the code! Pin
phillipvoyle27-Apr-17 21:24
phillipvoyle27-Apr-17 21:24 
GeneralImplemented simple parsing algorithm Pin
sebgod2-Jan-16 18:12
sebgod2-Jan-16 18:12 
GeneralRe: Implemented simple parsing algorithm Pin
phillipvoyle3-Jan-16 10:14
phillipvoyle3-Jan-16 10:14 
Thanks for the heads up and I'm glad you liked my article. It's funny to note that out of everything that I've written this is still my most popular download despite how old it is. Happy hacking
GeneralRe: Implemented simple parsing algorithm Pin
sebgod3-Jan-16 14:11
sebgod3-Jan-16 14:11 
QuestionEntry into LALR Parsing Table Pin
Member 108031207-May-14 20:45
Member 108031207-May-14 20:45 
AnswerRe: Entry into LALR Parsing Table Pin
phillipvoyle8-May-14 0:42
phillipvoyle8-May-14 0:42 
QuestionLALR Parser Pin
Member 1067872929-Apr-14 6:33
Member 1067872929-Apr-14 6:33 
AnswerRe: LALR Parser Pin
phillipvoyle2-May-14 19:16
phillipvoyle2-May-14 19:16 
GeneralLALR Parser Pin
Member 107326897-Apr-14 23:52
Member 107326897-Apr-14 23:52 
GeneralRe: LALR Parser Pin
phillipvoyle8-Apr-14 20:55
phillipvoyle8-Apr-14 20:55 
GeneralRunning the source code Pin
Member 1067872925-Mar-14 19:32
Member 1067872925-Mar-14 19:32 
GeneralRe: Running the source code Pin
phillipvoyle1-Apr-14 21:16
phillipvoyle1-Apr-14 21:16 
QuestionProblem Pin
hadi20138-Jan-13 22:50
hadi20138-Jan-13 22:50 
AnswerRe: Problem Pin
phillipvoyle22-Oct-13 17:06
phillipvoyle22-Oct-13 17:06 
QuestionVery brief, tidy and to the point. Pin
Member 209895314-Sep-12 12:40
Member 209895314-Sep-12 12:40 
AnswerRe: Very brief, tidy and to the point. Pin
phillipvoyle14-Sep-12 18:40
phillipvoyle14-Sep-12 18:40 
QuestionThanks Pin
ahmad.haniff11-Nov-11 5:49
ahmad.haniff11-Nov-11 5:49 
AnswerRe: Thanks Pin
phillipvoyle11-Nov-11 11:01
phillipvoyle11-Nov-11 11:01 
Questionnice effort, but since dragon book more efficient algorithms appeared Pin
Roman Ivantsov12-Sep-11 10:13
professionalRoman Ivantsov12-Sep-11 10:13 
AnswerRe: nice effort, but since dragon book more efficient algorithms appeared Pin
dave.dolan12-Sep-11 16:36
dave.dolan12-Sep-11 16:36 
AnswerRe: nice effort, but since dragon book more efficient algorithms appeared Pin
phillipvoyle12-Sep-11 21:37
phillipvoyle12-Sep-11 21:37 
QuestionThis is what Gold Parser Does Pin
dave.dolan9-Sep-11 9:37
dave.dolan9-Sep-11 9:37 
AnswerRe: This is what Gold Parser Does Pin
phillipvoyle10-Sep-11 11:10
phillipvoyle10-Sep-11 11:10 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.