Click here to Skip to main content
Click here to Skip to main content

LALR Parse Table Generation in C#

, 9 Sep 2011
Rate this:
Please Sign up or sign in to vote.
LALR parse table generation using C#.

Introduction

Table based parser generation offers the possibility of both fast and flexible parser construction. This article describes an implementation of a particular method of constructing a parse table for an LR (left to right bottom up) parser called an LALR parser or Lookahead-LR parser.

Background

An LR parser consists of a parse table and a stack. A parser of this kind recognizes a string of input tokens by examining the input string from left to right and performing one of the following operations:

  • Shift the token onto the stack
  • Reduce several of the tokens on the stack by replacing the tokens on the right hand side of a grammatical production, which must exist on the stack with the token on the left hand side of the production
  • 'Accept' the input string
  • Produce an error

The LALR algorithm produces a parser table that decides on the possible reductions from a given state using a concept of look-ahead. The algorithm examines the productions that lead into and out of each state via both transitions and grammatical productions, and determines, for each production represented in the state, which tokens could come after that production.

Using the code

The code I've published creates a parse table, and formats the parse table to the console output.

using System;
using CodeProject.Syntax.LALR;

namespace TestProject
{
    class MainClass
    {
        public static void Main (string[] args)
        {
            //
            // the following program produces a parse table for the following grammar
            // for infix expressions, and appropriately applies operator precedence of
            // the + - * / operators, otherwise evaluating the leftmost operations first
            //
            // S' -> e
            // e -> i
            // e -> ( e )
            // e -> e * e
            // e -> e / e
            // e -> e + e
            // e -> e - e
            //
            // 
            
            Grammar grammar = new Grammar();
            grammar.Tokens = new string[]{"S'", "e", "+", "-", "*", "/", "i", "(", ")"};
            
            grammar.PrecedenceGroups = new PrecedenceGroup[]
            {
                new PrecedenceGroup
                {
                    Derivation = Derivation.None,
                    Productions = new Production[]
                    {
                        //S' -> e
                        new Production{
                            Left = 0,
                            Right = new int[]{1}
                        },
                        //e -> i
                        new Production{
                            Left = 1,
                            Right = new int[]{6}
                        },
                        //e -> ( e )
                        new Production{
                            Left = 1,
                            Right =  new int[]{7, 1, 8}
                        }
                    }
                },
                new PrecedenceGroup
                {
                    Derivation = Derivation.LeftMost,
                    Productions = new Production[]
                    {
                        //e -> e * e
                        new Production{
                            Left = 1,
                            Right = new int[]{1, 4, 1}
                        },
                        //e -> e / e
                        new Production{
                            Left = 1,
                            Right = new int[]{1, 5, 1}
                        },
                    }
                },
                new PrecedenceGroup
                {
                    //productions are left associative and bind less tightly than * or /
                    Derivation = Derivation.LeftMost,
                    Productions = new Production[]
                    {
                        //e -> e + e
                        new Production{
                            Left = 1,
                            Right = new int[]{1, 2, 1}
                        },
                        //e -> e - e
                        new Production{
                            Left = 1,
                            Right = new int[]{1, 3, 1}
                        }
                    }
                }
            };

            // generate the parse table
            Parser parser = new Parser(grammar);
            
            // write the parse table to the screen
            Debug.DumpParseTable(parser);
        }
    }
}

And produces the following output:

+---+---+---+---+---+---+---+---+---+---+---+
|###| $ |S' | e | + | - | * | / | i | ( | ) |
|---+---+---+---+---+---+---+---+---+---+---+
|  0|   |   |S 1|   |   |   |   |S 2|S 3|   |
+---+---+---+---+---+---+---+---+---+---+---+
|  1|R 0|   |   |S 4|S 5|S 6|S 7|   |   |   |
+---+---+---+---+---+---+---+---+---+---+---+
|  2|R 1|   |   |R 1|R 1|R 1|R 1|   |   |R 1|
+---+---+---+---+---+---+---+---+---+---+---+
|  3|   |   |S 8|   |   |   |   |S 2|S 3|   |
+---+---+---+---+---+---+---+---+---+---+---+
|  4|   |   |S 9|   |   |   |   |S 2|S 3|   |
+---+---+---+---+---+---+---+---+---+---+---+
|  5|   |   |S10|   |   |   |   |S 2|S 3|   |
+---+---+---+---+---+---+---+---+---+---+---+
|  6|   |   |S11|   |   |   |   |S 2|S 3|   |
+---+---+---+---+---+---+---+---+---+---+---+
|  7|   |   |S12|   |   |   |   |S 2|S 3|   |
+---+---+---+---+---+---+---+---+---+---+---+
|  8|   |   |   |S 4|S 5|S 6|S 7|   |   |S13|
+---+---+---+---+---+---+---+---+---+---+---+
|  9|R 5|   |   |R 5|R 5|S 6|S 7|   |   |R 5|
+---+---+---+---+---+---+---+---+---+---+---+
| 10|R 6|   |   |R 6|R 6|S 6|S 7|   |   |R 6|
+---+---+---+---+---+---+---+---+---+---+---+
| 11|R 3|   |   |R 3|R 3|R 3|R 3|   |   |R 3|
+---+---+---+---+---+---+---+---+---+---+---+
| 12|R 4|   |   |R 4|R 4|R 4|R 4|   |   |R 4|
+---+---+---+---+---+---+---+---+---+---+---+
| 13|R 2|   |   |R 2|R 2|R 2|R 2|   |   |R 2|
+---+---+---+---+---+---+---+---+---+---+---+

Parser class

The Parser class encapsulates the LALR table construction algorithm, but also exposes several methods and properties useful for grammar analysis and debugging.

Parser - Constructor

Analyses the grammar producing the parse table, using the below methods, and other supporting methods.

GenerateLR0Items

Generates the LR(0) States of the parser by starting with the State S' -> .X where X is the start symbol of the grammar. The production S' -> X must be explicitly passed into the constructor above.

LR(0) items consists of the set of items {S' -> .X} and all of the states reachable by substituting the very next symbol after the '.' (X in this case) with the left hand side of any production that has X on the right hand side. This operation is called the closure of an LR(0) item. In the grammar above, State 0 consists of the following set of items.

S' -> . e
e -> . i
e -> . ( e )
e -> . e * e
e -> . e / e
e -> . e + e
e -> . e - e

We then find the states Goto(X) for any X which is a token of the grammar and find the successor states of state 0. This is done by including each item with the 'X' to the right of the '.' and putting them into the new state, then calculating the closure of that new state. On the token e in the grammar presented, the new state, state '1', has the following LR(0) items:

S' -> e .
e -> e . * e
e -> e . / e
e -> e . + e
e -> e . - e

The method repeats this process until there are no new states.

ComputeFirstSets

This method computes the set of terminals that are possible to occur after any token in the grammar. The first set of the token e is {i, (}. This construction makes it possible to compute the LALR states later.

CalculateLookAheads

Calculates the look-ahead items for each LR0 item from GenerateLR0Items above. A looka-head at the end of a production in the state tells the parser that it is safe to perform a reduction. For example, state 2 above on token ) is able to reduce by rule e -> i thus replacing an i on the stack with the non-terminal e. The parser knows it can do this because the LALR state 2 contains the LALR Item e -> i. , ) for production e -> i and look-ahead token ).

GenerateParseTable

This method constructs the actions of the parse table. It does this by combining the Goto actions from each state and the reduction actions possible depending on the look-aheads generated in the previous method. If at any state there is a conflict between either a goto or a reduction, the algorithm attempts to resolve this by choosing a rule from a higher precedence group. If there is no clear winner, then the algorithm checks whether the rule should produce left-most or right-most derivations. A left most derivation will favour a reduction, whereas a right-most derivation will favour a goto. If there is still no clear winner, the algorithm will announce either a Reduce-Reduce or a Shift-Reduce conflict error.

Debug class

The Debug class in the sample code contains several methods that might be useful for debugging a grammar or parser.

DumpParseTable

Writes a formatted parse table to the console.

DumpLR0State

Writes an LR(0) state to the console. For example, the following snippet will write State 0 above to the console.

Debug.DumpLR0State(parser, parser.LR0States[0]);

DumpLALRState

Writes an LALR state to the console, including look-aheads. The following snippet will write the LALR items in State 0 of the generated parser to the console.

Debug.DumpLR1State(parser, parser.LALRStates[0]);

References

The project code implements the LALR algorithm described in the dragon book "Compilers: Principles, Techniques, and Tools" (1986) by Aho, Sethi, and Ullman.

Feature backlog

I intend to implement the following features in later updates.

  • Generate a C# type that will parse an input grammar
  • Parse an input grammar from a file
  • ComponentModel/Reflection invocation of types/methods that perform the reduction rules of the grammar
  • Generate parsers in different languages

History

  • 2011-09-07: Initial implementation generating parse table.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

phillipvoyle
Software Developer (Senior)
New Zealand New Zealand
I've spent time programming for the security industry, video games, and telephony. I live in New Zealand, and have a Bachelor of Computing and Mathematical Sciences specializing in Computer Technology, from the University of Waikato in Hamilton, New Zealand.
Follow on   Twitter

Comments and Discussions

 
QuestionEntry into LALR Parsing Table PinmemberMember 108031207-May-14 20:45 
AnswerRe: Entry into LALR Parsing Table [modified] Pinmemberphillipvoyle8-May-14 0:42 
QuestionLALR Parser PinmemberMember 1067872929-Apr-14 6:33 
AnswerRe: LALR Parser Pinmemberphillipvoyle2-May-14 19:16 
GeneralLALR Parser PinmemberMember 107326897-Apr-14 23:52 
GeneralRe: LALR Parser Pinmemberphillipvoyle8-Apr-14 20:55 
GeneralRunning the source code PinmemberMember 1067872925-Mar-14 19:32 
GeneralRe: Running the source code Pinmemberphillipvoyle1-Apr-14 21:16 
QuestionProblem Pinmemberhadi20138-Jan-13 22:50 
AnswerRe: Problem [modified] Pinmemberphillipvoyle22-Oct-13 17:06 
QuestionVery brief, tidy and to the point. PinmemberMember 209895314-Sep-12 12:40 
AnswerRe: Very brief, tidy and to the point. Pinmemberphillipvoyle14-Sep-12 18:40 
QuestionThanks Pinmemberahmad.haniff11-Nov-11 5:49 
AnswerRe: Thanks Pinmemberphillipvoyle11-Nov-11 11:01 
You're welcome Smile | :)
Questionnice effort, but since dragon book more efficient algorithms appeared PinmemberRoman Ivantsov12-Sep-11 10:13 
AnswerRe: nice effort, but since dragon book more efficient algorithms appeared Pinmemberdave.dolan12-Sep-11 16:36 
AnswerRe: nice effort, but since dragon book more efficient algorithms appeared Pinmemberphillipvoyle12-Sep-11 21:37 
QuestionThis is what Gold Parser Does Pinmemberdave.dolan9-Sep-11 9:37 
AnswerRe: This is what Gold Parser Does Pinmemberphillipvoyle10-Sep-11 11:10 
GeneralRe: This is what Gold Parser Does Pinmemberdave.dolan10-Sep-11 19:16 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web04 | 2.8.140814.1 | Last Updated 9 Sep 2011
Article Copyright 2011 by phillipvoyle
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid