Converting InFix to PostFix by Recursive Descendent Parser in C#

Andreas Gieriet

Rate me:

5.00/5 (2 votes)

23 Apr 2012CPOL2 min read

14.2K

This is an alternative for "Converting InFix to PostFix using C#"

Introduction

This is an alternative tip to the original Converting InFix to PostFix using C#[^]. A much clearer way to implement such a simple syntax is to write a Recursive Descendent Parser[^].

My opinion is that the Shunting Yard Algortihm[^] as presumably implemented in the original post is a dead-end road regarding maintainability and extensibility. E.g. implementing an unary minus leads to complicated tricks to get it working, one has to tweak with weird precedence tables, etc.

The recurive descendent parser implements the precedence table implicitly. And further operators are easily added by inserting new unary or binary parse funcitons according to the extended syntax. More to that in a minute.

Let's start with the syntax of the language.

The Syntax

The syntax of the given language in the original post is as follows, written in EBNF[^]:

Expr    = Sum .
Sum     = Mul { ("-"|"+") Mul } .
Mul     = Factor { ("*"|"/"|"%") Factor } .
Factor  = Number | "(" Expr ")" .

This is easily imlemented as Recursive Descendent Parser - I show this just a few lines further down. But first: The parser needs tokens to parse. So the scanner provides these tokens.

The Scanner

The scanner decomposes the incomming character stream into chunks that make up the language words and punctuation, all together called tokens. We have:

Number: integer values like 1, 23, 456, etc.
-
+
*
/
%
(
)

White spaces between these shall be ignored.

A possible implementation for a scanner is a regular expression.

private IEnumerator<string> _tokens;
private bool Move() { return _tokens.MoveNext(); }
private InFixToRpn(string inFix)
{
    string syntax = @"\s*(\d+|[-+*/%()]|\S)\s*";
    _tokens = Regex.Matches(inFix, syntax, RegexOptions.Compiled)
        .Cast<Match>().Select(t => t.Groups[1].Value).GetEnumerator();
    Move();
}

Note: the regualr expression could be drastically reduced due to the fact that we only have single character operators:

...
    string syntax = @"\s*(\d+|\S)\s*";
    _tokens = Regex.Matches(inFix, syntax, RegexOptions.Compiled)
        .Cast<Match>().Select(t => t.Groups[1].Value).GetEnumerator();
...

There is more to the scanner (see further down), but this is just the basics: decomposing the character stream by use of regular expression[^].

The Parser

The parser reads these tokens one after the other and tries to make sense of it all. As said above, the EBNF can easily be translated into the Recursively Descendent Parser: implement each lefthand side into a function.

private void ParseExpr() { ParseSum(); }
private void ParseSum() { ParseBinary(ParseMul, "-", "+"); }
private void ParseMul() { ParseBinary(ParseFactor, "*", "/", "%"); }
private void ParseFactor() { if (IsNumber) ParseNumber(); else ParseNested(); }
private void ParseNumber() { _stack.Push(Curr); Move(); } // no check since may be at the end
private void ParseNested()
{
    if (Curr != "(" || !Move()) Error("(...) expected");
    ParseExpr();
    if (Curr != ")") Error("trailing ')' expected");
    Move(); // no check since may be at the end
}
private void ParseBinary(Action parse, params string[] ops)
{
    parse();
    string op;
    while ((op = CurrOpAndNext(ops)) != null) { parse(); _stack.Push(op); }
}

And again, this is not all of the parser code, but this is the core. And the strong statement: there is a clear 1:1 relation between the EBNF and the parser functions. Once you understand the EBNF, you understand the parser in general. The details may need some closer look, though.

Put it all together

Now that we have all the relevant parts together, lets make it work with all the pieces in place:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;

namespace AlternativeTipForInFixToPostFixParser // nice long name, isn't it? ;-)
{
    public class InFixToRpn
    {
        // the result from parsing: Error:=yes/no and Stack of operands and operators
        public class Res
        {
            public Stack<string> Stack { get; private set; }
            public Exception Error { get; private set; }
            public bool HasError { get { return Error != null; } }
            internal Res(Stack<string> stack, Exception exc) { Stack = stack; Error = exc; }
        }
        // the only entry point to the conversion
        public static Res Parse(string inFix) { return new InFixToRpn(inFix).Parse(); }
        //
        // scanner
        //
        private IEnumerator<string> _tokens;
        private Stack<string> _stack;
        private string Curr { get { return _tokens.Current ?? string.Empty; } }
        private bool Move() { return _tokens.MoveNext(); }
        private char Ch { get { return string.IsNullOrEmpty(Curr) ? ' ' : Curr[0]; } }
        private bool IsNumber { get { return char.IsNumber(Ch); } }
        private string CurrOpAndNext(params string[] ops)
        {
            string s = ops.Contains(Curr) ? Curr : null;
            if (s != null && !Move()) Error("data expected");
            return s;
        }
        private void Error(string msg) { throw new ArgumentException("Error: " + (msg ?? "unknown error")); }
        //
        // Parser
        //
        private InFixToRpn(string inFix)
        {
            string syntax = @"\s*(\d+|[-+*/%()]|\S)\s*";
            _tokens = Regex.Matches(inFix, syntax, RegexOptions.Compiled)
                .Cast<Match>().Select(t => t.Groups[1].Value).GetEnumerator();
            Move();
        }
        // no-throw: handles the whole parsing and catches any exception.
        private Res Parse()
        {
            _stack = new Stack<string>();
            Exception exc = null;
            try { ParseExpr(); if (Curr != "") Error("too much text given"); }
            catch (Exception e) { exc = e; }
            return new Res(_stack, exc);
        }
        private void ParseExpr() { ParseSum(); }
        private void ParseSum() { ParseBinary(ParseMul, "-", "+"); }
        private void ParseMul() { ParseBinary(ParseFactor, "*", "/", "%"); }
        private void ParseFactor() { if (IsNumber) ParseNumber(); else ParseNested(); }
        private void ParseNumber() { _stack.Push(Curr); Move(); } // no check since may be at the end
        private void ParseNested()
        {
            if (Curr != "(" || !Move()) Error("(...) expected");
            ParseExpr();
            if (Curr != ")") Error("trailing ')' expected");
            Move(); // no check since may be at the end
        }
        private void ParseBinary(Action parse, params string[] ops)
        {
            parse();
            string op;
            while ((op = CurrOpAndNext(ops)) != null) { parse(); _stack.Push(op); }
        }
    }
}

The call to it:

class Program
{
    public static void Main(string[] args)
    {
        string input = "(9*(1-3))-4*(3+3*4)";
        var res = InFixToRpn.Parse(input);
        Console.WriteLine("Error:  {0}{1}", res.HasError, res.HasError ? " ("+res.Error.Message+")" : "");
        Console.WriteLine("Input:  {0}", input);
        Console.WriteLine("Output: {0}", string.Join(" ", res.Stack.Reverse()));
    }
}

This results in:

Has Error = False
Input:  (9*(1-3))-4*(3+3*4)
Output: 9 1 3 - * 4 3 3 4 * + * -

--------- That's all Folks! ---------

BTW: Checkout the links as embedded in the text.

History

V1.0 2012-04-23 Initial version.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By

Andreas Gieriet

Founder eXternSoft GmbH

Switzerland

I feel comfortable on a variety of systems (UNIX, Windows, cross-compiled embedded systems, etc.) in a variety of languages, environments, and tools.
I have a particular affinity to computer language analysis, testing, as well as quality management.

More information about what I do for a living can be found at my LinkedIn Profile and on my company's web page (German only).

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.