# LL Mathematical Parser

, 20 Jan 2010 CPOL
 Rate this:
A mathematical parser based on Formal Language Theory constructs

## Introduction

The main task of this article is to present an implementation of Formal Language Theory constructs in a real world application. In the case described below, it will be a mathematical parser (library), which will evaluate a given valid mathematical expression and return the corresponding result. In scripting languages, you'll find this algorithm already implemented (it is actually defined within the core of dynamic programming language behavior - "everything is evaluated on runtime"). Most frequently, it is defined as the `eval` method.

In the case of static programming languages, we should define an algorithm which will perform the semantic (according to symbol table) and syntactic (parsing) analysis of a given string. Mainly, there are two basic approaches in Formal Language Theory which perform this task: LL top-down parser, which parses the input from Left to right, and constructs a Leftmost derivation of the sentence; and LR parser that reads the input from Left to right and produces the Rightmost derivation. The parser which is going to be explained here is the LL parser, written in C#.

## Background (Chomsky Hierarchy)

There are infinitely many legitimate mathematical expressions. In order to be able to identify invalid sentences from valid ones, we will need to define a precise set of rules which in turn will define a formal grammar (as in any natural language, in order to build correct sentences, we should learn its grammar). This grammar, if proved that it belongs to the class of LL grammars, will be able to generate all possible valid sequences and identify invalid ones. It is worth mentioning that it does not describe the meaning of these sentences, only the rules that can be applied on them. If you are looking for detailed explanations about LL grammars, please consult material related to Chomsky hierarchy and specifically, Type-2 grammars (context free grammars). Click here for a detailed review.

## Polish Postfix Notation

After the parser (based on the formal grammar) will tell us if the input string concedes to defined rules, we'll face the problem of actually calculating the expression according to mathematical rules of priority. Following is a simple example:

```4*3^(2)  =  36 ('^' stands for power sign)
4*3^(2) !=  122```

People can easily count the value of the above written expression because they do know what the priority of each operation is. That's why we'll start counting from the end to the beginning, first raising 3 to the power of 2, and then multiplying with 4. However, machines cannot learn this; that's where Polish Postfix Notation (PPN or Reverse Polish Notation) comes in handy. It allows us to transform a given expression into an analogous one in which the operations will be performed sequentially from left to right. Imagine that the previous example is rewritten as follows:

`(2)^3*4`

In this case, the priority can be ignored because sequentially we'll obtain the correct result. That's how PPN is used. It is important to remark that we will be using the postfix notation because the main difficulty of using the infix notation is that we still may have brackets in it, meaning that the priority order problem is not solved. That's why, in an evaluator, it is more preferable to use postfix ones. Fortunately, the PPN string is built using a parsing binary tree which in turn is constructed while our parser checks whether the input expression is valid or not. After this check is done, we'll have an in-memory copy of this tree which will allow us to build the PPN sequence. Following is an example of a PPN expression:

`4*3^(2) == 2 3 ^ 4 * (PPN)`

After we build the PPN sequence, we also need an interpreter which will be able to perform the actual mathematical expression calculation. Interpreters of Reverse Polish notation are stack based. Operands are pushed into the stack until an operation sign is encountered, then the two last items are popped, the operation is performed, and the result is pushed back into the stack. We'll get to an example of an evaluator a little bit further. For more detailed explanations, please consult the Polish notation topic.

## Binary Tree

Because the LL(1) grammar helps us to build a parsing tree (which is, of course, a binary tree), now we can traverse this one using a Left - Right - Root traversal and build the needed PPN expression. If you are not familiar with binary trees, please consult any available topic related to it.

## Building the Grammar

In the mathematical parser which is going to be explained further, the set of tokens (terminals, symbols which cannot be broken in smaller units) is defined as follows:

```VT = {+, -, *,  /,  ^, ( , ), a},
where  ' ^ ' stands for power, and
'a' stand for any real constant.```

The grammar actually defines the rules of transformation (derivation) of the input expression. Before starting to write these rules, we must consider the fact that derivations which have bigger priority (mathematically) must be placed in the deepest leaves of our derivation tree; thus, we must start to write our rules from less important tokens (from a mathematical point of view, the operations '+' and '-'; the biggest priority belongs to the unary plus and minus operations).

Grammar:
 # Rule Comment 1 E->E+T Expression + Term 2 E->E-T Expression - Term 3 E->T Some expressions can be considered to be just terms 4 T->T*P Term * Power 5 T->T/P Term / Power 6 T->P Some terms can be considered to be just powers 7 P->P^F Power ^ Factor 8 P->F Some powers can be considered to be just factors 9 F-> -F This derivation stands for unary minus 10 F-> +F This derivation stands for unary plus 11 F->a Factor can be considered to be a constant 12 F->(E) Factor can be considered to be an expression in parentheses

The obvious problems with this grammar (those which make this grammar not belonging to the LL(1) class) are:

1. We have more than '1' production (derivation) which starts with the same symbol (production 1 && 2, 4 && 5). This problem can be solved easily by applying the left factoring rule.
2. Left recursion (production 1, 2, 4, 5, 7). Needs to be eliminated.

I've just pointed out those algorithms which we need to use in order for the last to belong to the LL(1) class. The actual transformation is beyond the scope of this article, so I'll just skip it (if you are still interested in it, feel free to mail me and I'll send you the actual proof). At the final stage (after applying all the required theoretical changes to the grammar), we will be able to construct the LL(1) Parsing Table (which will tell the computer precisely what derivation to apply on each specific input character). Also, it is important to mention that before constructing the LL(1) Parsing Table, we need to build the Director Symbol Set (using the First and Follow sets).

## LL(1) Parsing Table

The actual table which was developed from the grammar can be seen in the above figure. If you are not familiar with its usage, consult any available material, or take a look at the provided example in the figure after this one.

• V - stands for adVance
• \$\$ - stands for Accept

While using this predictive parsing table, we can:

1. derive the binary tree, which will help us
2. define the Polish Postfix Expression, which in turn will
3. evaluate our mathematical expression.

Let's take an example to verify our LL(1) parsing table and build the parsing tree.

`'a+a*a^-a'`

The initial state of the stack machine is E\$ (where \$ stands for the end of stack), meaning that we are deriving the expression from the axiom E (E - expression). The initial character in the input stream is a (the first character for the input string). In order to somehow signalize to the parsing table that the input string ended, we will append the \$ sign at the very end of our string (a+a*a^-a\$). If the entire input string is valid from a grammar point of view, then at the very end, our mechanism will reach the final accept state (\$ - in the stack machine, and \$ in the entrance band). The derivation works as follows:

1. Look at the intersection between the first character from the stack machine (E) and first character of the entrance band (a) in the Parsing Table. We can see that it is the TW cell. Push TW into the stack.
2. Follow the LL(1) parsing table rules until the \$\$ state is reached.
3. If an empty cell is encountered while parsing, this means that the input string is invalid.

## Parsing Tree

Up until now, we've modeled a Syntax/Semantic Parser which is only able to control if the given expression is a valid mathematical one. In case the sequence is not valid, our LL(1) table will stop on an empty cell as I've already mentioned. Using the previous example, we can build the parsing tree, which in turn will construct the Polish Postfix Notation string. Using this string, we will perform the actual evaluation which will return the value of the math function.

#### The tree

If we traverse this tree using the LRR algorithm (Left - Right - Root), grabbing characters from leafs (nodes without children), we'll build the Polish Postfix Notation string. The string after traversal will contain the following sequence of characters:

`'a a a 0 a - ^ * +'`

## Reverse Polish Notation Evaluator

Like I've explained previously, the Reverse Polish Expression evaluator is stack based. Operands are pushed into the stack until the computer finds an operator (`+`, `-`, `*`, `/`, `^`). After that, the two previous values are popped, and the evaluation is done. The result is pushed back into the stack. Assume for the previous expression that `a = 2`.

`2+2*2^-2 = 2.5`

First of all, the computer should perform the operation of unary minus. After that, `2` is raised to the power of `-2`. The result is multiplied by `2`, and finally is added to `2`.

As you can see, the evaluation is done correctly. The LL(1) Parsing table works.

## MathObj Project

In the archive which contains the actual implementation of the mathematical parser, we can find the MathParserDataStructures project. The following class diagram describes the main classes which will perform all the theoretical work described above:

The central class which will perform the real evaluation is the `MathObj` class. It gives the opportunity to its user to find the value of any given valid mathematical expression. `MathParserBinaryTree` is the class which will hold the in-memory structure of the Parsing Tree. The tree is built from `MathParserTreeNode` objects, which in turn have their own `Nonterminal` or `Operation` values. In order to visit nodes from the tree, I've used the Visitor pattern. The tree traversal is performed using the Left-Right-Root recursive algorithm.

## Using the Code

The main operation of the `MathObj` class is the `Evaluate` method. It takes the parameters as follows:

1. `mathInput:String` - the input itself (E.g.: "x+y").
2. `vars:char[]` - an array of character variables used in the expression (E.g., {'x','y'}).
3. `varsValues:double[]` - variable values (E.g., {2.5,2.6}).

The result of this calculation will be 5.1.

You can argue that a better classes design might be an architecture with the `MathObj` class defined as `static`. But I've decided differently, because most of the time, people would like to calculate a value of a function on an interval. In this case, the string input will remain the same (alongside with the `char[]` of variables in it), only the values of the variables will change. Because the biggest overhead of the theoretical algorithm states within the building of the parsing tree, I've chosen to build it only once (and persist its state in the memory). If the method is invoked a second time without a string change, the previously built Polish Postfix Notation will be used in order to calculate the value of the expression (this Persist State idea gives the algorithm a much faster evaluation speed).

```/// <summary>
/// Evaluate the mathematical expression.
/// <summary/>
/// <param name="mathInput">Mathematical expression </param>
/// <param name="vars">Character variables
///      contained within the expression[x,y,z etc.] </param>
/// <param name="varsValues">Coresponding values
///     of character variables within the expression </param>
/// <returns> The result of evaluation </returns>
/// <exception cref="MathParserException">MathParserException </exception>
/// <exception cref="ArgumentException">ArgumentException </exception>
public virtual double Evaluate(string mathInput, char[] vars, double[] varsValues)
{
if (String.IsNullOrEmpty(mathInput))
throw new ArgumentException("String mathInput is empty or null");

if (this._mathInput == mathInput)
_persistState = true;
else
{
this._mathInput = mathInput;
_persistState = false;
}

if (_varsAndValues.Count != 0)
_varsAndValues.Clear();
if (vars != null && varsValues != null)
{
if (vars.Length != varsValues.Length)
throw new ArgumentException(
"vars and varsValues arrays length are not equal");
for (int i = 0; i < vars.Length; i++)
}
double retVal;
try
{

if (!_persistState)
{
//semantic transform (building the tree)
this.SemanticTransform();
this._polishPostfixExpression =
new Operation[this._sizeOfPolishPostfixExpression]; //
this.GeneratePolishPostfixExpression();
}
retVal = this.CalculateValue();
}
catch (MathParserException)
{
throw;
}
return retVal;
}```

Please run the MathParser.v.3 sample application in order to view how `MathObj` can be used.

## Output

The main output of the project available in the download section is the MathParser.dll library. Also, we can find a sample project which is very simple. It allows drawing the graph of a mathematical function. Please note that the function is dependent on only one variable `x : f(x)`.

## Conclusion

Even though there are plenty of other projects (even here on CodeProject) which advice this particular problem of static programming language as dynamic function evaluation, none of them speak about theoretical part of the story. I've decided to post my solution in order for the reader to understand the practical implementation of Formal Language Theory in this particular problem. This article makes a clear mapping from pure theoretical constructs to a real world example, and I've chosen to keep the code as closer as I could to the abstract terms which were used in the description part. We can argue that in terms of memory-cost, I should have used matrixes in order to design the Parsing tree, instead of a real Tree structure, but as I've already written, I've tried to build the algorithm as close to theory as possible. I've used plenty of theoretical terms which you might explore further and adjust/use them for your particular problem. Here, I'll just enumerate them so that you can Google/explore them further:

• Formal Language Theory
• Chomsky Hierarchy
• LL(1) Grammar
• Context-free Grammars
• Parser (Syntax/Semantic Analyzer)
• Polish Postfix Notation
• Left Factoring Rule
• Predictive Parsing Table
• Director Symbol Set
• Binary Tree
• Binary Tree Traversal
• Left-Root-Right Traversal

## History

• 18th January, 2010: Initial post
• 20th January, 2010: Updated source code

## Share

Software Developer
Moldova (Republic Of)
Interested in computer science, math, research, and everything that relates to innovation. Fan of agnostic programming, don't mind developing under any platform/framework if it explores interesting topics. In search of a better programming paradigm.

 First Prev Next
 My vote of 3 Andreas Gieriet 11-Apr-12 13:37
 Hello Ciumac Sergiu,   I know, it's late, but since I saw your article for the first time today after two years, I only comment now.   I find it brave to mention so many theoretical terms in one article without really explaining them for the intermediate user. The expert knows them, the intermediate gets too little in my opinion (not even to talk about the beginner).   You seem to use some terms in a bit weird way - not to say "sloppy" (e.g. as others mentioned, RPN is established while PPN is not, and a syntax tree that is universally named as AST (Abstract Syntax Tree) - and for sure is not a binary tree...)   But I disagree mainly on the approach you suggest for implementing an LL(1) scanner/parser.   I hardly ever saw a hand crafted table driven LL(1) parser (you would use a tool for that if you insist on doing a table driven parser). E.g. the parser below took me about 1h to write - ok, it's not my first and I did borrow some code from my previous work.   The most natural approach for a hand crafted LL(1) parser is the recursive descending parser. It directly follows from an EBNF description: each production is modelled as a function. And you don't even mention this approach?!   E.g. your language in EBNF is: ```Expression = Term { ("+"|"-") Term } . Term = Power { ("*"|"/") Power } . Power = Factor { "^" Factor } . Factor = ("+"|"-") Factor | Primary . Primary = a | Nested . Nested = "(" Expression ")" . ```   And the associated parser: ``` 47 private double ParseExpression() { return ParseBinary(ParseTerm, "+", "-"); } 48 private double ParseTerm() { return ParseBinary(ParsePower, "*", "/"); } 49 private double ParsePower() { return ParseBinary(ParseFactor, "^"); } 50 private double ParseFactor() 51 { 52 if (CurrAndNext("+") != null) return _unOp["+"](ParseFactor()); 53 if (CurrAndNext("-") != null) return _unOp["-"](ParseFactor()); 54 return ParsePrimary(); 55 } 56 private double ParsePrimary() 57 { 58 double v; 59 if (double.TryParse(Curr, out v)) { Move(); return v; } 60 return ParseNested(); 61 } 62 private double ParseNested() 63 { 64 if (CurrAndNext("(") == null) Abort("(...) expected"); 65 double expr = ParseExpression(); 66 if (CurrOptNext(")") == null) Abort("')' expected"); 67 return expr; 68 } 69 private double ParseBinary(Func<double> parse, params string[] ops) 70 { 71 double expr = parse(); 72 string op; 73 while ((op = CurrAndNext(ops)) != null) expr = _binOp[op](expr, parse()); 74 return expr; 75 } 76 } 77 } ```   ...with the needed boiler plate code for calling the scanner...   ``` 1 namespace LL1Parser 2 { 3 public abstract class Scanner 4 { 5 private static readonly string _pattern = @"\s*(\d+(?:\.\d+)?|\S)\s*"; 6 private IEnumerator _tokens; 7 protected Scanner(string s) 8 { 9 _tokens = Regex.Matches(s, _pattern, RegexOptions.Compiled).Cast() 10 .Select(m => m.Groups[1].Value).GetEnumerator(); 11 Move(); 12 } 13 protected bool Move() { return _tokens.MoveNext(); } 14 protected void Abort(string msg) { throw new ArgumentException("Error: " + (msg ?? "unknown error")); } 15 protected string Curr { get { return _tokens.Current ?? string.Empty; } } 16 protected string CurrAndNext(params string[] ops) 17 { 18 string s = ops.Contains(Curr) ? Curr : null; 19 if (s != null && !Move()) Abort("data expected"); 20 return s; 21 } 22 protected string CurrOptNext(params string[] ops) 23 { 24 string s = ops.Contains(Curr) ? Curr : null; 25 if (s != null) Move(); 26 return s; 27 } 28 } ``` ...and the parser frame... ``` 29 public class Parser: Scanner 30 { 31 Dictionary> _binOp = new Dictionary>() 32 { 33 { "+", (a,b)=>a+b }, 34 { "-", (a,b)=>a-b }, 35 { "*", (a,b)=>a*b }, 36 { "/", (a,b)=>a/b }, 37 { "^", (a,b)=>Math.Pow(a,b) }, 38 }; 39 Dictionary> _unOp = new Dictionary>() 40 { 41 { "+", a=>a }, 42 { "-", a=>-a }, 43 }; 44 private Parser(string s): base(s) { } 45 public static double Parse(string s) { return new Parser(s).Parse(); } 46 private double Parse() { return ParseExpression(); } 47 ... // here comes the parser code from above ```   BTW: To produce an AST, define an `Expression` class, from that derive a `BinaryExpression`, a `UnaryExpression`, a `NumberLiteral`, and a `NestedExpression`. With that in hand, replace the `double` types by `Expression`, and in the dictionaries, add the construction of the respective expressions (plus in the `ParsePrimary()`, add the construction of the `NumberLiteral` and the `NestedExpression`.   It would also be helpful on give a bit more practical hints instead of lengthly evaluation of the table approach, e.g. more room for getting to a LL(1), which is most of the challenge.   E.g. How to convert the left-recusrsion into a right recursion From left-recursion: ```E -> E + T E -> E - T E -> T ``` With the given transformation (from theory): ```E -> E a E -> b ``` that produces the language `ba*`, e.g. `baaa`, and the equivalent set of productions that generate the same language (from theory again): ```E -> b X X -> a X X -> empty ```   This results in (with replacing `b` with `T` and `a` with `op T`, where `op` is one of `+` and `-`): ```E -> T X X -> op T X X -> empty ```   If this BNF is transformed into EBNF, the expression can even be simplified to: `E = T { ("+"|"-") T }` Where `{...}` is zero or more repetitions, `(...)` is grouping, and `a | b` is one of `a` or `b` (exclusive or). BTW: The original expression in EBNF could have been written as: `E = { E ("+"|"-") } T .`.   For my taste: this article was a good start, but it needs more accurate content. It does not show the "normal" approach to parse LL(1) - the easy way to manually write a recursive descending parser is probably the only reason why you would take the burdon to make a language definition LL(1) conformant. In addition: missing useful explicit links (give at least those that helped you!). Giving the "google terms" is nice, but the propblem is the huge amount of hits that is produced by these searches: pick the good ones and add them here explicitly.   Cheers Andi   PS: some links for the description above:   PPS: my vote of 3 for the usage of terms and the over engineered approach of the table driven LL(1) parser. This all sounds so complicated, even it is not so complicated at all, assuming you manage to get an LL(1) grammar.modified 11-Apr-12 18:52pm.
 Thoughts PIEBALDconsult 16-Dec-10 16:17
 Article clarification DavidCrow 8-Dec-10 10:11
 Re: Article clarification Ciumac Sergiu 8-Dec-10 22:05
 Thanks! Kebrite 19-Apr-10 13:30
 remarks Andrei Bozantan 27-Jan-10 6:16
 Re: remarks Ciumac Sergiu 27-Jan-10 6:33
 Nice Rozis 18-Jan-10 12:17
 Congratulations and good project eramax 18-Jan-10 12:00
 Re: Congratulations and good project Ciumac Sergiu 18-Jan-10 12:19
 Re: Congratulations and good project eramax 19-Jan-10 10:08
 Re: Congratulations and good project Ciumac Sergiu 20-Jan-10 1:04
 Re: Congratulations and good project eramax 21-Jan-10 0:17
 Article [modified] SidhNor 18-Jan-10 8:58
 Source code Tony Bermudez 18-Jan-10 8:46
 Re: Source code Ciumac Sergiu 18-Jan-10 9:10
 Last Visit: 31-Dec-99 19:00     Last Update: 28-Mar-15 2:19 Refresh 1