Click here to Skip to main content
15,880,725 members
Articles / Programming Languages / C#
Article

Inside the Mathematical Expressions Evaluator

Rate me:
Please Sign up or sign in to vote.
4.88/5 (43 votes)
17 Jan 2008CPOL8 min read 115.7K   3.4K   77   31
An article on mathematical expression evaluation
Screenshot - insideTheMEE0.gif

Introduction

Modern calculators solve differential equations and draw plots. They work with trigonometry, hyperbolic, logarithmic, statistic and other functions. Modern calculators contain processors that are much stronger than their predecessors in old "Apollo"-like spaceships. This article describes the easiest way to build a parser with the ability to analyze complex mathematical expressions and evaluate them. The algorithm is the one that is commonly used in the calculators.

Problem Definition

Let's decide what kind of expressions our calculator should handle.

Simple operations:(2+3)*4/5^5
Unary operators:-6
Single functions:2^3*cos(pi)
Functions with more than one parameter:Log(10,100)
Right-side operators:x!
Variables assignmentx=y=10;

Reverse Polish Notation and Shunting Yard Algorithm

A Prefix (Polish) notation was introduced in 1920 by the Polish mathematician Jan Lukasiewicz. It is a mathematical notation wherein every operator follows all of its operands. It is also known as Postfix notation. Reverse Polish notation (RPN) was invented by Australian philosopher and computer scientist Charles Hamblin in the mid-1950s, to enable zero-address memory stores. Hamblin presented his work at a conference in June 1957, and published it in 1957 and 1962.

In Reverse Polish notation, the operators follow their operands. Infix a + b changes to ab+; (a + b)*c changes to ab+c*. One of the RPN advantages is that it obviates the need for parentheses. The main advantage is that it's easy to use RPN with stack (first-in-last-out) structures.

The shunting yard algorithm is a method for parsing mathematical equations specified in infix notation. It can be used to produce output in RPN. The algorithm was invented by Edsger Dijkstra and named the "shunting yard" because its operation resembles that of a railroad shunting yard. Like the evaluation of RPN, the shunting yard algorithm is stack-based. One of the stacks contains operands (digits and variables). Another stack contains all operators (+, -, *, /, etc. and functions that are treated as operators).

Inside the Shunting Yard Algorithm

This is a brief example of how "shunting yard" works. Input expression: (2+3)*4/5;

First of all, we have to push a special "sentinel" operator into the operators stack. The sentinel is used to separate operations inside the parentheses and to indicate the bottom of the stack.

OperandsOperators
#

The parser reads the expression from left to right. The first token is "(" - right parenthesis. The sentinel is pushed into the operators stack. When the parser finishes processing the current token (speaking about any token), it reads the next token, stores it into the "token" variable and processes it.

  • Current token: "2" - an operand. Push it into the operands stack. Read the next token.
  • Current token: "+" - an operator. Push it into the operators stack. Read the next token.
  • Current token: "3" - an operand. Push it into the operands stack. Read the next token.
  • Current token: ")" - right parenthesis. Evaluate both stacks until the sentinel is reached.
OperandsOperators
3+
2#, parenthesis
#, bottom of the stack

Calculation: Pop "+" from the stack. Compare the operator's precedence. "#" has lower precedence than "+". That's why "+" is processed. "+" is a binary operator; pop two operands ("3" and "2") from the Operands stack. Calculate 2+3 and push the result into the stack. The sentinel (#) is reached. Pop the sentinel from the stack and read the next token.

OperandsOperators
5#
  • Current token: "*" - an operator. Push it into the operators stack. Read the next token.
  • Current token: "4" - an operand. Push it into the operands stack. Read the next token.
  • Current token: "/" - an operator. Compare operators' precedence. "*" has the same precedence as "/". When the token operator's precedence is lower than the stack operator's precedence, the operator in the stack is processed before the current operator.

Calculation: Pop "*" from the stack. Compare the operators' precedence. "#" has lower precedence than "*". "*" is a binary operator; pop "5" and "4" from the operands stack. Calculate 5*4 and push the result back into the stack. Push "/" into the stack. Read the next token: (2+3)*4/5.

  • Current token: "5" - an operand. Push it into the operands stack. Read the next token.
  • The last token is ";" - END indicator. We can process both stacks until the sentinel in the operators stack is reached.
OperandsOperators
5/
20#

The result is 20/5 = 4

Let's take a look at any simple or complex mathematical expression. We know what kind of token should come first in infix notation. It's a unary operator (like unary minus), left parenthesis, a number or a variable. For instance: -10; a+b; (10-x); The Primary() function converts the current token to one of the described mathematical characters. The binary operator comes after a primary token. The Binary() function evaluates binary operators. We'll also need some subroutine functions like:

  • NextToken() - Reads the next token and stores it to the token variable
  • Precedence() - Returns the operator's precedence
  • ReadOperand() - Parses the sequence of characters or digits to a number, a variable or a function

The beauty of the algorithm in its recursion. A listing below shows the basic algorithm's implementation in pseudo-C#:

C#
Stack Operands, Operators;
Token token; // current token


Parse()
{
   // Indicates the beginning of the stack

   Operators.Push(Sentinel);
   Binary();
   Expect(End)
   return Operands.Peek(); //contains the result

}

Binary()
{
   Primary();

   while token is a binary operator
   {
     PushOperator(token);
     NextToken();         // read next token

     Primary();           // parse primary

   }

   // while there are operators to evaluate

   while Operators.Peek != Sentinel
     PopOperator();
}

Primary()
{
  if token is operand //(digit or char)

     ReadOperand();
  else if token is unary
     PushOperator(ToUnary(token));
     NextToken();
     Primary();
  else if token is left parenthesis
     Operators.Push(Sentinel);
     NextToken();
     Binary();
     Expect (Right parentesis);
     Operators.Pop();
  else
     throw exception;
}

PushOperator(op)
{
   while Operators.Peek() > op // Precedence()

      PopOperator()

   Operators.Push(op)
}

PopOperator()
{
   if Operators.Peek is binary
      o2 = Operands.Pop();
      o1 = Operands.Pop();
      op = Operators.Pop();
      Operands.Push( Calculate(op, o1, o2) );
   else // unary operator

      o1 = Operands.Pop();
      op = Operators.Pop();
      Operands.Push( Calculate(op, o1) );
}

private void Expect(expectedToken)
{
    if (token == expectedToken)
    {
       NextToken();
       return;
    }
    else
       throw exception
}

As you can see, the basic implementation is very easy. This scheme is shown in the Basic calculator example.

Variables, Functions and Other Enhancements

Variables and Functions

Let's consider an expression: a+cos(pi) where a and pi are variables and cos is the cosine function. As defined above, the Primary() function is used to parse operands. In general, it should parse names and digits separately. Also, the Name parser should separate variables' and functions' names. The difference between cos(pi) and cosPI is that the first one is a function and the second one is a variable. The trick is that there is no actual "function" concept in this calculator. All functions are treated as operators. For instance, sin(x) is treated as a unary operator and log(x,y) as binary.

Each character sequence that is not a function (x, y1, sinX, etc.) is treated as a variable. The best way to store variables is in a Dictionary <string, double> collection. Therefore if the Primary() function discovers that the current token is a letter, it calls an addition function that reads the whole sequence of characters until an "escape" token is reached. In our example, a is a letter, but the next token is "+". This token cancels reading the sequence. First, the parser checks if "a" is a function or a variable. Then it retrieves "a" from memory (the dictionary) and pushes the value of the "a" variable into the stack.

In the case of a function (in our example, a+cos(pi) ), the parser reads the "c" token and does the same thing as was done with the variable: it reads "c", "o", "s", finds the escape "(" character and locates the cos function in the functions list. Then the parser calls (remember the recursion?) Primary() to parse the function's parameters. For example: cos(pi), log(10, abs( cos(pi) )), etc.

Variables Assignment

Therefore there are stored variables (pi, e) and there should be variables assignment: a=b=10^2. The "=" operator calls Parse() recursively: a=Parse( b=Parse(10^2) )

Multiplication

In mathematical expressions, multiplication is sometimes omitted. For example: the 10*x expression could be replaced with 10x. Let's define all cases in which multiplication could be omitted: x*(y), (y)*x, (y)*(x), x*f(x). When the parser finds a number, a variable or a right parenthesis, it calls the TryInsertMultiply() function that pushes the "*" operator in the stack if multiplication between the tokens was omitted.

Right-side Operators

One of the unary right-side operators in mathematics is the Factorial operator. Factorial is the product of all positive integers from one up to n and including a given integer. Factorial zero is assigned the value of one, factorial n (n!) is 1*2*...*(n-1)*n. To check if the next token is a right-side operator, the TryRightSideOperator() function is called. Argument separators in a multi-argument function are also treated as a right-side operators, i.e. log(2,4).

Power Operation

There is a basic math rule which I had forgotten. The power operator could not be evaluated the same way as multiplication or addition operators are evaluated. 3^4^5 = 3^(4^5), but 3*4*5 = (3*4)*5 or 3*(4*5). This is called associative operation. When the Compare() function compares two power operators, it returns "-1". This means that the current operator (the one that is on the top of the stack) is not processed.

Using the Code

There is one partial class Calculator divided into four files: Parser, Processor, Token and Variable.

  • Parser contains general and parse functions
  • Processor contains math operations' functions
  • Token contains tokens' (operations and functions) constants. It also contains "Is...()" functions, which determine whether the token is a digit, a function, an operator or a variable
  • Variable contains stored variables' subroutines

Points of Interest

The next step is to slightly modify the code and make the calculator able to solve logical expressions.

History

  • Power operation bug fixed - 15 January 2008
  • Release - 23 October 2007

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer
Finland Finland
I'm a Master degree student, studying at the University of Joensuu, Finland.

Comments and Discussions

 
Questioncan u do the same calculator but it must be on console application Pin
Member 1107192929-Sep-14 3:28
Member 1107192929-Sep-14 3:28 
GeneralMy vote of 5 Pin
Massimo Conti3-Jan-13 11:36
Massimo Conti3-Jan-13 11:36 
GeneralMy vote of 5 Pin
dnxit20-Oct-12 5:00
dnxit20-Oct-12 5:00 
GeneralMy vote of 1 Pin
voodooandrew29-Mar-11 11:36
voodooandrew29-Mar-11 11:36 
GeneralRe: My vote of 1 Pin
Dmitry_Savin29-Mar-11 13:12
Dmitry_Savin29-Mar-11 13:12 
GeneralRe: My vote of 1 Pin
voodooandrew29-Mar-11 13:31
voodooandrew29-Mar-11 13:31 
Thanks for reply.

But I'm not a software developer, just trying to develop one tool. To do this, I need fully working parser...

Once I found one, but these days I tried to find another one, to compare the speed
GeneralRe: My vote of 1 Pin
dnxit20-Oct-12 5:01
dnxit20-Oct-12 5:01 
GeneralParser Speed Pin
luiscoco17-Sep-10 7:23
luiscoco17-Sep-10 7:23 
GeneralExcellent work. My vote of 5 Pin
Hassan3D18-Aug-10 14:06
Hassan3D18-Aug-10 14:06 
GeneralGreat Code - Fix for bug when including negative numbers [modified] Pin
GregCaine16-Mar-10 17:25
GregCaine16-Mar-10 17:25 
GeneralSo simple, this is just what I need. Pin
Zaki Alaydrus23-Nov-09 1:20
Zaki Alaydrus23-Nov-09 1:20 
Generalเทพจริงๆพี่ Pin
new42610-Nov-08 3:23
new42610-Nov-08 3:23 
GeneralNeed Help. Its URGENT Pin
Anshul R22-Oct-08 19:01
Anshul R22-Oct-08 19:01 
QuestionOmit parentheses when applying right-side operators? Pin
Member 65995127-Sep-08 6:45
Member 65995127-Sep-08 6:45 
GeneralCool Pin
richardblaha8-Aug-08 2:46
richardblaha8-Aug-08 2:46 
GeneralPrecedence Pin
Hattmannen29-Apr-08 1:04
Hattmannen29-Apr-08 1:04 
GeneralRe: Precedence Pin
Zaur Nasibov29-Apr-08 8:00
Zaur Nasibov29-Apr-08 8:00 
GeneralThere is no evaluation "standard" Pin
ssa-ed17-Jan-08 10:08
ssa-ed17-Jan-08 10:08 
General[Message Deleted] [modified] Pin
Zaur Nasibov18-Jan-08 1:56
Zaur Nasibov18-Jan-08 1:56 
Generalimperfection Pin
ArnoKalkman30-Dec-07 7:32
ArnoKalkman30-Dec-07 7:32 
GeneralRe: imperfection Pin
Zaur Nasibov3-Jan-08 0:54
Zaur Nasibov3-Jan-08 0:54 
GeneralNice but Pin
Yoramo24-Nov-07 9:17
Yoramo24-Nov-07 9:17 
GeneralRe: Nice but Pin
Zaur Nasibov25-Nov-07 3:18
Zaur Nasibov25-Nov-07 3:18 
GeneralExcellent Pin
Sieepvvalk15-Nov-07 9:10
Sieepvvalk15-Nov-07 9:10 
GeneralVery timely Pin
Marc Clifton11-Nov-07 9:25
mvaMarc Clifton11-Nov-07 9:25 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.