Introduction
In this article, I'll describe my C# math evaluator called "HInterpreter
". I won't go deep into the algorithm in a step-by-step manner, instead I'll try to describe as much information as possible to enable you to have a clear view as to how it works and how developers can extend it.
Background
Here's what is most important - what is this for? Well - from the programmer point of view, the "brain" of the entire library is the Interpreter
class which has one public
method:
public double Calculate(string input)
By giving the input string
, which is supposed to be a math expression, the function returns the calculated value.
Before I start with the implementation details, here are some facts that can give more view about possibilities:
- Library can evaluate any math expression - there is no max-length limit
- Spaces insensitive, eg: "10 - (2*8) / (20 +1)"
- Evaluate any type of brackets, e.g.: "2+3*[7-1] / {10%2}"
- Evaluate float numbers, e.g.: "3.65 * 2 / (1-0,1)"
- Evaluate functions, e.g.: "1 + sin(4*(sqrt(9) + abs(-2))) + Exp(7)"
- Evaluate mathematical constants, e.g.: "3 + cos(pi*2) + e"
- Easy for extensions (see Extensions chapter for more info.)
To calculate the expression - the expression itself must be valid! Otherwise exception will occur.
Algorithm
To obtain such logic, I used RPN (Reverse Polish notation). It is a mathematical notation wherein every operator follows all of its operands. It is also known as Postfix notation and is parenthesis-free as long as operator arities are fixed.
That means - given the expression:
"5 + ((1 + 2) * 4) - 3"
Program transforms it to:
5 1 2 + 4 * + 3 -
The expression is evaluated left-to-right. An interesting fact is that RPN is quite simple to achieve. What needs to be done first is create the postfix notation from typical "infix". In this case - second algorithm is used, called Shunting-yard algorithm. It is quite complicated, if you are interested in details, check here.
Implementation
The entire algorithm is written within the Interpreter
class. However this is not the only class available in the library. To achieve easy extensibility, the entire library is split into logic classes:
Interpreter
class
This is the main high-level class that is used to calculate values from expression. It contains algorithm implementation. Public Calculate()
method has already been described. If you only want to use library (not extend), that's the only class you should know.
Token
class
Token
is the abstract
class. It represents any single, atomic part of expression. That means: function, operator, number value... but also left and right brackets.
TokenFactory
This class is used to recognize part of the expressions. By given string
, TokenFactory
returns the appropriate object (more information in the next chapter).
Class Hierarchy
As said, the library is built for extension. Before we get to the actual "how to extend" scenario, have a quick look at the below diagram. It shows the library class hierarchy (well, part of it - that inherits from Token abstract
class).
Token Class
As you can see, Token
is the root class for all expression parts. Every atomic part of the expression, that we want to distinguish, must inherit from Token
. Token
code implementation is very short:
public abstract class Token {
public readonly string Entry;
public Token(string entry) {
Entry=entry;
}
}
Nothing fancy here - just one readonly
field called "Entry
". That field contains a string
corresponding to a particular expression.
Example: In expression "1+sin(4)
", we can distinguish 4 tokens:
- value 1
- operator +
- sinus function
- function argument
For these tokens, their Entries are : "1", "+", "sin", "4".
NumberBase Class
This is the abstract
class for all constant values written in the expression. For instance: it has an implementation for the "pi" number (3.14), "e" number, as well as typical math value - Number
class. Number
class is the representation of any number simply written in the expression. If you want to add your own math contants, you should definitely inherit from the NumberBase
class.
And below is an example of implementation - PI number:
class Pi: NumberBase {
public override double Value {
get {
&nbs
p; return Math.PI;
}
}
public Pi(string value)
:base(value){
}
}
OperatorBase Class
No puzzle here, this class is to manage operators behaviour. Let's look on the implementation:
abstract class OperatorBase: Token{
public virtual int Precedence {
get {
&nbs
p; return 1;
}
}
public virtual Associativity
Associativity {
get {
&nbs
p; return Associativity.Left;
}
}
public OperatorBase(string value)
:base(value) {
}
public abstract double Calculate(double
left,double right);
}
As you can see, some more complicated stuff is present here. The major part is abstract Calculate
method that every operator must implement. Besides that, we see:
- Precedence. Value of that field manages operators precedence - for example: multiplication must be done before addition (talking about parameterless expression). Default value of the precedence is 1. For instance: multiplication has precedence of 2, and power has 3. Thus, such expression: "1+2*4^3" acts properly (first power, then multiplication and then addition).
- Associativity. This field is of type
Enum
with two values possible: Left and Right (default is Left). That mean - when calculating following: "1+2+3", first 1+2 is calculated, and result is added to 3. Particular operators can override it. Example: power operator has Right associativity.
Example of implementation - operator "-" (Subtraction):
class Subtraction:OperatorBase {
public Subtraction(string value): base
(value) {
}
public override double Calculate(double
left,double right) {
return left-
right;
}
}
FunctionBase Class
This is the base class for all functions (like sin, cos, round, abs, sqrt, exp, etc..). Let's look at the code:
abstract class FunctionBase: Token {
public virtual int OperandsCount {
get {
&nbs
p; return 1;
}
}
public FunctionBase(string value)
: base(value) {
}
public abstract double Calculate(params
double[] args);
}
Similar to operators, the main function that needs to be overridden is the Calculate
function. It takes 1 parameter: array of double
. This parameter represents operands of a function (although currently library can manage only functions of 1 operand, this parameter was designed for the future. That means OperandsCount
property is not implemented yet, and should not be overridden).
A Quick Look At the Sample Function
class Cosinus: FunctionBase {
public Cosinus(string value)
: base(value) {
}
public override double Calculate(params
double[] args) {
return
Math.Cos(args[0]);
}
}
Extension
If you already read the entire article above, it should be clear as to how to extend the library.
- If you want to add a new operator - inherit from
OperatorBase
class
- If you want to add a new function - inherit from
FunctionBase
class
- If you want to add a new math constant value - inherit from
NumberBase
class
But declaration of new classes is not enough. If you want them to be used in Interpreter
, one more class needs to be extended - it's the TokenFactory
class.
TokenFactory
is responsible for creation of appropriate Token
object based on part of the expression. Extending it is very simple. Here's a part of the TokenFactory
code that should explain every unknown:
RegisterToken<Power>(x => x=="^");
RegisterToken<E>(x => x=="e");
RegisterToken<Sinus>(x => x=="sin");
RegisterToken<Cosinus>(x => x=="cos");
What is shown above is the part of the static
constructor of TokenFactory
class. You don't have to go deep into the RegisterToken
function - simply add a new entry and the corresponding expression. In the example, we see registration of power operator (that is "^" char), registration of Euler number ("e" char) and function -sin and cos. That means: where "sin" expression is found - it's considered as a sinus function. Notice that you don't have to specify whether a particular token is a number, function or operator. Just specify a token type (as a generics type parameter), and expression representation as string
. That's it.
Conclusion
What do I say .. I hope you like my library. A small request before comments / rating - please don't blame my English grammar skills (or lack of it...) - it's not perfect yet.
History
- 2nd September, 2009: Initial post
- 27th March, 2010: Updated source code