In some projects we might need a proper math parser to either do simple math (a little calculator) or more complex one. CodeProject offers us a variaty of math parsers ranging from ports of C/C++ ones to ones purely written in C#. This article tries to create a powerful parser in pure C#, using the properties that come with C# and the .NET Framework. The parser itself is perfectly capable of running on other platforms as well, as it has been written on Linux.
The parser itself is Open Source. The full source code of YAMP is available on Github. A NuGet package will also be available soon.
Years ago I've written a math parser using the C# compiler and some
inheritance. The way was to provide a base class that offered all possible methods (like
exp(), ...), which serves us in the
inheritance process. Then a string has been built that represented valid C# code with a custom expression in its middle, which has then to be compiled. Of course a pre-parser was necessary to give some data types. On the other side operator overloading and other C# stuff could then be used.
The problem with this solution is that the C# compiler is kind of heavy for a single document of code. Also the resulting assembly had to be loaded first. So after our pre-parsing process we had to deal with the C# compiler and the assembly. This means that reflection was necessary as well. The minimum time for a any expression was therefore O(102) ms. This is far too long and does not provide a smooth outcome.
There are several projects on the CodeProject, which aim to provide a powerful parser. All of those projects are faster than my old solution - they are also more robust. However, they are not as complete as my project was - therefore I now try to succeed in this very old project with a new way. This time I built a kind of parser from scratch, which uses (and abused) reflection and therefore provides an easy to extend code base, that contains everything from operators to expressions over constants and functions.
What YAMP can do
Before we dig to deep into YAMP we might have a look on what YAMP offers us. First a brief look at some features:
- Assignment operators (=, +=, -=, ...)
- Complex arithmetic using i as imaginary constant.
- Matrix algebra including products, divisions.
- Trigonometric functions (sin, cos, sinh, cosh) and their inverse (arcsin, arccos, arcosh, arsinh).
- More complicated functions like Gamma and others.
- Faculty, transpose and left divide operator.
- Using indices to get and set fields.
- Complex roots and powers.
The core element of YAMP is the
ScalarValue class, which is an implementation of the abstract
Value class. This class provides everything that is required to do numerics with a complex type. Implementations of the basic trigonometric functions with complex arguments have been included as well.
MatrixValue class is basically a (two dimensional) list of
ScalarValue instances. We might think that it should be possible to make this list n dimensional, however, that would offer some complications like the need to specify a way to set a value in the n dimension. At the moment we will avoid this issue.
Now that we have a brief overview of the main features of YAMP, let's look at some of the expressions that can be parsed by YAMP:
This is a simple expression, but it already expressions that are not trivial. In order to parse this expression successfully, we need to be careful about operator levels (the multiply operator
* has a higher level than the one for subtraction
-, same goes for the division operator
/). We also have the minus sign used as a unitary operator, i.e. we do not have a heading zero here.
This is often parsed wrong. Asking Matlab for this expression we get 256. That seemed kind of low and wrong. Asking YAMP first resulted in this as well - until we include the rule that even operators of the same level will be treated
separately - from right to left. This rule does not make a difference in any expression - except with the power operator. Now we get the same result as Google shows us - 65356.
We could use brackets, but in this case we do not need them. This is a simple matrix assignment to the symbol
x. Our own assigned symbols are case sensitive, while included functions and constants are not. Therefore we could assign our own
pi and still access the original constant by using a different upper- / lowercase notation like
Again, some constants are built in like
pi. Therefore this expression is parsed successfully. Using YAMP as an external library it is easily possible to add our own constants.
This seems odd, but it will make sense. First of all we have a scalar (2) powered by a matrix (the one we assigned before). This is not defined, therefore YAMP will perform the power operation on every value in the matrix. The resulting matrix will have the same dimensions as the exponent - it will be a 2 times 3 matrix. This matrix then performs the subtraction with 5 (a scalar) on the right side. This is again not defined, therefore YAMP performs the operation on every cell again. In the end we have a 2 times 3 matrix.
We use the range operator
: to specify all available indices. The row index is set to 1 (we use rows before columns as Matlab and Fortran does - and we are now 1 based) - the first row. Therefore the result will be a matrix with dimension 1 times 3 (a transposed three vector). The entries will be 2, 3, 4.
Here we reassign every value of the matrix to a new value - given in the vector 8, 9, 10, 11, 12, 13. If we just use one index it will be a combined one. Therefore a 2 times 3 matrix might have the pair 2,3 as last entry (and the pair 1,1 as first entry), but might also have the index 6 as last entry (and the index 1 as first entry).
The adjungate operator either conjugates scalars (a projection on the real axis, i.e. changing the sign of the imaginary value) or transposes the matrix along with conjugating its entries. Therefore
xt now stores a 3 times 2 matrix. If we just want to transpose (without conjugating) we can use the
z=sin(pi * (a = eye(4)))
Here we assign the value of
eye(4) to the symbol
a. Afterwards the value is
pi and then evaluated by the
sin() function. The outcome is saved in the symbol
eye() function generates an n dimensional unit matrix. If we apply trigonometric functions on matrices we end up (as with the power function and others) with a matrix, where each value is the outcome of the former value used as an argument for the specific function.
Here two things are happening. First we create a new symbol (called
whatever) and then we assign the cell in the 10th row and 10th column to the value 8+3i. So we are able to use indices even on non-existing symbols. We can also expand matrices by assigning higher indices. This is only possible by setting values - we are not available to get higher indices. Here an exception will be thrown.
Here we will get information about all even rows (2:2:end means: start at index 2 and go to the end (a short macro) with a step of 2) with the first half of their columns. The displayed matrix will be a 5 times 5 one with rows 2,4,6,8,10 and columns 1,2,3,4,5.
YAMP does take matrices seriously. Multiplying matrices is done with respect to the proper dimensions. Let's have a look at a sample console application:
The index operator allows us to select (multiple) columns and / or rows from matrices. The following output shows the selection of the inner 3x3 matrix (from the 5x5 matrix).
This concludes our short look at YAMP. Of course YAMP is also able to perform queries like
How YAMP does it
YAMP contains a few important classes:
The most important datatype for accessing YAMP outside the library is the
Parser class. We'll introduce this concept later on.
Right now we want to have a short look at the class diagram for the whole library. First of all the class diagram in principle:
Parser generates a
ParseTree, that has full access to the singleton class
Tokens. The singleton is used to find operators, functions, expressions and constants, as well as resolving symbols.
YAMP relies heavily on reflection. The first time YAMP is used might be a little bit slow (compared to further usages). Therefore calling the static
Load() method of the
Parser should be done before any measurements or user input. This guarantees the shortest possible execution time for the given expression.
Any expression will always consist of a number of sub-expressions and operators. Operators can either be binary (like +, -, ...) or unary (like !, , ', ...). The binary ones need two operands (left and right), while unary ones just need one. An expression might be one of the following:
- A bracket expression, which contains a full
- A number - it can either be positive, negative, real or complex.
- An absolute value, which is sometime like a bracket expression that calls the
abs() function after being evaluated.
- A symbolic value - either to be resolved (could be a constant or something that has been stored previously) or to be set.
- A function expression can be viewed as a symbolic value that has a bracket expression directly attached (without an operator between the expression).
This results in the following class diagram:
Let's have a look at the class diagram for the operators next:
Here we see that the
AssignmentOperator is also nothing more than another
BinaryOperator. For simplicity and code-reuse another abstract class called
AssignmentPrefixOperator has been added as an additional layer between any combined assignment operator (like += or -=) and the original assignment operator. Overall the following assignment operators have been added: +=, -=, *=, /=, \=, ^=.
The difference between the left division (/) and right division (\) is that, while left division means A = B / C = B-1 * C, right
division means A = B \ C = B * C-1. The difference between both can be crucial for matrices, where it matters, if we multiply from left or right. The two operands are always of type
Value. This type is an abstract class, which forms the basis for the following derived datatypes:
ScalarValue for numbers (can be imaginary)
MatrixValue for matrices (can be only vectors), which consist of
StringValue for strings
ArgumentsValue for a list of arguments
The functions are also a quite important part of YAMP. The class diagram for the functions look like:
Here we create a standard type
StandardFunction. This type has to implement the interface
IFunction, as required. The only thing that will be called is the
Perform() method. The
StandardFunction contains already a framework that works only with numeric types like
ScalarValue. If the
Perform() method won't be changed, every number of a given matrix will be changed to the result of the function that is being called.
Another useful type here is the
ArgumentFunction type. This one can be used to create a fully operational overload machine. Once we derive from this class we only have to create one or more functions that are called
Function() with return type
Value. Reflection will find out how many arguments are required for each function and call the right one (or return an error to the user) for the number of given arguments. Let's consider the
class RandFunction : ArgumentFunction
static readonly Random ran = new Random();
public Value Function()
return new ScalarValue(ran.NextDouble());
public Value Function(ScalarValue dim)
var k = (int)dim.Value;
if(k <= 1)
return new ScalarValue(ran.NextDouble());
var m = new MatrixValue(k, k);
for(var i = 1; i <= k; i++)
for(var j = 1; j <= k; j++)
m[j, i] = new ScalarValue(ran.NextDouble());
public Value Function(ScalarValue rows, ScalarValue cols)
var k = (int)rows.Value;
var l = (int)cols.Value;
var m = new MatrixValue(k, l);
for(var i = 1; i <= l; i++)
for(var j = 1; j <= k; j++)
m[j, i] = new ScalarValue(ran.NextDouble());
With this code YAMP will return results for zero, one or two arguments. If the user provides more than two arguments, YAMP will throw an exception with the message that no overload has been found for the number of arguments the user provided.
Implementing new standard functions, i.e., those functions which just have one argument, is easy as well. Here the implementation of the
ceil() function is presented:
class CeilFunction : StandardFunction
protected override ScalarValue GetValue(ScalarValue value)
var re = Math.Ceiling(value.Value);
var im = Math.Ceiling(value.ImaginaryValue);
return new ScalarValue(re, im);
The implementation of the
BinaryOperator class is quite straight forward. Since any binary operator requires two expressions the
implementation of the
Evaluate() method throws an exception if the given number of expressions is different from two. Two new methods have been introduced: One that can be
Handle() and another one that has to be implemented called
Perform(). The previous one will be used by the standard implementation of the first one.
abstract class BinaryOperator : Operator
public BinaryOperator (string op, int level) : base(op, level)
public abstract Value Perform(Value left, Value right);
public virtual Value Handle(Expression left, Expression right, Hashtable symbols)
var l = left.Interpret(symbols);
var r = right.Interpret(symbols);
return Perform(l, r);
public override Value Evaluate (Expression expressions, Hashtable symbols)
if(expressions.Length != 2)
throw new ArgumentsException(Op, expressions.Length);
return Handle(expressions, expressions, symbols);
Let's discuss now how YAMP is actually generated the available objects. Most of the magic is done in the
RegisterTokens() method of the
var assembly = Assembly.GetExecutingAssembly();
var types = assembly.GetTypes();
var ir = typeof(IRegisterToken).Name;
var fu = typeof(IFunction).Name;
foreach(var type in types)
if(type.GetInterface(ir) != null)
(type.GetConstructor(Type.EmptyTypes).Invoke(null) as IRegisterToken).RegisterToken();
if(type.GetInterface(fu) != null)
AddFunction(type.Name.Replace("Function", string.Empty), (type.GetConstructor(Type.EmptyTypes).Invoke(null) as IFunction).Perform, false);
The code goes over all types that are available in the executing assembly (YAMP). If the type is abstract, we avoid it. Otherwise we look if the type implements the
IRegisterToken interface. If this is the case we will call the constructor, treat it as a the interface and call the
RegisterToken() method. This method usually looks like the following (for operators):
public void RegisterToken ()
Now we have the case that an operator is searched from our
ParseTree. In this case the
FindOperator() method of the
Tokens class will be called.
public Operator FindOperator(string input)
var maxop = string.Empty;
var notfound = true;
foreach(var op in operators.Keys)
if(op.Length <= maxop.Length)
notfound = false;
for(var i = 0; i < op.Length; i++)
if(notfound = (input[i] != op[i]))
if(notfound == false)
maxop = op;
if(maxop.Length == 0)
throw new ParseException(input);
This method will find the maximum operator for the current input. If no operator was found (the length of the operator with the maximum length is still zero) an exception is thrown. Otherwise the found operator will call the
Create() method. This method just returns another instance of the operator.
Parsing a (simple?) expression
In the following paragraphs we are going to figure out how YAMP is parsing a simple expression. The query is given in form of
3 - (2-5 * 3++(2 +- 9)/7)^2.
We actually included some whitespaces, as well as some (obvious) mistakes in the expression. First of all "++" should just be expressed as "+". Second the operation "+-" should be replaced by just "-". First of all the problem is divided into two specific tasks:
- Generate the expression tree, i.e. seperate operators from expressions and mind the operator levels.
- Interpret the expression tree, i.e. evaluate the expressions by using the operators.
The parser then ends up with an expression tree that looks like the following image.
In the second step we call the method to start the interpretation of the elements. Therefore the interpreter works and calls the
Interpret() method on each bracket. Every bracket looks at the number of operators. If no operator is available, then there has to be only one expression. The value of the expression is then returned. Otherwise the
Handle() method of the operator is called, passing the array of
Expression instances. Since the operator is either a specialized
BinaryOperator or a specialized
UnaryOperator, the method will always call the right function with the right number of arguments.
In this example our expression tree consists of five layers. Even though the interpretation starts at the top level (1), it requires the information of the lowest level (5) to be executed. Once 5 has been interpreted, level 4 can be interpreted, then level 3 and so on. The final result of the given expression is -193.
YAMP against other math parsers
We can guess that YAMP is probably slower than most parser written in C/C++. This is just the native advantage (or managed disadvantage) we have to play with. Therefore benchmarks against parser written in C++ may be unfair. Since Marshalling / Interop takes some time as well, parsing small queries and comparing the result might also be unfair (this time probably unfair for the C/C++ parser). Finally a fair test might therefore only consist of C# only parser. Having a look at CodeProject we can find the following parsers:
We set up the following four tests:
- Parse and interpret 100000 randomly generated queries*
- Parse and interpret 10000 times one pre-defined query (long with brackets)
- Parse and interpret 10000 times one pre-defined query (medium size)
- Parse and interpret 10000 times one pre-defined query (small)
* The expression random generator works like this: First we generate a number of (binary) operators, then we alternate between expression and binary operator, always randomly picking one. The expression is just a randomly picked number (integer) and the operator is a randomly chosen one from the list of operators (+, -, *, /, ^). In the end we have a complete expression with n operators and n+1 numbers.
The problem with this benchmark is that only 3 (YAMP, MathParser.NET, and LL Mathematical Parser) parsers have been able to parse all queries (even excluding the ones using the power operator). Therefore this benchmark was just able to give a speed comparison between these three. However, we can use this test to
identify that YAMP is really able to parse a lot of different queries.
If we look at the time that is required per equation in dependency of the number of operators, we see that YAMP scales actually pretty well. MFP and LLMP are significantly better here, however, MFP could not parse much at all and LLMP does not support complex arguments nor does it support matrices.
The biggest problem with the fastest implementations are the constraints given by them. They are hard to extend and less flexible than YAMP. While YAMP does contain a lot of useful features (indices, assignments, overloaded functions, ...), none of the other parsers could come even close to the level of detail.
The longer (and more complex) the benchmark, the better for YAMP. This means that YAMP performs solid in the region of short expressions (where no one will notice much about great performance), while it performs great in the region of long expressions (where most would notice delays and other annoying stuff otherwise).
In the table shown above we see that YAMP can parse all equations and does perform quite well. Considering the early stage of the development process and the set of features, YAMP is certainly a good choice for any problem involving numerical mathematics.
How to use YAMP in your code
If we use YAMP as an external library we need to hook up every chance we want to do. Luckily for us, this is not a hard task. Consider the case of adding a new constant.
This way we can override existing constants. If we remove our custom constant later one, any previously
overridden constant will be available again. This way we can also add or remove custom functions. Here our own function needs to have a special kind of signature, providing one argument (of type
Value) and returning an instance of type
Value. The following example uses a lambda expression to generate a really simple function in one line:
YAMP.Parser.AddCustomFunction("G", v => new YAMP.ScalarValue((v as YAMP.ScalarValue).Value * Math.PI) );
What does it take to actually use YAMP? Let's see some sample first:
var parser = YAMP.Parser.Parse("2+(2,2;1,2)");
var result = parser.Execute();
catch (Exception ex)
So everything we need to do is to call the static
Parse() method in the
Parser class (contained in the
YAMP namespace). The method results in a new
Parser instance that contains the full
ParseTree. If we want to obtain the result of the given expression, we only need to call the
Points of interest
YAMP makes use of reflection to make extending the code as easy as possible. This way adding an operator can be achieved by just adding a class that inherits from the abstract
UnaryOperator class or any other operator that is available.
YAMP is impendent of the set culture (the US numbering style has been set as numbering format explicitly). Strings and others are stored with UTF8 encoding. However, symbols can only start with a letter (A-Za-Z) and can then only contain letters and numbers (A-Za-z0-9).
The project is currently under heavy development, which results in a quite fast update cycle (one to two new versions per week at least). This means that you should
definitely consider taking a look at the GitHub page (and maybe this article), if you are interested in the project.
Generally YAMP will require more interested people working on it (mostly on some numeric functions). If you think you can contribute, then feel free to do so! Everyone is more than welcome to contribute to this project.
- v1.0.0 | Initial release | 19.09.2012.
- v1.0.1 | Added a table with benchmark data | 19.09.2012