Building a General Purpose Interpreter, Math Engine and Parser in C#

Kashif_Imran

4.97/5 (69 votes)

Dec 17, 2012

GPL3

13 min read

86244

8045

An easy to understand and use language interpreter, run time parser and calculation engine from scratch in C#.

October 8, 2014: Math Processor is now Open Source.

Introduction
Using the Sample Applications
Introduction to Math Processor
Using the Math Processor Library
Understanding the Core Engine

1. Introduction

First, the best thing about this article is that we will try to make everything extremely simple. You are NOT required to be a master of, or even familiar with, the details about parsers, compilers etc.

Before we get into details, let's have a look at a few examples of the commands that the parser/engine we are going to create can execute.

Commands	Result	Comment
>> primes(1000000)	2 3 5 ...	The first million prime numbers
>> gcd(24,12, 124, 136)	4	GCD of the given numbers
>> a = array(50, 10, 50) >> b = array(60, 40, 50) >> c = array(80, 60, 10) >> m = matrix(a, b, c) >> d = det(m)	-76000	Determinant of a matrix
{ t = vectorin(-16, 0.05, 16);: x = sin(8t/5) cos(t);: y = sin(8t/5) sin(t);: plot(x, y);: }		A polar curve.

You can also create more interesting/advanced graphs like the following Spirograph patterns created by one of MP users (To learn more about Spirograph math and how these patterns were created, have a look at these Spirograph math tutorials/examples):

2. Using the Sample Applications

Update August 28, 2013: This section was written for the old versions. Since a new software with complete documentation became available (top downloads), I deleted the contents of this section so as to make the article easier/smaller for new readers. The old downloads are still there at these links to preserve the history:

3. Introduction to Math Processor

I hope that by this time you have played with the Math Processor application a bit. If not, I recommend that you do so now and try to get familiar with its working.

The Math Processor engine provides a lot more functions and commands than discussed in this article. If you are interested in knowing more about the facilities provided by Math Processor, you can have a look at the Math Processor Documentation.

3.1. The Road Ahead

We will be building an interpreter for math expressions as well as a simple little language MPCL (but still more powerful than you probably think). More on MPCL and what we will be implementing in next section. Before we get into that, here is what we will NOT be doing:

Support symbolic math
Perform arbitrary precision calculations
Implement things the hard way

3.2. Math Processor Command Language (MPCL)

MPCL is a simple loosely typed language the Math Processor parser/calculating engine can interpret at run time to carry out the supported operations. MPCL supports:

Basic math expressions containing the common operations like +, -, *, /, % etc.
Boolean/logical operations
Operations on numbers (including lists of numbers)
Operations on Matrices
Conditional Processing using the if and if-else constructs
Loops (repeat and while)
Certain directives to control a few aspects of the processing environment
Inbuilt functions to expose the major facilities of the System.Math class of the .Net framework as well as a lot of other operations. For the complete list of supported functions, visit the Math Processor Functions page.
User defined functions
An extremely simple mechanism to allow programmers to write more useful functions and plug them into the main engine at run time.

4. Using the Math Processor Library

Math Processor provides a unified application storage and communication mechanism within the internal modules as well as for the programming interface through the use of class Token. A Token is capable of storing text, numbers, matrices, Boolean data and a lot more things that MP uses internally. What is contained within a Token depends upon its type, which is determined by a public property TokenType of type TokenType. This property is just an enum defined as under:

public enum TokenType 
{ 
    Void, Error, Operator, Bool, Vector, Matrix, Text, Function, UserFunction, Directive, 
    Block, LoopOrCondition, FunctionDefiner, Break 
}

To the user of Math Processor, not all members of this enum are of interest. A few are for internal use only. We will learn about the important ones later in this text. The different fields defined in class Token are declared as under:

TokenType type;
string name = "";
string data = "";
List<double> vector = new List<double>();
double extra;

And that's all that we need to support all the kinds of data we want to process. From numbers to Boolean values, from lists to matrices and all that. Amazing! Isn't it? The trick is to have different interpretations of the fields according to the type of the token. So for a Token of type Void, most fields mean nothing. It is just what it is called: Void. For a Boolean type, a 0 in the list would mean false and a 1 would mean true. For a matrix, we will make use of 'extra' to store the number of rows. We will store the actual matrix values in the List<double> field. We will compute the rows and columns for the matrix at run time.

Please note that by stuffing everything inside a Token, we have traded some efficiency (in terms of speed and memory) for simplicity. I believe the trade-off here is worth the cost, as we have avoided the need to create complex data structures. You can devise a more complex model if you wished. For me, life is much simpler without parse tables, expression trees and a thousand look up fields!

Now that we have understood the role as well as form of the class Token it is time to discover how to make use of the Math Processor library. There are two possibilities:

Directly send the raw expressions to the parsing engine, obtain the result and do whatever is needed with it.
Create instances of class Token in code and then use different public methods of the class library to manipulate them.

Both the approaches are quite simple. However, the first one is the preferred way to do things in most situations. In the sample applications we have primarily used the first approach. However, in the Console based sample we have also shown how to use the second method.

The second method is needed only when you want to extend the functionality of the application by providing more functions or when you want to directly work with the already defined functions inside the main class library. We shall see a use of this approach a bit later.

The use of the first method is just as easy as calling the following static method of MathProcessor.Calculator class:

public static Token ProcessCommand (string expression) { ... }

The parameter expression is any text that is a valid MPCL command. For a broad list of what constitutes a valid MPCL command please refer to Section 3.2. The result returned from this method would be a Token of any of the following types:

TokenType.Void The operation completed successfully but nothing remarkable was returned to the caller. Note that the operation itself may have produced variables or functions etc. depending on the nature of the operation performed.
TokenType.Error The operation resulted in an error.
TokenType.Text The resulting Token is just plain text.
TokenType.Vector The returned Token contains one or more numeric values.
TokenType.Matrix The returned Token is a matrix.
TokenType.Bool The returned Token contains Boolean data (true/false).

Another notable fact is that some operations produce intermediate results before the whole block of code is parsed and executed. For example, the following block of MPCL code produces output multiple times by use of the repeat loop (paste it in the demo application to see what it does!):

{
    format(G);:
    a = 1;:
    repeat(10)
    {
        echo("4 x ", a, "= ", 4*a):
        a = a + 1;:
    }:
}

In order to inform the front-end about the intermediate output, the Calculator class raises events using the following delegate:

public delegate void IntermediateResult(Token result);

Now that we have understood the communication mechanism, let's build a very simple front end for the MP engine. Perform the following steps:

Create a new Console application in Visual Studio.
Add a reference to "MathProcessor.dll" (available with downloads of this article)

Modify your Program class as under:

class Program
{
    static void Main(string[] args)
    {
        Calculator.IntermediateResultProduced += new IntermediateResult(IntermediateResultProduced);
        string input = "";
        Console.ForegroundColor = ConsoleColor.White;
        Console.WriteLine("Enter MP expressions to process. Enter 'exit' to quit:");
        Console.ForegroundColor = ConsoleColor.Gray;
        while (true)
        {
            input = Console.ReadLine();
            if (input.Trim().ToLower() == "exit")
            {
                break;
            } 
            Token result = Calculator.ProcessCommand(input);
            DisplayResult(result);
        }
    }

    static void IntermediateResultProduced(Token result)
    {
        DisplayResult(result);
    }

    static private void DisplayResult(Token result)
    {
        if (result.TokenType == TokenType.Error)
        {
            Console.ForegroundColor = ConsoleColor.Red;
            Console.WriteLine(">> " + result.GetString());
            Console.ForegroundColor = ConsoleColor.Gray;
        }
        else if (result.TokenType == TokenType.Matrix)
        {
            int rows = (int)result.Extra;
            if (rows > 0)
            {
                List<double> data = result.Vector.ToList();
                Token temp;
                temp = new Token(TokenType.Vector, data.GetRange(0, result.Count / rows).ToArray());
                Console.WriteLine(">> " + temp.GetString());
                for (int i = result.Count / rows; i < result.Count; i += result.Count / rows)
                {
                    temp = new Token(TokenType.Vector, data.GetRange(i, result.Count / rows).ToArray());
                    Console.WriteLine("   " + temp.GetString());
                }
            }
        }
        else if (result.TokenType != TokenType.Void)
        {
            Console.WriteLine(">> " + result.GetString());
        }
    }
}

That's it!

After writing this little program, almost all the MP functions are now available to us on the command line. However the function plot will not be available, as it creates a Windows Form and calls its Show() method, which requires a message loop not available to us in this example. The sample application uses a workaround. Please have a look at the downloadable code to see how the workaround is implemented.

4.1. Extending the Core by Adding More Command Line Functions

Another problem we have is the lack of our ability to paste multi-line code blocks in the Console application. Such an operation is not directly available. Let's write a new MP command line function to allow us to 'paste' text content directly from the Clipboard (the sample application contains this functionality). Add a reference to Windows.Forms assembly to the project and put the following method in your Program class:

public static Token Paste(string operation, List<Token> arguments)
{
    try
    {
        string text = System.Windows.Forms.Clipboard.GetText();
        return Calculator.ProcessCommand(text);
    }
    catch (Exception)
    {
        return new Token(TokenType.Error, "Could not paste text from Clipboard.");
    }
}

Now modify your Main() method as under:

[STAThread]
static void Main(string[] args)
{
    Function.AddDirective("paste", Paste);
    /* Everthing as before */
}

Please note the use of [STAThread] Attribute with the Main() method. This is needed for working with the Clipboard. After making the above changes, you will have a new directive called "paste" available to you. You can also make it a function, if you so desired, like this:

Function.AddFunction("paste", Paste);

In MPCL, the difference between a directive and a function is that a function is always followed by a pair of parentheses containing zero or more parameters. Also the parameters to functions are processed before the function is called. For a directive, the parameters are not per-processed. For more detail about directives, please refer to Math Processor Directives page.

Most of the times you will be working with functions. The directives are just a few in number and mostly deal with the configuration of the engine and are not meant for real processing.

5. Understanding the Core Engine

Let's begin with a simple diagram showing the workflow of Math Processor.

The expression from the client is first converted into tokens by the parser/tokenizer. The list of tokens is then converted from Infix to Postfix (Reverse Polish) notation. After that, the tokens are processed by the calculation engine and the result is sent back to the client.

The conversion from Infix to Postfix form is done to facilitate the evaluation phase. Due to the way expressions are represented in Postfix notation, changing the priority of operations does not require use of parentheses, which simplifies the evaluation process. You can see the conversion from Infix to Postfix notation using the postfix directive of Math Processor. Here are a couple of examples:

>> postfix 2*9+3
>> 2 9 * 3 +
>> postfix 2*(9+3)
>> 2 9 3 + *

In the above examples, we can see there is not need to use parentheses in the postfix form, as the priority of operations is inbuilt in the very expressions. In the first example the multiplication takes precedence according to general rules of mathematics. However, in the second example, the Infix form changes the priority by use of parentheses. This changed priority is reflected in the Postfix form through re-ordering of the tokens, which enables us to get rid of the parentheses.

So far so good. But how does the process start and continue and end? Actually it is done in different stages. First, we need to parse the ordinary expression to create the tokens. Then we need to arrange these tokens in a suitable way for processing. After that comes the actual calculation. Finally the result is returned back to the user.

The whole process takes place inside 7 core classes consisting of just a few hundred lines of code. The details of these classes are as under:

5.1. Token

This is the ubiquitous data holder of Math Processor. We already discussed Token in detail earlier in this text in Part-4.

5.2. Tokenizer

This class is responsible for parsing the input and converting it into Token objects. The entire process takes place inside a static function:

public static Token TokenizeString(String expression, out List<token> tokens)

The first parameter 'expression' is the expression to be tokenized. The expression is any valid MPCL command (consisting of math expressions, functions, directives and so on). The second parameter out List<Token> tokens would contain the result of what tokens were found in the input expression. Lastly, the return value will tell about the result of the parsing and can be either of type TokenType.Void in case of success or TokenType.Error in case of erroneous input expression.

5.3. Variables

This class is responsible for storing reserved words and named constants and variables. It defines a small inner class NamedToken to store the named Token objects. A variable can be created using the assignment operator. Some MP functions can also create variables. The name of a variable must start with a letter or underscore and can consist of letters and numbers. Here is how a variable can be created and used:

>> a = 10
>> 10
>> b = a + 100
>> 110

5.4. Function

This class stores all the functions that can be used using MPCL function calls. Every MP command-line function should be of the form defined by the following delegate:

public delegate Token FunctionDelegate(string operation, List<token> arguments);

Let's see the whole process in action. MP provides a function named contains, which is used to check if a number is present in a matrix or array of numbers. This is how this function is implemented:

public static Token Contains(string operation, List<Token> arguments)
{
    if (arguments.Count != 2)
        return Token.Error("Exactly two arguments expected");

    if ((arguments[0].TokenType != TokenType.Matrix && arguments[0].TokenType != TokenType.Vector) ||
        (arguments[1].TokenType != TokenType.Matrix && arguments[1].TokenType != TokenType.Vector) ||
         arguments[1].Count != 1)
        return Token.Error("First argument should be an array or matrix and second a single numeric value");

    if (arguments[0].Vector.Contains(arguments[1].FirstValue))
        return new Token(TokenType.Bool, 1, 1);
    else 
        return new Token(TokenType.Bool, 1, 0);
}

Let's now see how we can use this function:

>> a = array(1,2,3,4);
>> contains(a, 1)
>> true
>> contains(a, 5)
>> false
>> if (contains(a, 2)) { echo("Number found in the array."): }
>> Number found in the array.

5.5. FunctionDefiner

This class is responsible for creating, managing and executing User Defined Functions, which are a very important means to allow the users to create small amounts of code to extend the functionality of the engine at run time. User defined functions are not really efficient due to their interpreted nature. However, they are still useful for doing small chunks of work you need to perform repeatedly.

We already saw a very interesting example of a User Defined Function earlier in this text when we created some very interesting curves (Spirograph curves). To get more details and see other examples of User Defined functions, you can refer to the User Defined Functions page of the online MP documentation.

5.6. BlockCommands

This class is responsible for managing and executing the if/if-else, while and repeat constructs. The reason that these constructs are put together is their common syntactical form:

key_word(expression) { one or more statements }

For an explanation of each of these see the Conditional Statements and Loops pages.

5.7. Calculator

We already saw the Calculator.ProcessCommand() method in action, which was used to send all the expressions to the engine for processing. Besides providing that facility, this class also contains all the code to carry out basic arithmetic calculations involving the supported operators like +, -, * etc. It also handles the conversion from Infix to Postfix notation, determines the priority of operators and applies these operators to operands to obtain the results.

Conclusion

Finally, we have reached the end. I hope you found the article, the main engine and the sample applications helpful. I also hope I was able to deliver my promise of creating a simple parsing library and calculation engine for you to use in your own projects.