12,551,901 members (53,735 online)
alternative version

63.7K views
22 bookmarked
Posted

# A Very Simple Parser

, 12 Apr 2004
 Rate this:
An article describing how to parse simple mathematical expressions (containing only +, -, *, /, and numbers) and evaluate their values.

## Introduction

One day, when I was 15 years old, I went to a shop near my house and I saw a scientific calculator. When I used the calculator, there was something that took my attention. The calculator was not like the other simple calculators I used to use (at that time). There was a bar in the top of the display of the calculator that you type any expression in (e.g. 5*sin(pi/2)+5) and the calculator evaluates the result. I was amazed how the calculator understood the text, i.e. parsed the expression, and evaluated the result. Now, after being 18, I decided to make a series of programs that do the same thing. This is my first program in this series.

This article explains a simple program that parses simple expressions containing only +, -, *, /, and numbers (e.g. 5+6*3.3) and evaluates the result.

## Let's Do It

I dislike too much talk, so I think the best way to illustrate my idea is to begin with an example. Consider the following expression:

`1*2*3-4*5/6.6+2`

The first thing we should put in mind is that multiplication and division have precedence over addition and subtraction. We should find some way to implement this. The best method is to partition the expression into terms. Thus the result will be to evaluate the following terms:

```1*2*3
4*5/6.6
2```

After evaluating each term alone, we add them putting in mind the sign before each expression. For example, the expression 4*5/6.6 has a minus sign, so we should subtract the value of the term from the final result instead of adding it. For this example, the following sequence will occur:

1. The value of `fResult` (which we will use for the final result of the expression) will be set to zero.
2. As we are using right to left parsing, terms to the right will be evaluated before terms to the left.
3. The term 2 will be evaluated and added to `fResult`. Thus `fResult` will have the value of 2.
4. The term 4*5/6.6 (which has the value of 3.030303) will be evaluated and subtracted (because of the minus sign) from the final result. Thus `fResult` will be containing -1.030303.
5. The term 1*2*3 (which has the value of 6) will be evaluated and added to the final result. Thus `fResult` will be containing 4.969697.

Let's examine the following segment of code:

```bool EvaluateExpression(const char * strExpression, double & fResult)
{
// Evaluates the value of each term.
int nLength = strlen(strExpression);
int nEnd = nLength - 1;
double fTerm;
fResult = 0.0;
// Searches the expression from right to left for a '+'
// or '-', i.e. partition the expression
//  into terms.
for (int i = nLength - 1; i >= 0; i--)
if (strExpression[i] == '+')
{
if (!EvaluateTerm(strExpression + i + 1, nEnd - i, fTerm))
// Invalid term!!!
return false;
fResult += fTerm;
nEnd = i-1;
}
else
if (strExpression[i] == '-')
{
if (!EvaluateTerm(strExpression + i + 1, nEnd - i, fTerm))
// Invalid term!!!
return false;
fResult -= fTerm;
nEnd = i-1;
}

// The first term is often not preceded by a sign!!!
if (strExpression[0] != '+' && strExpression[0] != '-')
{
if (!EvaluateTerm(strExpression, nEnd + 1, fTerm))
// Invalid term!!!
return false;
fResult += fTerm;
nEnd = i-1;
}

return true;
}```

As we see, the function searches the string of the expression from right to left for any '+' or '-'. A term lies either between two signs ('+' or '-') or between one sign and one end of the expression. After finding a term, the function `EvaluateExpression` calls another function, which is `EvaluateTerm` to evaluate the value of the found term. In a minute, we will be seeing how the function `EvaluateTerm` works.

After the function has evaluated the value of the term, it examines its sign and adds the value of the term to the final result.

Now, let's take a look at the function `EvaluateTerm`. This function almost has the same idea as the function `EvaluateExpression`. Just like how we did with expression, the better method to find the value of a term is to partition it into numbers and evaluate the term, putting in mind the sign before each number. For example:

`4*5/6.6`

We should partition this term into:

```4
5
6.6```

and then evaluate the results considering the sign of each number. For example, for the term 4*5/6.6, the following sequence will occur:

1. The value of `fResult` (which we will use for the final result of the term) will be set to one (instead of zero).
2. As we are using right to left parsing, numbers to the right will be evaluated before terms to the left.
3. `fResult` will be divided by 6.6 to result in 0.151515.
4. `fResult` will be multiplied by 5 to result in 0.757576.
5. `fResult` will be multiplied by 4 to result in 3.030303 which is the final result of the term.

Let's examine the code of the function:

```bool EvaluateTerm(const char * strTerm, int nTermLength, double & fResult)
{
// Evaluates the value of each term.
int nEnd = nTermLength - 1;
fResult = 1.0;
double fNumber;
// Searches the term from right to left for
// a '*' or '/', i.e. partition the terms
//  into numbers.
for (int i = nEnd; i >= 0; i--)
if (strTerm[i] == '*')
{
if (!StrToFloat(strTerm + i + 1, nEnd - i, fNumber))
// Invalid number!!!
return false;
fResult *= fNumber;
nEnd = i-1;
}
else
if (strTerm[i] == '/')
{
if (!StrToFloat(strTerm + i + 1, nEnd - i, fNumber))
// Invalid number!!!
return false;
fResult /= fNumber;
nEnd = i-1;
}

// The first term should not be preceded by a sign!!!
if (strTerm[0] != '*' && strTerm[0] != '/')
{
if (!StrToFloat(strTerm, nEnd + 1, fNumber))
// Invalid number!!!
return false;
fResult *= fNumber;
}

return true;
}```

As we see, the function traces the term from right to left looking for either '*' or '/'. As is the case for a term, a number lies either between two signs (now they are '*' and '/' instead of '+' and '-') or between one sign and one end of the term.

Note: The `StrToFloat` function converts a string to the value it represents.

## Final Notes

Before I finish my article, I want to mention the following two points:

1. The user should verify that the expression doesn't contain any invalid characters. This is done in the sample using the function `VerifyExpression`. This function searches the expression string for any character other than '+', '-', '*', '/', '.', and '0'-'9'. If it finds such character(s), it will notify that the expression is invalid. It is left to the reader to see the function in the code because it's not necessary to lengthen the article too much with easy stuff.
2. In this sample, I didn't use exception handling because it is not the subject of the article. But in a real application, it is better to use exception handling instead of just returning 'false' which will not tell us more than "your expression couldn't be evaluated"!

## Questions

1. I was about to enter 5*6 in the program, but I decided, as a test to the `VerifyExpression` function which doesn't accept white spaces as valid characters, to enter the expression 5 *6 and 5* 6. I expected that the program will display "Invalid expression!", but for my surprise, the program didn't and, instead, it displayed 5 for the first test and 0 for the second test. Why?
2. The following expressions are invalid expressions but the program doesn't violate them and it may produce unexpected results:
```5**6
5//6
1+*2
5*+6```

Run the program and test them. Can you find out the reason behind the answer? Also, can you solve these problems?

3. In my program, I used right to left parsing. Why do you think I did? Try to make a similar program that uses left to right parsing. Which is easier?

Remark: If you take a look at the `VerifyExpression` function, you will notice that the function just searches for invalid characters. For example, the expression 5**6 will pass the test of `VerifyExpression` function. How can you discover such syntax errors?

Well, if you just solve the problems mentioned in question 2, you will find that such syntax errors will automatically be discovered, how come that?

A list of licenses authors might use can be found here

## Share

 Software Developer (Senior) FactSet Europe Limited United Kingdom
No Biography provided

## You may also be interested in...

 Pro Pro

 First Prev Next
 sample in csharp pba6617-Oct-07 3:27 pba66 17-Oct-07 3:27
 possible to convert to vb2005 express ? scalpa9823-Apr-07 22:12 scalpa98 23-Apr-07 22:12
 Re: possible to convert to vb2005 express ? ThePromather24-Apr-07 8:49 ThePromather 24-Apr-07 8:49
 Re: possible to convert to vb2005 express ? scalpa9825-Apr-07 2:03 scalpa98 25-Apr-07 2:03
 reverse polish notation Tweety14-Apr-04 8:17 Tweety 14-Apr-04 8:17
 Nice Attempt -- Some suggestions... KevinHall13-Apr-04 9:52 KevinHall 13-Apr-04 9:52
 About parsing expressions Anonymous13-Apr-04 9:25 Anonymous 13-Apr-04 9:25
 Interesting ideas but, Amer Gerzic13-Apr-04 3:52 Amer Gerzic 13-Apr-04 3:52
 Re: Interesting ideas but, Sérgio Alexandre Gianezini Júnior13-Apr-04 7:17 Sérgio Alexandre Gianezini Júnior 13-Apr-04 7:17
 Re: Interesting ideas but, Amer Gerzic13-Apr-04 7:33 Amer Gerzic 13-Apr-04 7:33
 Re: Interesting ideas but, Clevedon_Peanut21-Apr-04 13:22 Clevedon_Peanut 21-Apr-04 13:22
 Re: Interesting ideas but, Amer Gerzic22-Apr-04 2:32 Amer Gerzic 22-Apr-04 2:32
 Last Visit: 31-Dec-99 18:00     Last Update: 24-Oct-16 1:24 Refresh 1