
Introduction
There are a number of good articles in CodeProject on this subject. This one, however, presents the same thing in pure JavaScript. Although this expression evaluator does not come close to some of those which have been written in C#, C++, or VB, I am sure it can be useful to those who want to learn the concepts behind expression parsing and subsequent evaluation.
Basic Concepts
Before we go into the details, let's understand a few terms first:
- Infix notation is a common arithmetic and logical formula notation, in which the operators are written in infix-style, i.e., between the operands they act on. The major problem with this notation is that it is not that simple to parse, by a computer, as prefix or postfix notations. In infix notation, unlike the prefix or postfix notations, the parentheses surrounding the groups of operands and operators are necessary to indicate the intended order in which the operations are to be performed. In the absence of parentheses, certain precedence rules determine the order of operations.
- Postfix notation, also known as Reverse Polish notation (RPN), is an arithmetic formula notation, derived from the Polish notation introduced in 1920 by the Polish mathematician Jan Lukasiewicz. RPN was invented by an Australian philosopher and computer scientist Charles Hamblin in the mid-1950s, to enable zero-address memory stores. As a user interface for calculation, the notation was first used in Hewlett-Packard's desktop calculators in the late 1960s, and then in the HP-35 handheld scientific calculator launched in 1972. In RPN, the operands precede the operator, thus dispensing the need for parentheses. Why is this notation more suitable to computers? The implementations of RPN are stack-based; that is, operands are popped from a stack, and calculation results are pushed back into it. Although this concept may seem obscure at first, RPN has the advantage of being extremely easy, and therefore fast, for a computer to analyze. Let's look at the practical implications of RPN:
- Calculations proceed from left to right.
- There are no brackets or parentheses, as they are unnecessary.
- Operands precede operator, they are removed as the operation is evaluated.
- When an operation is done, the result becomes an operand itself (for later operators).
- There is no hidden state, and no need to worry about whether you have hit an operator or not.
Examples
The calculation, ((5 - 2) * 4) / 3, can be written down like this in RPN:
5 2 - 4 * 3 /
The expression is evaluated in the following way (the stack is displayed after the operation has taken place):
Input | Stack | Operation |
5 | 5 | Push operand |
2 | 5, 2 | Push operand |
- | 3 | Subtract top two operands |
4 | 3, 4 | Push operand |
* | 12 | Multiply top two operands |
3 | 12, 3 | Push operand |
/ | 4 | Divide the top two operands |
The final result, 4, lies on the top of the stack at the end of the calculation.
Converting from Infix Notation
Edsger Dijkstra invented an algorithm named "shunting yard", which converts from infix notation to RPN. The shunting yard algorithm is also stack-based.
- While there are tokens to be read:
- Read a token.
- If the token is a number, then add it to the output queue.
- If the token is a function token, then push it onto the stack.
- If the token is a function argument separator (e.g., a comma):
- Until the topmost element of the stack is a left parenthesis, pop the element onto the output queue. If no left parentheses are encountered, then either the separator was misplaced or the parentheses were mismatched.
- If the token is an operator, O1, then:
- while there is an operator O2 at the top of the stack, and either
- O1 is left-associative and its precedence is less than or equal to that of O2, or
- O1 is right-associative and its precedence is less than that of O2,
- pop O2 off the stack, onto the output queue;
- push O1 onto the operator stack.
- If the token is a left parenthesis, then push it onto the stack.
- If the token is a right parenthesis, then pop the operators off the stack, onto the output queue, until the token at the top of the stack is a left parenthesis, at which point it is popped off the stack but not added to the output queue. At this point, if the token at the top of the stack is a function token, pop it too onto the output queue. If the stack runs out without finding a left parenthesis, then there are mismatched parentheses.
- When there are no more tokens to be read, pop all the tokens, if any, off the stack, add each to the output as it is popped out, and exit. (These must be operators; if a left parenthesis is popped, then there are mismatched parentheses.)
Tokenizer
The tokanizer.js file contains the code to break the expression into various tokens. This is very much important as we get the expression in string form. Breaking this string into tokens helps in subsequent parsing and conversion. Once the tokenizer completes, we are left with an array of tokens, which is then fed to the infix2postfix converter. The tokenizer makes use of the switch
clause to distinguish the common operators and the string constants.
Stack
The stack data structure is implemented in the Stack.js file. This data structure is implemented using an array. For the purpose of debugging, the stack also provides a toString
method which basically converts the stack into a comma separated string.
Evaluator
The entire expression conversion and evaluation logic is written in the Evaluator.js file. The root object is called Expression
, which exposes two methods, namely:
ParseExpression
: To convert an infix expression into an equivalent postfix expression. EVal
: Which actually evaluates the postfix expression. This method internally makes a call to ParseExpression
, if the expression is not already parsed.
Like the tokenizer, the evaluator also makes use of the switch
clause to distinguish between the various operators. The most important part of this code is the default case handler in the EVal()
method and the Parse()
function. The evaluator also makes use of an excellent piece of code written by Mr. Matt Kruse for handling the date
data type. The rest of the code in this file simply contains some helper functions. The user defined variables can be added using the AddVar(varName, varValue)
method on the Expression
object. Similarly, the user defined variables can be cleared using the ClearVars()
method. The following table summarizes the various operators and functions supported by the expression evaluator:
Arithmetic | Logical | Comparison | Functions |
+ | ! | = | AVG |
- | & | > | ABS |
* | | | < | ACOS |
/ | | <= | ASC |
% | | >= | ASIN |
^ | | <> | ATAN |
| | | CDATE |
| | | CHR |
| | | COS |
| | | DATE |
| | | FIX |
| | | HEX |
| | | IIF |
| | | LCASE |
| | | LEFT |
| | | LOG |
| | | MAX |
| | | MID |
| | | MIN |
| | | RIGHT |
| | | ROUND |
| | | SIN |
| | | SQRT |
| | | TAN |
| | | UCASE |
License
This code is released under the GPL variant. Please see License.txt for more details.
History
- October 31st, 2005 - Initial release.
- November 04th, 2005 - Added licensing details.
- May 26th, 2006 - Fixed a small bug pointed out by Mr. Raphael Wils.
- June 21st, 2006 - Updated the JavaScript and sample page to demonstrate variable support.
- November 17th, 2008 - Fixed the small bug pointed out by CodeProject member Zephirim. Also added the reference section containing the links to reference documents pointed out by Zephirim.
References