![]() |
General Programming »
Algorithms & Recipes »
Parsers
Intermediate
License: The Code Project Open License (CPOL)
The expression evaluator revisited (Eval function in 100% managed .NET)By Pascal GanayeThis second article about evaluation in .NET introduces a parser which pre-compiles the expressions |
Windows, .NET 1.1VS.NET2003, Dev
|
|
Advanced Search Add to IE Search |
|
|
|
||||||||||||||||
Most applications need to evaluate a formula at run-time. The .NET Framework, despite allowing advanced compilation support, does offer a quick and light-weight eval function. This article introduces a usable eval function with some rarely available functionalities:
This article also attempts to explain how the whole thing works.
People often tell me that there is no place in their application for an evaluator because it is too complicated for their users. I do not agree with this vision. An evaluator is a cheap way to HIDE the complexity for the average user and provide powerful features for the advanced user. Lets take an example.
In your application, you let the users choose the title of the window. This is convenient and simple; it's just a textbox where they can type what they want. The difficulty comes when some users want more. Let's say that they want to see their User IDs or the time. Then you have 3 alternatives:
The first option requires lots of work on your side and can potentially confuse the more basic users. The second option won't confuse your users, but might lose you the more advanced ones. The third option is ideal because you can keep your textbox and let the powerful user type what they want.
Title of the window for user : %[USERID], the time is %[NOW]
And you're done. The interface is still using a regular textbox and is not complicated. On the coding side, it is really not much to add. In terms of power, you can add a new variable every day and as long as you document it all, your users stay satisfied.
Using the .NET Framework compilation capabilities seem to be the most obvious way to make an evaluator. However, in practice this technique has a nasty side effect. It looks like it creates a new DLL in memory each time you evaluate your function and it seems nearly impossible to unload the DLL. You can refer to remarks at the end of the article Evaluating Mathematical Expressions by Compiling C# Code at Runtime for more details.
Using other engines or application domains is an option if you want a full VBScript or C# syntax. If you need to write classes and loops, this is probably the way to go. This evaluator is neither using CodeDOM nor trying to compile VB source. It parses an expression character-by-character and evaluates its value without using any third party DLL.
The evaluator can be run with just two lines of code:
Dim ev As New Eval3.Evaluator
MsgBox(ev.Parse("1+2+3").value)
Eval3.Evaluator ev = new Eval3.Evaluator(
Eval3.eParserSyntax.c,/*caseSensitive*/ false);
MessageBox.Show(ev.Parse("1+2+3").value.ToString());
By default, the evaluator does not define any function or variable anymore. This way, you can really decide which function you want your evaluator to understand. To extend the evaluator, you need to create a class. Below is a VB Sample; a C# version is available in the Zip file.
Public Class class1
Public field1 As Double = 2.3
Public Function method2() As Double
Return 3.4
End Function
Public ReadOnly Property prop3() As Integer
Get
Return 4.5
End Get
End Property
End Class
Note that only public members will be visible.
Dim ev As New Eval3.Evaluator
ev.AddEnvironmentFunctions(New class1)
MsgBox(ev.Parse("field1*method2*prop3").value.ToString)
You can also use a more dynamic version. I don't really like this method, but it can be useful. Note that the value of an extention can change once parsed, but the type should not.
Public Class class2
Implements iVariableBag
Public Function GetVariable(
ByVal varname As String) As Eval3.iEvalTypedValue _
Implements Eval3.iVariableBag.GetVariable
Select Case varname
Case "dyn1"
Return New Eval3.EvalVariable("dyn1", 1.1, _
"Not used yet", GetType(Double))
End Select
End Function
End Class
ev.AddEnvironmentFunctions(New class2)
Dim code As opCode = ev.Parse("dyn1*field1")
MsgBox(code.value & " " & code.value)
The evaluator can work with any object, but it will allow common operators (+ - * / and or) only on usual types. Internally, I use these types:
enum evalType number will convert from integer, double, single, byte, int16... boolean string date equivalent to datetime object anything else
There is a shared function in the evaluator to return all those types as string:
Evaluator.ConvertToString(res)This function will return every type using a default format.
If you just want to use the library, please refer to the 'Using the code' section. The following sections are just for curious people who want to know it works. The techniques I used are rather traditional and can, I hope, be a good introduction to the compilation theory.
The evaluator is made of a classic Tokenizer followed by a classic Parser. I wrote both of them in VB, without using any Lex or Bisons tools. The aim was readability over speed. Tokenizing, parsing and execution are all done in one pass. This is elegant and, at the same time, quite efficient because the evaluator never looks ahead or backwards more than one character.
The first thing the evaluator needs to do is split up the string you provide into a set of Tokens. This operation is called tokenization and in my library it is done by a class called tokenizer
The tokenizer reads the characters one by one and changes its state according to the characters it encounters. When it recognizes one of the Token types, it returns it to the parser. If it does not recognize a character, it will raise a syntax error exception. Once the class is created with this command,
tokenizer = new Tokenizer("1+2*3+V1")
...the evaluator will just access tokenizer.type to read the type of the first token of the string. The type returned is one of those listed in the chart below. Note that the tokenizer is not reading the entire string. To improve performance, it will only read a single token at a time and return its type. To access the next token, the evaluator will call the method tokenizer.nextToken(). When the tokenizer reaches the end of the string, it returns a special token end_of_formula.
enum eTokenType operator_plus + operator_minus - operator_mul * operator_div / operator_percent % open_parenthesis ( comma , dot . close_parenthesis ) operator_ne <> operator_gt <= operator_ge >= operator_eq = operator_le <= operator_lt < operator_and AND operator_or OR operator_not NOT operator_concat & any word starting with a letter or _ value_identifier value_true TRUE value_false FALSE any number starting 0-9 or . value_number any string starting ' or " value_string open_bracket [ close_bracket ] Initial state none State once the last character is reached end_of_formula |
The parser has been completely rewritten in this version. The parser is using the information provided by the tokenizer (the big brown box) to build a set of objects out of it (the stack on the right). In my library, each of these objects is called an OpCode. Each OpCode returns a value and can have parameters or not.
Opcode 1
Opcode 2
Opcode 3
Opcode *
Two Opcode +
and Opcode +
The OpCodes + and * have two parameters. The rest of the OpCodes have none. One of the more complicated concepts of the parser is that of priorities. In our expression...
1 + 2 * 3 + v1
...the evaluator has to understand that what we really mean is:
1 + (2 * 3) + v1
In other words, we need to do the multiplication first. So, how can this be done in one pass? At any time, the parser knows its level of priority:
enum ePriority
none = 0
concat = 1
or = 2
and = 3
not = 4
equality = 5
plusminus = 6
muldiv = 7
percent = 8
unaryminus = 9
When the parser encounters an operator, it will recursively call the parser to get the right part. When the parser returns the right part, the operator can apply its operation (for example, +) and the parsing continues. The interesting part is that while calculating the right part, the Tokenizer already knows its current level of priority. Therefore, while parsing the right part, if it detects an operator with more priority it will continue its parsing and return only the resulting value.
The last part of the evaluation process, is the interpretation. This part is now running a lot faster thanks to the OpCode.
To get the result out of the stack of OpCodes, you just need to call the root OpCode value. In our sample, the root OpCode is a + operator. The property Value will in turn call the value of each of the operands and the result will be added and returned. As you can see from this picture, the speed of evaluation is now quite acceptable. The program below needs 3 full expression evaluations for every single pixel in the image. For this image, it required 196,608 evaluations and, despite that, it returned in less than a second.
The class at the core of this new project is the OpCode class. The key property in the opCode class is the property 'value'.
Public MustInherit Class opCode
Public Overridable ReadOnly Property value(
) As Object Implements iEvalValue.value
MustOverride ReadOnly Property ReturnType(
) As evalType Implements iEvalValue.evalType
...
End Class
Each OpCode returns its value through it. For the operator +, the value is calculated this way:
Return DirectCast(mParam1.value, Double) + DirectCast(mParam2.value, Double)
It is faster if you need to evaluate the functions more than once. If you need to evaluate the function only once, you might not care about speed anyway. So, I would recommend this new version in either case. As you can see from the picture above, 3 formulas are evaluated for every pixel of the image. The image being 256x256 pixels, the evaluator had to calculate 196,608 expressions. So, simple expressions are returned in less than 5 microseconds. I think this is acceptable for most applications.
Dynamic variables are an interesting concept. The idea is that if you use several formulas in your application, you don't want to recalculate all the formulas when a variable changes. The evaluator as a built-in ability to do that. On this page, the program uses the dynamic ability:
To use this ability once you have parsed your expression:
mFormula3 = ev.Parse(tbExpression3.Text)
You only have to wait for the event mFormula3.ValueChanged
Private Sub mFormula3_ValueChanged( _
ByVal Sender As Object, _
ByVal e As System.EventArgs) _
Handles mFormula3.ValueChanged
Dim v As String = Evaluator.ConvertToString(mFormula3.value)
lblResults3.Text = v
LogBox3.AppendText(Now.ToLongTimeString() & ": " & v & vbCrLf)
End Sub
Yes, the evaluator supports the . operator. If you enter the expression theForm.text then the evaluator will return the title of the form. If you enter the expression theForm.left, it will return its runtime left position. This feature is only experimental and has not been tested yet. That is why I have put this code here, hoping that others will find its features valuable and submit their improvements.
In fact, the object came free. I used System.Reflection to evaluate the custom functions. The same code is used to access the object's methods and properties. When the parser encounters an identifier that is a keyword without any meaning to it, it will try to reflect the CurrentObject to see if it can find a method or a property with the same name.
mi = CurrentObject.GetType().GetMethod(func, _
_Reflection.BindingFlags.IgnoreCase _
Or Reflection.BindingFlags.Public _ Or Reflection.BindingFlags.Instance)
If a method or a property is found, it will feed its parameters.
valueleft = mi.Invoke(CurrentObject, _ _ System.Reflection.BindingFlags.Default, Nothing, _ DirectCast(parameters.ToArray(GetType(Object)), Object()), Nothing)
The following are requests/bugs from the original project:
Someone reported that you need the option 'Compare Text' for the evaluator to work properly. I think this is fixed now. If you want the evaluator to be case-sensitive you can ask for it in the evaluator constructor.
Someone also reported that the evaluator did not like having a comma as a decimal point in the windows international settings. This is fixed, too, I believe.
My request: If you find this library useful or interesting, don't forget to vote for me. :-)
Speed Tests: I wish I could have the time to compare various eval methods. If someone wants to help, please contact me. To my knowledge, this is the only formula evaluator available on CodeProject with a separate Tokenizer, Parser and Interpretor. The extensibility is extremely easy due to internal use of System.Reflection.
General
News
Question
Answer
Joke
Rant
Admin
|
PermaLink |
Privacy |
Terms of Use
Last Updated: 18 May 2007 Editor: Genevieve Sovereign |
Copyright 2006 by Pascal Ganaye Everything else Copyright © CodeProject, 1999-2009 Web15 | Advertise on the Code Project |