Click here to Skip to main content
Click here to Skip to main content

Sprache.Calc: building yet another expression evaluator

, 10 Jul 2014
Rate this:
Please Sign up or sign in to vote.
This paper demonstrates a technique of building Sprache parsers using grammar inheritance.

Introduction

This paper demonstrates a technique of building parsers using grammar inheritance. Inheritance allows building new grammars based on existing grammars by adding new rules or overriding inherited rules, speeding up the development of new parsers. Any changes made to the base grammar become visible in inherited grammars automatically. This technique is especially helpful when a parser should support a number of language dialects or versions.

Grammar inheritance is supported in a number of parser toolkits (such as ANTLR or Nitra) and is inherently available in many tools that use object-oriented languages as DSL languages for describing grammars (such as Sprache and Irony).

Extensible expression evaluator library serves as an example code for this article. The evaluator supports arithmetic operations, custom functions and parameters. It takes string representation of an expression and converts it to a structured LINQ expression instance which can easily be compiled to an executable delegate. In contrast with interpreted expression evaluators such as NCalc, compiled expressions perform just as fast as native C# methods. Here is an example usage of a ready evaluator library:

// using variables
var s = "Sin(y/x)";
var expr = calc.ParseExpression(s, x => 2, y => 3.14);
var func = expr.Compile();
Console.WriteLine("Result = {0}", func());

// custom functions
calc.RegisterFunction("Mul", (a, b, c) => a * b * c);
expr = calc.ParseExpression("2 ^ Mul(PI, , b)", a => 2, b => 10);
Console.WriteLine("Result = {0}", func.Compile()());
Sprache.Calc Logo

Sprache.Calc library isn't just an example, it's a ready to use expression evaluator. To use it in your own projects, install Sprache.Calc NuGet package by typing the following command in the Package Manager Console:

PM> Install-Package Sprache.Calc

Sprache toolkit overview

Sprache toolkit is a lightweight functional library for constructing parsers directly in C# code. The authors claim that it fits somewhere in between regular expressions and full-featured toolsets like ANTLR and doesn't mean to compete with "industrial-strength" language workbenches.

I'd say that Sprache is an excellent tool, well suited for a wide range of parsing-related tasks. For me, the most attractive feature of Sprache is its ability to use LINQ query syntax to define parsers and strong-typed parsing results. Sprache stimulates step-by-step grammar design and encourages test-driven development of parsers. Of course, parser combinators aren't free from certain drawbacks (complicated diagnostic and poor error recovery capabilities), but these details are beyoud the scope of this paper.

Parsers in Sprache are functions transforming input string into something else. Unlike traditional parser frameworks, Sprache doesn't use code generation. Parsers are defined directly in the source code, and can be used to parse text right away. It allows writing unit tests in parallel with parsers themselves. Here is an example of a simple parser that consumes a sequence of A characters:

var parseA = Parse.Char('A').AtLeastOnce();

Simple parsers are combined into complex ones using Sprache's built-in extension-methods (such as Or, And, Many, etc), often callable via LINQ query comprehensions:

Parser<string> identifier =
    from leading in Parse.WhiteSpace.Many()
    from first in Parse.Letter.Once()
    from rest in Parse.LetterOrDigit.Many()
    from trailing in Parse.WhiteSpace.Many()
    select new string(first.Concat(rest).ToArray());

var id = identifier.Parse(" abc123  ");
Assert.AreEqual("abc123", id);

The complete rule set, or language grammar, typically looks like a static class with delegate fields. To learn more about Sprache toolkit, have a look at this introductory article by Nicholas Blumhardt.

Expression evaluator design

Expression evaluator supports three modes: simple, scientific and extensible. Simple evaluator supports arithmetic operations and decimal floating-point operands, unary minus and parentheses. Scientific mode adds support for hexadecimal and binary numbers, exponential notation and System.Math functions. Extensible mode allows using parameterized expressions and custom functions with overloading support.

Every next mode supports all features of its predecessors and adds its own features, so the grammars describing the input language of expression evaluators are organized as hierarchy.

Expression parser is a function transforming a source string into a LINQ expression which can be easily compiled to a delegate and executed just as normal .NET method:

var expr = "4*(1/1-1/3+1/5-1/7+1/9-1/11+1/13-1/15+1/17-1/19+10/401)";
var func = calc.ParseExpression(expr).Compile();
var result = func();

Simple evaluator

Simple evaluator is based on Sprache LinqyCalculator sample. The grammar is optimized in a way to simplify the instantiation of LINQ expressions during parsing:

Expr ::= Term ("+"|"-" Term)*
Term ::= InnerTerm ("*"|"/"|"%" InnerTerm)*
InnerTerm ::= Operand ("^" Operand)
Operand ::= NegativeFactor | Factor
NegativeFactor ::= "-" Factor
Factor ::= "(" Expr ")" | Constant
Constant ::= Decimal

Sprache parsers are normally defined as static lambda-functions. Static fields can't be redefined in derived classes, so we'll define them as virtual properties instead:

// Original parser:
public static readonly Parser<Expression> ExpressionInParentheses =
	from lparen in Parse.Char('(')
	from expr in Expr
	from rparen in Parse.Char(')')
	select expr;

// Modified parser:
protected virtual Parser<Expression> ExpressionInParentheses
{
	get
	{
		return
			from lparen in Parse.Char('(')
			from expr in Expr
			from rparen in Parse.Char(')')
			select expr;
	}
}

After this fairly trivial transformation, any rule can be overriden in derived classes. To enable unit testing for parsers, declare properties as protected internal.

The full grammar of the simple calculator is available here. It's effectively the same as Sprache's LinqyCalculator sample.

Scientific mode

Scientific calculator inherits from simple calculator, so it supports all features of simple calculator automatically. For binary and hexadecimal numbers support it introduces new grammar rules:

protected virtual Parser<string> Binary
{
	get
	{
		return Parse.IgnoreCase("0b").Then(x =>
			Parse.Chars("01").AtLeastOnce().Text()).Token();
	}
}

protected virtual Parser<string> Hexadecimal
{
	get
	{
		return Parse.IgnoreCase("0x").Then(x =>
			Parse.Chars("0123456789ABCDEFabcdef").AtLeastOnce().Text()).Token();
	}
}

Adding new rules is not enough because base grammar has no idea when to apply them. Binary and hexadecimal numbers are constants, so we'll add them to the Constant rule of base grammar. Note that Binary and Hexadecimal parsers return string while Constant parser returns LINQ expression, so we'll need helper methods to convert strings into LINQ expressions. Finally, Constant parser with decimal, binary and hexadecimal numbers support will look like this:

protected override Parser<Expression> Constant
{
	get
	{
		return
			Hexadecimal.Select(x => Expression.Constant((double)ConvertHexadecimal(x)))
			.Or(Binary.Select(b => Expression.Constant((double)ConvertBinary(b))))
			.Or(base.Constant);
	}
}

Now let's add support for functions. We'll need two more rules:

protected virtual Parser<string> Identifier
{
	get { return Parse.Letter.AtLeastOnce().Text().Token(); }
}

protected virtual Parser<Expression> FunctionCall
{
	get
	{
		return
			from name in Identifier
			from lparen in Parse.Char('(')
			from expr in Expr.DelimitedBy(Parse.Char(',').Token())
			from rparen in Parse.Char(')')
			select CallFunction(name, expr.ToArray());
	}
}

CallFunction helper method simply creates a LINQ expression for calling System.Math static methods:

protected virtual Expression CallFunction(string name, Expression[] parameters)
{
	var methodInfo = typeof(Math).GetMethod(name, parameters.Select(e => e.Type).ToArray());
	if (methodInfo == null)
	{
		throw new ParseException(string.Format("Function '{0}({1})' does not exist.",
			name, string.Join(",", parameters.Select(e => e.Type.Name))));
	}

	return Expression.Call(methodInfo, parameters);
}

Again, we need to override a base grammar rule to suggest when to apply the new rules. Now, finding a rule to override is not as obvious as it was with binary and hexadecimal constants. The proper rule should be chosen based on function call operation priority. It can easily be seen that it should have the highest priority, same as parentheses. For example, when evaluating Sin(2) * Cos(3) we have to calculate function values and then perform multiplication.

In base grammar parentheses are featured in the Factor rule, so we'll override it. By calling base.Factor we indicate that we still support everything supported by the base grammar:

protected override Parser<Expression> Factor
{
	get { return base.Factor.XOr(FunctionCall); }
}

Adding custom functions

The most advanced version of expression evaluator inherits from scientific calculator class. Adding custom functions doesn't introduce new syntax, so we don't need to add new grammar rules. All we need to modify is the method that generates LINQ expressions for function calls. Here is its pseudocode:

override Expression CallFunction(string name, Expression[] parameters)
{
	if there is a custom function named 'name',
	{
		return an expression to call this function
			with the given parameters;
	}

	// otherwise, fall back to System.Math method
	return base.CallFunction(name, parameters);
}

Any custom function for our calculator can be presented as a Func<double[], double> delegate. Named functions can be stored as a Dictionary<string, Func<double[], double>>, and all we need to do to support overloaded functions is to add number of arguments to the function name, i.e.:

"Min:2" -- Min function with two arguments
"Min:5" -- Min function with five arguments, etc.

The pseudocode above turns into something like that:

protected override Expression CallFunction(string name, Expression[] parameters)
{
	// try to find a custom function
	var mangledName = name + ":" + parameters.Length;
	if (CustomFuctions.ContainsKey(mangledName))
	{
		return Expression.Call(...); // call named function
	}

	// fall back to System.Math method
	return base.CallFunction(name, parameters);
}

There's an issue with Expression.Call expression we need to generate. The story is that Expression.Call requires a MethodInfo instance, so it only supports calling existing methods while custom functions are created at runtime. To work around it, we define an adapter method for calling named functions:

private double CallCustomFunction(string name, double[] parameters)
{
	return CustomFuctions[name](parameters);
}

This method exists at compile-time, and it can be used to create an Expression.Call expression. To get a MethodInfo instance for this method, we use a Func delegate constructor (it's faster and safer than using System.Reflection). List of parameters should be converted into an array using Expression.NewArrayInit as follows:

protected override Expression CallFunction(string name, Expression[] parameters)
{
	// try to find a custom function
	var mangledName = name + ":" + parameters.Length;
	if (CustomFuctions.ContainsKey(mangledName))
	{
		// prepare parameters for CallCustomFunction
		var newParameters = new List<Expression>();
		newParameters.Add(Expression.Constant(mangledName));
		newParameters.Add(Expression.NewArrayInit(typeof(double), parameters));

		// call this.CallCustomFunction(mangledName, double[]);
		var callCustomFunction = new Func<string, double>(CallCustomFunction).Method;
		return Expression.Call(Expression.Constant(this), callCustomFunction, newParameters.ToArray());
	}

	// fall back to System.Math method
	return base.CallFunction(name, parameters);
}

Adding parameters to expressions

Adding parameters requires a new syntax. A parameter is just an identifier that can take place anywhere a constant or a function call can occur:

protected virtual Parser<Expression> Parameter
{
	get { return Identifier; }
}

protected override Parser<Expression> Factor
{
	get { return Parameter.Or(base.Factor); }
}

Here we finally encounter a conflict in the grammar! Factor rule has two alternatives that both start with an identifier: Parameter and Function. When the parser finds an identifier, it has no idea what to parse next (a parameter or a function) unless it looks ahead of it. If the identifier is followed by a "(" symbol, it must be a function, otherwise it's a parameter

As far as I know, Sprache doesn't help finding such conflicts. Full-featured workbenches like ANTLR or Irony will automatically detect them while processing your grammar, but with Sprache, you'll only detect them via failed unit tests (that's why tests are so important for Sprache-based parsers).

Our conflict is quite easy to solve: we just define Parameter as "an identifier not followed by a "(" symbol". This rule will not cause the conflict because it eliminates ambiguity. Here is how it looks like:

protected virtual Parser<Expression> Parameter
{
	get
	{
		// identifier not followed by a '(' is a parameter reference
		return
			from id in Identifier
			from n in Parse.Not(Parse.Char('('))
			select GetParameterExpression(id);
	}
}

Parse.Not parser is similar to negative lookahead in regular expressions: it doesn't move the current position and it succeeds if the parser given as parameter, in this case, Parse.Char('('), fails.

Handling parameters

In contrast with custom functions which are bound to the instance of evaluator, parameters are bound to expressions. An expression can be evaluated multiple times using different parameter values. Named parameters are passed as Dictionary<string, double>.

To handle parameters, we need to change our generated expression type from Expression<Func<double>> (parameterless function returning double value) to Expression<Func<Dictionary<string, double>, double>> (a function that takes a dictionary of parameters and returns double value). Parameterized expressions are used as follows:

var function = calc.ParseFunction("y/x").Compile();
var parameters = new Dictionary<string, double>{ { "x", 2 }, { "y", 4 } };
var result = function(parameters);

Let's generate a LINQ expression for accessing named parameter. If parameter name is unknown, we can try looking at System.Math constants. This trick allows using built-in math constants such as PI an E automatically:

protected virtual Expression GetParameterExpression(string id)
{
	// TODO: try to get a named parameter
	// return an expression for accessing parameters[id]

	// try to find a constant (static field) in System.Math class
	var systemMathConstants = typeof(System.Math).GetFields(BindingFlags.Public | BindingFlags.Static);
	var constant = systemMathConstants.FirstOrDefault(c => c.Name == id);
	if (constant == null)
	{
		throw new ParseException(string.Format("Parameter or constant '{0}' does not exist.", id));
	}

	// retur System.Math constant value
	return Expression.Constant(constant.GetValue(null));
}

Generating an indexer access expression such as "parameters[name]" might look tricky unless you remember that C# compiler treats an indexer as a property named "Item". By convention property getters are named like "get_PropertyName" and setters like "set_PropertyName", so reading a named parameter is just calling "get_Item" method:

var getItemMethod = typeof(Dictionary<string, double>).GetMethod("get_Item");
return Expression.Call(ParameterExpression, getItemMethod, Expression.Constant(name));

To keep things simple, our generated expression won't check whether the parameter with the given name exists. Dictionary class will check it anyway and throw an exception in case of emergency. Here is a full body of parameter access generator code:

protected virtual Expression GetParameterExpression(string name)
{
	// try to find a constant in System.Math
	var systemMathConstants = typeof(System.Math).GetFields(BindingFlags.Public | BindingFlags.Static);
	var constant = systemMathConstants.FirstOrDefault(c => c.Name == name);
	if (constant != null)
	{
		// return System.Math constant value
		return Expression.Constant(constant.GetValue(null));
	}

	// return parameter value: Parameters[name]
	var getItemMethod = typeof(ParameterList).GetMethod("get_Item");
	return Expression.Call(ParameterExpression, getItemMethod, Expression.Constant(name));
}

Syntactic sugar for parameters

Usage example in the beginning of this article features this syntactic sugar for specifying parameter values:

var expr = calc.ParseExpression("Sin(y/x)", x => 2, y => System.Math.PI);

This syntax is much more compact and easier to read than creating and populating a Dictionary<string, double> instance. This syntax comes in handy when the list of available parameters is fixed. Although it's a bit off-topic, here's how it works:

public Expression<Func<double>> ParseExpression(string text,
	params Expression<Func<double, double>>[] parameters)
{
	var paramList = new Dictionary<string, double>();
	foreach (var p in parameters)
	{
		var paramName = p.Parameters.Single().Name;
		var paramValue = p.Compile()(0);
		paramList[paramName] = paramValue;
	}

	return ParseExpression(text, paramList);
}

A word about unit tests

Grammar inheritance adds extra burden for unit-tested Sprache parsers. Unit tests must ensure that every parser class still works with new and overridden rules. We have to test not only methods added in each class, but also all inherited methods.

The straightforward solution to this problem is using inheritance in unit tests. Base test class defines a virtual method CreateCalculator that creates an instance of expression evaluator used in unit tests. Derived test class overrides CreateCalculator method and returns a superclass instance instead, so all test methods inherited from base test class automatically test this superclass instance.

Conclusion

Grammar inheritance is a powerful technique that helps to keep complexity under control and to speed up the development of new parsers. Thanks to inheritance, most rules defined in base grammar are reused in derived grammars which allows to focus developer's efforts on new rules and features different from the base version. In Sprache-based parsers, grammar inheritance stimulates defining smaller reusable rules, step-by-step and test-driven development of parsers. Here is a quick summary:

  • Declare parsers as virtual properties instead of static fields
  • Decompose the grammar into small and reusable rules
  • Write unit tests for every single atomic parser
  • Use "protected internal" access modifier to enable unit testing
  • Unit tests must be organized in the same hierarchy as parser classes

References

History

  • 10.07.2014 Initial post

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Alexeу Yakovlev
Software Developer (Senior) ULTIMA Businessware
Russian Federation Russian Federation
No Biography provided
Follow on   Google+

Comments and Discussions

 
GeneralMy vote of 5 PinprofessionalCatchExAs10-Jul-14 8:41 
GeneralThanks for reading! PinprofessionalAlexeу Yakovlev10-Jul-14 21:49 
GeneralMy vote of 5 PinpremiumPrasad Khandekar9-Jul-14 23:54 
GeneralThanks! PinprofessionalAlexeу Yakovlev10-Jul-14 0:17 
QuestionVery good article Pinmembermartin.nedopil9-Jul-14 20:03 
AnswerRe: Very good article PinprofessionalAlexeу Yakovlev9-Jul-14 20:49 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Mobile
Web02 | 2.8.140721.1 | Last Updated 10 Jul 2014
Article Copyright 2014 by Alexeу Yakovlev
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid