LeMP enhances C# in many ways. In this post, we'll see how easy it is to write a program to generate C# code, or even analyze existing code.
Introduction
I planned to write an article about the new pattern-matching and "algebraic data type" features I added to C# via LeMP, but then I saw the new WuffProjects.CodeGeneration library and thought "wait a minute, LeMP has made that easy for a year now!" In fact, LeMP can do some pretty neat stuff, as you'll see!
LeMP is a macro processor for a superset of C# called "Enhanced C#". If you've ever used sweet.js, LeMP is basically the same thing for C#, just not as polished. Also, whereas sweet.js seems focused on letting you create your own macros, LeMP comes with many useful built-in macros, but creating new ones isn't as easy (yet).
So here's the scenario: you want to write a program that generates C# source code, and either runs it or analyzes it somehow. How should you do it?
In fact, this article also shows how to parse and analyze C# source code, but I'll focus first on code generation. This article also contains links to some fascinating stuff, so try to read to the end before you click off to somewhere else... this article gets more interesting (IMO) as you get farther into it!
Background: The Old Ways
First, let's touch on a couple of alternatives:
- Obviously, you can generate C# code as a string and simply write it to a file. That's okay for simple tasks, but it can be challenging to ensure that the syntax and spacing is correct. Therefore, many people use CodeDom.aspx), but that's a "lowest common denominator" technology; the constructs supported by CodeDom are limited.
- If you'd like to generate code inside Visual Studio, the most well-known approach is to use T4 templates. However, it may be better to use LeMP instead, because
- the output looks better (no ugly indentation problems)
- your input file is much more elegant
- LeMP has lots of built-in features to enhance C#, and
- your input file gets syntax highlighting, thanks to a free Visual Studio extension.
Let me know in the comments what you'd like to accomplish with LeMP; I might be able to help!
My LL(k) parser generator needed a robust way to generate C# code, and for this, it uses Loyc trees printed by the Enhanced C# printing engine in Loyc.Ecs.dll (formerly Ecs.exe), all of which is part of the Loyc repo on GitHub.
In layman's terms, you could use somewhat scary code like this to generate C# (without any LeMP goodness):
public static void Main(string[] args)
{
File.WriteAllText("helloWorld.cs", HelloWorldProgram("Hello, World!"));
}
static string HelloWorldProgram(string whatToPrint)
{
var F = new LNodeFactory(EmptySourceFile.Unknown);
var code = LNode.List(
F.Call(CodeSymbols.Import, F.Id("System")),
F.Call(CodeSymbols.Import, F.Dot(F.Id("System"), F.Id("Collections"), F.Id("Generic"))),
F.Call(CodeSymbols.Namespace, F.Id("Namespaze"), F.Missing, F.Braces(
F.Call(CodeSymbols.Class, F.Id("Klass"), F.List(), F.Braces(
F.Fn(F.Void, F.Id("Main"), F.List(), F.Braces(
F.Call(F.Dot(F.Id("Console"), F.Id("WriteLine")), F.Literal(whatToPrint))
))
))
)));
return EcsLanguageService.WithPlainCSharpPrinter.Print(code);
}
Then HelloWorldProgram("Hello, World!")
returns:
using System;
using System.Collections.Generic;
namespace Namespaze
{
class Klass
{
void Main()
{
Console.WriteLine("Hello, world!");
}
}
}
It's a little easier if you ask the parser to do some of the work, and then do a find-and-replace...
static string HelloWorldProgram(string whatToPrint)
{
var F = new LNodeFactory(EmptySourceFile.Unknown);
IEnumerable<LNode> code = EcsLanguageService.Value.Parse(@"
using System;
using System.Collections.Generic;
namespace Namespaze {
class Klass {
void Main() {
Console.WriteLine(PLACEHOLDER);
}
}
}",
MessageSink.Console, ParsingService.Stmts);
code = code.Select((LNode stmt) =>
stmt.ReplaceRecursive(expr => {
if (expr.IsIdNamed("PLACEHOLDER"))
return F.Literal(whatToPrint);
return null;
}));
return EcsLanguageService.WithPlainCSharpPrinter.Print(code);
}
But this isn't really what you want, since syntax errors are not detected at compile-time, and you're wasting runtime CPU cycles on parsing code instead of generating it.
Introducing LeMP
LeMP lets you do code generation with a "literal" representation of the code (in comp-sci jargon, it makes C# pretend to be homoiconic). For example, suppose you want to generate a method called Square()
that takes a parameter of a user-defined type T
and squares it. You'll be able to write that as:
static LNode GetSquareFunction(LNode T) {
return quote {
public static $T Square($T x) => x*x;
};
}
But first, you'll need to install the code generator in Visual Studio.
Installing LeMP
First, download and extract the zip file, or clone the Loyc respository from GitHub, since I haven't yet figured out how to do that magic auto-installation via NuGet.
Next, browse to the Lib\LeMP folder and run Lib\LeMP\LoycFileGeneratorForVs.exe to install the LeMP and LLLPG Custom Tools (a.k.a. Single-File Generators). Make sure your version of Visual Studio is listed, and click Register (install).
Note: The custom tools run in-place; they are not copied anywhere else. Visual Studio versions 2008 through 2015 are supported.
To install syntax highlighting for .ecs and .les files, run Lib\LeMP\LoycSyntaxForVs.vsix. Visual Studio versions 2010 through 2015 are supported.
Finally, create a new C# project in Visual Studio (or open an existing one), and then create a new text file named example.ecs:
Finally, open the Properties panel and change the Custom Tool option to LeMP. An output file called example.out.cs should appear under example.ecs. To make sure it's all working fine, paste a little code in the new file, e.g.:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Windows
namespace Loyc.Ecs {
class Person {
public Person(public readonly string Name,
public int WeightLb, public int Age) {}
}
}
Warning: Before installing a new version of LeMP or LLLPG, you must uninstall the old syntax highlighter (Tools | Extensions and Updates | LoycSyntaxForVS | Uninstall). A version mismatch between the two will cause the LeMP or LLLPG Custom Tool to stop working (typically with a MissingMethodException
or a failure to load an assembly.)
As of this writing, the Loyc libraries have a version number of 1.5.*.
You can use basic features of LeMP already, but if you want write a code generator, you'll also have to add references to the DLLs you'll be using in your program (see below).
Using LeMP to Write a Code Generator
LeMP itself is a code generator, so what I'm doing now is showing you how to use a code generator in Visual Studio (LeMP) to generate another code generator that runs outside Visual Studio. (You could then, if you wanted, reprogram your code generator to run inside Visual Studio by reading my article about Custom Tools, or better yet, by writing a macro to be called by LeMP itself.)
This may sound complicated, but it's easy to do, at least after you've installed LeMP, made an example.ecs file in your project and assigned LeMP
as the Custom Tool.
You'll need to add references to the following assemblies from your copy of LeMP:
- Loyc.Essentials.dll
- Loyc.Collections.dll
- Loyc.Syntax.dll
- Loyc.Ecs.dll
Put the following code in your example.ecs file:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Loyc;
using Loyc.Collections;
using Loyc.Syntax;
using Loyc.Ecs;
namespace Loyc.Ecs {
class Example {
public static string HelloWorldProgram(string whatToPrint) {
LNode code = quote {
using System;
using System.Collections.Generic;
namespace Namespaze {
class Klass {
void Main() {
Console.WriteLine($(LNode.Literal(whatToPrint)));
}
}
}
};
return EcsLanguageService.WithPlainCSharpPrinter.Print(code.Args);
}
}
}
Then locate your Main
method and add a call to Console.WriteLine(Example.HelloWorldProgram("Howdy folks!"))
. Run and make sure it works.
The trick here, and the reason we're using LeMP instead of plain C#, is that LeMP includes a neat trick called "quote", which allows us to generate syntax trees inside our C# code. In this case, we've quoted an entire C# source file:
LNode code = quote {
using System;
using System.Collections.Generic;
namespace Namespaze {
class Klass {
void Main() {
Console.WriteLine($(LNode.Literal(whatToPrint)));
}
}
}
};
LNode
(short for Loyc tree node) is a flexible "generic" syntax tree which could, theoretically, represent code in any programming language, but happens (at the moment) to represent C# code. An LNode
is immutable (read-only).
quote
is a macro — a function that transforms one syntax tree into another. It generates code to construct the syntax tree you asked for; for example, if you write:
LNode call = quote(func(12345));
You'll see code in your output file (example.out.cs) to create a syntax tree representing a call to func
with 12345
as its argument list:
LNode call = LNode.Call((Symbol) "func", LNode.List(LNode.Literal(12345)));
quote
allows you to insert subtrees into your tree. For example, if you write:
LNode assignment = quote(x = $call);
quote
assumes that $call
refers to a variable of type LNode
called call
, so it inserts call
into the output, like this:
LNode assignment = LNode.Call(CodeSymbols.Assign,
LNode.List(LNode.Id((Symbol) "x"), call)).SetStyle(NodeStyle.Operator);
quote
accepts either an (expression in parentheses)
or a { statement in braces; }
. When using braces, make sure to add a semicolon at the end of the statement! If you'd like to create a syntax tree that itself represents a braced block, you'll need to use double braces as in quote {{ stuff; }}
.
Because the output from quote
refers to data types such as Symbol
, CodeSymbols
, and LNode
, you may need to add references to the following namespaces when using this macro:
using Loyc;
using Loyc.Syntax;
Macros themselves cannot make non-local changes, so quote
itself cannot add these using
directives on your behalf.
Other useful namespaces include:
using Loyc.Collections;
using Loyc.Ecs;
Generating Code in a Loop
If you're generating code, a common task is generating a sequence of similar statements, methods, or data types.
The normal data type for lists of LNode
is called VList<LNode>
. Note that VLists are value types, so they can only be empty, never null
.
As an example, here's one way to generate a sequence of using
statements from a sequence of namespaces:
VList<LNode> namespaces = quote(System, System.Text, System.Linq).Args;
VList<LNode> usings = LNode.List(namespaces.Select(ns => quote { using $ns; }));
Note: quote
always produces a single LNode
. If you quote multiple things, the outer node will be a call to the special identifier #splice
— in this case #splice(System, System.Text, System.Linq)
. By writing quote(...).Args
, we are extracting the three arguments to the #splice
pseudo-function.
Alternately, you could use a loop:
VList<LNode> namespaces = quote(System, System.Text, System.Linq).Args;
VList<LNode> usings = LNode.List();
foreach (var ns in namespaces)
usings.Add(quote { using $ns; });
If you have a VList<LNode>
or IEnumerable<LNode>
, you can "splice" it into an argument list by using $(..list)
inside of a quote
. For example, given a list of method arguments, we might like to splice them into a method declaration:
VList<LNode> args = quote{ string second; object third; }.Args;
LNode function = quote {
void function(int first, $(..args), long fourth) {}
};
Console.WriteLine(EcsLanguageService.Value.Print(function));
The output is:
void function(int first, string second, object third, long fourth)
{
}
As you can see, the $(..args)
expression causes the nodes in args
to be expanded and treated as part of the function's argument list.
A peculiar thing here is that string second; object third;
are separated by semicolons, even though arguments in a method's argument list are separated by commas. In fact, if you try to separate these variables by commas, you'll get a syntax error. So what's really going on here?
There are two parts to the answer.
- First, the syntax: why are there semicolons? The Enhanced C# parser used by LeMP has no idea that the variables
second
and third
will be used later in a function argument list. All it sees are two ordinary variable declarations inside braces. Because of the braces, a semicolon is required at the end of each statement. Note: If you change the code to quote(string second, object third)
, you'll actually get a syntax error because in that case, the parser treats quote()
as an ordinary function call, so its arguments are not allowed to be (unassigned) variable declarations. - Second: why does this work? The fact that you can insert what appear to be statements into the middle of an argument list works because a syntax tree that represents a variable declaration like "
int x;
" is identical to the syntax tree that represents a method argument like "int x
". This is a property of the mapping from "Enhanced C#" (the syntax accepted by LeMP) to Loyc trees, and this mapping tends to be designed in such a way that you can transplant syntax trees from one place to another and it "just works" the way you want it to.
I won't distract you with too much depth on this topic; if you want to know more, please ask.
Converting Code to Text and Compiling It
To convert an LNode
to text, you can use EcsLanguageService.WithPlainCSharpPrinter.Print(lnode)
as shown earlier. Using EcsLanguageService.WithPlainCSharpPrinter
instead of EcsLanguageService.Value
tells the printer to avoid using syntax that is not part of "plain-old" C#, if possible.
You can also simply call LNode.ToString()
as in:
Console.WriteLine("{0}", quote { class Foo {} });
But this doesn't work the way you want, because the default output language is LES, not C#. You can, however, change the current output language to C# by using (LNode.PushPrinter(...))
:
using (LNode.PushPrinter(EcsLanguageService.WithPlainCSharpPrinter.Printer))
Console.WriteLine("{0}", quote { class Foo {} });
That's better! Having converted your code to a string
, you can run it using the CSharpCodeProvider
in System.dll. Here's a demonstration:
static void CompileAndRun()
{
VList<LNode> code = quote {
using System;
namespace Example {
public class Code {
public static double Square(double x) { return x*x; }
}
}
}.Args;
string[] codeStrings = {EcsLanguageService.Value.Print(code)};
Assembly asm = CompileToAssembly(codeStrings,
new[] { "System.dll" }, MessageSink.Console);
var module = asm.GetModules()[0];
do {
if (module != null) {
Type mt = module.GetType("Example.Code");
if (mt != null) {
MethodInfo methInfo = mt.GetMethod("Square");
if (methInfo != null) {
double n = 9.0;
Console.WriteLine("The Square of {0} is {1}", n,
methInfo.Invoke(null, new object[] { n }));
break;
}
}
}
Console.WriteLine("Failed to locate method");
} while (false);
}
static Assembly CompileToAssembly(string[] sourceFiles, string[] references = null,
IMessageSink sink = null)
{
references = references ?? new[] { "System.dll" };
sink = sink ?? MessageSink.Current;
CompilerParameters CompilerParams = new CompilerParameters();
CompilerParams.GenerateInMemory = true;
CompilerParams.TreatWarningsAsErrors = false;
CompilerParams.GenerateExecutable = false;
CompilerParams.CompilerOptions = "/optimize";
CompilerParams.ReferencedAssemblies.AddRange(references);
CSharpCodeProvider provider = new CSharpCodeProvider();
CompilerResults compile = provider.CompileAssemblyFromSource(CompilerParams, sourceFiles);
StringBuilder msgs = new StringBuilder("Compiler errors:\n");
foreach (CompilerError msg in compile.Errors) {
LogMessage lmsg = new LogMessage(msg.IsWarning ? Severity.Warning : Severity.Error,
new LineAndPos(msg.Line, msg.Column), "{0}: {1}", msg.ErrorNumber, msg.ErrorText);
lmsg.WriteTo(MessageSink.Current);
msgs.Append(lmsg.ToString() + "\n");
}
if (compile.Errors.HasErrors)
throw new FormatException(msgs.ToString());
return compile.CompiledAssembly;
}
It's that easy, although error handling can be quite a pain if the source code doesn't exist in a file anywhere, since the locations mentioned in the error messages don't tell you much.
Analyzing and Manipulating Syntax Trees
What else can you do with a syntax tree?
Writing It to a File
That's too easy:
VList<LNode> code = quote {
using System;
class HelloWorld {
public static void Main(string[] args) {
Console.WriteLine("I'm not talking to you.");
}
}
}.Args;
string text = EcsLanguageService.WithPlainCSharpPrinter.Print(code,
MessageSink.Console, ParsingService.File);
File.WriteAllText("HelloWorld.cs", text);
Finding and Replacing
Given an LNode
, you can use ReplaceRecursive
to find something, and optionally change it. For example, the following code finds every literal true
in a code block and changes it to !false
:
node = node.ReplaceRecursive(expr => {
if (true.Equals(expr.Value))
return quote(!false);
return null;
}));
Remember that LNode
is immutable, so this doesn't change the existing syntax tree, it creates a new one with some part(s) changed. That's why we write node = node.ReplaceRecursive(...)
.
If you simply want to search for "everything that matches a certain pattern" and not change the syntax tree, you should still use the same ReplaceRecursive
function, but always return null
from it. If you want to avoid examining children of a particular node, simply return the same node you were given, which prevents children from being scanned without creating a new syntax tree:
node = node.ReplaceRecursive(expr => {
matchCode (expr) {
case { $_ $method($(.._)) { $(.._); } }:
return expr;
}
...
}));
Pattern Matching Using matchCode
The matchCode
macro provides pattern matching for a single LNode
. You can use $
to create variables that "capture
" part of the syntax tree, or use $_
for parts you don't care about. For example:
static Symbol GetLoopType(LNode code)
{
matchCode(code) {
case { while($_) $_; }: return CodeSymbols.While;
case { for($_; $_; $_) $_; }: return CodeSymbols.For;
case { do $_; while($_); }: return CodeSymbols.Do;
default: return null;
}
}
What's a Symbol?
A Symbol
is a kind of singleton string. Many programming languages, including Ruby and Ecmascript 6, have a "symbol" concept, but since .NET doesn't have a standard Symbol type, the Loyc libraries have their own. A WeakValueDicionary
is used to store "global" Symbols; when you do (Symbol) "mySymbol"
you are looking up an existing global symbol or creating a new one if it doesn't exist yet.
The main advantage of Symbol
over string
is that, since Symbol
s are singletons, two Symbol
s never need to be compared for equality like strings do, which improves their performance. If two Symbol
references are equal then the symbols are equal, otherwise they are not equal; there is no need to compare the Symbol.Name
inside the two symbols, and in fact no operator==
is defined for Symbol
. I introduced Symbol
in an old CodeProject article.
Common C# symbols (for keywords and operators) are defined in the CodeSymbols class.
Pattern-Matching Example #2: Extract Class Name
Here's a method that finds the name of a class
, struct
, or enum
type:
static LNode GetName(LNode type)
{
matchCode(type) {
case { class $name : $(.._) { $(.._); } },
{ struct $name : $(.._) { $(.._); } },
{ enum $name : $(.._) { $(.._); } }:
return name;
default:
return null;
}
}
The capture $(.._)
contains the expression .._
, which consists of two parts:
..
which means "match any number of nodes (arguments or statements)" _
which means "discard the matching code". If you don't discard the result, it has type VList<LNode>
.
So this thing:
class $name : $(.._) { $(.._); }
means "match a class definition with any number of base types, and any number of statements inside the braces".
This demo shows the GetName
function in action:
public static void Demo()
{
using (LNode.PushPrinter(EcsLanguageService.WithPlainCSharpPrinter.Printer)) {
Console.WriteLine(GetName(quote {
public sealed class String : System.Object, IEnumerable<char> {}
}));
Console.WriteLine(GetName(quote {
public enum BinaryDigits { Zero, One }
}));
Console.WriteLine(GetName(quote {
public struct Point<T> {
public T X { get; set; }
public T Y { get; set; }
}
}));
}
}
The output should be:
String;
BinaryDigits;
Point<T>;
By the way, if you're reading this and thinking "this LeMP thing has some pretty impressive capabilities... why haven't I heard of it before?" the answer is twofold:
- This is the first article I've written about LeMP's new matchCode construct, and
- I stopped working on LeMP for a couple of months because people showed very little no interest in it. Currently if you Google "LeMP", my original article about LeMP doesn't show up in the top 10 search results, because nobody blogged about it so there ain't no links anywhere about it. If you like LeMP, please say so, share it with your friends, blog about it, make a YouTube video... something!
Pattern-matching Example 3: [notify]
You might want to look for a particular attribute on a particular construct. For example, let's say you want to find a 'notify
' attribute on a property, which will help implement the standard INotifyPropertyChanged
interface.aspx) by calling a user-defined NotifyPropertyChanged
method. In other words, suppose we want to transform:
public string CompanyName { get { return _companyName; } [notify] set; }
into this:
public string CompanyName {
get { return _companyName; }
set {
if (_companyName != null ? !_companyName.Equals(value) : value != null) {
_companyName = value;
NotifyPropertyChanged("CompanyName");
}
}
}
Here's a method that detects a property of the expected form and returns a new one, or the same property if unchanged:
public static LNode MaybeTransformNotifyProperty(LNode input) {
matchCode (input) {
case {
[$(..attrs)] $Type $Name {
[$(..getAttrs)] $getter;
[$(..setAttrs)] set;
}
}:
LNode notify;
setAttrs = setAttrs.WithoutNodeNamed((Symbol) "notify", out notify);
if (notify == null)
return input;
LNode notifyMethod = quote(NotifyPropertyChanged);
matchCode(notify) {
case $_($(ref notifyMethod)):
}
LNode fieldName;
matchCode (getter) {
case { get => $(ref fieldName); },
{ get { $(.._); return $(ref fieldName); } }:
default:
return input;
}
LNode changed;
matchCode (Type) {
case int, uint, byte, sbyte, short, ushort,
float, double, decimal, string:
changed = quote(value != $fieldName);
default:
changed = quote($fieldName != null ?
!$fieldName.Equals(value) : value != null);
}
string propNameString = Name.Name.Name;
return quote {
[$(..attrs)] public $Type $Name {
[$(..getAttrs)] $getter;
[$(..setAttrs)] set {
if ($changed) {
$fieldName = value;
$notifyMethod($(LNode.Literal(propNameString)));
}
};
};
};
default:
return null;
}
}
First, notice that the initial matchCode (input)
looks for a property but doesn't directly match the [notify]
attribute we're looking for. That's because of a limitation of matchCode
in the current version: it is unable to do pattern matching on attributes, it can only grab the entire attribute list. Instead I call setAttrs.WithoutAttrNamed
to search for an attribute with a particular name and remove it from the attribute list, which we need to do anyway.
Second, what does this do?
LNode notifyMethod = quote(NotifyPropertyChanged);
matchCode(notify) {
case $_($(ref notifyMethod)):
}
You can control the name of the function that should be called if the property changed by writing [notify(MethodName)]
. The default method is NotifyPropertyChanged
. In matchCode
, $(ref X)
tells matchCode
to assign the matching syntax tree to X
rather than to create a new variable (which would be scoped to the inside of the case
handler).
You do not need a break
statement at the end of each case
. In fact, as shown here, you don't need any code inside a case at all!
Next, look at the matchCode (getter)
construct, which figures out the name of the backing field. Notice that case
can accepts multiple patterns, each one optionally enclosed in braces; the braces indicate that you want to match a statement rather than an expression. In truth, Loyc trees do not distinguish between the concepts of "statement" and "expression"; what the braces really do is tell the parser to expect "statement syntax" rather than "expression syntax".
Choosing a difference-check is straightforward, but doesn't actually work right:
LNode changed;
matchCode (Type) {
case int, uint, byte, sbyte, short, ushort,
float, double, decimal, string:
changed = quote(value != $fieldName);
default:
changed = quote($fieldName != null ?
!$fieldName.Equals(value) : value != null);
}
What we really want is to use the first check if the type has a meaningful !=
operator, and use the second check if it's a reference type. However, LeMP doesn't have a semantic analysis engine, so it has no idea whether Type
is a reference type or not. One way to solve this problem would be to somehow allow the programmer to signal what kind of equality check is desired, but I'll leave that as an exercise.
This line is a little funny:
string propNameString = Name.Name.Name;
The property name is an LNode
called Name
, and it has a property called Name
which gets the identifier name if it is a simple identifier. The name is a Symbol
, not a string, but Symbol
has a Name
property that gets the string
stored inside it.
This line is actually wrong, because it won't work properly for explicit interface implementations like this:
T IFunky.FunkyProp<T> { get => _funkyProp; [notify] set; }
In this case, Name
will hold the syntax tree for IFunky.FunkyProp<T>
. If we want to extract the name FunkyProp
from this, there is a method LeMP.StandardMacros.KeyNameComponentOf(Name)
for doing this, but if you'd like to avoid adding LeMP.exe as a runtime reference, you could just copy the method's source code:
public static LNode KeyNameComponentOf(LNode name)
{
if (name.CallsMin(S.Of, 1))
name = name.Args[0];
if (name.CallsMin(S.Dot, 1))
name = name.Args.Last;
if (name.IsCall)
return KeyNameComponentOf(name.Target);
return name;
}
You can use this method as follows:
string propNameString = KeyNameComponentOf(Name).Name.Name;
Finally, notice how we call the notifyMethod
:
$notifyMethod($(LNode.Literal(propNameString)));
Remember that quote
assumes its inputs have type LNode
(or VList<LNode>
if you use $(..list)
), so I've used $(LNode.Literal(propNameString))
to convert the string into an LNode
by calling LNode.Literal()
.
Calling your Method
Now that we have a method to transform a property, the final task is to find the properties. Again, this can be done with ReplaceRecursive
:
node = node.ReplaceRecursive(MaybeTransformNotifyProperty);
Exercises for You
First exercise: Modify the code so that...
notify public string CompanyName => _companyName;
...produces the same output as the original statement:
public string CompanyName { get { return _companyName; } [notify] set; }
This time, I've used notify
instead of [notify]
; this makes it a "custom word attribute", similar to partial class
or yield return
: partial
and yield
are not keywords, but the C# compiler treats them as if they were. Similarly, Enhanced C# treats notify
as if it were a keyword attribute like public
or virtual
. The following two statements are equivalent:
notify public string CompanyName => _companyName;
[#notify, #public] string CompanyName => _companyName;
That is:
- Keyword attributes like
public
and sealed
are treated like normal attributes, just with #
prepended on the front. Note: #
is not considered to be an operator and it does not necessarily mark a preprocessor directive; it's simply part of the identifier. Unless it's a preprocessor directive like #region
, Enhanced C# treats #
like a normal identifier character, exactly like _
or a letter of the alphabet. - The word attribute
notify
is treated as a regular attribute called #notify
.
Knowing this, you should be able to complete the exercise.
Second exercise: in order to speed up the search function, modify it so that it detects methods and ignores their contents, by returning n
:
node = node.ReplaceRecursive(n => {
matchCode(n) {
}
return MaybeTransformNotifyProperty(n);
});
The pattern to match an arbitrary method was shown earlier in this article. Test your new code on this syntax tree:
LNode node = quote {
void Nonsensical(int _y) {
public string Y { get { return _y; } [notify] set; }
}
public string Z { get { return _z; } [notify] set; }
};
When you print the output, you should see that Z
was modified but not Y
.
Matching Patterns Specified at Run-Time
matchCode
performs pattern-matching at compile-time — whenever you save your *.ecs file. You can also do pattern matching at run-time using the MatchesPattern
method. For instance, if you run this code:
LNode code = quote(this.foo(Math.PI * 2, bar + 1));
LNode pattern = rawQuote(this.$_($A, $B));
MMap<Symbol, LNode> captures;
if (code.MatchesPattern(pattern, out captures)) {
foreach (KeyValuePair<Symbol, LNode> p in captures)
Console.WriteLine("['{0}'] = {1}", p.Key,
EcsLanguageService.Value.Print(p.Value, null, ParsingService.Exprs));
} else
Console.WriteLine("DID NOT MATCH PATTERN");
Here, I've used rawQuote
rather than a normal quote
, which causes the $
operator to be treated literally: $A
and $B
will become literal parts of the syntax tree (rawQuote
, unlike quote
, does not treat A
and B
as existing variables to insert into the tree.)
When you run this, the output is:
['_'] = foo
['A']'= Math.PI * 2
['B'] = bar + 1
I've used...
EcsLanguageService.Value.Print(pair.Value, null, ParsingService.Exprs)
...to ensure that the node is printed as an expression, instead of the default printing mode which treats every node as a statement. As I mentioned earlier, the nodes themselves do not distinguish between statements and expressions; a node cannot tell you if it is a statement or an expression, so you have to tell the printing engine explicitly.
Note: MatchesPattern
uses a completely separate pattern-matching engine than matchCode
(akin to the difference between an interpreter and a compiler). One difference I noticed while writing this: MatchesPattern
actually captures _
rather than ignoring it; it probably shouldn't do that. Please write a comment if you notice any other differences. By the way, MatchesPattern
is used by the replace
macro built into LeMP, which was described in the first article about LeMP.
Writing Macros
Finally, you can use the skills you've learned here to write your own macro DLL that LeMP will load and use; this is more convenient than having to write and maintain your own Visual Studio extension. I'll write an article about writing and using macros just as soon as someone asks for one.
Step 1 is as follows. Remember the MaybeTransformNotifyProperty
method from above? You can easily change this into a macro by adding a LexicalMacro
attribute and an IMacroContext
parameter, like this:
[LexicalMacro("public string Name { get { return _name; } [notify(NPC)] set; }",
"Generates code for INotifyPropertyChanged. This example will call "+
"NPC(\"CompanyName\") if the new value is different from the old value. "+
"The argument on the [notify] attribute is optional; if absent, the "+
"default method, `NotifyPropertyChanged`, is called.",
"#property", "notify", Mode = MacroMode.Passive | MacroMode.Normal)]
public static LNode MaybeTransformNotifyProperty(LNode input, IMacroContext context) {
}
The first two string
s in the attribute are documentation, and the third string "#property"
(which is actually a params string[]
) is the name of the node for which the macro function should be invoked. This function modifies properties, and it just so happens that properties are represented in a Loyc tree by the #property()
pseudo-function; therefore "#property"
causes this method to be called whenever LeMP encounters a property.
We're not quite done yet, but whoops, would you look at the time! This article is getting pretty long, so I will end it now.
Learn More about LeMP!
You can learn about some other LeMP capabilities in the previous article, and I plan to write another article soon, this time about pattern matching with match
(as opposed to matchCode
which you saw in this article).
In addition to LeMP, you'll find that the LNode
class has numerous useful methods for querying and modifying Loyc trees. See the LNode
class reference.
Conclusion
LeMP is a useful code generation and code analysis tool. QED. Let me know if you have any questions, and what you're doing with it!
Tip: When using LeMP, keep your Error List open. An error in your example.ecs file will often lead to more errors in your example.out.cs file, but unfortunately Visual Studio often puts errors from the *.out.cs file first, so when diagnosing errors, you'll have to look near the end of the error list first for any errors in your *.ecs file.
Help Wanted
I would like someone to help make a Roslyn back-end for LeMP. In other words, I want to convert the output of LeMP into a Microsoft Roslyn syntax tree and compile it with the Roslyn C# 6 compiler. Then I want a Visual Studio extension that introduces a new "Enhanced C# project type" that uses LeMP as the front-end and Roslyn C# as the backend. An EC# project type could allow *.ecs files to enjoy IntelliSense just like plain C#! I do not have time to do this myself, my TO-DO list is full, so if nobody else volunteers, it won't happen. If you want to do this project, I will happily teach you whatever you need to now about LeMP; learning about Roslyn will be your responsibility, and I only know the basics of writing Visual Studio extensions (having written the syntax highlighter for *.ecs).
As a simpler exercise, could someone write a test program that looks for bugs in the Enhanced C# parser?
- Recursively searches a directory for *.cs files and
- Parses each one with code like this, printing out all errors to the console:
var stream = File.Open(path, FileMode.Open, FileAccess.Read)
var chars = new StreamCharSource(stream);
IListSource<LNode> statements = EcsLanguageService.Parse(
chars, path, MessageSink.Console, ParsingService.File);
- Copy all files to a second folder, but with all *.cs files replaced by the output of
EcsLanguageService.Parse
(something like File.WriteAllText(newPath, EcsLanguageService.Value.Print (chars, MessageSink.Console)
). This way, you can try compiling the output, to verify that the printer works properly.
P.S.: A shout out to the srclib project. I wish I had time to implement the Visual Studio version!