Introduction
It’s been two months since the first publically available version of The Roslyn Project was released. Although it’s not necessarily required for understanding the content of this article, in case you haven’t heard about it I highly recommend watching Anders Hejlsberg’s presentation (the part about Roslyn begins at 35:40), reading Eric Lippert’s post, or looking at the introduction in the whitepaper.
In one sentence, its primary goal is to open up the compiler, and provide easy access to the information it gathers during the different stages of the compilation process.
Results
Before I begin, I'd like to let you take a glance at the final form of the projects this article is about.
The first one is refactoring extension. It extracts variable declarations within methods.
The second project is a code generator. It creates overloads for Enumerable extension methods, which build on the ability of the compiler to infer type parameters from explicitly typed function parameters, thus allows the user to omit the Enumerable.OfType
call when filtering to a specific type.
Here’s a peek at the test method of the generated code:
The third one adds a simple attribute-oriented feature to C#, which has perfect integration with the source code.
Background
For readers not familiar with the Enumerable extensions of the System.Linq namespace, the corresponding MSDN page might come in handy. Some of the methods are extensively used in the projects.
Note that a few screenshots only display well on a wide resolution. I apologize to everyone who’s browsing this in a smaller one (or planned to print this text), but displaying the code in two different styles would have been a mess, and I don’t believe one size can fit optimally for all. If you can’t read the source code on an image, click on it, it will open up in a new tab. For people not satisfied with that, it really shouldn’t take much effort to reformat the code to better fit their taste. All source code is attached to the article.
I would also like to point out that the API is far from being properly documented at this time, mostly because it is subject to change based on developer feedback. I put a considerable amount of time into exploring the CTP, and I'm trying to show it according to my best knowledge. Mistakes were inevitably caused by the lack of documentation (along with mistakes caused by other factors). The reason I decided to write this article is that I think I can show Roslyn from a somewhat different aspect than the sample projects provided with the CTP.
Keep in mind that this is an early introduction.
Code Refactoring
The Problem
Refactoring is one of the most straightforward things people would want to use a code analysis tool for, and Roslyn already provides excellent support for this area. While waiting for the CTP I had some vaguely planned out tools in mind I would attempt to build with it. However, the temporal lack of support for certain language features and the unfamiliar look of the API scared me a bit and made me want to experiment with Roslyn on some more in-the-box project first.
A few days before the CTP was released, someone in a developer forum asked if there is a refactoring tool which can:
- Move all variable declarations within methods to the beginning of their method.
- Keep declaration order.
- Replace var keywords with corresponding types.
- Process arbitrarily many classes at once, without the need to manually click through every method.
He said this would make coworkers’ code easier to understand.
I myself am fortunate enough not to have to deal with code I consider poorly written too often, but this sounded like a perfect task for Roslyn, so I chose it as my first experimental project.
The Building Process
I found working on a single file the best way to start making a code refactoring tool. It’s very easy and quick to parse the content of a file, no need to load up a new instance of Visual Studio for every debug session, and no need to deal with the specifics of a whole solution either, while we’re still able to use the whole IDE to verify the generated code.
We can get the syntax tree of a given piece of code in the following way:
The syntax tree holds all syntactical information about the code. The parsing is lossless. We can get back the exact text we passed in when creating the tree at any time.
Since the highest-level syntactic elements we want to work on are methods, the first goal is to extract method nodes from the syntax tree.
I hope the call is self-explanatory. The only thing that’s probably worth noting is that we’re filtering BaseMethodDeclarationSyntax instead of MethodDeclarationSyntax because the latter doesn’t cover every kind of method we want to work on (e.g. constructors).
Now we can enumerate through every method node within our tree. What are we going to do with them? Syntax trees (and, of course, nodes as well) are immutable. This means there’s no way to “update” a node in place, we have to create a new one if we need something that differs in any way.
In the end, we want to create a new syntax tree that has method nodes from the old one replaced with the ones we desire, and is otherwise identical to it.
Syntax.MethodDeclaration
returns a new MethodDeclarationSyntax instance. In Roslyn, using the factory methods of the Syntax class is the standard way of creating new syntax nodes (as well as tokens and trivias) instead of using the new operator.
The SyntaxTree.Create
method takes a filename (isn't relevant for our purpose), and a root node as parameters, and returns a new tree.
SyntaxNode.ReplaceNodes
takes an enumeration of nodes that are descendants of the node the method is called on, and a function that takes in an old node as a parameter and returns a new one we want to replace it with, then returns a new node in which the old descendants are replaced by the new ones. Well, actually the function takes 2 parameters, the second one being “the same node rewritten with replaced descendents”, which I couldn’t find the need for in my projects. For some reason however there’s no predefined overload that takes only one argument, so I took the liberty to make an extension overload myself.
Using this, the call can be rewritten as follows
The return value of this line is exactly what we need, the code of a new tree that is identical to the old one above method level.
Now we only have to create a new method node out of an existing one that satisfies the specification of the problem, and we’re ready.
The Syntax Visualizer Extension bundled with the CTP provides an excellent way to examine what a method node looks like.
Here’s the directed syntax graph of a simple method:
void MethodName() {
var firstVarName = 1;
var secondVarName = "stringvalue";
}
As it can hopefully be concluded from the figures, the child that holds the body of the method is reached through the BaseMethodDeclarationSyntax.BodyOpt property.
SyntaxNode.ReplaceNode
returns a new node with only one of the original descendants replaced.
To change a method or a method body, we just have to replace nodes. However, changing the statement list within a block is different. What directly holds the statements is not a syntax node, it's a SyntaxList. We can’t replace it the same way as a node, we need to update the whole block and provide it with a new list that satisfies our needs.
So we need to build a new statement list from scratch.
As you can probably also figure out from the syntax graph, variable declaration statements are of type LocalDeclarationStatementSyntax. These are the ones that we need to process, other statements can get into the new list without modification.
Of course, after we’re done with processing the list, we have to concatenate declarations and statements, and put them together into the new method.
Now we can deal with declarations.
If we try to extract declarations from local declaration statements and look at its properties, we’ll notice that it contains a list of variable declarations instead of just a single one. That’s due to the support of multi-declarations. If we also want to provide support for this feature, we need to handle every one of those. Let’s take a look at the syntax graph of a multi-declaration statement:
int firstVarName = 2, secondVarName = 3;
In order to group the variable declarations together, I created a list of local declaration statements, declarations
. For every variable declaration, we should add a stand-alone declaration to declarations
as well as a stand-alone initialization to the statement list.
Let's deal with the former part first.
If we just extracted the declarations like this, it would actually work well for some scenarios.
Not for implicitly typed variables however, their type would remain var. Not only this violates the specification of the problem, but the generated code would not even compile.
These two declarations:
Would be turned into:
We need to find the actual type of implicitly typed declarations. To do that, in many cases we could just use the available syntax trees, and a lot of work (as long as the types are not defined in a different assembly), but fortunately Roslyn does the semantic analysis for us.
If we have access to the semantic model of a syntax tree, it can tell us the type of expressions within it directly.
To get the semantic model, we need a compilation provided with the corresponding assembly references:
var semanticModel = Compilation.Create("test").AddSyntaxTrees(tree)
.AddReferences(new AssemblyFileReference(typeof(object).Assembly.Location))
.GetSemanticModel(tree);
After that, the following call returns the type of declarations regardless if they are explicitly or implicitly typed:
semanticModel.GetSemanticInfo(declaration.Declaration.Type).Type;
Functionally, using it like this works perfectly fine. The Name property of the TypeSymbol however holds the full name of the corresponding type. This means that the aforementioned declarations will be resolved as:
This is not how people usually want to see types displayed.
For this reason, there is a handy extension method, ISymbol.ToMinimalDisplayString
that finds the shortest name of symbols available for a certain location within a syntax tree.
.ToMinimalDisplayString(tree.GetLocation(declaration), semanticModel)
If we replace the Name property with this call, the type of the declarations will now be shown as:
As you might have noticed, replacing var keywords with explicit types introduced another problem as well.
Comments are stored as SyntaxTrivias in the tree. From the whitepaper: “In general, a token owns any trivia after it on the same line up to the next token. Any trivia after that line is associated with the following token.”
Comments before the declarations are associated with their type syntax child. When those are replaced with completely new ones, the comments are lost.
We can re-add leading trivias to our newly built node using the Syntaxnode.WithLeadingTrivia
method:
Now the declarations will be correctly extracted.
This code works well as long as there are no separate blocks within the method’s main body block. It will however fail to extract variables from inner blocks. Should not be surprising, since every statement that’s not of LocalDeclarationStatementSyntax type will go unprocessed.
The following code:
var var1 = 0;
if (var1 == 1) {
var var2 = 2;
var var3 = "3";
}
else {
var var4 = 4;
}
var var5 = 5;
Will be converted into:
To resolve this, we need to process blocks inside the method the exact same way we’re processing the main block of the method. For that purpose, I extracted the block handling code into a delegate, and called it for every BlockSyntax inside every regular statement.
The first problem with this is that blocks within inner blocks will be processed multiple times, since the call provides ReplaceNodes
with every descendant of every statement. Example:
var var0 = 0;
if (var0 == 0) {
var var1 = 1;
while (true) {
var var2 = 2;
}
}
int var0;
int var2;
int var1;
int var2;
It’s fairly simple to solve this, thanks to the fact that the SyntaxNode.DescendentNodes
method takes an optional function parameter that can specify which nodes we don’t want to further descend into. This does not only fix the error, but also increases performance by eliminating multiple visits to the same node.
We’re still not ready, because we’ll get exceptions when processing empty blocks. Need to filter those out.
This works fine now, but the previous step introduced some blatant redundancy.
Since this is the second occasion we’re using the DescendentNodes
function, I think it’s the right time to show you another little extension overload I added to help its usage.
With this in scope, we can use the method like this:
We can also update the call used at the beginning, because it's actually prone to the same error this one was (throws exception if an empty method is encountered), and also works slower, since it enumerates every node within methods.
Okay, that much about DescendentNodes.
The next problem is caused by different variables with identical names declared within the same method. They’ll result in the same variable being declared multiple times.
if (true) {
int q = 2;
} else {
double q = 2.0;
}
Well, the most reasonable solution I could think of was ignoring variables with such names.
For this, first make a lookup of variable names that shouldn’t be processed:
Then check this set before processing a local declaration statement:
Okay, just one final thing about declarations that we have to do mainly because we’re using a CTP and the semantic analyzer doesn’t support the whole C# language yet. Due to missing language features, it's not able to resolve the type of the following call for example:
SyntaxTree tree = null;
var descendents = tree.Root.DescendentNodes();
SyntaxTree st;
var descendents;
To not let this temporary limitation make the refactored code uncompilable, we should always check whether the type was successfully resolved. If not, leave the declaration in place, process it like any other statement.
(type = semanticModel.GetSemanticInfo(localDeclaration.Declaration.Type).Type).Kind != SymbolKind.ErrorType
Now the extracted declarations above look like this:
That’s it about the declaration part. Let’s take one step forward and start processing initializations. Don’t worry, those require much less fine-tuning than declarations did.
First, here's a syntax tree of a simple variable initialization:
What we need to extract are expression statements of the type BinaryExpressionSyntax and the SyntaxKind AssignExpression.
Works like charm, as long as there are no statement blocks on the right side of expressions. Those will not be processed.
Action a = () => {
int k = 1;
};
Action a;
a = () => {
int k = 1;
};
To solve this issue, we can run the same function on every variable’s initializer that we already run on every non-declarative statement.
This actually settles the processing of effective code. The only thing left is formatting. Here’s the final form of the method:
I hope the part that puts a comment at the end of all declarations doesn’t cause too much headache.
The SyntaxNode.Format
call at the end of the new root places essential whitespaces and other required trivias into its descendants. Without this, we would have to insert those trivias one by one, which would result in much more verbose code.
Unfortunately, it currently cannot be customized, and only formats according to its default settings, so if you have custom needs, you’ll probably want to run an Edit.FormatDocument command in your IDE on each document processed by it. If this bothers you a lot, you can either write your own format method, or just wait for a future release of Roslyn. I’m pretty sure we’ll be able to customize it in the future, because currently opening Tools -> Options -> Text Editor -> CSharp -> Formatting while running Visual Studio with Roslyn enabled results in an exception, it cannot be customized either.
Apart from the lack of customization, the biggest issue I noticed is that it doesn’t like #region blocks. Specifically, it will erase the existing whitespace between the keyword and the region’s name like this
Thus making the code unable to compile.
As a quickfix, I just replaced the corresponding strings with their correct variants. This might touch the string “#region” at some place it shouldn’t, requiring manual fix, but the odds of that are really tiny and we can’t get a CTP to work flawlessly anyways.
So the method is ready. How should we use it?
If we want to process bigger entities than files, Workspace is supposed to provide us with an easy solution.
static void ProcessSolution(string fileName) {
var workspace = Workspace.LoadSolution(fileName);
foreach (var d in workspace.CurrentSolution.Projects.SelectMany(p => p.Documents))
workspace.UpdateDocumentAsync(d.Id, new StringText(ProcessCode(d.GetText())));
}
The issue with it is that currently some projects drop an InternalErrorException when the ID of a document in them is passed to Workspace.UpdateDocument
. This is a very ugly exception, I couldn't even handle it within the application. After a quick search the only relevant information I found about it was this. I decided not to delay publishing this article to find a workaround till it’s fixed (at least turned into a catchable exception). If you really need to process a solution with a faulty project and can’t wait for a future version of the CTP, I suggest just processing files one by one.
Most projects however, don’t cause any problems. Projects individually are just as simple to process:
static void ProcessProject(string fileName) {
var workspace = Workspace.LoadStandAloneProject(fileName);
foreach (var d in workspace.CurrentSolution.Projects.First().Documents)
workspace.UpdateDocumentAsync(d.Id, new StringText(ProcessCode(d.GetText())));
}
Putting the method into an extension is also simple and plays nicely. The only issue with it is that it will only work if the editor is started up with Roslyn enabled, which limits its usability a lot for the moment.
Just create a new Roslyn Code Refactoring Extension Project. The template is very intuitive to use.
The result:
Code generation
When I began working with Roslyn, the first lines of code I wrote contained many calls in the
.DescendentNodes().OfType<TNode>().Where(node => /*<arbitrary condition>*/)
format. This was the first time I had to rely on the
OfType
extension so heavily. My first attempt to reduce the complexity of the code resulted in an overload for the
Where
method.
It looked like this:
Since the compiler can infer type parameters from functions with explicitly typed input parameters, this allowed me to make one less call each time for the same outcome. As I showed you in the previous section, overloading DescendentNodes
came with a better result in the end, but this overload made me wonder what it would be like if I had something like that for every feasible extension method. That could make code a bit simpler for several different scenarios.
One notable problem with it is that there are 86 different extensions in the Enumerable class only, which are feasible (i.e. such overload would make sense for them). It would take quiet some work to build, debug, and (for future versions of the framework) update them.
Of course, I never planned to do that in the first place.
Long story short, here’s the code that generates the extension methods:
It will only generate 66 of them currently, because nullable types aren’t supported by the CTP at this time.
I don’t feel like explaining the building process of this project like I did with the first ones', I would be repeating myself for the most part. Almost all of the calls should be familiar by now, and the nature of the rest can be examined in similar ways to those described in the first section.
Here is a fragment of the generated code:
And the test method that demonstrates on a textbook example how these extensions can make code a bit shorter (of course, you could also generate the test methods as well… and have your code generator generate the code that tests if it's working fine):
Attribute Oriented Programming
I chose a very simple example to demonstrate how Roslyn can be used to add attribute oriented features to C#. The tool I made provides two attributes that can be applied to properties. One raises an event from the property’s setter, the other does the same from its getter. Unfortunately, it’s not fully automatic currently. It requires three additional steps from the developer after applying an attribute to make it work. It also relies on Reactive Extensions at the moment, because the CTP lacks support for standard C# style events.
Here’s how it looks like in practice:
When the appropriate attribute is applied to a property that doesn't have the coressponding event implemented, the CodeIssueProvider indicates an error.
When ctrl+. is hit on the error, it shows the suggested quickfix.
After applying the quickfix, the code should be formatted.
Finally, the appropriate outlining put there by the SyntaxOutliner needs to be collapsed.
The primary point of this is that redundant code will be hidden from the developer while working on the project.
Here’s a less trivial example:
The code for the CodeIssueProvider:
And for the SyntaxOutliner:
Of course, there’s plenty of space for improvement. It could properly handle errors, “clean up” after itself when the attribute is removed, accept parameters for customization, work on properties that are not auto-implemented, work on all properties of a given class, build a new completion list so the private members don’t pollute the in-class intellisense etc…
This is supposed to be a basic proof-of-concept. The fact that it relies on Roslyn extensions limits its usability today anyways.
Conclusion
The Roslyn CTP is an extremely exciting toy to play with. Back at PDC ’08, Anders Hejlsberg already said they’ve been working on it for a year, which implies that the C# and VB teams put an extreme amount of work behind it as well by now. Although it does have issues -both functional and structural- that keep reminding the consumer that it’s not a fully mature tool yet, I found its design very satisfying overall. In addition to the potential its public API has, the design team is also considering releasing its source code eventually.
Regardless of what happens to Roslyn and what it may hold for the future, its current release is already opening up an enormous amount of new possibilities. I hope this article succeeded to give you a pleasant little taste of those.
About the Source Code
The non-extension projects attached will open up in Visual Studio 11 or Visual Studio 2010 SP1, they will not compile unless provided with the corresponding DLLs from the CTP though. The refactoring extension requires Visual Studio 2010 SP1 with Roslyn CTP installed, while the third project also needs reference to Reactive Extensions on the top of that.
History
- 2012-05-08: A few tiny modifications.
- 2011-12-26: Minor revision.
- Added Index, Source Code and History sections.
- Added screencap of generated code.
- Changed the wording of a few sentences.
- 2011-12-22: First published version.