Resolving Symbolic References in a CodeDOM (Part 7)

KenBeckett

4.75/5 (6 votes)

Dec 2, 2012

CDDL

12 min read

19862

530

Resolving symbolic references in a CodeDOM.

Introduction

This article is about resolving symbolic references in a codeDOM. Sources are included. This is Part 7 of a series on codeDOMs, but it may be useful for anyone who wishes to resolve symbolic references in C# source code. In the previous parts, I’ve discussed “CodeDOMs”, provided a C# codeDOM, a WPF UI and IDE, a C# parser, solution/project codeDOM classes, and covered loading type metadata.

Resolving a CodeDOM Tree

In the previous articles of this series, I’ve built up a C# codeDOM that can parse itself from existing C# source files, resulting in a tree of code objects with most symbolic references unresolved and represented by UnresolvedRef objects. In the last article, I added support for loading type metadata from referenced assemblies, and now it’s time to add the necessary logic to replace those UnresolvedRef objects with specific references using the classes shown in the table below.

CodeDOM Class	Purpose
`SymbolicRef`	Base class for all symbolic references.
`TypeRefBase`	Base class for all type references (supports generic parameters, arrays).
`UnresolvedRef`	Unresolved reference.
`UnresolvedThisRef`	Unresolved ‘this’ for an explicit interface implementation of an indexer.
`TypeRef`	References a type declaration.
`AliasRef`	References an alias (to a type or namespace).
`TypeParameterRef`	References a type parameter.
`MethodRef`	References a method.
`AnonymousMethodRef`	References an anonymous method.
`ConstructorRef`	References a constructor.
`OperatorRef`	References an operator declaration.
`GotoTargetRef`	Base class for ‘goto’ target references.
`LabelRef`	References a label.
`SwitchItemRef`	References a switch item (case or default).
`SelfRef`	Base class for current object instance references.
`BaseRef`	References the base class of the current object instance.
`ThisRef`	References the current object instance.
`VariableRef`	Base class for all variable references.
`PropertyRef`	References a property.
`IndexerRef`	References an indexer.
`EventRef`	References an event.
`EnumMemberRef`	References an enum member.
`FieldRef`	References a field.
`LocalRef`	References a local variable.
`ParameterRef`	References a parameter.
`ExternAliasRef`	References an extern alias.
`NamespaceRef`	References a namespace.
`DirectiveSymbolRef`	References a compiler directive symbol.

We need to traverse the codeDOM tree from the top-down looking for UnresolvedRef objects and then attempt to resolve each one into the appropriate specific reference object to the correct declaration by following the various scoping and resolution rules dictated in the C# language specification. One of the things that becomes apparent when thinking about how to resolve references is that often the type of the reference can be determined based upon the context. For example, references after a ‘using’ directive should represent namespaces, and variable types and method return types should represent types (with optional namespace prefixes), etc. The enum ResolveCategory represents the possible categories of reference types based on the current context.

The virtual method “CodeObject Resolve(ResolveCategory resolveCategory, ResolveFlags flags)” on the CodeObject base class is used to perform the top-down traversal of the tree, and is overloaded as necessary by all code objects to perform any special logic and to resolve all child objects. For UnresolvedRef objects, the overload attempts to resolve them, and returns the appropriate new reference object if successful, with parent objects assigning the result to their child properties. The ResolveFlags enum parameter is used to represent special modes, such as the 3 phases below, resolving inside documentation comments, and “unresolve” mode (used to convert resolved references back into UnresolvedRefs when necessary).

For Solution, Project, and CodeUnit objects, the Resolve() overload executes in 3 phases:

Phase 1 resolves all types – all statements in CodeUnits and NamespaceDecls, and TypeDecl headers including any base lists, stopping at type bodies.
Phase 2 resolves all type members – definitions of methods, properties, fields, stopping at the bodies of methods, properties, or initializers of fields.
Phase 3 resolves all of the code – the bodies of methods, properties, and field initializers.

Not that this is not 3 full passes, but rather more of a breadth-first rather than depth-first traversal of the tree, and it allows for all references to be resolved in a single pass without order-of-evaluation dependency problems (another special case is that Switch needs to resolve all Case expressions before all of the bodies in order to handle possible forward references via “goto case …”).

Resolving an UnresolvedRef

The Resolve() overload of UnresolvedRef first calls Resolve() on any type argument children, and then it attempts to resolve itself by creating an instance of the Resolver class, passing itself to the constructor, and calling Resolve() on it. The Resolver class operates according to the ResolveCategory, behaving differently when looking for specific reference types as compared to references in expressions (which can be of almost any type). Validation of the type of a possible matching object and the text of any error message is also based upon the category.

The Resolver class contains various special-case logic, but the primary functionality consists of calling ResolveRef() to look for declarations with a matching name at the current scope, and if nothing is found it calls ResolveRefUp() to continue searching at higher levels of the tree, eventually stopping if nothing is found. Depending upon the resolve category, it might stop before reaching the top of the tree if that makes sense.

When a declaration with a matching name is found, the AddMatch() method is called on the Resolver instance, which creates a MatchCandidate instance and then validates that the type of the matched object is valid for the resolve category. If the match is a method, it must then attempt to infer any omitted type arguments if the method is generic, and go through a lot of complicated overload logic to determine if the parameter types match the types of the supplied arguments. There are also checks to verify that the candidate object is static or not as appropriate, and that the access specifiers allow it to be accessed in the current scope. It’s determined whether the candidate is a “complete” match or only partial, and this in turn determines whether or not the search will continue into other higher scopes.

If this process finds a single valid match, Resolver.Resolve() creates a new reference to the matched declaration by calling CreateRef() on it, and returns it, causing the UnresolvedRef object to be replaced with it. If no matches are found, or if multiple matches are found, error messages are generated as appropriate and attached to the UnresolvedRef object (and they get propagated up to the Solution level and logged to the console or displayed in the UI).

Expression Type Evaluation

In order to determine the proper match for an overloaded method, it’s necessary to evaluate the types of the argument expressions to see if they match the parameter types. A virtual “TypeRefBase EvaluateType()” method has been added to Expression, and is overloaded as necessary by subclasses to evaluate their type. Also, a virtual “TypeRefBase EvaluateTypeArgumentTypes()” method has been added to TypeRefBase to evaluate the types of any generic type arguments on a type or method reference. The AddMatch() method on Resolver uses these to evaluate the type of each argument expression passed to a method, and then calls ParameterRef.MatchParameter() to determine if the type matches the parameter type (which internally calls EvaluateParameter()).

Various other methods necessary to the type evaluation process include the following members of the TypeRef class: FindTypeArgument() used in the evaluation of type arguments, IsImplicitlyConvertibleTo() and FindUserDefinedImplicitConversion() to handle implicit conversions, and GetCommonType() to determine a common type that can represent two given types.

Method Groups

Sometimes the name of an overloaded method is used by itself in code, without any parentheses or parameters. This is known as a “method group”, and it is usually assigned to a variable of delegate type or passed to a parameter of delegate type. Such method groups are represented by the UnresolvedRef class, which will have multiple match candidates in such a case. The method group is then normally resolved to a single method reference using the delegate type to which it is assigned (or passed) to determine the parameter types and thus the single matching method.

Generated Files

In some cases, C# source files are generated at compile time with partial classes that must be combined by the compiler with “code-behind” files. These files have extensions such as “.Designer.cs” or “.g.cs”, and may be located in the output directory as temporary files. Now that we are resolving symbolic references, we need to also load and process these generated files or we’ll have many symbols which can’t be resolved. This is done by logic in the Project class detecting and including such files in the project. Logic has also been added to ignore automatic code cleanup and save attempts for such files. Nova does not yet automatically generate these files if they are missing (like VS does), so if a project hasn’t been built for a particular configuration, the generated files will show as missing from the project and resolve errors will be generated for symbolic references to declarations in them.

Code Inside Documentation Comments

Code inside documentation comments – inside a <code> or <c>tag, or a ‘cref’ attribute – is automatically parsed and resolved by default. However, any parse or resolve errors that occur in such code will be treated as only warnings. Parsing of content inside <code> tags can be turned off by setting the DocCode.ParseContentAsCode option to false, and if it’s not parsed then there won’t be anything to resolve, either.

Nova Studio Improvements

Nova Studio now resolves all symbolic references automatically upon loading solutions or projects. So, missing references or other such issues can now cause a lot of error messages. A “Go To Declaration” option has been added to the context menu to navigate to the target of a symbolic reference. Also, any expression which evaluates to a constant value now displays the constant value in its tooltip. The screenshot below shows that references are resolved.

Performance

The Nova Studio IDE now has similar functionality to the VS IDE as far as loading solutions, projects, files, and referenced assemblies into memory and also parsing and resolving all sources and displaying error messages. Just out of curiosity, here’s a performance comparison:

Solution	Projects	Files	Code Objects	Load (secs)		Memory (MB)
Solution	Projects	Files	Code Objects	Nova	VS2010	Nova	VS2010
Nova	10	820	439,649	5	22	201	424
SubText 2.5.2	7	849	277,163	4	26	218	410
Mono Tests	1,939	1,945	157,634	9	`4,000+`	329	`3,000+`
MS EntLib Tests	70	2,445	866,065	10	43	338	583
Large Proprietary	43	4,677	1,829,942	19	72	426	615
SharpDevelop 4.2	93	5,289	2,556,585	23	76	485	631

The numbers shown are approximate, using Task Manager to check peak working sets while loading and the approximate time until the UI is highly responsive (CPU usage less than 15%). Nova isn’t optimized yet, especially not the resolving, but it seems to be performing up to 3-4 times faster than VS. I’ve noticed over the years that VS (or perhaps MSBuild, which it uses to load) seems to be doing something for each project that takes about a half to a full second, and so makes loading large numbers of projects painfully slow. Despite some improvements over the years, it’s always been and still is slow if you have dozens of projects, much less a hundred or so. The test case shown of a solution with almost 2,000 test projects from Mono is laughable – I gave up after well over ONE HOUR of waiting and with the working set over 3 GB! That seems to indicate that the performance has an exponential relationship to the number of projects. Nova does all the work that needs to be done in less than 10 seconds. I know this is not a typical use case, but it’s still very sad… because it probably wouldn’t be that hard to fix this, and if they did then VS would be much snappier loading more typical solutions. I think millions of users out there loading solutions in say 1/2 the time would be well worth the effort of a few man-months of dev work to clean up the loading process. Hey, MS, anyone listening?

Using the Attached Source Code

A new Resolving folder has been added with new classes related to the resolving support: Resolver and MatchCandidate (and a MatchCandidates collection). It also contains the ResolveCategory and ResolveFlags enums. Various resolving-related methods – such as Resolve(), ResolveRef(), ResolveRefUp() and others – have been added to most of the existing CodeDOM classes, segregated into regions with a comment of “RESOLVING”. New examples have been added to the Nova.Examples project, including demonstration of the LoadOptions.DoNotResolve flag to load solutions/projects without resolving if it’s not required. Nova Studio now resolves by default, and all of the red unresolved references should be gone (other than for post-C# 2.0 features, which I’ve implemented but not yet released as open source). Use it to load the provided source code (“Nova.sln”) and inspect the source files to see. As usual, a separate ZIP file containing binaries is provided so that you can run them without having to build them first.

Summary

My codeDOM now has support for loading, parsing, and resolving C# solutions and projects. Nova Studio is starting to look like a real IDE! Now that entire solutions can be loaded and resolved, it’s time to add some basic features for analyzing the code. Everything up to this point may have been interesting, but now it’s time to actually do something useful! In my next article, I’ll look at calculating metrics on a codeDOM and also doing various types of searches on it.