Resolving Symbolic References in a CodeDOM (Part 7)






4.75/5 (6 votes)
Resolving symbolic references in a CodeDOM.
Introduction
This article is about resolving symbolic references in a codeDOM. Sources are included. This is Part 7 of a series on codeDOMs, but it may be useful for anyone who wishes to resolve symbolic references in C# source code. In the previous parts, I’ve discussed “CodeDOMs”, provided a C# codeDOM, a WPF UI and IDE, a C# parser, solution/project codeDOM classes, and covered loading type metadata.
Resolving a CodeDOM Tree
In the previous articles of this series, I’ve built up a C#
codeDOM that can parse itself from existing C# source files, resulting in a
tree of code objects with most symbolic references unresolved and represented
by UnresolvedRef
objects. In the last article, I added
support for loading type metadata from referenced assemblies, and now it’s time
to add the necessary logic to replace those UnresolvedRef
objects with
specific references using the classes shown in the table below.
CodeDOM Class | Purpose |
SymbolicRef
|
Base class for all symbolic references. |
TypeRefBase
|
Base class for all type references (supports generic parameters, arrays). |
UnresolvedRef
|
Unresolved reference. |
UnresolvedThisRef
|
Unresolved ‘this’ for an explicit interface implementation of an indexer. |
TypeRef
|
References a type declaration. |
AliasRef
|
References an alias (to a type or namespace). |
TypeParameterRef
|
References a type parameter. |
MethodRef
|
References a method. |
AnonymousMethodRef
|
References an anonymous method. |
ConstructorRef
|
References a constructor. |
OperatorRef
|
References an operator declaration. |
GotoTargetRef
|
Base class for ‘goto’ target references. |
LabelRef
|
References a label. |
SwitchItemRef
|
References a switch item (case or default). |
SelfRef
|
Base class for current object instance references. |
BaseRef
|
References the base class of the current object instance. |
ThisRef
|
References the current object instance. |
VariableRef
|
Base class for all variable references. |
PropertyRef
|
References a property. |
IndexerRef
|
References an indexer. |
EventRef
|
References an event. |
EnumMemberRef
|
References an enum member. |
FieldRef
|
References a field. |
LocalRef
|
References a local variable. |
ParameterRef
|
References a parameter. |
ExternAliasRef
|
References an extern alias. |
NamespaceRef
|
References a namespace. |
DirectiveSymbolRef
|
References a compiler directive symbol. |
We need to traverse the codeDOM tree from the top-down looking
for UnresolvedRef
objects and then attempt to resolve each one into the appropriate specific
reference object to the correct declaration by following the various scoping
and resolution rules dictated in the C# language specification. One of the things that becomes apparent when
thinking about how to resolve references is that often the type of the
reference can be determined based upon the context. For example, references after a ‘using’
directive should represent namespaces, and variable types and method return
types should represent types (with optional namespace prefixes), etc. The enum ResolveCategory
represents
the possible categories of reference types based on the current context.
The virtual method “CodeObject Resolve(ResolveCategory
resolveCategory, ResolveFlags flags)
” on the CodeObject
base class
is used to perform the top-down traversal of the tree, and is overloaded as
necessary by all code objects to perform any special logic and to resolve all
child objects. For UnresolvedRef
objects, the overload attempts to resolve them, and returns the appropriate new
reference object if successful, with parent objects assigning the result to
their child properties. The ResolveFlags
enum parameter is used to represent special modes, such as the 3 phases below,
resolving inside documentation comments, and “unresolve” mode (used to convert resolved
references back into UnresolvedRef
s when necessary).
For Solution
, Project
, and CodeUnit
objects, the Resolve()
overload executes in 3 phases:
- Phase 1 resolves all types
– all statements in
CodeUnit
s andNamespaceDecl
s, andTypeDecl
headers including any base lists, stopping at type bodies. - Phase 2 resolves all type members – definitions of methods, properties, fields, stopping at the bodies of methods, properties, or initializers of fields.
- Phase 3 resolves all of the code – the bodies of methods, properties, and field initializers.
Not that this is not
3 full passes, but rather more of a breadth-first rather than depth-first
traversal of the tree, and it allows for all references to be resolved in a
single pass without order-of-evaluation dependency problems (another special
case is that Switch
needs to resolve all Case
expressions before all of the bodies in order to
handle possible forward references via “goto case …
”).
Resolving an UnresolvedRef
The Resolve()
overload of UnresolvedRef
first
calls Resolve()
on any type argument children, and then it attempts to resolve itself by
creating an instance of the Resolver
class, passing itself to the
constructor, and calling Resolve()
on it. The Resolver
class operates according to the
ResolveCategory
,
behaving differently when looking for specific reference types as compared to
references in expressions (which can be of almost any type). Validation of the type of a possible matching
object and the text of any error message is also based upon the category.
The Resolver
class contains various special-case
logic, but the primary functionality consists of calling ResolveRef()
to look for declarations with a matching name at the current scope, and if
nothing is found it calls ResolveRefUp()
to continue searching at
higher levels of the tree, eventually stopping if nothing is found. Depending upon the resolve category, it might
stop before reaching the top of the tree if that makes sense.
When a declaration with a matching name is found, the AddMatch()
method is called on the Resolver
instance, which creates a MatchCandidate
instance and then validates that the type of the matched object is valid for
the resolve category. If the match is a
method, it must then attempt to infer any omitted type arguments if the method
is generic, and go through a lot of complicated overload logic to determine if the
parameter types match the types of the supplied arguments. There are also checks to verify that the candidate object is static or not as appropriate, and that the access
specifiers allow it to be accessed in the current scope. It’s determined whether the candidate is a
“complete” match or only partial, and this in turn determines whether or not
the search will continue into other higher scopes.
If this process finds a single valid match, Resolver.Resolve()
creates a new reference to the matched declaration by calling CreateRef()
on it, and returns it, causing the UnresolvedRef
object to be replaced with
it. If no matches are found, or if
multiple matches are found, error messages are generated as appropriate and
attached to the UnresolvedRef
object (and they get propagated up to the Solution
level and logged to
the console or displayed in the UI).
Expression Type Evaluation
In order to determine the proper match for an overloaded
method, it’s necessary to evaluate the types of the argument expressions to see
if they match the parameter types. A
virtual “TypeRefBase EvaluateType()
” method has been added to Expression
, and is
overloaded as necessary by subclasses to evaluate their type. Also, a virtual “TypeRefBase
EvaluateTypeArgumentTypes()
” method has been added to TypeRefBase
to evaluate the types of any generic type arguments on a type or method reference. The AddMatch()
method on Resolver
uses these to evaluate the type of each argument expression passed to a method,
and then calls ParameterRef.MatchParameter()
to determine if the type matches the parameter type (which internally calls EvaluateParameter()
).
Various other methods necessary to the type evaluation
process include the following members of the TypeRef
class: FindTypeArgument()
used in the evaluation of
type arguments, IsImplicitlyConvertibleTo()
and FindUserDefinedImplicitConversion()
to handle implicit conversions, and GetCommonType()
to determine a common
type that can represent two given types.
Method Groups
Sometimes the name of an overloaded method is used by itself
in code, without any parentheses or parameters.
This is known as a “method group”, and it is usually assigned to a
variable of delegate type or passed to a parameter of delegate type. Such method groups are represented by the UnresolvedRef
class, which will have multiple match candidates in such a case. The method group is then normally resolved to
a single method reference using the delegate type to which it is assigned (or
passed) to determine the parameter types and thus the single matching method.
Generated Files
In some cases, C# source files are generated at compile time
with partial classes that must be combined by the compiler with “code-behind”
files. These files have extensions such
as “.Designer.cs” or “.g.cs”, and may be located in the output directory as
temporary files. Now that we are
resolving symbolic references, we need to also load and process these generated
files or we’ll have many symbols which can’t be resolved. This is done by logic in the Project
class detecting and including such files in the project. Logic has also been added to ignore automatic
code cleanup and save attempts for such files.
Nova does not yet automatically
generate these files if they are missing (like VS does), so if a project hasn’t
been built for a particular configuration, the generated files will show as
missing from the project and resolve errors will be generated for symbolic
references to declarations in them.
Code Inside Documentation Comments
Code inside documentation comments – inside a <code>
or <c>
tag, or a ‘cref
’ attribute – is automatically parsed and resolved by
default. However, any parse or resolve
errors that occur in such code will be treated as only warnings. Parsing of content inside <code>
tags can be turned off by setting the DocCode.ParseContentAsCode
option to
false, and if it’s not parsed then there won’t be anything to resolve, either.
Nova Studio Improvements
Nova Studio now resolves all symbolic references automatically upon loading solutions or projects. So, missing references or other such issues can now cause a lot of error messages. A “Go To Declaration” option has been added to the context menu to navigate to the target of a symbolic reference. Also, any expression which evaluates to a constant value now displays the constant value in its tooltip. The screenshot below shows that references are resolved.
Performance
The Nova Studio IDE now has similar functionality to the VS IDE as far as loading solutions, projects, files, and referenced assemblies into memory and also parsing and resolving all sources and displaying error messages. Just out of curiosity, here’s a performance comparison:
Solution | Projects | Files | Code Objects | Load (secs) | Memory (MB) | ||
Nova | VS2010 | Nova | VS2010 | ||||
Nova | 10 | 820 | 439,649 | 5 | 22 | 201 | 424 |
SubText 2.5.2 | 7 | 849 | 277,163 | 4 | 26 | 218 | 410 |
Mono Tests | 1,939 | 1,945 | 157,634 | 9 |
4,000+
|
329 |
3,000+
|
MS EntLib Tests | 70 | 2,445 | 866,065 | 10 | 43 | 338 | 583 |
Large Proprietary | 43 | 4,677 | 1,829,942 | 19 | 72 | 426 | 615 |
SharpDevelop 4.2 | 93 | 5,289 | 2,556,585 | 23 | 76 | 485 | 631 |
The numbers shown are approximate, using Task Manager to check peak working sets while loading and the approximate time until the UI is highly responsive (CPU usage less than 15%). Nova isn’t optimized yet, especially not the resolving, but it seems to be performing up to 3-4 times faster than VS. I’ve noticed over the years that VS (or perhaps MSBuild, which it uses to load) seems to be doing something for each project that takes about a half to a full second, and so makes loading large numbers of projects painfully slow. Despite some improvements over the years, it’s always been and still is slow if you have dozens of projects, much less a hundred or so. The test case shown of a solution with almost 2,000 test projects from Mono is laughable – I gave up after well over ONE HOUR of waiting and with the working set over 3 GB! That seems to indicate that the performance has an exponential relationship to the number of projects. Nova does all the work that needs to be done in less than 10 seconds. I know this is not a typical use case, but it’s still very sad… because it probably wouldn’t be that hard to fix this, and if they did then VS would be much snappier loading more typical solutions. I think millions of users out there loading solutions in say 1/2 the time would be well worth the effort of a few man-months of dev work to clean up the loading process. Hey, MS, anyone listening?
Using the Attached Source Code
A new Resolving
folder has been added with new
classes related to the resolving support: Resolver
and MatchCandidate
(and a MatchCandidates
collection). It also contains the ResolveCategory
and ResolveFlags
enums. Various resolving-related methods
– such as Resolve()
,
ResolveRef()
,
ResolveRefUp()
and others – have been added to most of the existing CodeDOM classes,
segregated into regions with a comment of “RESOLVING”. New examples have been added to the
Nova.Examples project, including demonstration of the LoadOptions.DoNotResolve
flag to load solutions/projects without resolving if it’s not required. Nova Studio now resolves by default, and all
of the red unresolved references should be gone (other than for post-C# 2.0
features, which I’ve implemented but not yet released as open source). Use it to load the provided source code
(“Nova.sln”) and inspect the source files to see. As usual, a separate ZIP file containing
binaries is provided so that you can run them without having to build them
first.
Summary
My codeDOM now has support for loading, parsing, and resolving C# solutions and projects. Nova Studio is starting to look like a real IDE! Now that entire solutions can be loaded and resolved, it’s time to add some basic features for analyzing the code. Everything up to this point may have been interesting, but now it’s time to actually do something useful! In my next article, I’ll look at calculating metrics on a codeDOM and also doing various types of searches on it.