Click here to Skip to main content
Click here to Skip to main content

Resolving Symbolic References in a CodeDOM (Part 7)

, 2 Dec 2012 CDDL
Rate this:
Please Sign up or sign in to vote.
Resolving symbolic references in a CodeDOM.

Introduction 

This article is about resolving symbolic references in a codeDOM.  Sources are included.  This is Part 7 of a series on codeDOMs, but it may be useful for anyone who wishes to resolve symbolic references in C# source code.  In the previous parts, I’ve discussed “CodeDOMs”, provided a C# codeDOM, a WPF UI and IDE, a C# parser, solution/project codeDOM classes, and covered loading type metadata

Resolving a CodeDOM Tree

In the previous articles of this series, I’ve built up a C# codeDOM that can parse itself from existing C# source files, resulting in a tree of code objects with most symbolic references unresolved and represented by UnresolvedRef objects.  In the last article, I added support for loading type metadata from referenced assemblies, and now it’s time to add the necessary logic to replace those UnresolvedRef objects with specific references using the classes shown in the table below.

CodeDOM Class Purpose
SymbolicRef Base class for all symbolic references.
   TypeRefBase Base class for all type references (supports generic parameters, arrays).
      UnresolvedRef Unresolved reference.
         UnresolvedThisRef Unresolved ‘this’ for an explicit interface implementation of an indexer.
      TypeRef References a type declaration.
         AliasRef References an alias (to a type or namespace).
         TypeParameterRef References a type parameter.
      MethodRef References a method.
         AnonymousMethodRef References an anonymous method.
         ConstructorRef References a constructor.
         OperatorRef References an operator declaration.
   GotoTargetRef Base class for ‘goto’ target references.
      LabelRef References a label.
      SwitchItemRef References a switch item (case or default).
   SelfRef Base class for current object instance references.
      BaseRef References the base class of the current object instance.
      ThisRef References the current object instance.
   VariableRef Base class for all variable references.
      PropertyRef References a property.
         IndexerRef References an indexer.
      EventRef References an event.
      EnumMemberRef References an enum member.
      FieldRef References a field.
      LocalRef References a local variable.
      ParameterRef References a parameter.
   ExternAliasRef References an extern alias.
   NamespaceRef References a namespace.
   DirectiveSymbolRef References a compiler directive symbol.

We need to traverse the codeDOM tree from the top-down looking for UnresolvedRef objects and then attempt to resolve each one into the appropriate specific reference object to the correct declaration by following the various scoping and resolution rules dictated in the C# language specification.  One of the things that becomes apparent when thinking about how to resolve references is that often the type of the reference can be determined based upon the context.  For example, references after a ‘using’ directive should represent namespaces, and variable types and method return types should represent types (with optional namespace prefixes), etc.  The enum ResolveCategory represents the possible categories of reference types based on the current context.

The virtual method “CodeObject Resolve(ResolveCategory resolveCategory, ResolveFlags flags)” on the CodeObject base class is used to perform the top-down traversal of the tree, and is overloaded as necessary by all code objects to perform any special logic and to resolve all child objects.  For UnresolvedRef objects, the overload attempts to resolve them, and returns the appropriate new reference object if successful, with parent objects assigning the result to their child properties.  The ResolveFlags enum parameter is used to represent special modes, such as the 3 phases below, resolving inside documentation comments, and “unresolve” mode (used to convert resolved references back into UnresolvedRefs when necessary).

For Solution, Project, and CodeUnit objects, the Resolve() overload executes in 3 phases:

  • Phase 1 resolves all types – all statements in CodeUnits and NamespaceDecls, and TypeDecl headers including any base lists, stopping at type bodies.
  • Phase 2 resolves all type members – definitions of methods, properties, fields, stopping at the bodies of methods, properties, or initializers of fields.
  • Phase 3 resolves all of the code – the bodies of methods, properties, and field initializers.

Not that this is not 3 full passes, but rather more of a breadth-first rather than depth-first traversal of the tree, and it allows for all references to be resolved in a single pass without order-of-evaluation dependency problems (another special case is that Switch needs to resolve all Case expressions before all of the bodies in order to handle possible forward references via “goto case …”).

Resolving an UnresolvedRef

The Resolve() overload of UnresolvedRef first calls Resolve() on any type argument children, and then it attempts to resolve itself by creating an instance of the Resolver class, passing itself to the constructor, and calling Resolve() on it.  The Resolver class operates according to the ResolveCategory, behaving differently when looking for specific reference types as compared to references in expressions (which can be of almost any type).  Validation of the type of a possible matching object and the text of any error message is also based upon the category.

The Resolver class contains various special-case logic, but the primary functionality consists of calling ResolveRef() to look for declarations with a matching name at the current scope, and if nothing is found it calls ResolveRefUp() to continue searching at higher levels of the tree, eventually stopping if nothing is found.  Depending upon the resolve category, it might stop before reaching the top of the tree if that makes sense.

When a declaration with a matching name is found, the AddMatch() method is called on the Resolver instance, which creates a MatchCandidate instance and then validates that the type of the matched object is valid for the resolve category.  If the match is a method, it must then attempt to infer any omitted type arguments if the method is generic, and go through a lot of complicated overload logic to determine if the parameter types match the types of the supplied arguments.  There are also checks to verify that the candidate object is static or not as appropriate, and that the access specifiers allow it to be accessed in the current scope.  It’s determined whether the candidate is a “complete” match or only partial, and this in turn determines whether or not the search will continue into other higher scopes.

If this process finds a single valid match, Resolver.Resolve() creates a new reference to the matched declaration by calling CreateRef() on it, and returns it, causing the UnresolvedRef object to be replaced with it.  If no matches are found, or if multiple matches are found, error messages are generated as appropriate and attached to the UnresolvedRef object (and they get propagated up to the Solution level and logged to the console or displayed in the UI).

Expression Type Evaluation

In order to determine the proper match for an overloaded method, it’s necessary to evaluate the types of the argument expressions to see if they match the parameter types.  A virtual “TypeRefBase EvaluateType()” method has been added to Expression, and is overloaded as necessary by subclasses to evaluate their type.  Also, a virtual “TypeRefBase EvaluateTypeArgumentTypes()” method has been added to TypeRefBase to evaluate the types of any generic type arguments on a type or method reference.  The AddMatch() method on Resolver uses these to evaluate the type of each argument expression passed to a method, and then calls ParameterRef.MatchParameter() to determine if the type matches the parameter type (which internally calls EvaluateParameter()).

Various other methods necessary to the type evaluation process include the following members of the TypeRef class:  FindTypeArgument() used in the evaluation of type arguments, IsImplicitlyConvertibleTo() and FindUserDefinedImplicitConversion() to handle implicit conversions, and GetCommonType() to determine a common type that can represent two given types.

Method Groups

Sometimes the name of an overloaded method is used by itself in code, without any parentheses or parameters.  This is known as a “method group”, and it is usually assigned to a variable of delegate type or passed to a parameter of delegate type.  Such method groups are represented by the UnresolvedRef class, which will have multiple match candidates in such a case.  The method group is then normally resolved to a single method reference using the delegate type to which it is assigned (or passed) to determine the parameter types and thus the single matching method.

Generated Files

In some cases, C# source files are generated at compile time with partial classes that must be combined by the compiler with “code-behind” files.  These files have extensions such as “.Designer.cs” or “.g.cs”, and may be located in the output directory as temporary files.  Now that we are resolving symbolic references, we need to also load and process these generated files or we’ll have many symbols which can’t be resolved.  This is done by logic in the Project class detecting and including such files in the project.  Logic has also been added to ignore automatic code cleanup and save attempts for such files.  Nova does not yet automatically generate these files if they are missing (like VS does), so if a project hasn’t been built for a particular configuration, the generated files will show as missing from the project and resolve errors will be generated for symbolic references to declarations in them. 

Code Inside Documentation Comments

Code inside documentation comments – inside a <code> or <c> tag, or a ‘cref’ attribute – is automatically parsed and resolved by default.  However, any parse or resolve errors that occur in such code will be treated as only warnings.  Parsing of content inside <code> tags can be turned off by setting the DocCode.ParseContentAsCode option to false, and if it’s not parsed then there won’t be anything to resolve, either.

Nova Studio Improvements

Nova Studio now resolves all symbolic references automatically upon loading solutions or projects.  So, missing references or other such issues can now cause a lot of error messages.  A “Go To Declaration” option has been added to the context menu to navigate to the target of a symbolic reference.  Also, any expression which evaluates to a constant value now displays the constant value in its tooltip.   The screenshot below shows that references are resolved.

Performance 

The Nova Studio IDE now has similar functionality to the VS IDE as far as loading solutions, projects, files, and referenced assemblies into memory and also parsing and resolving all sources and displaying error messages.  Just out of curiosity, here’s a performance comparison:

Solution Projects Files Code Objects Load (secs) Memory (MB)
Nova VS2010 Nova VS2010
Nova 10 820 439,649 5 22 201 424
SubText 2.5.2 7 849 277,163 4 26 218 410
Mono Tests 1,939 1,945 157,634 9 4,000+ 329 3,000+
MS EntLib Tests 70 2,445 866,065 10 43 338 583
Large Proprietary 43 4,677 1,829,942 19 72 426 615
SharpDevelop 4.2 93 5,289 2,556,585 23 76 485 631

The numbers shown are approximate, using Task Manager to check peak working sets while loading and the approximate time until the UI is highly responsive (CPU usage less than 15%).  Nova isn’t optimized yet, especially not the resolving, but it seems to be performing up to 3-4 times faster than VS.  I’ve noticed over the years that VS (or perhaps MSBuild, which it uses to load) seems to be doing something for each project that takes about a half to a full second, and so makes loading large numbers of projects painfully slow.  Despite some improvements over the years, it’s always been and still is slow if you have dozens of projects, much less a hundred or so.  The test case shown of a solution with almost 2,000 test projects from Mono is laughable – I gave up after well over ONE HOUR of waiting and with the working set over 3 GB!  That seems to indicate that the performance has an exponential relationship to the number of projects.  Nova does all the work that needs to be done in less than 10 seconds.  I know this is not a typical use case, but it’s still very sad… because it probably wouldn’t be that hard to fix this, and if they did then VS would be much snappier loading more typical solutions.  I think millions of users out there loading solutions in say 1/2 the time would be well worth the effort of a few man-months of dev work to clean up the loading process.  Hey, MS, anyone listening?

Using the Attached Source Code

A new Resolving folder has been added with new classes related to the resolving support: Resolver and MatchCandidate (and a MatchCandidates collection).  It also contains the ResolveCategory and ResolveFlags enums.  Various resolving-related methods – such as Resolve(), ResolveRef(), ResolveRefUp() and others – have been added to most of the existing CodeDOM classes, segregated into regions with a comment of “RESOLVING”.  New examples have been added to the Nova.Examples project, including demonstration of the LoadOptions.DoNotResolve flag to load solutions/projects without resolving if it’s not required.  Nova Studio now resolves by default, and all of the red unresolved references should be gone (other than for post-C# 2.0 features, which I’ve implemented but not yet released as open source).  Use it to load the provided source code (“Nova.sln”) and inspect the source files to see.  As usual, a separate ZIP file containing binaries is provided so that you can run them without having to build them first.

Summary

My codeDOM now has support for loading, parsing, and resolving C# solutions and projects.  Nova Studio is starting to look like a real IDE!  Now that entire solutions can be loaded and resolved, it’s time to add some basic features for analyzing the code.  Everything up to this point may have been interesting, but now it’s time to actually do something useful!  In my next article, I’ll look at calculating metrics on a codeDOM and also doing various types of searches on it.

License

This article, along with any associated source code and files, is licensed under The Common Development and Distribution License (CDDL)

Share

About the Author

KenBeckett
Software Developer (Senior)
United States United States
I've been writing software since the late 70's, currently focusing mainly on C#.NET. I also like to travel around the world, and I own a Chocolate Factory (sadly, none of my employees are oompa loompas).

Comments and Discussions

 
-- There are no messages in this forum --
| Advertise | Privacy | Mobile
Web04 | 2.8.141022.2 | Last Updated 2 Dec 2012
Article Copyright 2012 by KenBeckett
Everything else Copyright © CodeProject, 1999-2014
Terms of Service
Layout: fixed | fluid