Creating a CodeDOM: Modeling the Semantics of Code (Part 2)





5.00/5 (17 votes)
Creating a CodeDOM for C#
Introduction
This article is about creating a code model, or codeDOM, that models the semantics of a programming language – specifically, C#. Sources for a C# codeDOM are included, along with examples of creating and editing a codeDOM tree. This is Part 2 of a series on codeDOMs. See Part 1 for a discussion of what a “CodeDOM” is (in my opinion), including design goals for building one.
Choosing a Language
A codeDOM could be created for a brand new language, or it could model an existing language. In order to have a better chance of widespread adoption, I wanted to create a codeDOM that is interoperable with an existing mainstream text-based language. I also wanted to use a managed platform. This quickly narrowed my options to Java or C#. I personally prefer C# over Java for its richer feature set, so I chose to create a codeDOM for C#, even though it’s a very complex and rather rapidly evolving language.
Why Not Support Multiple Languages?
A codeDOM that supports multiple languages (as .NET does) might seem like a good idea – most languages have common features, and this would allow for mapping the same codeDOM objects onto different text languages. However, I think it’s much more important that the class names and functionality match the language being modeled (see design goal A in my first article). A codeDOM as I define it should be the language, and so shouldn’t be corrupted in order to support multiple languages. So, I will model C# closely, and any support for other languages in the future will involve the creation of new sets of codeDOM classes. Converters could always be created to convert between codeDOMs for different languages.
What about System.CodeDOM, Expression Trees, and Roslyn?
At first glance, the classes in the System.CodeDOM
namespace might look like exactly the sort
of codeDOM that I’ve been talking about, and they’ve existed since .NET
1.1. Unfortunately, they are designed
with something completely different in mind than what I’m looking for: generation of text code in multiple languages
from a single set of objects. Code
generation wizards can use them to manually build up a code object tree, and
then spit out the equivalent C# or VB code.
These classes violate my primary design goal (A), don’t have the needed functionality,
and most importantly they lack support for many C# language features.
Expression trees arrived with .NET 3.5 (in System.Linq.Expressions
). These classes were added as a part of the
support for LINQ, and they model C# expressions. They can be used to represent
single-expression lambdas only, and do not support statements. So, somewhat ironically, they provide some of
what is missing from System.CodeDOM
,
but themselves lack support for most language features other than expressions. They were extended in .NET 4.0 to provide
support for some statements such as try/catch blocks and switches, but this
support is for use by dynamic languages – the statement support doesn’t yet
work with lambda expressions. Again,
this was designed with something different in mind, and there is far too much
missing for it to be useful for our purposes.
The Roslyn project isn’t in a final release state yet (while I’m writing this), but a preview is available, and it does aim to provide classes for all C# language features. Roslyn is certainly something more like the general-purpose codeDOM I’ve been talking about, but the design goals of Roslyn still seem to differ quite a bit from my codeDOM design goals – for one thing, it focuses on a typical syntax (AST) model. In any case, my code has actually been under development for years, and I think it has some significant benefits over Roslyn. So, I’m going to present my codeDOM classes and all supporting code first (in a series of articles), and then in a later article I’ll take a look at Roslyn and how it compares.
General CodeDOM Observations
A number of interesting issues arise when modeling classes to represent code.
- It becomes obvious right away that much of the text syntax is mostly for the benefit of the parser, and is not really relevant for a tree of objects – such as semi-colon terminators, comma separators, braces, many parenthesis, etc. There is no need to represent such things in a code object tree.
- It is desirable for many comments to be attached to the specific code objects that they are documenting, such as a comment that documents a following line or block of code, or a preceding line of code in the case of EOL comments.
- Documentation comments are better represented with objects instead of XML, because it’s then much easier to work with them programmatically.
- Most formatting, such as tabs and other whitespace, are not really relevant for the display of a code object tree, as they can be generated using configurable settings. However, newlines are still relevant even for objects so that they can be displayed without much horizontal scrolling.
- Line and column numbers are not really relevant for the display of a code object tree, but they can still be useful for the equivalent parsed and/or generated text or messages that refer to it.
- Features of text-based languages can actually be constrained by the text syntax itself. Anything is possible when working with objects, but creating sensible and easy to parse text can be tricky.
Presenting a CodeDOM for C#
The source code that accompanies this article is a complete codeDOM for C# 2.0, minus ‘unsafe’ features. I’ve actually fully implemented support for all features through C# 5.0, but I’ve intentionally restricted the feature set for now, to make it a bit smaller and easier to understand (and also because I’m not ready to open-source all of my hard work just yet). Even without the newer features, there is still a huge amount of code to cover (over 250 classes), and their omission doesn’t really take anything away from this series of articles on codeDOMs and the many issues involved with creating and using them.
Since we have so many classes to discuss, we’ll cover them in logical groups. I have listed class names in tables with derived classes indicated by indentation, and with separate tables for related groups of classes. Furthermore, I have omitted descriptions for most concrete classes whose function is fairly obvious by the class name, showing descriptions mainly for base classes. Also, I’m not going to discuss classes or their members in much detail at all – this is just a high-level overview. The source code that accompanies this article contains documentation comments for further reference.
The major classes of the codeDOM (and one enum) are shown in
Table 1. The common base class for all of the codeDOM
objects is CodeObject
. It includes a Parent reference, an
optional collection of Annotations (comments, attributes, etc), and
a flags enum, FormatFlags
,
used primarily for minor formatting (such as newlines). Many helpful properties exist for the flags,
such as IsFirstOnLine
(true if the object starts on a new line), NewLines
(the total number
of newlines preceding the object), and IsSingleLine
(true if the object doesn’t
have any newlines within it). There are
also many useful properties dealing with annotations to determine if certain
kinds exist on an object (HasComments
, HasAttributes
, etc), or to
easily get or set certain annotations (DocComment
, EOLComment
, etc). The included tests and examples will help
demonstrate how to use the various available properties and methods.
Table 1 - Major Types
CodeObject | The common base class of all code objects. |
Statement
|
The common base class of all statements. |
BlockStatement
|
The common base class of all statements that can have a body of code. |
Expression
|
The common base class of all expressions. |
Operator
|
The common base class of all operations (binary, unary, or other). |
SymbolicRef
|
The common base class of all symbolic references. |
Annotation
|
The common base class of all code annotations. |
Attribute
|
Represents metadata associated with a code object. |
CommentBase
|
The common base class of all comments. |
Comment , DocComment
|
|
CompilerDirective
|
The common base class of all compiler directives. |
Message
|
Represents a generated message associated with a code object. |
Block
|
Represents the body of a BlockStatement (has 0 or more code objects). |
Namespace
|
Represents a namespace of type declarations and optional child namespaces. |
RootNamespace
|
Represents the top-level (global) namespace. |
NamespaceTypeDictionary
|
Represents a dictionary of namespaces and types. |
NamespaceTypeGroup
|
Represents a group of types and/or namespaces with the same name. |
NamedCodeObjectDictionary
|
Represents a dictionary of named code objects. |
NamedCodeObjectGroup
|
Represents a group of named code objects with the same name. |
ChildList<T>
|
Represents a collection of child code objects of a particular type. |
Modifiers
|
An enum of modifiers applicable to many declaration statements. |
Public , Protected , Internal , Private , Static ,
New , Abstract , Sealed , Virtual , Override , Extern , Partial , Implicit ,
Explicit , Const , ReadOnly , Volatile , Event
|
There is a Statement base class for all C# statements,
and BlockStatement
for all statements which can have child blocks of code. C# expressions all derive from Expression
,
and are broken up into two sub-categories for Operator
s and SymbolicRef
s. All of the many statement and expression
sub-classes will be listed and discussed in Table 2
through Table 6. The Attribute
class handles C# attributes, CompilerDirective
is the base for all compiler directives (see Table 8),
and both of these are Annotation
s on code, along with regular Comment
s,
documentation comments (Table
7),
and Message
s
(error, warning, or informational messages associated with code).
The Block
class is used for the body of a BlockStatement
. The Namespace
class represents a namespace
full of types and child namespaces (using NamespaceTypeDictionary
and
NamespaceTypeGroup
internally), and RootNamespace
is a subclass representing the
global namespace. The NamedCodeObjectDictionary
and NamedCodeObjectGroup
types are used by Block
to index all child objects with names – these are
very similar to the classes used by Namespace
, but they deal with named CodeObject
s
instead of types and namespaces, and the differences are enough to justify
separate classes. The generic ChildList<T>
class is a collection of CodeObject
s or any derived type, and is used
anywhere a collection of child objects is needed (the Block
class, the Annotation
collection, etc). This collection
ensures that objects added to it have their Parent property referencing
the owner of the collection (not the
collection itself). The Modifiers
enum is used by type and member declarations to specify their access level and
other special properties.
All Statement
classes are shown in Table 2. Many of them are also BlockStatement
s, which
can have bodies of code. Most map directly
to C# statements, and are self-explanatory.
A C# source file is represented by CodeUnit
(usually the root
object unless dealing with an isolated code fragment) which is derived from
NamespaceDecl
because they share a lot of functionality – it behaves as a root-level NamespaceDecl
with an implied “namespace global” (both can have ‘using’ directives, for
example). The name CodeUnit
was chosen because it might be in-memory rather than map to a file, and
“compilation unit” is too specific to compiling and quite long. Although I generally avoid abbreviations in
names, you’ll notice the use of “Decl” in many class names as short for
“Declaration”. Do not confuse UsingDirective
to import namespaces with Using
for the statement that uses the same
keyword.
Table 2 - Statement Classes
ExternAlias
|
Used with compiler options to create additional root-level namespaces. | |
UsingDirective
|
Imports the contents of a namespace into the current scope. | |
Alias
|
Represents the declaration of a namespace or type alias (“using name = …”). | |
BlockStatement
|
The common base class of all statements that can have a body of code. | |
NamespaceDecl
|
Declares a namespace plus a body of declarations that belong to it. | |
CodeUnit | Declares a unit of independent code that belongs to the root-level namespace. Usually is the contents of a source file, but it can also be in-memory only. | |
TypeDecl
|
The common base class of all type declarations. | |
BaseListTypeDecl
|
The common base class of all types with optional base type lists. | |
ClassDecl , EnumDecl ,
InterfaceDecl , StructDecl
|
||
DelegateDecl
|
||
MethodDeclBase
|
The common base class of all method declaration statements. | |
MethodDecl
|
Represents a method with a unique name and a return type. | |
GenericMethodDecl
|
Represents a generic method with type parameters. | |
AccessorDecl
|
The common base class of all accessors. | |
AccessorDeclWithValue
|
The common base class of all accessors with a value parameter. | |
SetterDecl
|
Represents a method used to write the value of a property. | |
AdderDecl
|
Represents a method used to add a delegate to an event. | |
RemoverDecl
|
Represents a method used to remove a delegate from an event. | |
GetterDecl
|
Represents a method used to read the value of a property. | |
OperatorDecl
|
The common base class of all user-defined operators. | |
ConversionOperatorDecl
|
||
ConstructorDecl , DestructorDecl
|
||
PropertyDeclBase
|
The common base class of all property-like declaration statements. | |
PropertyDecl , IndexerDecl ,
EventDecl
|
||
IfBase
|
The common base class of If and ElseIf statements. | |
If , ElseIf
|
||
SwitchItem
|
The common base class
of Case and Default statements (of a Switch ).
|
|
Case , Default
|
||
BlockDecl
|
Represents a block of code restricted to a local scope (surrounded by braces). | |
Else , For , ForEach , While ,
Try ,
Catch , Finally , Using , Lock ,CheckedBlock , UncheckedBlock
|
||
Break ,
Continue , Goto , Label , Return,
Throw
|
||
VariableDecl
|
The common base class of all variable declaration statements. | |
FieldDecl , LocalDecl ,
ParameterDecl , EnumMemberDecl
|
||
MultiFieldDecl , MultiLocalDecl , ValueParameterDecl ,
MultiEnumMemberDecl
|
||
YieldStatement
|
The common base class
of the YieldBreak and YieldReturn statements.
|
|
YieldBreak , YieldReturn
|
Notice that there are separate classes to declare multiple
fields or local variables in a single statement. They are derived from the class that supports
a single declaration – such as MultiFieldDecl
is derived from FieldDecl
. This was done to keep the most common case of
single declarations simple and lightweight, while supporting multi-declarations
in such a way that as little code as possible needs to be specifically aware of
them. The multi-declarations actually
are a collection of the single declarations where the modifiers and type are
forced to be the same. An EnumDecl
always uses a MultiEnumMemberDecl
to contain all of its EnumMemberDecl
entries.
The ValueParameterDecl
class is used for the implied ‘value’ parameter of property setters (and event
adder/removers).
Classes related to generics (types or methods) are shown in Table 3 – they model type parameters, constraint clauses, and individual constraint types.
Table 3 - Support Classes for Generics
TypeParameter
|
Represents a type parameter of a generic type or method declaration. |
ConstraintClause
|
Represents one or more constraints on a type parameter. |
TypeParameterConstraint
|
The common base class of all type parameter constraints. |
ClassConstraint ,
StructConstraint , NewConstraint , TypeConstraint
|
The largest sub-category of Expression
s is Operator
s,
which are shown in Table 4. The operators are generally
self-explanatory. They are divided
mainly into binary, unary, and those with arguments (either a variable number
or a single argument). The Conditional
(a
? b : c) operator is a special case.
Table 4 - Operator Classes
BinaryOperator
|
The common base class of all binary operators. |
Assignment
|
Assigns the right expression to the left. Also the common base class of all compound assignment operators. |
AddAssign (+=),
BitwiseAndAssign (&=), BitwiseOrAssign (|=),
BitwiseXorAssign (^=),DivideAssign (/=),
LeftShiftAssign (<<=), ModAssign (%=),
MultiplyAssign (*=),RightShiftAssign (>>=),
SubtractAssign (-=)
|
|
BinaryArithmeticOperator
|
The common base class of all binary arithmetic operators. |
Add (+), Divide (/), Mod (%), Multiply (*), Subtract (-)
|
|
BinaryBitwiseOperator
|
The common base class of all binary bitwise operators. |
BitwiseAnd (&), BitwiseOr (|), BitwiseXor (^)
|
|
BinaryBooleanOperator
|
The common base class of all binary operators with a boolean result. |
And (&&), Or (||), Is
|
|
RelationalOperator
|
The common base class of all relational operators. |
Equal (==), GreaterThan (>), GreaterThanEqual (>=), LessThan (<),LessThanEqual (<=), NotEqual (!=)
|
|
BinaryShiftOperator
|
The common base class of all shift operators. |
LeftShift (<<), RightShift (>>)
|
|
As , Dot ( . ), IfNullThen ( ?? ), Lookup ( :: )
|
|
UnaryOperator
|
The common base class of all unary operators. |
PreUnaryOperator
|
The common base class of all prefix unary operators. |
Cast , Complement (~), Decrement (--),
Increment (++), Negative (-), Not (!), Positive (+)
|
|
PostUnaryOperator
|
The common base class of all postfix unary operators. |
PostIncrement (++), PostDecrement (--)
|
|
ArgumentsOperator
|
The common base class of all operators with variable arguments. |
NewOperator
|
The common base class of the NewArray and NewObject
operators.
|
NewArray , NewObject
|
|
Call
|
Represents a call to a method or delegate, including any arguments. |
ConstructorInitializer
|
The common base class of BaseInitializer and ThisInitializer .
|
BaseInitializer , ThisInitializer
|
|
Index ([ ])
|
|
SingleArgumentOperator
|
The common base class of all operators with a single argument. |
CheckedOperator
|
The common base class of the Checked and Unchecked
operators.
|
Checked , Unchecked
|
|
RefOutOperator
|
The common base class of the Ref and Out
operators.
|
Ref , Out
|
|
TypeOperator
|
The common base class of TypeOf , SizeOf , DefaultValue
operators.
|
TypeOf , SizeOf , DefaultValue
|
|
Conditional ( ? : )
|
Represents a conditional if/then/else (“a ? b : c“) expression. |
The other big sub-category of Expression
s is SymbolicRef
s,
which are shown in Table 5
– these represent “symbolic references”, or a reference to a named code object
defined elsewhere. Notice the use of
“Ref
” in these class names as short for “Reference”. I have chosen to represent symbolic
references to objects as direct object references. An unresolved symbolic reference is
represented by an UnresolvedRef
, which references by name. When such a reference is resolved to a
particular object (resolving will be addressed in a future article), the UnresolvedRef
object is replaced by another one of the appropriate type, such as TypeRef
or MethodRef
,
which uses an object reference. The use
of direct references makes things faster and easier, and keeps the objects
smaller. Renaming any object doesn’t
require any “fixups” – all references will automatically pick up the new name
from the target object.
Most expressions in C# will evaluate to a type, which will
often be represented by TypeRefBase
, because although it will usually
be a TypeRef
,
it could also be an UnresolvedRef
or MethodRef
. Notice that
TypeRef
is used to reference
any C# type (whether a class, struct, interface, enum, or delegate) – there has
been no need so far to create derived reference types for these, and they are
easily distinguished with properties such as IsEnum
.
Table 5 - SymbolicRef Classes
TypeRefBase
|
The common base class of TypeRef , MethodRef , and
UnresolvedRef .
|
MethodRef
|
Represents a reference to a method declaration. |
AnonymousMethodRef ,
ConstructorRef , OperatorRef
|
|
TypeRef
|
Represents a reference to a type declaration. |
AliasRef , TypeParameterRef
|
|
UnresolvedRef
|
Represents a symbolic reference that hasn’t been resolved to a direct reference. |
GotoTargetRef
|
The common base class of LabelRef and SwitchItemRef .
|
LabelRef , SwitchItemRef
|
|
SelfRef
|
The common base class of ThisRef and BaseRef .
|
ThisRef , BaseRef
|
|
VariableRef
|
The common base class of all variable references. |
PropertyRef
|
Represents a reference to a property declaration. |
IndexerRef
|
|
EventRef , EnumMemberRef ,
FieldRef , LocalRef , ParameterRef
|
|
ExternAliasRef ,
NamespaceRef , DirectiveSymbolRef
|
There are a few expressions which are not operators or
symbolic references. They are shown in Table 6,
with Literal
being by far the most widely used. In
order to retain the exact text of a literal, such as escape sequences in
strings or chars, or the exact format (such as hex and/or suffix characters) of
numerics, the literal value is stored as a string inside the Literal
object. A GetValue()
method is provided
that returns the actual constant value of the appropriate type.
Table 6 - Other Expression Classes
AnonymousMethod |
Represents an un-named method that can be assigned to a delegate. |
Initializer |
Represents the initialization of an array. |
Literal |
Represents a literal value of a particular type (string, integer, boolean, etc). |
Documentation comment classes are listed in Table 7. Classes are provided for all standard XML
tags, so that they can be treated as objects, with no concern for dealing with the
XML (which I prefer to think of as an implementation detail). Non-standard tags can be represented using
DocTag
. The DocComment
class is both a common base
class and also a concrete class that can be used to represent just some text or
a collection of child objects. The DocC
class can contain a code Expression
, and the DocCode
class can
contain a code Block
– so code examples inside documentation comments can be actual code objects
just like “real” code. References to
code by DocCodeRefBase
and DocNameBase
classes use SymbolicRef
s
to directly reference the code.
Table 7 – Documentation Comments
DocComment
|
Represents user documentation of code, and is also the common base class of all documentation comments. |
DocCodeRefBase
|
The common base class of all documentation comment tags with a ‘cref’ attribute. |
DocException ,
DocPermission , DocSee , DocSeeAlso
|
|
DocNameBase
|
The common base class of all documentation comment tags with a ‘name’ attribute. |
DocParam ,
DocParamRef , DocTypeParam , DocTypeParamRef
|
|
DocB ,
DocC , DocCDATA , DocCode , DocExample ,
DocI , DocInclude , DocList ,DocListDescription , DocListHeader , DocListItem ,
DocListTerm , DocPara ,DocRemarks , DocReturns , DocSummary ,
DocTag , DocText , DocValue
|
Compiler directives are shown in Table 8. They present a problem with modeling code as
objects, because they really operate on the text form of the language at a sort
of pre-processing phase, before it’s parsed into objects. The conditional directives can “split” what
would otherwise logically be a single code object into multiple parts. Compiler directives have been handled by
treating them as annotations that can be attached to any other code object (or
they can also be standalone at the Block
level, just like comments). The conditional directives in particular will
be evaluated at parse time (when we add that functionality in a later article),
perhaps affecting how code objects are created.
Changing which symbols are defined will most easily be handled by
re-parsing entire CodeUnit
s, although it could theoretically be done by
re-parsing only the affected fragments of code – something that should
eventually be supported for editing anyway.
Table 8 – Compiler Directive Classes
ConditionalDirectiveBase
|
The common base class of all conditional directives. |
ConditionalDirective
|
The common base class of ‘open’ conditional directives. |
ElseDirective
|
|
ConditionalExpressionDirective
|
The common base class of conditional directives w/expressions. |
IfDirective , ElIfDirective
|
|
EndIfDirective
|
|
MessageDirective
|
The common base class of all directives with messages. |
RegionDirective ,
EndRegionDirective , ErrorDirective , WarningDirective
|
|
PragmaDirective
|
The common base class of all pragma directives. |
PragmaChecksumDirective , PragmaWarningDirective
|
|
SymbolDirective
|
The common base class of all symbol directives. |
DefineSymbol , UnDefSymbol
|
|
LineDirective
|
A final group of types are the interfaces shown in Table 9. These are used where common behavior existed
across classes and there wasn’t any logical common base class to put it
in. For example, all code objects that
have a Name
property implement the INamedCodeObject
interface, so that code
(such as a name-based dictionary) that only cares about the name can work with all
objects that implement the interface.
Table 9 – Interfaces
IBlock
|
Implemented
by all code objects with a Block body. (BlockStatement ,
AnonymousMethod , and DocCode )
|
IModifiers
|
Implemented
by all code objects that have Modifiers. (TypeDecl , MethodDeclBase ,
PropertyDeclBase , FieldDecl )
|
IMultiVariableDecl
|
Implemented by all VariableDecls that support multiple
declarations in a statement.
|
INamespace
|
Implemented by all code objects that represent a namespace (Namespace and
Alias ).
|
INamedCodeObject
|
Implemented by all code objects that have a Name property.
|
IParameters
|
Implemented
by all code objects that have parameters. (MethodDeclBase ,
IndexerDecl , DelegateDecl , and AnonymousMethod )
|
ITypeDecl
|
Implemented
by all code objects that represent type declarations. (TypeDecl ,
TypeParameter , and Alias )
|
ITypeParameters
|
Implemented
by all code objects that have TypeParameter s. (TypeDecl and
GenericMethodDecl )
|
IVariableDecl
|
Implemented
by all code objects that represent variable declarations. (VariableDecl and
PropertyDeclDecl ) |
All of the codeDOM classes support manual creation of
instances and assignment of child objects.
It’s possible to manually build up a tree of objects to represent any C#
source code. Of course, it also makes
sense to be able to parse existing code into codeDOM objects – this will be
covered in a later article. Editing
operations are also supported. Named
objects can have their names changed simply by setting their Name
property. Child objects can be changed
by setting properties to newly created or cloned objects. Existing objects can be moved simply by
assigning them, which automatically changes their Parent property to reference
their new parent.
The codeDOM can also be easily rendered as C# text. Since such a feature is rather critical and
used all the time (such as when debugging), text rendering is integrated
directly into the codeDOM classes using methods that start with “AsText…”. Code is formatted using the
FormatFlags
on each object, which default as might be expected for typical C# formatting,
but can be overridden if desired. These
flags determine where newlines occur, if parentheses exist on an expression or
braces on a block of code, or even if a semi-colon terminates a statement or
expression.
Supporting Classes
The CodeWriter
class (in the Rendering folder) is
used to render codeDOM classes in text format.
An instance of this class is required when using the AsText()
methods of any class to render it as text, and you can see examples of this in
CodeUnit.SaveAs()
or certain special overloaded CodeObject.AsText()
methods that can render
any CodeObject
and all of its children as text.
There are utility classes (in the Utilities folder) that provide static helper methods for working with arrays, collections, files, strings, and also the various Reflection types. The methods aren’t implemented as extension methods because we are setting up this code base to be able to parse itself, and it won’t be supporting extension methods for now. You might find some of these useful for your own apps, and they are easily converted to extension methods.
The Configuration
class is used to read entries
from the “.config” file of the application.
It’s very small (the main LoadSettings method is only 60 lines of code),
but it’s a very handy class that you are welcome to “steal” and modify for your
own apps. It works by reading entries
from a special section of the config file, and it uses reflection to find and
set the values of static members of classes.
The Log
class is used to log messages – either to
the console or to any client code that chooses to intercept them. It supports different levels of logging, and
special handling of exceptions. It’s
also a simple and handy class that could be re-used by other apps.
Using the Attached Source Code
The “code name” chosen for this project is Nova, and so the
solution in the included sources is named “Nova.sln
”. All of the CodeDOM classes and supporting
classes described in the previous sections are located in the
“Nova.CodeDOM.csproj
” project. You only
need to reference the resulting Nova.CodeDOM.dll
(or add the project to your
own solution) if you wish to experiment with your own client code.
There are also two other projects included in the solution. The Nova.Examples
project contains some
simple examples of using the codeDOM that will get more examples added as we
add more features. It’s a console app
that you can run to see the results. The
Nova.Test
project contains a file “FullTest.cs
” that implements most of the C#
features that the codeDOM is designed to support, and “ManualTests.cs
” contains
code that manually builds an equivalent codeDOM object tree. This is a console application, and running it
will build the object tree, emit it into a “FullTest.generated.cs
” file, and
then compare the result to “FullTest.cs
” (there are also a couple of very small
tests in ManualTests.cs
that will be run first).
Examine the code in ManualTests.cs
to see how to manually
create a codeDOM using all of the classes provided for the various language
features. There are a number of implicit
conversion operators defined to make things easier. You can just pass ‘0
’ to a method parameter
expecting an Expression
and it will be equivalent to ‘new Literal(0)
’, or pass
a Type such as ‘typeof(int)
’ (or other reflection object, like a MethodInfo
)
and it will be equivalent to ‘new TypeRef(typeof(int))
’ (or other appropriate reference).
Summary
We now have a complete set of codeDOM classes that can be manually instantiated to create a tree of objects that represents just about any C# program, and we can also render any tree of these objects to C# source code. But, we’re still missing some obvious things – how about parsing existing programs? And, don’t we need Solution and Project objects so we can work with “.sln” and “.csproj” files? Both of those are on the way, but first l think we need a better way to display a codeDOM than just converting it to text.
In my next article, I’ll present code for rendering the codeDOM using WPF, making it easier to inspect and understand it. Now, things are going to start getting more interesting… or, at least more attractive. Click HERE for Part 3.