Introduction
This article is about creating a code model, or codeDOM, that
models the semantics of a programming language – specifically, C#. Sources for a C# codeDOM are included, along
with examples of creating and editing a codeDOM tree. This is Part 2 of a series on codeDOMs. See Part 1 for a discussion of what a “CodeDOM”
is (in my opinion), including design goals for building one.
Choosing a Language
A codeDOM could be created for a brand new language, or it
could model an existing language. In
order to have a better chance of widespread adoption, I wanted to create a
codeDOM that is interoperable with an existing mainstream text-based language. I also wanted to use a managed platform. This quickly narrowed my options to Java or
C#. I personally prefer C# over Java for
its richer feature set, so I chose to create a codeDOM for C#, even though it’s
a very complex and rather rapidly evolving language.
Why Not Support Multiple Languages?
A codeDOM that supports multiple languages (as .NET does)
might seem like a good idea – most languages have common features, and this would
allow for mapping the same codeDOM objects onto different text languages. However, I think it’s much more important
that the class names and functionality match the language being modeled (see
design goal A in my first article). A
codeDOM as I define it should be the
language, and so shouldn’t be corrupted in order to support multiple
languages. So, I will model C# closely,
and any support for other languages in the future will involve the creation of
new sets of codeDOM classes. Converters could
always be created to convert between codeDOMs for different languages.
What about System.CodeDOM, Expression Trees, and Roslyn?
At first glance, the classes in the System.CodeDOM
namespace might look like exactly the sort
of codeDOM that I’ve been talking about, and they’ve existed since .NET
1.1. Unfortunately, they are designed
with something completely different in mind than what I’m looking for: generation of text code in multiple languages
from a single set of objects. Code
generation wizards can use them to manually build up a code object tree, and
then spit out the equivalent C# or VB code.
These classes violate my primary design goal (A), don’t have the needed functionality,
and most importantly they lack support for many C# language features.
Expression trees arrived with .NET 3.5 (in System.Linq.Expressions
). These classes were added as a part of the
support for LINQ, and they model C# expressions. They can be used to represent
single-expression lambdas only, and do not support statements. So, somewhat ironically, they provide some of
what is missing from System.CodeDOM
,
but themselves lack support for most language features other than expressions. They were extended in .NET 4.0 to provide
support for some statements such as try/catch blocks and switches, but this
support is for use by dynamic languages – the statement support doesn’t yet
work with lambda expressions. Again,
this was designed with something different in mind, and there is far too much
missing for it to be useful for our purposes.
The Roslyn project isn’t in a final release state yet (while
I’m writing this), but a preview is available, and it does aim to provide
classes for all C# language features. Roslyn is certainly something more like the
general-purpose codeDOM I’ve been talking about, but the design goals of Roslyn
still seem to differ quite a bit from my codeDOM design goals – for one thing, it
focuses on a typical syntax (AST) model.
In any case, my code has actually been under development for years, and
I think it has some significant benefits over Roslyn. So, I’m going to present my codeDOM classes and
all supporting code first (in a series of articles), and then in a later
article I’ll take a look at Roslyn and how it compares.
General CodeDOM Observations
A number of interesting issues arise when modeling classes
to represent code.
- It becomes obvious right
away that much of the text syntax is mostly for the benefit of the parser,
and is not really relevant for a tree of objects – such as semi-colon
terminators, comma separators, braces, many parenthesis, etc. There is no need to represent such
things in a code object tree.
- It is desirable for many comments
to be attached to the specific code objects that they are documenting,
such as a comment that documents a following line or block of code, or a
preceding line of code in the case of EOL comments.
- Documentation comments are
better represented with objects instead of XML, because it’s then much
easier to work with them programmatically.
- Most formatting, such as
tabs and other whitespace, are not really relevant for the display of a
code object tree, as they can be generated using configurable settings. However, newlines are still relevant
even for objects so that they can be displayed without much horizontal
scrolling.
- Line and column numbers
are not really relevant for the display of a code object tree, but they
can still be useful for the equivalent parsed and/or generated text or
messages that refer to it.
- Features of text-based
languages can actually be constrained by the text syntax itself. Anything is possible when working with
objects, but creating sensible and easy to parse text can be tricky.
Presenting a CodeDOM for C#
The source code that accompanies this article is a complete
codeDOM for C# 2.0, minus ‘unsafe’ features.
I’ve actually fully implemented support for all features through C# 5.0,
but I’ve intentionally restricted the feature set for now, to make it a bit
smaller and easier to understand (and also because I’m not ready to open-source
all of my hard work just yet). Even without the newer features, there is
still a huge amount of code to cover (over 250 classes), and their omission
doesn’t really take anything away from this series of articles on codeDOMs and
the many issues involved with creating and using them.
Since we have so many classes to discuss, we’ll cover them
in logical groups. I have listed class
names in tables with derived classes indicated by indentation, and with separate tables
for related groups of classes.
Furthermore, I have omitted descriptions for most concrete classes whose
function is fairly obvious by the class name, showing descriptions mainly for
base classes. Also, I’m not going to
discuss classes or their members in much detail at all – this is just a
high-level overview. The source code
that accompanies this article contains documentation comments for further
reference.
The major classes of the codeDOM (and one enum) are shown in
Table 1. The common base class for all of the codeDOM
objects is CodeObject
. It includes a Parent reference, an
optional collection of Annotations (comments, attributes, etc), and
a flags enum, FormatFlags
,
used primarily for minor formatting (such as newlines). Many helpful properties exist for the flags,
such as IsFirstOnLine
(true if the object starts on a new line), NewLines
(the total number
of newlines preceding the object), and IsSingleLine
(true if the object doesn’t
have any newlines within it). There are
also many useful properties dealing with annotations to determine if certain
kinds exist on an object (HasComments
, HasAttributes
, etc), or to
easily get or set certain annotations (DocComment
, EOLComment
, etc). The included tests and examples will help
demonstrate how to use the various available properties and methods.
Table 1 - Major Types
CodeObject
| The common base class
of all code objects.
|
Statement
| The common base class
of all statements.
|
BlockStatement
| The common base class
of all statements that can have a body of code.
|
Expression
| The common base class
of all expressions.
|
Operator
| The common base class
of all operations (binary, unary, or other).
|
SymbolicRef
| The common base class
of all symbolic references.
|
Annotation
| The common base class
of all code annotations.
|
Attribute
| Represents metadata
associated with a code object.
|
CommentBase
| The common base class
of all comments.
|
Comment , DocComment
|
CompilerDirective
| The common base class
of all compiler directives.
|
Message
| Represents a generated
message associated with a code object.
|
Block
| Represents the body of
a BlockStatement (has 0 or more code
objects).
|
Namespace
| Represents a namespace
of type declarations and optional child namespaces.
|
RootNamespace
| Represents the
top-level (global) namespace.
|
NamespaceTypeDictionary
| Represents a dictionary
of namespaces and types.
|
NamespaceTypeGroup
| Represents a group of
types and/or namespaces with the same name.
|
NamedCodeObjectDictionary
| Represents a dictionary
of named code objects.
|
NamedCodeObjectGroup
| Represents a group of
named code objects with the same name.
|
ChildList<T>
| Represents a collection
of child code objects of a particular type.
|
Modifiers
| An enum of modifiers
applicable to many declaration statements.
|
Public , Protected , Internal , Private , Static ,
New , Abstract , Sealed , Virtual ,
Override , Extern , Partial , Implicit ,
Explicit , Const , ReadOnly , Volatile , Event
|
There is a Statement base class for all C# statements,
and BlockStatement
for all statements which can have child blocks of code. C# expressions all derive from Expression
,
and are broken up into two sub-categories for Operator
s and SymbolicRef
s. All of the many statement and expression
sub-classes will be listed and discussed in Table 2
through Table 6. The Attribute
class handles C# attributes, CompilerDirective
is the base for all compiler directives (see Table 8),
and both of these are Annotation
s on code, along with regular Comment
s,
documentation comments (Table
7),
and Message
s
(error, warning, or informational messages associated with code).
The Block
class is used for the body of a BlockStatement
. The Namespace
class represents a namespace
full of types and child namespaces (using NamespaceTypeDictionary
and
NamespaceTypeGroup
internally), and RootNamespace
is a subclass representing the
global namespace. The NamedCodeObjectDictionary
and NamedCodeObjectGroup
types are used by Block
to index all child objects with names – these are
very similar to the classes used by Namespace
, but they deal with named CodeObject
s
instead of types and namespaces, and the differences are enough to justify
separate classes. The generic ChildList<T>
class is a collection of CodeObject
s or any derived type, and is used
anywhere a collection of child objects is needed (the Block
class, the Annotation
collection, etc). This collection
ensures that objects added to it have their Parent property referencing
the owner of the collection (not the
collection itself). The Modifiers
enum is used by type and member declarations to specify their access level and
other special properties.
All Statement
classes are shown in Table 2. Many of them are also BlockStatement
s, which
can have bodies of code. Most map directly
to C# statements, and are self-explanatory.
A C# source file is represented by CodeUnit
(usually the root
object unless dealing with an isolated code fragment) which is derived from
NamespaceDecl
because they share a lot of functionality – it behaves as a root-level NamespaceDecl
with an implied “namespace global” (both can have ‘using’ directives, for
example). The name CodeUnit
was chosen because it might be in-memory rather than map to a file, and
“compilation unit” is too specific to compiling and quite long. Although I generally avoid abbreviations in
names, you’ll notice the use of “Decl” in many class names as short for
“Declaration”. Do not confuse UsingDirective
to import namespaces with Using
for the statement that uses the same
keyword.
Table 2 - Statement Classes
ExternAlias
| Used with compiler
options to create additional root-level namespaces.
|
UsingDirective
| Imports the contents of
a namespace into the current scope.
|
Alias
| Represents the declaration
of a namespace or type alias (“using name = …”).
|
BlockStatement
| The common base class
of all statements that can have a body of code.
|
NamespaceDecl
| Declares a namespace
plus a body of declarations that belong to it.
|
CodeUnit
| Declares a unit of
independent code that belongs to the root-level namespace. Usually is the contents
of a source file, but it can also be in-memory only.
|
TypeDecl
| The common base class
of all type declarations.
|
BaseListTypeDecl
| The common base class
of all types with optional base type lists.
|
ClassDecl , EnumDecl ,
InterfaceDecl , StructDecl
|
DelegateDecl
|
MethodDeclBase
| The common base class
of all method declaration statements.
|
MethodDecl
| Represents a method
with a unique name and a return type.
|
GenericMethodDecl
| Represents a generic
method with type parameters.
|
AccessorDecl
| The common base class
of all accessors.
|
AccessorDeclWithValue
| The common base class
of all accessors with a value parameter.
|
SetterDecl
| Represents a method
used to write the value of a property.
|
AdderDecl
| Represents a method
used to add a delegate to an event.
|
RemoverDecl
| Represents a method
used to remove a delegate from an event.
|
GetterDecl
| Represents a method
used to read the value of a property.
|
OperatorDecl
| The common base class
of all user-defined operators.
|
ConversionOperatorDecl
|
ConstructorDecl , DestructorDecl
|
PropertyDeclBase
| The common base class
of all property-like declaration statements.
|
PropertyDecl , IndexerDecl ,
EventDecl
|
IfBase
| The common base class
of If and ElseIf statements.
|
If , ElseIf
|
SwitchItem
| The common base class
of Case and Default statements (of a Switch ).
|
Case , Default
|
BlockDecl
| Represents a block of
code restricted to a local scope (surrounded by braces).
|
Else , For , ForEach , While ,
Try ,
Catch , Finally , Using , Lock ,
CheckedBlock , UncheckedBlock
|
Break ,
Continue , Goto , Label , Return,
Throw
|
VariableDecl
| The common base class
of all variable declaration statements.
|
FieldDecl , LocalDecl ,
ParameterDecl , EnumMemberDecl
|
MultiFieldDecl , MultiLocalDecl , ValueParameterDecl ,
MultiEnumMemberDecl
|
YieldStatement
|
The common base class
of the YieldBreak and YieldReturn statements.
|
YieldBreak , YieldReturn
|
Notice that there are separate classes to declare multiple
fields or local variables in a single statement. They are derived from the class that supports
a single declaration – such as MultiFieldDecl
is derived from FieldDecl
. This was done to keep the most common case of
single declarations simple and lightweight, while supporting multi-declarations
in such a way that as little code as possible needs to be specifically aware of
them. The multi-declarations actually
are a collection of the single declarations where the modifiers and type are
forced to be the same. An EnumDecl
always uses a MultiEnumMemberDecl
to contain all of its EnumMemberDecl
entries.
The ValueParameterDecl
class is used for the implied ‘value’ parameter of property setters (and event
adder/removers).
Classes related to generics (types or methods) are shown in Table 3
– they model type parameters, constraint clauses, and individual constraint
types.
Table 3 - Support Classes for Generics
TypeParameter
| Represents a type
parameter of a generic type or method declaration.
|
ConstraintClause
| Represents one or more
constraints on a type parameter.
|
TypeParameterConstraint
| The common base class
of all type parameter constraints.
|
ClassConstraint ,
StructConstraint , NewConstraint , TypeConstraint
|
The largest sub-category of Expression
s is Operator
s,
which are shown in Table 4. The operators are generally
self-explanatory. They are divided
mainly into binary, unary, and those with arguments (either a variable number
or a single argument). The Conditional
(a
? b : c) operator is a special case.
Table 4 - Operator Classes
BinaryOperator
| The common base class
of all binary operators.
|
Assignment
| Assigns the right
expression to the left. Also the common base
class of all compound assignment operators.
|
AddAssign (+=),
BitwiseAndAssign (&=), BitwiseOrAssign (|=),
BitwiseXorAssign (^=),
DivideAssign (/=),
LeftShiftAssign (<<=), ModAssign (%=),
MultiplyAssign (*=),
RightShiftAssign (>>=),
SubtractAssign (-=)
|
BinaryArithmeticOperator
| The common base class
of all binary arithmetic operators.
|
Add (+), Divide (/), Mod (%), Multiply (*), Subtract (-)
|
BinaryBitwiseOperator
| The common base class
of all binary bitwise operators.
|
BitwiseAnd (&), BitwiseOr (|), BitwiseXor (^)
|
BinaryBooleanOperator
| The common base class
of all binary operators with a boolean result.
|
And (&&), Or (||), Is
|
RelationalOperator
| The common base class
of all relational operators.
|
Equal (==), GreaterThan (>), GreaterThanEqual (>=), LessThan (<),
LessThanEqual (<=), NotEqual (!=)
|
BinaryShiftOperator
| The common base class
of all shift operators.
|
LeftShift (<<), RightShift (>>)
|
As , Dot ( . ), IfNullThen ( ?? ), Lookup ( :: )
|
UnaryOperator
| The common base class
of all unary operators.
|
PreUnaryOperator
| The common base class of all prefix unary operators.
|
Cast , Complement (~), Decrement (--),
Increment (++), Negative (-), Not (!), Positive (+)
|
PostUnaryOperator
| The common base class of all postfix unary operators.
|
PostIncrement (++), PostDecrement (--)
|
ArgumentsOperator
| The common base class of all operators with variable arguments.
|
NewOperator
| The common base class of the NewArray and NewObject
operators.
|
NewArray , NewObject
|
Call
| Represents a call to a
method or delegate, including any arguments.
|
ConstructorInitializer
| The common base class of BaseInitializer and ThisInitializer .
|
BaseInitializer , ThisInitializer
|
Index ([ ])
|
SingleArgumentOperator
| The common base class
of all operators with a single argument.
|
CheckedOperator
| The common base class of the Checked and Unchecked
operators.
|
Checked , Unchecked
|
RefOutOperator
| The common base class of the Ref and Out
operators.
|
Ref , Out
|
TypeOperator
| The common base class of TypeOf , SizeOf , DefaultValue
operators.
|
TypeOf , SizeOf , DefaultValue
|
Conditional ( ? : )
| Represents a conditional if/then/else (“a ? b : c“) expression.
|
The other big sub-category of Expression
s is SymbolicRef
s,
which are shown in Table 5
– these represent “symbolic references”, or a reference to a named code object
defined elsewhere. Notice the use of
“Ref
” in these class names as short for “Reference”. I have chosen to represent symbolic
references to objects as direct object references. An unresolved symbolic reference is
represented by an UnresolvedRef
, which references by name. When such a reference is resolved to a
particular object (resolving will be addressed in a future article), the UnresolvedRef
object is replaced by another one of the appropriate type, such as TypeRef
or MethodRef
,
which uses an object reference. The use
of direct references makes things faster and easier, and keeps the objects
smaller. Renaming any object doesn’t
require any “fixups” – all references will automatically pick up the new name
from the target object.
Most expressions in C# will evaluate to a type, which will
often be represented by TypeRefBase
, because although it will usually
be a TypeRef
,
it could also be an UnresolvedRef
or MethodRef
. Notice that
TypeRef
is used to reference
any C# type (whether a class, struct, interface, enum, or delegate) – there has
been no need so far to create derived reference types for these, and they are
easily distinguished with properties such as IsEnum
.
Table 5 - SymbolicRef Classes
TypeRefBase
| The common base class of TypeRef , MethodRef , and
UnresolvedRef .
|
MethodRef
| Represents a reference to a method declaration.
|
AnonymousMethodRef ,
ConstructorRef , OperatorRef
|
TypeRef
| Represents a reference to a type declaration.
|
AliasRef , TypeParameterRef
|
UnresolvedRef
| Represents a symbolic reference that hasn’t been resolved to a
direct reference.
|
GotoTargetRef
| The common base class of LabelRef and SwitchItemRef .
|
LabelRef , SwitchItemRef
|
SelfRef
| The common base class of ThisRef and BaseRef .
|
ThisRef , BaseRef
|
VariableRef
| The common base class of all variable references.
|
PropertyRef
| Represents a reference to a property declaration.
|
IndexerRef
|
EventRef , EnumMemberRef ,
FieldRef , LocalRef , ParameterRef
|
ExternAliasRef ,
NamespaceRef , DirectiveSymbolRef
|
There are a few expressions which are not operators or
symbolic references. They are shown in Table 6,
with Literal
being by far the most widely used. In
order to retain the exact text of a literal, such as escape sequences in
strings or chars, or the exact format (such as hex and/or suffix characters) of
numerics, the literal value is stored as a string inside the Literal
object. A GetValue()
method is provided
that returns the actual constant value of the appropriate type.
Table 6 - Other Expression Classes
AnonymousMethod | Represents an un-named method that can be assigned to a delegate. |
Initializer | Represents the initialization of an array. |
Literal | Represents a literal value of a particular type (string, integer, boolean, etc). |
Documentation comment classes are listed in Table 7. Classes are provided for all standard XML
tags, so that they can be treated as objects, with no concern for dealing with the
XML (which I prefer to think of as an implementation detail). Non-standard tags can be represented using
DocTag
. The DocComment
class is both a common base
class and also a concrete class that can be used to represent just some text or
a collection of child objects. The DocC
class can contain a code Expression
, and the DocCode
class can
contain a code Block
– so code examples inside documentation comments can be actual code objects
just like “real” code. References to
code by DocCodeRefBase
and DocNameBase
classes use SymbolicRef
s
to directly reference the code.
Table 7 – Documentation Comments
DocComment
| Represents user documentation of code, and is also the common base
class of all documentation comments.
|
DocCodeRefBase
| The common base class of all documentation comment tags with a
‘cref’ attribute.
|
DocException ,
DocPermission , DocSee , DocSeeAlso
|
DocNameBase
| The common base class of all documentation comment tags with a ‘name’
attribute.
|
DocParam ,
DocParamRef , DocTypeParam , DocTypeParamRef
|
DocB ,
DocC , DocCDATA , DocCode , DocExample ,
DocI , DocInclude , DocList ,
DocListDescription , DocListHeader , DocListItem ,
DocListTerm , DocPara ,
DocRemarks , DocReturns , DocSummary ,
DocTag , DocText , DocValue
|
Compiler directives are shown in Table 8. They present a problem with modeling code as
objects, because they really operate on the text form of the language at a sort
of pre-processing phase, before it’s parsed into objects. The conditional directives can “split” what
would otherwise logically be a single code object into multiple parts. Compiler directives have been handled by
treating them as annotations that can be attached to any other code object (or
they can also be standalone at the Block
level, just like comments). The conditional directives in particular will
be evaluated at parse time (when we add that functionality in a later article),
perhaps affecting how code objects are created.
Changing which symbols are defined will most easily be handled by
re-parsing entire CodeUnit
s, although it could theoretically be done by
re-parsing only the affected fragments of code – something that should
eventually be supported for editing anyway.
Table 8 – Compiler Directive Classes
ConditionalDirectiveBase
| The common base class of all conditional directives.
|
ConditionalDirective
| The common base class of ‘open’ conditional directives.
|
ElseDirective
|
ConditionalExpressionDirective
| The common base class of conditional directives w/expressions.
|
IfDirective , ElIfDirective
|
EndIfDirective
|
MessageDirective
| The common base class of all directives with messages.
|
RegionDirective ,
EndRegionDirective , ErrorDirective , WarningDirective
|
PragmaDirective
| The common base class of all pragma directives.
|
PragmaChecksumDirective , PragmaWarningDirective
|
SymbolDirective
| The common base class of all symbol directives.
|
DefineSymbol , UnDefSymbol
|
LineDirective
|
A final group of types are the interfaces shown in Table 9. These are used where common behavior existed
across classes and there wasn’t any logical common base class to put it
in. For example, all code objects that
have a Name
property implement the INamedCodeObject
interface, so that code
(such as a name-based dictionary) that only cares about the name can work with all
objects that implement the interface.
Table 9 – Interfaces
IBlock
| Implemented
by all code objects with a Block body. (BlockStatement ,
AnonymousMethod , and DocCode )
|
IModifiers
| Implemented
by all code objects that have Modifiers. (TypeDecl , MethodDeclBase ,
PropertyDeclBase , FieldDecl )
|
IMultiVariableDecl
| Implemented by all VariableDecls that support multiple
declarations in a statement.
|
INamespace
| Implemented by all code objects that represent a namespace (Namespace and
Alias ).
|
INamedCodeObject
| Implemented by all code objects that have a Name property.
|
IParameters
| Implemented
by all code objects that have parameters. (MethodDeclBase ,
IndexerDecl , DelegateDecl , and AnonymousMethod )
|
ITypeDecl
| Implemented
by all code objects that represent type declarations. (TypeDecl ,
TypeParameter , and Alias )
|
ITypeParameters
| Implemented
by all code objects that have TypeParameter s. (TypeDecl and
GenericMethodDecl )
|
IVariableDecl
| Implemented
by all code objects that represent variable declarations. (VariableDecl and
PropertyDeclDecl ) |
All of the codeDOM classes support manual creation of
instances and assignment of child objects.
It’s possible to manually build up a tree of objects to represent any C#
source code. Of course, it also makes
sense to be able to parse existing code into codeDOM objects – this will be
covered in a later article. Editing
operations are also supported. Named
objects can have their names changed simply by setting their Name
property. Child objects can be changed
by setting properties to newly created or cloned objects. Existing objects can be moved simply by
assigning them, which automatically changes their Parent property to reference
their new parent.
The codeDOM can also be easily rendered as C# text. Since such a feature is rather critical and
used all the time (such as when debugging), text rendering is integrated
directly into the codeDOM classes using methods that start with “AsText…”. Code is formatted using the
FormatFlags
on each object, which default as might be expected for typical C# formatting,
but can be overridden if desired. These
flags determine where newlines occur, if parentheses exist on an expression or
braces on a block of code, or even if a semi-colon terminates a statement or
expression.
Supporting Classes
The CodeWriter
class (in the Rendering folder) is
used to render codeDOM classes in text format.
An instance of this class is required when using the AsText()
methods of any class to render it as text, and you can see examples of this in
CodeUnit.SaveAs()
or certain special overloaded CodeObject.AsText()
methods that can render
any CodeObject
and all of its children as text.
There are utility classes (in the Utilities folder) that
provide static helper methods for working with arrays, collections, files,
strings, and also the various Reflection types.
The methods aren’t implemented as extension methods because we are
setting up this code base to be able to parse itself, and it won’t be
supporting extension methods for now.
You might find some of these useful for your own apps, and they are
easily converted to extension methods.
The Configuration
class is used to read entries
from the “.config” file of the application.
It’s very small (the main LoadSettings method is only 60 lines of code),
but it’s a very handy class that you are welcome to “steal” and modify for your
own apps. It works by reading entries
from a special section of the config file, and it uses reflection to find and
set the values of static members of classes.
The Log
class is used to log messages – either to
the console or to any client code that chooses to intercept them. It supports different levels of logging, and
special handling of exceptions. It’s
also a simple and handy class that could be re-used by other apps.
Using the Attached Source Code
The “code name” chosen for this project is Nova, and so the
solution in the included sources is named “Nova.sln
”. All of the CodeDOM classes and supporting
classes described in the previous sections are located in the
“Nova.CodeDOM.csproj
” project. You only
need to reference the resulting Nova.CodeDOM.dll
(or add the project to your
own solution) if you wish to experiment with your own client code.
There are also two other projects included in the solution. The Nova.Examples
project contains some
simple examples of using the codeDOM that will get more examples added as we
add more features. It’s a console app
that you can run to see the results. The
Nova.Test
project contains a file “FullTest.cs
” that implements most of the C#
features that the codeDOM is designed to support, and “ManualTests.cs
” contains
code that manually builds an equivalent codeDOM object tree. This is a console application, and running it
will build the object tree, emit it into a “FullTest.generated.cs
” file, and
then compare the result to “FullTest.cs
” (there are also a couple of very small
tests in ManualTests.cs
that will be run first).
Examine the code in ManualTests.cs
to see how to manually
create a codeDOM using all of the classes provided for the various language
features. There are a number of implicit
conversion operators defined to make things easier. You can just pass ‘0
’ to a method parameter
expecting an Expression
and it will be equivalent to ‘new Literal(0)
’, or pass
a Type such as ‘typeof(int)
’ (or other reflection object, like a MethodInfo
)
and it will be equivalent to ‘new TypeRef(typeof(int))
’ (or other appropriate reference).
Summary
We now have a complete set of codeDOM classes that can be
manually instantiated to create a tree of objects that represents just about
any C# program, and we can also render any tree of these objects to C# source
code. But, we’re still missing some
obvious things – how about parsing existing programs? And, don’t we need Solution and Project
objects so we can work with “.sln” and “.csproj” files? Both of those are on the way, but first l
think we need a better way to display a codeDOM than just converting it to
text.
In my next article, I’ll present code for rendering the codeDOM
using WPF, making it easier to inspect and understand it. Now, things are going to start getting more
interesting… or, at least more attractive. Click HERE for Part 3.
I've been writing software since the late 70's, currently focusing mainly on C#.NET. I also like to travel around the world, and I own a Chocolate Factory (sadly, none of my employees are oompa loompas).