Creating a CodeDOM: Modeling the Semantics of Code (Part 2)

KenBeckett

5.00/5 (17 votes)

Nov 6, 2012

CDDL

24 min read

43229

777

Creating a CodeDOM for C#

Download Nova.0.1.zip - 460.1 KB

Introduction

This article is about creating a code model, or codeDOM, that models the semantics of a programming language – specifically, C#. Sources for a C# codeDOM are included, along with examples of creating and editing a codeDOM tree. This is Part 2 of a series on codeDOMs. See Part 1 for a discussion of what a “CodeDOM” is (in my opinion), including design goals for building one.

Choosing a Language

A codeDOM could be created for a brand new language, or it could model an existing language. In order to have a better chance of widespread adoption, I wanted to create a codeDOM that is interoperable with an existing mainstream text-based language. I also wanted to use a managed platform. This quickly narrowed my options to Java or C#. I personally prefer C# over Java for its richer feature set, so I chose to create a codeDOM for C#, even though it’s a very complex and rather rapidly evolving language.

Why Not Support Multiple Languages?

A codeDOM that supports multiple languages (as .NET does) might seem like a good idea – most languages have common features, and this would allow for mapping the same codeDOM objects onto different text languages. However, I think it’s much more important that the class names and functionality match the language being modeled (see design goal A in my first article). A codeDOM as I define it should be the language, and so shouldn’t be corrupted in order to support multiple languages. So, I will model C# closely, and any support for other languages in the future will involve the creation of new sets of codeDOM classes. Converters could always be created to convert between codeDOMs for different languages.

What about System.CodeDOM, Expression Trees, and Roslyn?

At first glance, the classes in the System.CodeDOM namespace might look like exactly the sort of codeDOM that I’ve been talking about, and they’ve existed since .NET 1.1. Unfortunately, they are designed with something completely different in mind than what I’m looking for: generation of text code in multiple languages from a single set of objects. Code generation wizards can use them to manually build up a code object tree, and then spit out the equivalent C# or VB code. These classes violate my primary design goal (A), don’t have the needed functionality, and most importantly they lack support for many C# language features.

Expression trees arrived with .NET 3.5 (in System.Linq.Expressions). These classes were added as a part of the support for LINQ, and they model C# expressions. They can be used to represent single-expression lambdas only, and do not support statements. So, somewhat ironically, they provide some of what is missing from System.CodeDOM, but themselves lack support for most language features other than expressions. They were extended in .NET 4.0 to provide support for some statements such as try/catch blocks and switches, but this support is for use by dynamic languages – the statement support doesn’t yet work with lambda expressions. Again, this was designed with something different in mind, and there is far too much missing for it to be useful for our purposes.

The Roslyn project isn’t in a final release state yet (while I’m writing this), but a preview is available, and it does aim to provide classes for all C# language features. Roslyn is certainly something more like the general-purpose codeDOM I’ve been talking about, but the design goals of Roslyn still seem to differ quite a bit from my codeDOM design goals – for one thing, it focuses on a typical syntax (AST) model. In any case, my code has actually been under development for years, and I think it has some significant benefits over Roslyn. So, I’m going to present my codeDOM classes and all supporting code first (in a series of articles), and then in a later article I’ll take a look at Roslyn and how it compares.

General CodeDOM Observations

A number of interesting issues arise when modeling classes to represent code.

It becomes obvious right away that much of the text syntax is mostly for the benefit of the parser, and is not really relevant for a tree of objects – such as semi-colon terminators, comma separators, braces, many parenthesis, etc. There is no need to represent such things in a code object tree.
It is desirable for many comments to be attached to the specific code objects that they are documenting, such as a comment that documents a following line or block of code, or a preceding line of code in the case of EOL comments.
Documentation comments are better represented with objects instead of XML, because it’s then much easier to work with them programmatically.
Most formatting, such as tabs and other whitespace, are not really relevant for the display of a code object tree, as they can be generated using configurable settings. However, newlines are still relevant even for objects so that they can be displayed without much horizontal scrolling.
Line and column numbers are not really relevant for the display of a code object tree, but they can still be useful for the equivalent parsed and/or generated text or messages that refer to it.
Features of text-based languages can actually be constrained by the text syntax itself. Anything is possible when working with objects, but creating sensible and easy to parse text can be tricky.

Presenting a CodeDOM for C#

The source code that accompanies this article is a complete codeDOM for C# 2.0, minus ‘unsafe’ features. I’ve actually fully implemented support for all features through C# 5.0, but I’ve intentionally restricted the feature set for now, to make it a bit smaller and easier to understand (and also because I’m not ready to open-source all of my hard work just yet). Even without the newer features, there is still a huge amount of code to cover (over 250 classes), and their omission doesn’t really take anything away from this series of articles on codeDOMs and the many issues involved with creating and using them.

Since we have so many classes to discuss, we’ll cover them in logical groups. I have listed class names in tables with derived classes indicated by indentation, and with separate tables for related groups of classes. Furthermore, I have omitted descriptions for most concrete classes whose function is fairly obvious by the class name, showing descriptions mainly for base classes. Also, I’m not going to discuss classes or their members in much detail at all – this is just a high-level overview. The source code that accompanies this article contains documentation comments for further reference.

The major classes of the codeDOM (and one enum) are shown in Table 1. The common base class for all of the codeDOM objects is CodeObject. It includes a Parent reference, an optional collection of Annotations (comments, attributes, etc), and a flags enum, FormatFlags, used primarily for minor formatting (such as newlines). Many helpful properties exist for the flags, such as IsFirstOnLine (true if the object starts on a new line), NewLines (the total number of newlines preceding the object), and IsSingleLine (true if the object doesn’t have any newlines within it). There are also many useful properties dealing with annotations to determine if certain kinds exist on an object (HasComments, HasAttributes, etc), or to easily get or set certain annotations (DocComment, EOLComment, etc). The included tests and examples will help demonstrate how to use the various available properties and methods.

Table 1 - Major Types

CodeObject	The common base class of all code objects.
`Statement`	The common base class of all statements.
`BlockStatement`	The common base class of all statements that can have a body of code.
`Expression`	The common base class of all expressions.
`Operator`	The common base class of all operations (binary, unary, or other).
`SymbolicRef`	The common base class of all symbolic references.
`Annotation`	The common base class of all code annotations.
`Attribute`	Represents metadata associated with a code object.
`CommentBase`	The common base class of all comments.
`Comment`, `DocComment`
`CompilerDirective`	The common base class of all compiler directives.
`Message`	Represents a generated message associated with a code object.
`Block`	Represents the body of a BlockStatement (has 0 or more code objects).
`Namespace`	Represents a namespace of type declarations and optional child namespaces.
`RootNamespace`	Represents the top-level (global) namespace.
`NamespaceTypeDictionary`	Represents a dictionary of namespaces and types.
`NamespaceTypeGroup`	Represents a group of types and/or namespaces with the same name.
`NamedCodeObjectDictionary`	Represents a dictionary of named code objects.
`NamedCodeObjectGroup`	Represents a group of named code objects with the same name.
`ChildList<T>`	Represents a collection of child code objects of a particular type.
`Modifiers`	An enum of modifiers applicable to many declaration statements.
`Public`, `Protected`, `Internal`, `Private`, `Static`, `New`, `Abstract`, `Sealed`, `Virtual`, `Override`, `Extern`, `Partial`, `Implicit`, `Explicit`, `Const`, `ReadOnly`, `Volatile`, `Event`

There is a Statement base class for all C# statements, and BlockStatement for all statements which can have child blocks of code. C# expressions all derive from Expression, and are broken up into two sub-categories for Operators and SymbolicRefs. All of the many statement and expression sub-classes will be listed and discussed in Table 2 through Table 6. The Attribute class handles C# attributes, CompilerDirective is the base for all compiler directives (see Table 8), and both of these are Annotations on code, along with regular Comments, documentation comments (Table 7), and Messages (error, warning, or informational messages associated with code).

The Block class is used for the body of a BlockStatement. The Namespace class represents a namespace full of types and child namespaces (using NamespaceTypeDictionary and NamespaceTypeGroup internally), and RootNamespace is a subclass representing the global namespace. The NamedCodeObjectDictionary and NamedCodeObjectGroup types are used by Block to index all child objects with names – these are very similar to the classes used by Namespace, but they deal with named CodeObjects instead of types and namespaces, and the differences are enough to justify separate classes. The generic ChildList<T> class is a collection of CodeObjects or any derived type, and is used anywhere a collection of child objects is needed (the Block class, the Annotation collection, etc). This collection ensures that objects added to it have their Parent property referencing the owner of the collection (not the collection itself). The Modifiers enum is used by type and member declarations to specify their access level and other special properties.

All Statement classes are shown in Table 2. Many of them are also BlockStatements, which can have bodies of code. Most map directly to C# statements, and are self-explanatory. A C# source file is represented by CodeUnit (usually the root object unless dealing with an isolated code fragment) which is derived from NamespaceDecl because they share a lot of functionality – it behaves as a root-level NamespaceDecl with an implied “namespace global” (both can have ‘using’ directives, for example). The name CodeUnit was chosen because it might be in-memory rather than map to a file, and “compilation unit” is too specific to compiling and quite long. Although I generally avoid abbreviations in names, you’ll notice the use of “Decl” in many class names as short for “Declaration”. Do not confuse UsingDirective to import namespaces with Using for the statement that uses the same keyword.

Table 2 - Statement Classes

`ExternAlias`	Used with compiler options to create additional root-level namespaces.
`UsingDirective`	Imports the contents of a namespace into the current scope.
`Alias`	Represents the declaration of a namespace or type alias (“using name = …”).
`BlockStatement`	The common base class of all statements that can have a body of code.
`NamespaceDecl`	Declares a namespace plus a body of declarations that belong to it.
CodeUnit	Declares a unit of independent code that belongs to the root-level namespace. Usually is the contents of a source file, but it can also be in-memory only.
`TypeDecl`	The common base class of all type declarations.
`BaseListTypeDecl`	The common base class of all types with optional base type lists.
`ClassDecl`, `EnumDecl`, `InterfaceDecl`, `StructDecl`
`DelegateDecl`
`MethodDeclBase`	The common base class of all method declaration statements.
`MethodDecl`	Represents a method with a unique name and a return type.
`GenericMethodDecl`	Represents a generic method with type parameters.
`AccessorDecl`	The common base class of all accessors.
`AccessorDeclWithValue`		The common base class of all accessors with a value parameter.
`SetterDecl`		Represents a method used to write the value of a property.
`AdderDecl`		Represents a method used to add a delegate to an event.
`RemoverDecl`		Represents a method used to remove a delegate from an event.
`GetterDecl`		Represents a method used to read the value of a property.
`OperatorDecl`	The common base class of all user-defined operators.
`ConversionOperatorDecl`
`ConstructorDecl`, `DestructorDecl`
`PropertyDeclBase`	The common base class of all property-like declaration statements.
`PropertyDecl`, `IndexerDecl`, `EventDecl`
`IfBase`	The common base class of If and ElseIf statements.
`If`, `ElseIf`
`SwitchItem`	The common base class of `Case` and `Default` statements (of a `Switch`).
`Case`, `Default`
`BlockDecl`	Represents a block of code restricted to a local scope (surrounded by braces).
`Else`, `For`, `ForEach`, `While`, `Try`, `Catch`, `Finally`, `Using`, `Lock`, `CheckedBlock`, `UncheckedBlock`
`Break`, `Continue`, `Goto`, `Label`, `Return,` `Throw`
`VariableDecl`	The common base class of all variable declaration statements.
`FieldDecl`, `LocalDecl`, `ParameterDecl`, `EnumMemberDecl`
`MultiFieldDecl`, `MultiLocalDecl`, `ValueParameterDecl`, `MultiEnumMemberDecl`
`YieldStatement`	The common base class of the `YieldBreak` and `YieldReturn` statements.
`YieldBreak`, `YieldReturn`

Notice that there are separate classes to declare multiple fields or local variables in a single statement. They are derived from the class that supports a single declaration – such as MultiFieldDecl is derived from FieldDecl. This was done to keep the most common case of single declarations simple and lightweight, while supporting multi-declarations in such a way that as little code as possible needs to be specifically aware of them. The multi-declarations actually are a collection of the single declarations where the modifiers and type are forced to be the same. An EnumDecl always uses a MultiEnumMemberDecl to contain all of its EnumMemberDecl entries. The ValueParameterDecl class is used for the implied ‘value’ parameter of property setters (and event adder/removers).

Classes related to generics (types or methods) are shown in Table 3 – they model type parameters, constraint clauses, and individual constraint types.

Table 3 - Support Classes for Generics

`TypeParameter`	Represents a type parameter of a generic type or method declaration.
`ConstraintClause`	Represents one or more constraints on a type parameter.
`TypeParameterConstraint`	The common base class of all type parameter constraints.
`ClassConstraint`, `StructConstraint`, `NewConstraint`, `TypeConstraint`

The largest sub-category of Expressions is Operators, which are shown in Table 4. The operators are generally self-explanatory. They are divided mainly into binary, unary, and those with arguments (either a variable number or a single argument). The Conditional (a ? b : c) operator is a special case.

Table 4 - Operator Classes

`BinaryOperator`	The common base class of all binary operators.
`Assignment`	Assigns the right expression to the left. Also the common base class of all compound assignment operators.
`AddAssign` (+=), `BitwiseAndAssign` (&=), `BitwiseOrAssign` (\|=), `BitwiseXorAssign` (^=), `DivideAssign` (/=), `LeftShiftAssign` (<<=), `ModAssign` (%=), `MultiplyAssign` (*=), `RightShiftAssign` (>>=), `SubtractAssign` (-=)
`BinaryArithmeticOperator`	The common base class of all binary arithmetic operators.
`Add` (+), `Divide` (/), `Mod` (%), `Multiply` (*), `Subtract` (-)
`BinaryBitwiseOperator`	The common base class of all binary bitwise operators.
`BitwiseAnd` (&), `BitwiseOr` (\|), `BitwiseXor` (^)
`BinaryBooleanOperator`	The common base class of all binary operators with a boolean result.
`And` (&&), `Or` (\|\|), `Is`
`RelationalOperator`	The common base class of all relational operators.
`Equal` (==), `GreaterThan` (>), `GreaterThanEqual` (>=), `LessThan` (<), `LessThanEqual` (<=), `NotEqual` (!=)
`BinaryShiftOperator`	The common base class of all shift operators.
`LeftShift` (<<), `RightShift` (>>)
`As`, `Dot` ( . ), `IfNullThen` ( ?? ), `Lookup` ( :: )
`UnaryOperator`	The common base class of all unary operators.
`PreUnaryOperator`	The common base class of all prefix unary operators.
`Cast`, `Complement` (~), `Decrement` (--), `Increment` (++), `Negative` (-), `Not` (!), `Positive` (+)
`PostUnaryOperator`	The common base class of all postfix unary operators.
`PostIncrement` (++), `PostDecrement` (--)
`ArgumentsOperator`	The common base class of all operators with variable arguments.
`NewOperator`	The common base class of the `NewArray` and `NewObject` operators.
`NewArray`, `NewObject`
`Call`	Represents a call to a method or delegate, including any arguments.
`ConstructorInitializer`	The common base class of `BaseInitializer` and `ThisInitializer`.
`BaseInitializer`, `ThisInitializer`
`Index` ([ ])
`SingleArgumentOperator`	The common base class of all operators with a single argument.
`CheckedOperator`	The common base class of the `Checked` and `Unchecked` operators.
`Checked`, `Unchecked`
`RefOutOperator`	The common base class of the `Ref` and `Out` operators.
`Ref`, `Out`
`TypeOperator`	The common base class of `TypeOf`, `SizeOf`, `DefaultValue` operators.
`TypeOf`, `SizeOf`, `DefaultValue`
`Conditional` ( ? : )	Represents a conditional if/then/else (“a ? b : c“) expression.

The other big sub-category of Expressions is SymbolicRefs, which are shown in Table 5 – these represent “symbolic references”, or a reference to a named code object defined elsewhere. Notice the use of “Ref” in these class names as short for “Reference”. I have chosen to represent symbolic references to objects as direct object references. An unresolved symbolic reference is represented by an UnresolvedRef, which references by name. When such a reference is resolved to a particular object (resolving will be addressed in a future article), the UnresolvedRef object is replaced by another one of the appropriate type, such as TypeRef or MethodRef, which uses an object reference. The use of direct references makes things faster and easier, and keeps the objects smaller. Renaming any object doesn’t require any “fixups” – all references will automatically pick up the new name from the target object.

Most expressions in C# will evaluate to a type, which will often be represented by TypeRefBase, because although it will usually be a TypeRef, it could also be an UnresolvedRef or MethodRef. Notice that TypeRef is used to reference any C# type (whether a class, struct, interface, enum, or delegate) – there has been no need so far to create derived reference types for these, and they are easily distinguished with properties such as IsEnum.

Table 5 - SymbolicRef Classes

`TypeRefBase`	The common base class of `TypeRef`, `MethodRef`, and `UnresolvedRef`.
`MethodRef`	Represents a reference to a method declaration.
`AnonymousMethodRef`, `ConstructorRef`, `OperatorRef`
`TypeRef`	Represents a reference to a type declaration.
`AliasRef`, `TypeParameterRef`
`UnresolvedRef`	Represents a symbolic reference that hasn’t been resolved to a direct reference.
`GotoTargetRef`	The common base class of `LabelRef` and `SwitchItemRef`.
`LabelRef`, `SwitchItemRef`
`SelfRef`	The common base class of `ThisRef` and `BaseRef`.
`ThisRef`, `BaseRef`
`VariableRef`	The common base class of all variable references.
`PropertyRef`	Represents a reference to a property declaration.
`IndexerRef`
`EventRef`, `EnumMemberRef`, `FieldRef`, `LocalRef`, `ParameterRef`
`ExternAliasRef`, `NamespaceRef`, `DirectiveSymbolRef`

There are a few expressions which are not operators or symbolic references. They are shown in Table 6, with Literal being by far the most widely used. In order to retain the exact text of a literal, such as escape sequences in strings or chars, or the exact format (such as hex and/or suffix characters) of numerics, the literal value is stored as a string inside the Literal object. A GetValue() method is provided that returns the actual constant value of the appropriate type.

Table 6 - Other Expression Classes

`AnonymousMethod`	Represents an un-named method that can be assigned to a delegate.
`Initializer`	Represents the initialization of an array.
`Literal`	Represents a literal value of a particular type (string, integer, boolean, etc).

Documentation comment classes are listed in Table 7. Classes are provided for all standard XML tags, so that they can be treated as objects, with no concern for dealing with the XML (which I prefer to think of as an implementation detail). Non-standard tags can be represented using DocTag. The DocComment class is both a common base class and also a concrete class that can be used to represent just some text or a collection of child objects. The DocC class can contain a code Expression, and the DocCode class can contain a code Block – so code examples inside documentation comments can be actual code objects just like “real” code. References to code by DocCodeRefBase and DocNameBase classes use SymbolicRefs to directly reference the code.

Table 7 – Documentation Comments

`DocComment`	Represents user documentation of code, and is also the common base class of all documentation comments.
`DocCodeRefBase`	The common base class of all documentation comment tags with a ‘cref’ attribute.
`DocException`, `DocPermission`, `DocSee`, `DocSeeAlso`
`DocNameBase`	The common base class of all documentation comment tags with a ‘name’ attribute.
`DocParam`, `DocParamRef`, `DocTypeParam`, `DocTypeParamRef`
`DocB`, `DocC`, `DocCDATA`, `DocCode`, `DocExample`, `DocI`, `DocInclude`, `DocList`, `DocListDescription`, `DocListHeader`, `DocListItem`, `DocListTerm`, `DocPara`, `DocRemarks`, `DocReturns`, `DocSummary`, `DocTag`, `DocText`, `DocValue`

Compiler directives are shown in Table 8. They present a problem with modeling code as objects, because they really operate on the text form of the language at a sort of pre-processing phase, before it’s parsed into objects. The conditional directives can “split” what would otherwise logically be a single code object into multiple parts. Compiler directives have been handled by treating them as annotations that can be attached to any other code object (or they can also be standalone at the Block level, just like comments). The conditional directives in particular will be evaluated at parse time (when we add that functionality in a later article), perhaps affecting how code objects are created. Changing which symbols are defined will most easily be handled by re-parsing entire CodeUnits, although it could theoretically be done by re-parsing only the affected fragments of code – something that should eventually be supported for editing anyway.

Table 8 – Compiler Directive Classes

`ConditionalDirectiveBase`	The common base class of all conditional directives.
`ConditionalDirective`	The common base class of ‘open’ conditional directives.
`ElseDirective`
`ConditionalExpressionDirective`	The common base class of conditional directives w/expressions.
`IfDirective`, `ElIfDirective`
`EndIfDirective`
`MessageDirective`	The common base class of all directives with messages.
`RegionDirective`, `EndRegionDirective`, `ErrorDirective`, `WarningDirective`
`PragmaDirective`	The common base class of all pragma directives.
`PragmaChecksumDirective`, `PragmaWarningDirective`
`SymbolDirective`	The common base class of all symbol directives.
`DefineSymbol`, `UnDefSymbol`
`LineDirective`

A final group of types are the interfaces shown in Table 9. These are used where common behavior existed across classes and there wasn’t any logical common base class to put it in. For example, all code objects that have a Name property implement the INamedCodeObject interface, so that code (such as a name-based dictionary) that only cares about the name can work with all objects that implement the interface.

Table 9 – Interfaces

`IBlock`	Implemented by all code objects with a Block body. (`BlockStatement`, `AnonymousMethod`, and `DocCode`)
`IModifiers`	Implemented by all code objects that have Modifiers. (`TypeDecl`, `MethodDeclBase`, `PropertyDeclBase`, `FieldDecl`)
`IMultiVariableDecl`	Implemented by all `VariableDecls` that support multiple declarations in a statement.
`INamespace`	Implemented by all code objects that represent a namespace (`Namespace` and `Alias`).
`INamedCodeObject`	Implemented by all code objects that have a `Name` property.
`IParameters`	Implemented by all code objects that have parameters. (`MethodDeclBase`, `IndexerDecl`, `DelegateDecl`, and `AnonymousMethod`)
`ITypeDecl`	Implemented by all code objects that represent type declarations. (`TypeDecl`, `TypeParameter`, and `Alias`)
`ITypeParameters`	Implemented by all code objects that have `TypeParameter`s. (`TypeDecl` and `GenericMethodDecl`)
`IVariableDecl`	Implemented by all code objects that represent variable declarations. (`VariableDecl` and `PropertyDeclDecl`)

All of the codeDOM classes support manual creation of instances and assignment of child objects. It’s possible to manually build up a tree of objects to represent any C# source code. Of course, it also makes sense to be able to parse existing code into codeDOM objects – this will be covered in a later article. Editing operations are also supported. Named objects can have their names changed simply by setting their Name property. Child objects can be changed by setting properties to newly created or cloned objects. Existing objects can be moved simply by assigning them, which automatically changes their Parent property to reference their new parent.

The codeDOM can also be easily rendered as C# text. Since such a feature is rather critical and used all the time (such as when debugging), text rendering is integrated directly into the codeDOM classes using methods that start with “AsText…”. Code is formatted using the FormatFlags on each object, which default as might be expected for typical C# formatting, but can be overridden if desired. These flags determine where newlines occur, if parentheses exist on an expression or braces on a block of code, or even if a semi-colon terminates a statement or expression.

Supporting Classes

The CodeWriter class (in the Rendering folder) is used to render codeDOM classes in text format. An instance of this class is required when using the AsText() methods of any class to render it as text, and you can see examples of this in CodeUnit.SaveAs() or certain special overloaded CodeObject.AsText() methods that can render any CodeObject and all of its children as text.

There are utility classes (in the Utilities folder) that provide static helper methods for working with arrays, collections, files, strings, and also the various Reflection types. The methods aren’t implemented as extension methods because we are setting up this code base to be able to parse itself, and it won’t be supporting extension methods for now. You might find some of these useful for your own apps, and they are easily converted to extension methods.

The Configuration class is used to read entries from the “.config” file of the application. It’s very small (the main LoadSettings method is only 60 lines of code), but it’s a very handy class that you are welcome to “steal” and modify for your own apps. It works by reading entries from a special section of the config file, and it uses reflection to find and set the values of static members of classes.

The Log class is used to log messages – either to the console or to any client code that chooses to intercept them. It supports different levels of logging, and special handling of exceptions. It’s also a simple and handy class that could be re-used by other apps.

Using the Attached Source Code

The “code name” chosen for this project is Nova, and so the solution in the included sources is named “Nova.sln”. All of the CodeDOM classes and supporting classes described in the previous sections are located in the “Nova.CodeDOM.csproj” project. You only need to reference the resulting Nova.CodeDOM.dll (or add the project to your own solution) if you wish to experiment with your own client code.

There are also two other projects included in the solution. The Nova.Examples project contains some simple examples of using the codeDOM that will get more examples added as we add more features. It’s a console app that you can run to see the results. The Nova.Test project contains a file “FullTest.cs” that implements most of the C# features that the codeDOM is designed to support, and “ManualTests.cs” contains code that manually builds an equivalent codeDOM object tree. This is a console application, and running it will build the object tree, emit it into a “FullTest.generated.cs” file, and then compare the result to “FullTest.cs” (there are also a couple of very small tests in ManualTests.cs that will be run first).

Examine the code in ManualTests.cs to see how to manually create a codeDOM using all of the classes provided for the various language features. There are a number of implicit conversion operators defined to make things easier. You can just pass ‘0’ to a method parameter expecting an Expression and it will be equivalent to ‘new Literal(0)’, or pass a Type such as ‘typeof(int)’ (or other reflection object, like a MethodInfo) and it will be equivalent to ‘new TypeRef(typeof(int))’ (or other appropriate reference).

Summary

We now have a complete set of codeDOM classes that can be manually instantiated to create a tree of objects that represents just about any C# program, and we can also render any tree of these objects to C# source code. But, we’re still missing some obvious things – how about parsing existing programs? And, don’t we need Solution and Project objects so we can work with “.sln” and “.csproj” files? Both of those are on the way, but first l think we need a better way to display a codeDOM than just converting it to text.

In my next article, I’ll present code for rendering the codeDOM using WPF, making it easier to inspect and understand it. Now, things are going to start getting more interesting… or, at least more attractive. Click HERE for Part 3.