CS CODEDOM Parser is utility which parses the C# source code and creates the CODEDOM tree of the code (general classes that represent code, part of .NET Framework - namespace System.CodeDom) .
Current version (0.1) is limited - it parses code down to type members and their parameters, it has very limited support for expressions and it does not parse the statements inside members. There are two main reasons for why I stayed at this level now
- First - It was enough for my needs (I wanted to do some code analysis to enforce coding standards)
- Second - CODEDOM is limited and cannot express fully the C# code - for more details see section CODEDOM Limitations below.
On the other hand it also parses source code comments, so it can be used to analyze the interdependencies of code and comments.
Also the stability of this version is low - it's kind of alpha version. If anybody wants to help get this thing further he is welcomed.
The parser is based on Mono - CSharp Compiler code. I was looking around little bit around for available C# parser and C# parser building tools (I wanted C# parser in C#) and finally decided for Mono. For more details about exploitation of Mono parser and other possibilities I explored see section C# parser Tools.
At first I thought it is great idea to use language independent syntax tree and CodeDom looks nice. If some code analysis tool is build on it, it can work for any .NET language. Just need to change parser and rest is the same, sounds cool. But, after I've got into the CodeDom, I have found that a lot of language features (and not just C#, basically for any language) is missing and it is not possible to parse the source code fully. The main problem is in expressions and statements, where CodeDom has very limited set of classes - there is for instance no support for unary operation and more more issues.
I decided to continue with CodeDom, even with its limitations, because it was enough for purposes of analyzing code for coding standards (at least what I need now - it also enables to keep comments and code in one tree, which is something I liked), but it is open issue for the future development.
Here is list of issues I've found (and there is more,):
- CodeCompile unit does not have space for using directives or ns members, so they are placed now into first default NS
- using_alias_directive - no support found
- nested namespaces - no support found ( so parser is flattening ns hierarchy)
- variable declaration list (int i,j,k;) - no support - transformed to individual var declarations
- pointer_type - no support found
- "jagged" array type (array of arrays) - MS CSharpCodeProvider reverses order of ranks
- params keyword - not supported - param is omitted in parsing and param is then an ordinary array type param
- private modifier on nested delegate is not shown by CSharpCodeProvider (all other nested types works fine)
- unsafe modifier - no support found
- readonly modifier - no support found
- volatile modifier - no support found
- explicit interface implementation - not implemented yet (I think this can be done)
- add and remove accessors for Event - no support found
- virtual and override modifiers do not work in MS CSharpCodeProvider for events
- Operator members and Destructors - no support found
- Expressions - no unary expressions(operations) at all !!!, only one dim arrays, some operators not supported and more
- Attribute targets : no support found
- Attributes on accessor : no support found
- If CompileUnit contains custom attributes in global scope, CSSharpCodeProvider prints then before global using directives (it is due to that using has to be in the first ns)
I wanted to use some existing tool so I looked around and found this interesting stuff :
- Mono project
They are implementing a complete open source .NET platform (they modified jay parser generator and used it to generate the parser).
Compiler Writing Tools using C#, from Malcolm Crowe of the University of Paisley
Mr.Crowe creates parser and lexer generator in C#. I was playing with these tools quite a bit, but when I wanted to do something bigger, I've got stuck.
C# grammar for flex/bison written by James Power of National University of Ireland
Contains scripts for well-known tools bison and flex, which can generate C parser. I thought I can use then in some C# port of those tools, but I was not able, so finally used the grammar from Mono.
This is port of JB Parser and Lexer Generation for Java (which itself is port of bison and flex). But the current version is alpha and I was not able to make work even their calculator example (which authors claim it was working).
CsLex from Brad Merrill
It is a lexer generator.
I've also looked at the MS Rotor project, the C# parser there is in C++ (and it is not Open Source license).
So finally I decided to use Mono source, I've used their lexer, jay and their jay grammar to generate my parser. It is the jay grammar I've use my code to create CodeDom objects.
Description of package
CS CODEDOM Parser package consist of :
- CodeDom parser itself (/ directory)
- NUnit tests for the parser (/NUnitTests directory)
Contains bunch of tests, I've used to check functionality of the parser - if you want to run then you should have NUnit.
- testParser (/testParser directory)
Simple command line utility that tests the parser - it parses file (name supplied as cmd line parameter) and write to stdout the code, which is generated by CSharpCodeProvider (class in CodeDom).
- CodeTreeView (/CodeTreeView directory)
Simple windows application, which opens file and displays CODEDOM tree in left part (treeview control) and original source in right part (textbox control). When you click on tree node, textbox scrolls to show the code. It is something like very very simple source code viewer.
CS CODEDOM Parser and tools included in this package are distributed under the under GPL licence.
You can check for latest version on http://ivanz.webpark.cz/csparser.html.
The basic idea about future development is to extend CodeDom to support all language features, so the sources can be completely parsed. (Alternative is to leave CodeDom and have its own syntax tree, but I still like the idea of the independent language tree structure, which can be used in different tasks).
Reporting of errors and warnings should be improved (unify codes and messages, unify error reporting, Report class should store reported errors).
Also parser should be improved to indicate location of syntax elements more exactly in the source file.
Better separation between the parser and CODEDOM builder is also needed.
If somebody likes the tool and wants to help with its improvements, he is welcome.