Static Code Analysis






4.97/5 (32 votes)
A static code analyzer building method call networks + sample applications
Introduction
A good overview of the consumed service operations and underlying business operations of your applications, eased communication between developers, analysts, and testers, an overview of exceptions thrown per service operation, a summarized overview of the security of your application, impact analysis in a snap, keeping your system streamlined by easily detecting dead code,... sounds too great to be true?
The static code analyzer presented here (including source code) can bring you a whole lot closer to those targets.
This article describes the operation of a method-based static code analyzer for .NET that constructs in-memory method call networks of compiled assemblies. Since the analyzer works on compiled MSIL code, it can be used to analyze components written in C#, VB.NET, or any other .NET language.
Second, I'll present a concrete application of static code analysis to generate a website providing easy insights on a sample application.
The download includes the code analyzer (C# source code), a website generation project, and a sample 3-tier program (using WPF/WCF/EF) to demonstrate the features of code analysis and the generated website.
A Method-Based Static Code Analyzer
At the core of the code analyzer resides a .NET disassembler for which I must thank Sorin Serban who published the implementation I used.
The analyzer will disassemble all given assemblies and construct an in-memory representation of the method call network.
For instance, to analyze the mscorlib
assembly, we create a StaticCodeAnalyzerSession
to which we add the mscorlib
assembly; then we instantiate a StaticCodeAnalyzer
to process that session. The result is a CodeModel
which mainly consists of a list of ModelMethod
s:
// Create an AnalyzerSession:
StaticCodeAnalyzerSession session = new StaticCodeAnalyzerSession();
session.Assemblies.Add(typeof(System.Int32).Assembly);
// Process the session:
StaticCodeAnalyzer analyzer = new StaticCodeAnalyzer();
CodeModel model = analyzer.Process(session);
The code model contains a method's collection filled with all methods of the analyzed assemblies, including constructor methods, getters and setters, and other special methods. We could now use the code model to query for all public
constructor methods with 7 or more arguments, as follows:
var query = from m in model.Methods
where m.IsConstructor
&& m.IsPublic
&& m.MethodBase.GetParameters().Count() >= 7
select m;
foreach (var method in query)
Console.WriteLine(method);
The code model contains all methods of the analyzed assemblies as ModelMethod
objects. These ModelMethod
s have knowledge of the methods they call and the methods they're called by. This in-memory method call network can, for instance, be used to display by whom these constructor methods are called:
foreach (ModelMethod method in query)
{
Console.Write(method);
if (method.CalledByMethods.Count() > 0)
{
Console.WriteLine(" called by:");
foreach (Method caller in method.CalledByMethods)
{
Console.WriteLine(" - " + caller);
}
}
else
{
Console.WriteLine(", not called in this assembly.");
}
}
The ModelMethod
class represents a node in the network of method calls. It provides the properties CallsMethods
and CalledByMethods
to query the network of method calls. GetAllCallsMethods
and GetAllCalledByMethods
return not only the direct callers, but also all indirect callers, recursively!
In addition, the ModelMethod
class contains several properties to help identify and select methods. Some of them simply delegate to the underlying MethodBase
instance.
We've just seen how we could use the code analyzer to check code quality rules (for instance, a rule that constructors should have less than 7 arguments), or to help in maintenance tasks, such as identifying unused methods (methods not called; provided, of course, we include the application's top level assemblies in our search).
If we include our unit test assemblies in the code analyzer session and query for all unused methods that are not unit tests, we get all methods that are not used (tested?) in unit tests.
Another example, by querying for all methods that are not public
or protected
, with no constructor, not overriding other members, not implementing an interface method, and "last but not least" - not called by any method, we obtain the list of methods that are not accessed and not accessible, often referred to as 'dead code':
// Query for unaccessed and unaccessible methods:
var query = from m in model.Methods
where !m.IsPublic
&& !m.IsProtected
&& !m.IsConstructor
&& !m.IsClassConstructor
&& !m.IsOverride
&& m.ImplementedInterfaceMethods.Count() == 0
&& m.CalledByMethods.Count() == 0
select m;
If you try this query on mscorlib
, you'll find a few hundreds of methods. Maybe some of them are truly dead code, but we should be aware that apparently dead-code can also be explained as:
- The method is called only in DEBUG build (within an
#if(debug)
section). - Methods may be declared
internal
and then consumed from within another assembly declared with 'InternalsVisibleTo
' (for instance, to provide a broader access to the unit test assembly).
That's also a reason I excluded constructor methods. mscorlib
typically contains two types of constructors that appear never to be called: private
default constructors written to work around the fact that static
classes did not exist in earlier .NET frameworks, and serialization constructor methods.
Later on, we'll see another example of detecting 'dead code' that is more directed to the code we would focus on.
So far, we've seen the basics of using the static code analyzer and had some samples on what to do with it. The building of an in-memory method call network is really the core of the code analyzer on which all further processing is based.
Code Analyzer Processors
The in-memory method call network connects all calling methods with their called methods, and vice versa. This concept is simple and clean, but unfortunately has some important limitations when used in a rich object oriented environment as the .NET framework.
Let's take a look the following diagram representing classes with methods and the method call network:
The method call network only consists of the green dots (the methods) and their connecting lines (the calls). Although class information is available, it is not used in building the method call network.
Based on the diagram, and assuming class A represents the only top-level application class, we could conclude that the methods C.mc3
, D.mc1
, E.me2
, E.me3
, and E.me4
(the dark and light-gray outlined ones) are not used from our application and hence can be deleted.
That's true for most of them. But the mc1
method on D
, although never directly called, is in fact an override of the mc1
method on class C
(as class D
inherits from class C
), and could possibly be called when mc1
is called on class C
(as this is polymorphic behavior).
Similar problems arise when calling methods on an interface and having several classes implementing that interface but never being called directly. This would make it almost impossible to properly analyze applications using interfaces and Inversion of Dependency.
Fortunately, there's an easy solution to this problem: simply consider all overriding members as being called by their base method, and all interface method implementations as being called by their interface method definition. It's simple, and really solves the issue. In the above diagram, this would mean we would add an arrow from C.mc1
to D.mc1
(from the base method to its override). Suddenly, it appears that mc1
on D
is not unused anymore but can be called from our application class A
through the method ma1
. Furthermore, E.me4
also becomes a used method, and only the dark gray outlined methods are really unused.
Connecting the base methods with their overriding methods can easily be done after processing the assemblies as in the following code:
// Process the session:
StaticCodeAnalyzer analyzer = new StaticCodeAnalyzer();
CodeModel model = analyzer.Process(session);
// Connect base methods to their overriding methods:
foreach (ModelMethod method in model.Methods)
{
if (method.IsOverride)
{
ModelMethod baseMethod
= model.Methods.ForMethodBase(method.BaseMethodDefinition);
if (baseMethod != null)
{
baseMethod.CallsMethods.Add(method);
}
}
}
We loop over all methods, and for every method that is an override, we try to identify the ModelMethod
instance matching the base method definition. Such a ModelMethod
instance is only found if it is defined in an analyzed assembly. So, if a class overrides a method defined in another assembly, we would need to include that assembly too in the analyzer session.
If the base ModelMethod
instance can be found, we add the overriding method to its CallsMethods
collection.
For this and similar tasks, predefined model processors are available:
VirtualMethodProcessor
: Connects base methods to their overriding methodsDefaultImplementationProcessor
: Connects interface methods to their implementationsDefaultMethodProcessor
: Connects constructors to specified methodsLeafMethodProcessor
: Disconnects methods from their selected callers
Custom processors can easily be implemented by implementing the IProcessor
interface.
The predefined processors do process only methods having special markers (tags). So for now, we can't use the processors. We will first have to see what tags and rule based processors can do for us.
Tagging for Eased Analysis
Every ModelMethod
has a Tags
collection property (and for convenience, a HasAnyOfTags()
method). A tag can be any kind of object, but think of it as a string, as a label we can use to mark methods.
Would I want to recognize exception constructors, then I could put a tag to all those methods, as in:
// Identify exception constructor methods:
var xcons = from m in model.Methods
where m.IsConstructor
&& m.DeclaringType.IsA<Exception>()
select m;
// Tag those methods with "exceptionconstructor":
foreach (ModelMethod method in xcons)
method.Tags.Add("exceptionconstructor");
(Note that the IsA<T>()
method is an extension method of mine. You could use typeof(Exception).IsAssignableFrom(m.DeclaringType)
instead.)
On the other hand, I might want to identify all public
methods of mscorlib
within the System.Text
namespace. I can mark those methods with a "text
" tag as follows:
// Identify public methods in the System.Text namespace:
var txtmem = from m in model.Methods
where m.DeclaringType.Namespace == "System.Text"
&& m.DeclaringType.IsPublic
&& m.IsPublic
select m;
// Tag those methods with "text":
foreach (ModelMethod method in txtmem)
method.Tags.Add("text");
I can now easily filter all methods common to both sets based on their tags, as follows:
// Show exception constructors within the System.Text namespace:
foreach (ModelMethod method in model.Methods
.WhereTagsContains("exceptionconstructor")
.WhereTagsContains("text"))
{
Console.WriteLine(method);
}
More interestingly, I can now use the method call network to show what exceptions are potentially thrown by what method. Listing which exception constructors are called by a method is not sufficient. A method can call another method that throws an exception. What we I need to do is search recursively for all calls to an exception constructor. Luckily, the GetAllCallsMethods()
does just that for us.
But it also introduces a small issue. The GetAllCallsMethods()
method goes recursively as deep as it can, and since a constructor always calls a base constructor, every method calling an exception constructor will appear to call the exception's base type constructor as well. A simple way to solve this issue is to process the method call network and say that we don't want to see which calls are made from an exception constructor:
// Disconnect exception constructors from their base constructor:
foreach (ModelMethod method in model.Methods
.WhereTagsContains("exceptionconstructor"))
method.CallsMethods.Clear();
(This could also be done using the LeafMethodProcessor
.)
Now we could use the following code to show the public
methods in the System.Text
namespace together with the list of exceptions they potentially throw:
// Show methods in System.Text namespace which potentially throw
// exceptions:
foreach (ModelMethod method in model.Methods
.WhereTagsContains("text")
.OrderBy(m => m.ToString()))
{
Console.WriteLine(method);
foreach (Type exception in method.GetAllCallsMethods()
.WhereTagsContains("exceptionconstructor")
.Select(m => m.DeclaringType)
.Distinct()
.OrderBy(t => t.ToString()))
{
Console.WriteLine(" - " + exception);
}
}
We run over all methods having a "text
" tag, then, for each of these methods, we look for called methods (using GetAllCallsMethods()
, so searching recursively) that have the "exceptionconstructor
" tag. Of those, we take the declaring type, distinct, and sort it.
The whole idea of tagging methods is to easily define subsets of methods and to ease the way we write queries. Tags also leverage a declarative way of defining subsets of methods that is described in the next paragraph about the rule based processor.
You might have noticed that we do not look for methods that 'throw' exceptions. Instead, we look for methods that call exception constructors. As the code model only knows about method calls, this is the best assumption we can make of exceptions being thrown. On the other hand, the code model cannot detect when exceptions are caught either.
The Rule Based Processor
A processor is nothing more than a piece of code that loops over all or a subset of the methods of the code model, and applies some processing on them.
The rule based processor is a processor that sets tags on selected methods, where method selection is done using rule objects.
To use the rule based processor, we need an XML definition of the rulebase. We will see later how an entire code analysis session can be defined in an XML document. For now, we will focus on the rule based processor only. Its XML definition looks like:
<rulesprocessor handler="Arebis.CodeAnalysis.Static.Processors.RulesProcessor">
<definitions>
<assembly path="bin\Arebis.CodeAnalysis.Static.dll" />
</definitions>
<rulesets>
<ruleset name="first">...</ruleset>
<ruleset name="second">...</ruleset>
<ruleset name="third">...</ruleset>
...
</rulesets>
</rulesprocessor>
As we will see later, all processor elements take a handler attribute with the name of the class that implements the processor (the name of the XML element itself is not important, it's only a convention to use "rulesprocessor
" for the rule based processor).
The content of the processor element depends on the processor (the rule based processor is so far the only processor for which the processor tag actually has content). The rulesprocessor
element content is made of two parts: definitions
and rulesets
.
The definitions
section merely lists assembly files in which rule classes are found. Unless you define your own rule implementations, you can just copy-paste the reference to the Arebis.CodeAnalysis.Static.dll.
The rulesets
section defines all rulesets. Every ruleset has a name, and that name is the tag that will be applied to a method if it matches the rule. Basically, the rule based processor loops over all methods of the code model, and tags those methods with the names of the rulesets that apply to those methods.
For instance, take the following ruleset
:
<ruleset name="exception">
<basetyperule type="System.Exception"/>
<tagrule name="constructor"/>
</ruleset>
This ruleset
says that all methods which are declared on a type that is a kind of Exception
, and that have the tag "constructor
", get the tag "exception
".
Or take the following ruleset
:
<ruleset name="componentoperation">
<modifierrule modifiers="FamANDAssem" target="Method"/>
<modifierrule modifiers="Static" target="Method"/>
<namerule target="Type" like="*ComponentManager"/>
<tagrule name="constructor" reverse="true"/>
</ruleset>
This ruleset
says that all methods which have the modifier FamANDAssem
(as defined on System.Reflection.MethodAttributes
; FamAndAssem
means "internal
" in C#), and are static
, and of which the type has a name that matches "*ComponentManager
" (the like
argument uses simple *
and ?
wildcards), and that do not have the "constructor
" tag (the rule 'has the constructor tag' is reversed), get the "componentoperation
" tag.
Ruleset
s can also be chained. For instance:
<ruleset name="leafmethod">
<tagrule name="exception"/>
</ruleset>
By the first ruleset
, we saw constructors of exception classes being tagged with the "exception
" tag. This ruleset
says that methods tagged with the "exception
" tag must also be tagged with the "leafmethod
" tag.
(The "leafmethod
" tag is used by the LeafMethodProcessor
to clear the calling method list of those methods.)
This last ruleset
example shows how (WCF) attributes can be used to identify service operations. The following rule tags all methods on a type that is decorated by the WCF [ServiceContract]
attribute and is an interface with "serviceoperationdefinition
". Similarly, we can identify unit tests by their [TestMethod]
attribute, or even define our own attributes to identify layers of our architecture.
<ruleset name="serviceoperationdefinition">
<attributerule type="System.ServiceModel.ServiceContractAttribute" target="Type" />
<modifierrule modifiers="Interface" target="Type" />
</ruleset>
An overview of the rule types that can be used is given in Appendix A. It's also important to know that you can develop your own rules by inheriting from BaseMatchingRule
, decorating your rule class with a CodeModelMatchingRuleAttribute
, and listing the assembly containing the rule class in the definitions
section of the ruleprocessor
element in the XML.
The rule based processor provides a powerful way of declaratively describing rules for code analysis. One of the things we can, and will, do with it in the next chapter is use it to tag methods depending on their layer in our software architecture. Once the methods are marked by their layer, it becomes easy to verify their compliancy to the coding standards and to document the application in terms of those layers.
Example - an Application Documentation Site
The download of this article contains the code of the static code analyzer together with a "ToDosSample
" application. The ToDo
sample application has the following architecture:
A WPF client application calls a WCF service layer that delegates execution to a business layer using Entity Framework as the data access component. The WCF service contract is defined in a contract assembly that is shared between the client and the service tier.
Based on this classic architecture, we would like to generate a website to support our functional analysts and our maintenance team. The website would list the operations defined on the client tier, those defined on the service tier, and those defined on the business component tier, and would show the relations between the operations on the different layers. In addition, it would tell us which exceptions can be thrown from the business operations, and provide a list of the unused business component operations.
The result would look like the following, and is also included, with the generation scripts, in the download to this article:
A live version of this site is available here.
To create this website, we will combine the static code analyzer, using an XML profile for the code analyzer, with a 'code
' generator that in fact will generate not code but the HTML of our website.
The generator I used is the one I published earlier on Code Project here.
But don't worry, I included everything in the download of this article, including the binaries of the code generator. Of course, you could use any other open .NET template based code generator, including Microsoft's T4 text templating engine, provided it supports multi-template generation projects.
The solution, which contains the ToDoSample
application (last 4 projects of the solution), also contains a "Generation" solution folder with all files used to generate the website.
The GenerationProfile.xml file contains a complete generation profile, and can be read by the StaticCodeAnalyzerSession
class as follows:
var session = new StaticCodeAnalyzerSession("GenerationProfile.xml");
This profile includes the following information:
- The assemblies to include in the analysis process (all assemblies of the
ToDoSample
application as well as the unit tests on the application). - Language settings (this is used to translate "
System.Int32
" into "int
" and strip away known namespaces when showing method signatures). - Processors (the list of processors to run (order in which they run is important)), including the rulesets of the rule based processor.
Tip: To edit the GenerationProfile.xml file with intellisense in Visual Studio.NET, open the GenerationProfile.xml file, then in the Properties window, select Schema, and set "Use this schema" next to "StaticCodeAnalyserProfile.xsd".
The GenerationSettings.xml file is used merely by the code generator, and tells it which template to start with ("GeneratorMain.cst"), where to find assemblies, and where to write the output. It also contains settings specific to the templates.
The first template ("GeneratorMain.cst") reads the static code analyzer session using the profile, processes the analyzer, and calls the generation templates. The Content.cst template builds the index on the left of the site, and also calls the other templates (UIOperation.cst, SOperation.cst, and COperatin.cst).
The Generate.cmd file is a batch command file that will regenerate the whole website.
The generated website on itself shows multiple things we can do with the static code analyzer:
- Detect which (application) exceptions are thrown by a method and its called methods recursively.
- Detect which operations are unused (i.e., unused component operations).
- Detect which security roles are required to access an operation.
- Detect whether unit tests exist in which given operations are used/tested.
- Detect which methods are not yet implemented (see the "
FindUserByName
" component operation), and show method call trees to show which method lacks implementation.
Conclusion
Static code analysis provides for a multitude of applications despite its simple concept of solely building an in-memory method call network. Code analysis allows you and your team to write better code by checking for and enforcing coding conventions.
You can, for instance, use the code analyzer in unit tests that check coding conventions such as, service operations that update data on a database must run in a transaction. An example of this is included in the sample; see the "TransactionCodeTest
" unit test. You could even write a set of custom code analysis rules to plug into the Visual Studio Code Analysis settings, or implement custom check-in policies for your source control environment.
Or you can use code analysis to build websites, or generate documentation to communicate on a higher level with functional analysts, software testers, developers, and maintainers.
Applying code analysis will not revolutionize your way of programming, but it will give you more control and allow you to respond quicker to impact analysis demands.
Appendix A - Overview of Rule Types
attributerule
Matches methods by the attributes they, their type, or their assembly have.
target
: Target having the attributetype
: Type name of the attribute
namerule
Matches methods by their name, their type name, or their assembly's name.
target
: Target of the namelike
: Wildcard expression the name should matchmatch
: Regular Expression the name should match
modifierrule
Matches methods by their modifiers or by their type modifiers.
target
: Target of the modifiermodifiers
: Comma-separated list of modifiers
tagrule
Matches methods by their tags.
name
: Name of the expected tagreverse
: True to reverse the match
interfacerule
Matches methods by the fact they implement a method of the given interface.
type
: Name of the interface
basetyperule
Matches methods on subtypes of the given type.
type
: Name of the base typeincludeinterfaces
: Whether the type can be an interface
History
- 7th March, 2010: Initial version