Static Code Analysis

Rudi Breedenraedt

4.97/5 (32 votes)

Mar 7, 2010

CPOL

16 min read

96044

1091

A static code analyzer building method call networks + sample applications

Download sample - 380.7 KB

Introduction

A good overview of the consumed service operations and underlying business operations of your applications, eased communication between developers, analysts, and testers, an overview of exceptions thrown per service operation, a summarized overview of the security of your application, impact analysis in a snap, keeping your system streamlined by easily detecting dead code,... sounds too great to be true?

The static code analyzer presented here (including source code) can bring you a whole lot closer to those targets.

This article describes the operation of a method-based static code analyzer for .NET that constructs in-memory method call networks of compiled assemblies. Since the analyzer works on compiled MSIL code, it can be used to analyze components written in C#, VB.NET, or any other .NET language.

Second, I'll present a concrete application of static code analysis to generate a website providing easy insights on a sample application.

The download includes the code analyzer (C# source code), a website generation project, and a sample 3-tier program (using WPF/WCF/EF) to demonstrate the features of code analysis and the generated website.

A Method-Based Static Code Analyzer

At the core of the code analyzer resides a .NET disassembler for which I must thank Sorin Serban who published the implementation I used.

The analyzer will disassemble all given assemblies and construct an in-memory representation of the method call network.

For instance, to analyze the mscorlib assembly, we create a StaticCodeAnalyzerSession to which we add the mscorlib assembly; then we instantiate a StaticCodeAnalyzer to process that session. The result is a CodeModel which mainly consists of a list of ModelMethods:

// Create an AnalyzerSession:
StaticCodeAnalyzerSession session = new StaticCodeAnalyzerSession();
session.Assemblies.Add(typeof(System.Int32).Assembly);

// Process the session:
StaticCodeAnalyzer analyzer = new StaticCodeAnalyzer();
CodeModel model = analyzer.Process(session);

The code model contains a method's collection filled with all methods of the analyzed assemblies, including constructor methods, getters and setters, and other special methods. We could now use the code model to query for all public constructor methods with 7 or more arguments, as follows:

var query = from m in model.Methods
            where m.IsConstructor
               && m.IsPublic
               && m.MethodBase.GetParameters().Count() >= 7
            select m;
foreach (var method in query)
    Console.WriteLine(method);

The code model contains all methods of the analyzed assemblies as ModelMethod objects. These ModelMethods have knowledge of the methods they call and the methods they're called by. This in-memory method call network can, for instance, be used to display by whom these constructor methods are called:

foreach (ModelMethod method in query)
{
    Console.Write(method);
    if (method.CalledByMethods.Count() > 0)
    {
        Console.WriteLine(" called by:");
        foreach (Method caller in method.CalledByMethods)
        {
            Console.WriteLine(" - " + caller);
        }
    }
    else
    {
        Console.WriteLine(", not called in this assembly.");
    }
}

The ModelMethod class represents a node in the network of method calls. It provides the properties CallsMethods and CalledByMethods to query the network of method calls. GetAllCallsMethods and GetAllCalledByMethods return not only the direct callers, but also all indirect callers, recursively!

In addition, the ModelMethod class contains several properties to help identify and select methods. Some of them simply delegate to the underlying MethodBase instance.

StaticCodeAnalysis/CodeModel.png

We've just seen how we could use the code analyzer to check code quality rules (for instance, a rule that constructors should have less than 7 arguments), or to help in maintenance tasks, such as identifying unused methods (methods not called; provided, of course, we include the application's top level assemblies in our search).

If we include our unit test assemblies in the code analyzer session and query for all unused methods that are not unit tests, we get all methods that are not used (tested?) in unit tests.

Another example, by querying for all methods that are not public or protected, with no constructor, not overriding other members, not implementing an interface method, and "last but not least" - not called by any method, we obtain the list of methods that are not accessed and not accessible, often referred to as 'dead code':

// Query for unaccessed and unaccessible methods:
var query = from m in model.Methods
            where !m.IsPublic
               && !m.IsProtected
               && !m.IsConstructor
               && !m.IsClassConstructor
               && !m.IsOverride
               && m.ImplementedInterfaceMethods.Count() == 0
               && m.CalledByMethods.Count() == 0
            select m;

If you try this query on mscorlib, you'll find a few hundreds of methods. Maybe some of them are truly dead code, but we should be aware that apparently dead-code can also be explained as:

The method is called only in DEBUG build (within an #if(debug) section).
Methods may be declared internal and then consumed from within another assembly declared with 'InternalsVisibleTo' (for instance, to provide a broader access to the unit test assembly).

That's also a reason I excluded constructor methods. mscorlib typically contains two types of constructors that appear never to be called: private default constructors written to work around the fact that static classes did not exist in earlier .NET frameworks, and serialization constructor methods.

Later on, we'll see another example of detecting 'dead code' that is more directed to the code we would focus on.

So far, we've seen the basics of using the static code analyzer and had some samples on what to do with it. The building of an in-memory method call network is really the core of the code analyzer on which all further processing is based.

Code Analyzer Processors

The in-memory method call network connects all calling methods with their called methods, and vice versa. This concept is simple and clean, but unfortunately has some important limitations when used in a rich object oriented environment as the .NET framework.

Let's take a look the following diagram representing classes with methods and the method call network:

StaticCodeAnalysis/CallNetwork2.png

Figure 1: In-memory method call network

The method call network only consists of the green dots (the methods) and their connecting lines (the calls). Although class information is available, it is not used in building the method call network.

Based on the diagram, and assuming class A represents the only top-level application class, we could conclude that the methods C.mc3, D.mc1, E.me2, E.me3, and E.me4 (the dark and light-gray outlined ones) are not used from our application and hence can be deleted.

That's true for most of them. But the mc1 method on D, although never directly called, is in fact an override of the mc1 method on class C (as class D inherits from class C), and could possibly be called when mc1 is called on class C (as this is polymorphic behavior).

Similar problems arise when calling methods on an interface and having several classes implementing that interface but never being called directly. This would make it almost impossible to properly analyze applications using interfaces and Inversion of Dependency.

Fortunately, there's an easy solution to this problem: simply consider all overriding members as being called by their base method, and all interface method implementations as being called by their interface method definition. It's simple, and really solves the issue. In the above diagram, this would mean we would add an arrow from C.mc1 to D.mc1 (from the base method to its override). Suddenly, it appears that mc1 on D is not unused anymore but can be called from our application class A through the method ma1. Furthermore, E.me4 also becomes a used method, and only the dark gray outlined methods are really unused.

Connecting the base methods with their overriding methods can easily be done after processing the assemblies as in the following code:

// Process the session:
StaticCodeAnalyzer analyzer = new StaticCodeAnalyzer();
CodeModel model = analyzer.Process(session);

// Connect base methods to their overriding methods:
foreach (ModelMethod method in model.Methods)
{
    if (method.IsOverride)
    {
        ModelMethod baseMethod
            = model.Methods.ForMethodBase(method.BaseMethodDefinition);
        if (baseMethod != null)
        {
            baseMethod.CallsMethods.Add(method);
        }
    }
}

We loop over all methods, and for every method that is an override, we try to identify the ModelMethod instance matching the base method definition. Such a ModelMethod instance is only found if it is defined in an analyzed assembly. So, if a class overrides a method defined in another assembly, we would need to include that assembly too in the analyzer session.

If the base ModelMethod instance can be found, we add the overriding method to its CallsMethods collection.

For this and similar tasks, predefined model processors are available:

VirtualMethodProcessor: Connects base methods to their overriding methods
DefaultImplementationProcessor: Connects interface methods to their implementations
DefaultMethodProcessor: Connects constructors to specified methods
LeafMethodProcessor: Disconnects methods from their selected callers

Custom processors can easily be implemented by implementing the IProcessor interface.

The predefined processors do process only methods having special markers (tags). So for now, we can't use the processors. We will first have to see what tags and rule based processors can do for us.

Tagging for Eased Analysis

Every ModelMethod has a Tags collection property (and for convenience, a HasAnyOfTags() method). A tag can be any kind of object, but think of it as a string, as a label we can use to mark methods.

Would I want to recognize exception constructors, then I could put a tag to all those methods, as in:

// Identify exception constructor methods:
var xcons = from m in model.Methods
            where m.IsConstructor
               && m.DeclaringType.IsA<Exception>()
            select m;

// Tag those methods with "exceptionconstructor":
foreach (ModelMethod method in xcons)
    method.Tags.Add("exceptionconstructor");

(Note that the IsA<T>() method is an extension method of mine. You could use typeof(Exception).IsAssignableFrom(m.DeclaringType) instead.)

On the other hand, I might want to identify all public methods of mscorlib within the System.Text namespace. I can mark those methods with a "text" tag as follows:

// Identify public methods in the System.Text namespace:
var txtmem = from m in model.Methods
             where m.DeclaringType.Namespace == "System.Text"
                && m.DeclaringType.IsPublic
                && m.IsPublic
             select m;

// Tag those methods with "text":
foreach (ModelMethod method in txtmem)
    method.Tags.Add("text");

I can now easily filter all methods common to both sets based on their tags, as follows:

// Show exception constructors within the System.Text namespace:
foreach (ModelMethod method in model.Methods
                                    .WhereTagsContains("exceptionconstructor")
                                    .WhereTagsContains("text"))
{
    Console.WriteLine(method);
}

More interestingly, I can now use the method call network to show what exceptions are potentially thrown by what method. Listing which exception constructors are called by a method is not sufficient. A method can call another method that throws an exception. What we I need to do is search recursively for all calls to an exception constructor. Luckily, the GetAllCallsMethods() does just that for us.

But it also introduces a small issue. The GetAllCallsMethods() method goes recursively as deep as it can, and since a constructor always calls a base constructor, every method calling an exception constructor will appear to call the exception's base type constructor as well. A simple way to solve this issue is to process the method call network and say that we don't want to see which calls are made from an exception constructor:

// Disconnect exception constructors from their base constructor:
foreach (ModelMethod method in model.Methods
                                    .WhereTagsContains("exceptionconstructor"))
    method.CallsMethods.Clear();

(This could also be done using the LeafMethodProcessor.)

Now we could use the following code to show the public methods in the System.Text namespace together with the list of exceptions they potentially throw:

// Show methods in System.Text namespace which potentially throw
// exceptions:
foreach (ModelMethod method in model.Methods
                                    .WhereTagsContains("text")
                                    .OrderBy(m => m.ToString()))
{
    Console.WriteLine(method);
    foreach (Type exception in method.GetAllCallsMethods()
                                     .WhereTagsContains("exceptionconstructor")
                                     .Select(m => m.DeclaringType)
                                     .Distinct()
                                     .OrderBy(t => t.ToString()))
    {
        Console.WriteLine(" - " + exception);
    }
}

We run over all methods having a "text" tag, then, for each of these methods, we look for called methods (using GetAllCallsMethods(), so searching recursively) that have the "exceptionconstructor" tag. Of those, we take the declaring type, distinct, and sort it.

The whole idea of tagging methods is to easily define subsets of methods and to ease the way we write queries. Tags also leverage a declarative way of defining subsets of methods that is described in the next paragraph about the rule based processor.

You might have noticed that we do not look for methods that 'throw' exceptions. Instead, we look for methods that call exception constructors. As the code model only knows about method calls, this is the best assumption we can make of exceptions being thrown. On the other hand, the code model cannot detect when exceptions are caught either.

The Rule Based Processor

A processor is nothing more than a piece of code that loops over all or a subset of the methods of the code model, and applies some processing on them.

The rule based processor is a processor that sets tags on selected methods, where method selection is done using rule objects.

To use the rule based processor, we need an XML definition of the rulebase. We will see later how an entire code analysis session can be defined in an XML document. For now, we will focus on the rule based processor only. Its XML definition looks like:

<rulesprocessor handler="Arebis.CodeAnalysis.Static.Processors.RulesProcessor">
  <definitions>
    <assembly path="bin\Arebis.CodeAnalysis.Static.dll" />
  </definitions>
  <rulesets>
    <ruleset name="first">...</ruleset>
    <ruleset name="second">...</ruleset>
    <ruleset name="third">...</ruleset>
    ...
  </rulesets>
</rulesprocessor>

As we will see later, all processor elements take a handler attribute with the name of the class that implements the processor (the name of the XML element itself is not important, it's only a convention to use "rulesprocessor" for the rule based processor).

The content of the processor element depends on the processor (the rule based processor is so far the only processor for which the processor tag actually has content). The rulesprocessor element content is made of two parts: definitions and rulesets.

The definitions section merely lists assembly files in which rule classes are found. Unless you define your own rule implementations, you can just copy-paste the reference to the Arebis.CodeAnalysis.Static.dll.

The rulesets section defines all rulesets. Every ruleset has a name, and that name is the tag that will be applied to a method if it matches the rule. Basically, the rule based processor loops over all methods of the code model, and tags those methods with the names of the rulesets that apply to those methods.

For instance, take the following ruleset:

<ruleset name="exception">
  <basetyperule type="System.Exception"/>
  <tagrule name="constructor"/>
</ruleset>

This ruleset says that all methods which are declared on a type that is a kind of Exception, and that have the tag "constructor", get the tag "exception".

Or take the following ruleset:

<ruleset name="componentoperation">
  <modifierrule modifiers="FamANDAssem" target="Method"/>
  <modifierrule modifiers="Static" target="Method"/>
  <namerule target="Type" like="*ComponentManager"/>
  <tagrule name="constructor" reverse="true"/>
</ruleset>

This ruleset says that all methods which have the modifier FamANDAssem (as defined on System.Reflection.MethodAttributes; FamAndAssem means "internal" in C#), and are static, and of which the type has a name that matches "*ComponentManager" (the like argument uses simple * and ? wildcards), and that do not have the "constructor" tag (the rule 'has the constructor tag' is reversed), get the "componentoperation" tag.

Rulesets can also be chained. For instance:

<ruleset name="leafmethod">
  <tagrule name="exception"/>
</ruleset>

By the first ruleset, we saw constructors of exception classes being tagged with the "exception" tag. This ruleset says that methods tagged with the "exception" tag must also be tagged with the "leafmethod" tag.

(The "leafmethod" tag is used by the LeafMethodProcessor to clear the calling method list of those methods.)

This last ruleset example shows how (WCF) attributes can be used to identify service operations. The following rule tags all methods on a type that is decorated by the WCF [ServiceContract] attribute and is an interface with "serviceoperationdefinition". Similarly, we can identify unit tests by their [TestMethod] attribute, or even define our own attributes to identify layers of our architecture.

<ruleset name="serviceoperationdefinition">
  <attributerule type="System.ServiceModel.ServiceContractAttribute" target="Type" />
  <modifierrule modifiers="Interface" target="Type" />
</ruleset>

An overview of the rule types that can be used is given in Appendix A. It's also important to know that you can develop your own rules by inheriting from BaseMatchingRule, decorating your rule class with a CodeModelMatchingRuleAttribute, and listing the assembly containing the rule class in the definitions section of the ruleprocessor element in the XML.

The rule based processor provides a powerful way of declaratively describing rules for code analysis. One of the things we can, and will, do with it in the next chapter is use it to tag methods depending on their layer in our software architecture. Once the methods are marked by their layer, it becomes easy to verify their compliancy to the coding standards and to document the application in terms of those layers.

Example - an Application Documentation Site

The download of this article contains the code of the static code analyzer together with a "ToDosSample" application. The ToDo sample application has the following architecture:

StaticCodeAnalysis/ToDoSampleArch.png

A WPF client application calls a WCF service layer that delegates execution to a business layer using Entity Framework as the data access component. The WCF service contract is defined in a contract assembly that is shared between the client and the service tier.

Based on this classic architecture, we would like to generate a website to support our functional analysts and our maintenance team. The website would list the operations defined on the client tier, those defined on the service tier, and those defined on the business component tier, and would show the relations between the operations on the different layers. In addition, it would tell us which exceptions can be thrown from the business operations, and provide a list of the unused business component operations.

The result would look like the following, and is also included, with the generation scripts, in the download to this article:

StaticCodeAnalysis/WebOutput.png

A live version of this site is available here.

To create this website, we will combine the static code analyzer, using an XML profile for the code analyzer, with a 'code' generator that in fact will generate not code but the HTML of our website.

The generator I used is the one I published earlier on Code Project here.

But don't worry, I included everything in the download of this article, including the binaries of the code generator. Of course, you could use any other open .NET template based code generator, including Microsoft's T4 text templating engine, provided it supports multi-template generation projects.

StaticCodeAnalysis/VSNetSolution.png

The solution, which contains the ToDoSample application (last 4 projects of the solution), also contains a "Generation" solution folder with all files used to generate the website.

The GenerationProfile.xml file contains a complete generation profile, and can be read by the StaticCodeAnalyzerSession class as follows:

var session = new StaticCodeAnalyzerSession("GenerationProfile.xml");

This profile includes the following information:

The assemblies to include in the analysis process (all assemblies of the ToDoSample application as well as the unit tests on the application).
Language settings (this is used to translate "System.Int32" into "int" and strip away known namespaces when showing method signatures).
Processors (the list of processors to run (order in which they run is important)), including the rulesets of the rule based processor.

Tip: To edit the GenerationProfile.xml file with intellisense in Visual Studio.NET, open the GenerationProfile.xml file, then in the Properties window, select Schema, and set "Use this schema" next to "StaticCodeAnalyserProfile.xsd".

The GenerationSettings.xml file is used merely by the code generator, and tells it which template to start with ("GeneratorMain.cst"), where to find assemblies, and where to write the output. It also contains settings specific to the templates.

The first template ("GeneratorMain.cst") reads the static code analyzer session using the profile, processes the analyzer, and calls the generation templates. The Content.cst template builds the index on the left of the site, and also calls the other templates (UIOperation.cst, SOperation.cst, and COperatin.cst).

The Generate.cmd file is a batch command file that will regenerate the whole website.

The generated website on itself shows multiple things we can do with the static code analyzer:

Detect which (application) exceptions are thrown by a method and its called methods recursively.
Detect which operations are unused (i.e., unused component operations).
Detect which security roles are required to access an operation.
Detect whether unit tests exist in which given operations are used/tested.
Detect which methods are not yet implemented (see the "FindUserByName" component operation), and show method call trees to show which method lacks implementation.

Conclusion

Static code analysis provides for a multitude of applications despite its simple concept of solely building an in-memory method call network. Code analysis allows you and your team to write better code by checking for and enforcing coding conventions.

You can, for instance, use the code analyzer in unit tests that check coding conventions such as, service operations that update data on a database must run in a transaction. An example of this is included in the sample; see the "TransactionCodeTest" unit test. You could even write a set of custom code analysis rules to plug into the Visual Studio Code Analysis settings, or implement custom check-in policies for your source control environment.

Or you can use code analysis to build websites, or generate documentation to communicate on a higher level with functional analysts, software testers, developers, and maintainers.

Applying code analysis will not revolutionize your way of programming, but it will give you more control and allow you to respond quicker to impact analysis demands.

Appendix A - Overview of Rule Types

attributerule

Matches methods by the attributes they, their type, or their assembly have.

target: Target having the attribute
type: Type name of the attribute

namerule

Matches methods by their name, their type name, or their assembly's name.

target: Target of the name
like: Wildcard expression the name should match
match: Regular Expression the name should match

modifierrule

Matches methods by their modifiers or by their type modifiers.

target: Target of the modifier
modifiers: Comma-separated list of modifiers

tagrule

Matches methods by their tags.

name: Name of the expected tag
reverse: True to reverse the match

interfacerule

Matches methods by the fact they implement a method of the given interface.

type: Name of the interface

basetyperule

Matches methods on subtypes of the given type.

type: Name of the base type
includeinterfaces: Whether the type can be an interface

History

7^th March, 2010: Initial version