15,876,805 members
Articles / Programming Languages / C#

# MSIL Decompiler Theory

Rate me:
4.12/5 (29 votes)
21 Nov 2013CPOL7 min read 46.7K   40   2
Try to develop a theory on how to decompile MSIL

## Introduction

Welcome to my journey of writing a .NET assembly decompiler. First of all, I'll try to develop a theory to decompile MSIL. I just do whatever an MSIL instruction asks me to do. But I do it keeping in mind that I am decompiling MSIL. So when it asks me to push a value of a variable, I push the name of that variable on the stack.

## Simple Case Example

To understand the code, it is required that you know or have a reference of what each instruction of MSIL actually does. Here is a sample program to test if our concept works:

C#
```namespace DisasmIL
{
class Math
{
public int add(int x, int y)
{
return x + y;
}
}
class Program
{
static void Main(string[] args)
{
Math m;
int a, b;
m = new Math();
a = 20;
b = 50;
int p = m.add(a, b);
}
}
}```

We only check one method. The `Main `method. When the `Main `method is compiled, it takes the following form.

MSIL
```.method private hidebysig static void Main(string[] args) cil managed
{
.entrypoint
// Code size 23 (0x17)
.maxstack 3
.locals init (
[0] class DisasmIL.Math m,
[1] int32 a,
[2] int32 b,
[3] int32 p
)
IL_0000: nop
IL_0001: newobj instance void DisasmIL.Math::.ctor()
IL_0006: stloc.0
IL_0007: ldc.i4.s 20
IL_0009: stloc.1
IL_000a: ldc.i4.s 50
IL_000c: stloc.2
IL_000d: ldloc.0
IL_000e: ldloc.1
IL_000f: ldloc.2
IL_0010: callvirt instance int32 DisasmIL.Math::'add'(int32,int32)
IL_0015: stloc.3
IL_0016: ret
} // end of method Program::Main```

We parse line by line:

MSIL
`.method private hidebysig static void Main(string[] args) cil managed`

It is a method declaration with default starting curly brace. Output code:

C#
```static void Main(string[] args)
{```

Stack: [empty]

MSIL
```.entrypoint
// Code size 23 (0x17)
.maxstack 3
.locals init (
[0] class DisasmIL.Math m,
[1] int32 a,
[2] int32 b,
[3] int32 p
)```

There is no need for explanation here. They are self explanatory. Declare variables. Output code:

C#
```DisasmIL.Math m;
int a;
int b;
int p;```

Stack: [empty]

MSIL
`IL_0000: nop:`

Does nothing (nop).
Output code: [none]<none>
Stack: [empty]

MSIL
`IL_0001: newobj instance void DisasmIL.Math::.ctor()`

Create a new instance of `DisasmIL.Math `using a default constructor so we push "new DisasmIL.Math()" on our stack.
Output code: nothing.<none>
Stack: `new DisasmIL.Math()`

MSIL
`IL_0006: stloc.0`

So we pop top of stack and assign it to a local variable `0`. Output code:

C#
`m = new DisasmIL.Math();`

Stack: [empty]

MSIL
`IL_0007: ldc.i4.s 20`

What we do is push constant `20` on the stack.
Output code: [none]<none>
Stack: `20`

MSIL
`IL_0009: stloc.1`

So we pop the top value and assign it to local variable `1`. Output code:

MSIL
`a = 20;`

Stack: [empty]

MSIL
`IL_000a: ldc.i4.s 50`

We push constant `50 `on stack.
Output code: [none]<none>
Stack: `50`

MSIL
`IL_000c: stloc.2`

We pop top value and assign it to a local variable `2`.
Output code:

C#
`b = 50;`

Stack: [empty]

MSIL
```IL_000d: ldloc.0
IL_000e: ldloc.1
IL_000f: ldloc.2```

Push local variable `0`, `1 `and `2 `on tstack.
Output code: none.<none>
Stack: `m`, `a`, `b`

MSIL
```IL_0010: callvirt instance int32 DisasmIL.Math::'add'(int32,int32)
IL_0015: stloc.3```

We call add method with values top-`1`, top of stack for instance of top-`2`. For any method call, if it returns a value, it is returned on the stack. So check the next instruction. If it is a `stloc`, then we assign the return value. We assign the return value to local variable `3`.

Output code:

C#
`p=m.add(a,b);`

Stack: [empty]

MSIL
`IL_0016: ret`

Return void. So no code except closing curly brace. Output code:

C#
`}`

Stack: [empty]

Now if you add the output codes together, you'll find the original C# code is generated. This works for simple cases. We need to test if it works for complex situations. Let's do some more interesting things now.

## The if Structure

Let's look at the most basic and useful `if`-`then `structure. The compiler generates a conditional branch. We create a block of instructions for the structure. The block is separated by the branch instruction (like `brtrue`) and the branch label (like `IL_0019 `- where the code jumps). And our condition is on the stack. If we find a `true `condition branch, we negate it and put it as an `if `statement condition. The block is initially a block of MSIL that we will convert to C# code later (we may need recursion here??). Please note that the labels are not stored in MSIL. It is just the byte offset of the MSIL in a method.

We take a simple method to test our theory with the `if `structure.

C#
```public int IfStructure(int a, int b)
{
if (a < b)
{
System.Console.Write("Condition is true");
}
return b;
}```

Here is the MSIL code generated by the Visual Studio 2005 compiler.

MSIL
```.method public hidebysig instance int32 IfStructure(int32 a, int32 b) cil managed
{
// Code size 31 (0x1f)
.maxstack 2
.locals init ([0] int32 CS\$1\$0000,
[1] bool CS\$4\$0001)
IL_0000: nop
IL_0001: ldarg.1
IL_0002: ldarg.2
IL_0003: clt
IL_0005: ldc.i4.0
IL_0006: ceq
IL_0008: stloc.1
IL_0009: ldloc.1
IL_000a: brtrue.s IL_0019
IL_000c: nop
IL_000d: ldstr "Condition is true"
IL_0012: call void [mscorlib]System.Console::Write(string)
IL_0017: nop
IL_0018: nop
IL_0019: ldarg.2
IL_001a: stloc.0
IL_001b: br.s IL_001d
IL_001d: ldloc.0
IL_001e: ret
} // end of method ControlStructures::IfStructure```

We start parsing now:

MSIL
```.method public hidebysig instance int32 IfStructure(int32 a, int32 b) cil managed
{
// Code size 31 (0x1f)
.maxstack 2
.locals init ([0] int32 CS\$1\$0000,
[1] bool CS\$4\$0001)```

These lines generate output that we do without any processing of MSIL. There is a method definition and local variables. We change the local variable names to C# current names without conflict. For simplicity, here we just replace '\$' with '_'. One thing to evaluate the MSIL instructions, we must keep a map of local variables with variable number. In this example, `CS\$1\$0000` is local variable `0 `of type `int`. For clarity, we do not show the map here. A simple STL map should work.

Output:

C#
```public int IfStructure(int a, int b)
{
int32 CS_1_0000;
bool CS_4_0001;```

Stack: [Empty]

MSIL
```IL_0000: nop
IL_0001: ldarg.1
IL_0002: ldarg.2```

So we push method argument `1 `and `2 `to a stack:

Output: [None]

Stack: `a`,`b`

MSIL
`IL_0003: clt`

This instructs us if stack top-`1 `is less than stack top. The two elements are popped from the stack and result goes to the stack. No output of course.

Output: [None]

Stack:
`a<b`

MSIL
`IL_0005: ldc.i4.0`

Load (means push) constant integer of value `0 `on the stack.

Output: [None]
Stack: `a<b,0`

MSIL
`IL_0006: ceq`

Check if stack top-`1 `equals stack top. Result goes to the stack.
Output: [None]
Stack: `a < b == 0`

MSIL
```IL_0008: stloc.1
IL_0009: ldloc.1```

What else? Store stack top in local variable `1 `and load that on track again. We decided previously when we store some value in a local variable we use assignment to that variable and output that code. For clarity, I added parentheses:

Output:

C#
`CS_4_0001 = (a < b == 0)`

Stack: `CS_4_0001`

MSIL
`IL_000a: brtrue.s IL_0019`

We have got a conditional branch. We create a block starting from here to `IL_0019`. And put them in curly braces. And our condition is on the stack. We find a true condition branch so we negate it and put it as an `if `structure as I said at the beginning.

Output:

C#
```if(!CS_4_0001)
{
IL_000c: nop
IL_000d: ldstr "Condition is true"
IL_0012: call void [mscorlib]System.Console::Write(string)
IL_0017: nop
IL_0018: nop
}```

Stack: [Empty]

MSIL
```IL_0019: ldarg.2
IL_001a: stloc.0
IL_001b: br.s IL_001d
IL_001d: ldloc.0
IL_001e: ret```

We do not parse them here. They are very simple to understand and we can parse them using the method we have seen in the simple example case.

OK, we can now work on slightly more complex codes. This will also produce codes that were generated by "for structure" but in a funny way. If we add a little more intelligence to produce "goto" output code for special branching that we cannot handle with `if`, we get the following result.

Code like this...

C#
```for(int i=0;i<10;i++)
{
...
}
...```

... will be converted to:

C#
```int i;
i=0;
label_1:

if(i<10)
{
---
i++;
goto label_1;
}
...```

OK, but no problem. We'll look at loops later. Now you may find that our theory generates a funny code block like:

MSIL
```CS\$4\$0001=((a<b)==0);

if(!CS\$4\$0001)
{

---

}```

Here the optimization comes to scene. But we skip it now.

## Loops

I wish, as a decompiler writer, there would be no loops. Programmers use thousands of `goto `statements with `if `statements. But as it is not the case, I must understand how to parse MSIL instructions that were generated from the loops.

Of the three types of most common loops (`for`, `while`, `do`-`while`) the `while `loop is the basic one. The block that is generated from any type of these loops usually has a conditional jump (usually a `brtrue.s`<label>) as the last instruction of the block. The difference from `if `structure is- the instruction jumps to an offset less than the current instruction offset. The `for `and `while `loops have a unconditional branch (`br.s`<label>) to an offset that is between the start and end of the block. The jump target is usually at the beginning of the condition checking instructions. The `do`-`while `loop lacks this branch for the reason - it does not test the condition before it is at the end of the block.

So, we get an instruction block like the following MSIL block:</label></label>

MSIL
```IL_0010: br.s IL_005a ;do-while loop does not have this line
IL_0012: nop

[any type and number of instructions]

IL_0059: nop

[condition check instruction- results boolean value on stack]

IL_0060: brtrue.s IL_0012```

While looking at the `if` structure, we have seen how to create a boolean condition for `if `structure. Things are similar here for the loops. Follow the instructions - get the top stack element when conditional jump is found - reverse it (add just an !) for `brtrue.s jump `and put it as the loop statements condition statement. Please note that the conditional jump targets the instruction just at or after the starting instruction of the block.

Here, we find that we cannot have a single passing decompiler. We must identify the code blocks in an iteration before the final iteration. Till now, we can identify blocks of `if`, `for`, `while`, `do`-`while `structures by using conditional jump instructions and their destination. `If `has a destination an offset after the current offset and others have destinations before the current offset. The `for `and `while `cannot be distinguished very clearly but the `do`-`while `does not have a jump at the beginning. And of course, there can be nested blocks that are generates from nested loops.

There is some complex variation of the loops - like infinite loops, foreach loop etc. They are not much different. But for now, let me keep that unfinished and wait for your response to fix errors and continue further then. Comment on it, vote for it and show me my mistakes.

## Tools

I have used Visual Studio 2005 and MSIL Disassembler that comes with .NET platform installer.

## Load IL assemply bytes

Here is how you start writing your decompiler (start by adding error checks :) )
```Assembly YourAssembly = Assembly.LoadFrom("YourAssembly.dll");
// Get all types
Type[] types = YourAssembly.GetTypes();

// Get all methods from first type
MethodInfo[] methods = types[0].GetMethods();

// Get method IL for first method
MethodBody mbody = methods[0].GetMethodBody();
byte[] methodIL =  mbody.GetILAsByteArray();
```

## History

• 24th September, 2007: Initial post

## License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By
Software Developer Microsoft
United States
Have completed BSc in Computer Science & Engineering from Shah Jalal University of Science & Technology, Sylhet, Bangladesh (SUST).

Story books (specially Masud Rana series), tourism, songs and programming is most favorite.

Blog:
Maruf Notes
http://blog.kuashaonline.com

## Comments and Discussions

 First Prev Next
 Nice article, but... Some ideas. Sander Rossel15-Jul-14 20:57 Sander Rossel 15-Jul-14 20:57
 Good Article The Manoj Kumar21-Nov-13 10:03 The Manoj Kumar 21-Nov-13 10:03
 Last Visit: 31-Dec-99 18:00     Last Update: 12-Apr-24 2:50 Refresh 1

General    News    Suggestion    Question    Bug    Answer    Joke    Praise    Rant    Admin

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.