
Introduction
.NET offers through its System.Reflection namespace the possibility to inspect an assembly. You can get all of the types defined inside, the fields, the properties and basically all you need. Still, something is missing: the body of a method. When doing a thorough inspection, you would expect to find the variables used, as well as the cycles and the decisions made inside a method body. Microsoft neglected this need, but still they provided us with something: the IL code. This is not enough, however, as it is actually an array of bytes with no meaning whatsoever to the untrained eyes of a normal programmer.
What is needed is a series of objects that represent the actual instructions from that IL code. That is what I want to provide.
Background
Any programmer who has worked with reflection has heard of the awesome reflector written by Lutz Roeder. The reflector can decompile any .NET assembly and provide the user with the equivalent code for each programming element within the given assembly.
You observed that I said "equivalent." This is mainly because the reflection mechanism cannot provide you with the original code. The compilation process removes any comments and unused variables first. Only the valid and necessary code is added to the compiled code. Thus, we cannot obtain the exact code.
The reflector is a wonderful tool, but we might want to obtain similar results with our own code. How can we do that? Let us look first at the classic "hello world" example to see what we want to achieve and what is actually provided to us by the framework. This is the classic C# code:
public void SayHello()
{
Console.Out.WriteLine("Hello world");
}
When we get the body of the SayHello method using reflection and ask for the IL code, we get an array of bytes such as:
0,40,52,0,0,10,114,85,1,0,112,111,53,0,0,10,0,42
Well, that's not very readable. What we know is that this is IL code and we want to transform it so that we can process it. The easiest way is to transform it to MSIL (Microsoft Intermediate Language). This is what the MSIL code of the SayHello method looks like and what my library is supposed to return:
0000 : nop
0001 : call System.IO.TextWriter System.Console::get_Out()
0006 : ldstr "Hello world"
0011 : callvirt instance System.Void System.IO.TextWriter::WriteLine()
0016 : nop
0017 : ret
Using the code
SDILReader is a library containing only three classes. In order to obtain the MSIL of the body of a method, one must simply create a MethodBodyReader object and pass to its constructor a MethodInfo object of the object you want to decompose.
MethodInfo mi = null;
...
SDILReader.MethodBodyReader mr = new MethodBodyReader(mi);
string msil = mr.GetBodyCode();
for (int i=0; i<mr.instructions.Count;i++)
{
}
How's it working
Well, this is the right question. In order to get started, we first need to know the structure of the IL array that is given by the .NET reflection mechanism.
IL code structure
The IL is in fact an enumeration of operations that must be executed. An operation is a pair: <operation code, operand>. The operation code is the the byte value of System.Reflection.Emit.OpCode, while the operand is the address of the metadata information for the entity the operator is working with, i.e. a method, type, value. This address is referred to as the metadata token by the .NET framework. So, in order to interpret the array, we must do something like this:
- Get the next byte and see what operator we are dealing with.
- Depending on the operator, the metadata token is defined in the next 1, 2, 3 or 4 bytes. Get the metadata token of the operand.
- Use the
MethodInfo.Module object to retrieve the object whom the metadata token is addressing.
- Store the pair <operator, operand>.
- Repeat if we are not at the end of the IL array.
ILInstruction
The ILInstruction class is used for storing the <operator, operand> pair. Also, we have there a simple method that transforms the inner information into a readable string.
MethodBodyReader
The MethodBodyReader class is doing all the hard work. Inside the constructor a private method, ConstructInstructions, is called that parses the IL array:
int position = 0;
instructions = new List<ILInstruction>();
while (position < il.Length)
{
ILInstruction instruction = new ILInstruction();
OpCode code = OpCodes.Nop;
ushort value = il[position++];
if (value != 0xfe)
{
code = Globals.singleByteOpCodes[(int)value];
}
else
{
value = il[position++];
code = Globals.multiByteOpCodes[(int)value];
value = (ushort)(value | 0xfe00);
}
instruction.Code = code;
instruction.Offset = position - 1;
int metadataToken = 0;
switch (code.OperandType)
{
case OperandType.InlineBrTarget:
metadataToken = ReadInt32(il, ref position);
metadataToken += position;
instruction.Operand = metadataToken;
break;
case OperandType.InlineField:
metadataToken = ReadInt32(il, ref position);
instruction.Operand = module.ResolveField(metadataToken);
break;
....
}
instructions.Add(instruction);
}
We see here the simple loop for parsing the IL. Well, it's not quite simple. It actually has 18 cases and I did not take into account all of the operators, only the most common ones. There are 240+ operators. The operators are loaded into two static lists at the start of the application:
public static OpCode[] multiByteOpCodes;
public static OpCode[] singleByteOpCodes;
public static void LoadOpCodes()
{
singleByteOpCodes = new OpCode[0x100];
multiByteOpCodes = new OpCode[0x100];
FieldInfo[] infoArray1 = typeof(OpCodes).GetFields();
for (int num1 = 0; num1 < infoArray1.Length; num1++)
{
FieldInfo info1 = infoArray1[num1];
if (info1.FieldType == typeof(OpCode))
{
OpCode code1 = (OpCode)info1.GetValue(null);
ushort num2 = (ushort)code1.Value;
if (num2 < 0x100)
{
singleByteOpCodes[(int)num2] = code1;
}
else
{
if ((num2 & 0xff00) != 0xfe00)
{
throw new Exception("Invalid OpCode.");
}
multiByteOpCodes[num2 & 0xff] = code1;
}
}
}
}
Upon constructing the object, we can use the object to either parse the list of instructions or get the string representation of them. That's it; have fun decompiling.
Future work
Well, what's left is to transform the MSIL into C# code.
History
9 May, 2006
Original version posted.
28 June, 2007
After a very long time, I managed to have a look at the issues signaled by the readers of my article. Here are the results:
- I added support for generics.
- Now
OperandType.InlineTok is also correctly processed.
- Various other small issues have been fixed.
Be sure to download the sources again from the links at the start of the project.
References
|
|
 |
 | Thanks! Oleg Zhukov | 16:33 19 Aug '09 |
|
 |
Thank you much Sorin, your code saved for me several hours of work
-- Oleg Zhukov
|
|
|
|
 |
 | Contact? Greg Ennis | 7:25 1 Oct '07 |
|
 |
Hello, how can I contact you Sorin? Your email address in your profile bounces. Thanks
|
|
|
|
 |
 | Bug Fix for Generic Fadrian Sudaman | 17:50 1 Sep '07 |
|
 |
First of all, thanks for the great works and making this available.
I'm trying to use MethodBodyReader to parse IL of method with generic parameters and return type and always get a BadImageFormatException. The previous fix may have resolved the issue with InlineType, but definitely not for InlineField and InlineMethod. You can try it out on methods from container classes in the System.Collection.Generic namespace to see the problem.
To fix the issue, I added the Generic Type and Argument parameters to ResolveField, ResolveMethod and ResolveMemember methods so that the Generic context is known for resolving the token. I have also change the member mi type from MethodInfo to MethodBase so that it can cater for constructor as well. We need to differentiate if the MethodBase is MethodInfo or ConstructorInfo. We test the type using "is" operator rather then just using the IsConstructor property as this does not cater for TypeConstructor.
case OperandType.InlineField: metadataToken = ReadInt32(il, ref position); if (mi is ConstructorInfo) { instruction.Operand = module.ResolveField(metadataToken, mi.DeclaringType.GetGenericArguments(), null); } else
{ instruction.Operand = module.ResolveField(metadataToken, mi.DeclaringType.GetGenericArguments(), mi.GetGenericArguments()); } break; case OperandType.InlineMethod: metadataToken = ReadInt32(il, ref position); try
{ if (mi is ConstructorInfo) { instruction.Operand = module.ResolveMethod(metadataToken, mi.DeclaringType.GetGenericArguments(), null); } else { instruction.Operand = module.ResolveMethod(metadataToken, mi.DeclaringType.GetGenericArguments(), mi.GetGenericArguments()); } } catch
{ if (mi is ConstructorInfo) { instruction.Operand = module.ResolveMember(metadataToken, mi.DeclaringType.GetGenericArguments(), null); } else
{ instruction.Operand = module.ResolveMember(metadataToken, mi.DeclaringType.GetGenericArguments(), mi.GetGenericArguments()); } } break;
Although it may be unnecessary, I dont see any harm of adding the Generic Type and Argument parameters to the ResolveType here as well for defense
case OperandType.InlineTok: metadataToken = ReadInt32(il, ref position); try
{ if (mi is ConstructorInfo) { instruction.Operand = module.ResolveType(metadataToken, mi.DeclaringType.GetGenericArguments(), null); } else
{ instruction.Operand = module.ResolveType(metadataToken, mi.DeclaringType.GetGenericArguments(), mi.GetGenericArguments()); } } catch
{ } break;
In addition your InlineType should probably cater for Constructor as well
case OperandType.InlineType: metadataToken = ReadInt32(il, ref position); if (this.mi is MethodInfo) { instruction.Operand = module.ResolveType(metadataToken, this.mi.DeclaringType.GetGenericArguments(), this.mi.GetGenericArguments()); } else if (mi is ConstructorInfo) { instruction.Operand = module.ResolveType(metadataToken, this.mi.DeclaringType.GetGenericArguments(), null); } else
{ instruction.Operand = module.ResolveType(metadataToken); } break;
Fadrian
|
|
|
|
 |
 | Question Marc Clifton | 4:36 1 Jul '07 |
|
 |
You wrote: but we might want to obtain similar results with our own code.
I can't imagine needing to do this. Why wouldn't I just use Reflector, which does generate C# code? What scenario would I do this as part of an application (other than a Reflector-like application)?
[edit]Ah, I just read Simon Franklin's post. OK, I guess that answers my questions [/edit]
Marc
|
|
|
|
 |
 | Great work Moim Hossain | 8:59 28 Jun '07 |
|
 |
I liked your article..its an excellent work. But for obfuscated codes...what will happen??
Moim Hossain Sr. Software Engineer Onirban Orion Technologies.
|
|
|
|
 |
 | Generic context Exception al011757 | 12:33 18 Feb '07 |
|
 |
Hi, I´m getting this exception when trying to read this MethodInfo: ((System.Reflection.RuntimeMethodInfo)(mi)).DeclaringType Name = "PairKey`2" FullName= "Microsoft.Practices.EnterpriseLibrary.Common.Configuration.ObjectBuilder.PairKey`2"
The exception is this:
A BadImageFormatException has been thrown while parsing the signature. This is likely due to lack of a generic context. Ensure genericTypeArguments and genericMethodArguments are provided and contain enough context.
at System.Reflection.Module.ResolveType(Int32 metadataToken, Type[] genericTypeArguments, Type[] genericMethodArguments) at System.Reflection.Module.ResolveType(Int32 metadataToken) at SDILReader.MethodBodyReader.ConstructInstructions(Module module) in C:\SDILReader\MethodBodyReader.cs:line 113 at SDILReader.MethodBodyReader..ctor(MethodInfo mi) in
Maybe you know what's happening.
Bests, Paulo
|
|
|
|
 |
|
 |
Hi Paulo,
I think I fixed this issue along with other things
Sorin
|
|
|
|
 |
 | Another suggestion Leif Wickland | 9:00 1 Feb '07 |
|
 |
In MethodBodyReader.ConstructInstructions() there's a comment in the InlineTok case about not knowing what to do. It seems that it is correct to replace that with a call to Module.ResolveMember(metadataToken).
|
|
|
|
 |
|
 |
Fixed, but Module.ResolveType was the sollution
Sorin
|
|
|
|
 |
 | Thanks, and a bug fix suggestion Leif Wickland | 13:57 31 Jan '07 |
|
 |
Thanks for making this available! It really helped me.
I suggest that you change ConstructInstructions to call the overloads of Module.Resolve*() which take type arrays for the generic arguments of the examined method and its type. Otherwise, the code will throw a BadImageFormatException when it attempts to resolve a generic metadata token. I can send you code for that change, if you like.
|
|
|
|
 |
 | Thanx Sorin Simon Franklin | 7:02 8 Jan '07 |
|
 |
Sorin you are the man!, this is more or less what I was looking for. I am currently working on an experimental project named Smf.Xml.IL or Xil pronounced (ZIL). Xil is a compiler for the .Net framework that takes in xml and outputs an application or class library. In building the compiler I have found the need to speed up the process of writing the xml code. So I have also desinged a de-comiler to read-in an assembly and output xml in the format excepted by the compiler. After discovering the MethodBody.GetILAsByteArray method I soon hit a brick wall when trying to parse the bytes returned. Sorin you are the man!. Within less than 5 minutes from downloading your code I have my de-compiler outputting IL OpCodes in the format expected by the Xil compiler. Although I have a long way to go to finish this project your code will speed up the process. The first goal in this project is to get Xil to compile it's self from a single xml file. And then use the output comiler to comile the same xml file. This would create a pure Xil application compiler. As i'm writting this I am already thinking that maybe I should include the decompiler in the same assembly as the compiler. If you wish to know more about Xil I will be more than glad to explain. In fact i'm thinking now that maybe I should make this an open source project. Let me know what you think. Once again thanx.
It just keeps on getting better.
|
|
|
|
 |
 | constrained. torq314 | 5:21 15 May '06 |
|
 |
your code looks great for a first look, however it does not handle "constrained." opcode properly. try to decompile:
public static void F<T>( T t ) { Console.WriteLine( t.ToString() ); }
and you get nothing, since the exception thrown by ResolveType is eaten.
I am investigating the issue. If you come with a fix, please update your code.
-- modified at 15:37 Monday 15th May, 2006
|
|
|
|
 |
|
 |
It works now. I had to pass also the list of generic parameters
Sorin
|
|
|
|
 |
 | Nice! jconwell | 9:01 9 May '06 |
|
 |
I've been writing some tools that needed the IL parsed out of an assembly, and have been using either the Mono.Cecil project or Microsoft.CCI.dll that comes with FxCop to get at the IL, but I'd much rather have a small lightweight library like this to do it.
I'm gona put this IL parsing technique into my tools and see how it works.
Thanks!
John Conwell
|
|
|
|
 |
 | Amazing NinjaCross | 7:51 9 May '06 |
|
 |
This is really exactly what I was searching for ! Tnx so much for you effort, these sources will be heavely usefull for my future development, so you have got my 5. Some very usefull improvements that you could do are: 1- Add a good management to that "catch" in "btnOpenAssembly_Click", cause sometimes the software says that the assembly is invalid (even if it is...) and no more information are given. I discovered this behaviour in assemblies with dependencies that can't be resolved by your implementation. 2- Add a display/management of the methods in a hierarchical way (with a treeview, for example) 3- Introduce editing/recompiling functionalities to the application, so everybody easily modify existing assemblies (think for example to a patching software)
(BTW, the image link in the article is broken)
-- NinjaCross www.ninjacross.com
-- modified at 12:57 Tuesday 9th May, 2006
|
|
|
|
 |
|
|
Last Updated 28 Jun 2007 |
Advertise |
Privacy |
Terms of Use |
Copyright ©
CodeProject, 1999-2010