|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Announcements
Chapters
Services
Feature Zones
|
Introduction.NET offers through its What is needed is a series of objects that represent the actual instructions from that IL code. That is what I want to provide. BackgroundAny programmer who has worked with reflection has heard of the awesome reflector written by Lutz Roeder. The reflector can decompile any .NET assembly and provide the user with the equivalent code for each programming element within the given assembly. You observed that I said "equivalent." This is mainly because the reflection mechanism cannot provide you with the original code. The compilation process removes any comments and unused variables first. Only the valid and necessary code is added to the compiled code. Thus, we cannot obtain the exact code. The reflector is a wonderful tool, but we might want to obtain similar results with our own code. How can we do that? Let us look first at the classic "hello world" example to see what we want to achieve and what is actually provided to us by the framework. This is the classic C# code: public void SayHello()
{
Console.Out.WriteLine("Hello world");
}
When we get the body of the 0,40,52,0,0,10,114,85,1,0,112,111,53,0,0,10,0,42
Well, that's not very readable. What we know is that this is IL code and we want to transform it so that we can process it. The easiest way is to transform it to MSIL (Microsoft Intermediate Language). This is what the MSIL code of the 0000 : nop
0001 : call System.IO.TextWriter System.Console::get_Out()
0006 : ldstr "Hello world"
0011 : callvirt instance System.Void System.IO.TextWriter::WriteLine()
0016 : nop
0017 : ret
Using the codeSDILReader is a library containing only three classes. In order to obtain the MSIL of the body of a method, one must simply create a MethodInfo mi = null;
// obtain somehow the method info of the method we want to dissasemble
// ussually you open the assembly, get the module, get the type and then the
// method from that type
//
...
// instanciate a method body reader
SDILReader.MethodBodyReader mr = new MethodBodyReader(mi);
// get the text representation of the msil
string msil = mr.GetBodyCode();
// or parse the list of instructions of the MSIL
for (int i=0; i<mr.instructions.Count;i++)
{
// do something with mr.instructions[i]
}
How's it workingWell, this is the right question. In order to get started, we first need to know the structure of the IL array that is given by the .NET reflection mechanism. IL code structureThe IL is in fact an enumeration of operations that must be executed. An operation is a pair: <operation code, operand>. The operation code is the the byte value of
ILInstructionThe MethodBodyReaderThe int position = 0;
instructions = new List<ILInstruction>();
while (position < il.Length)
{
ILInstruction instruction = new ILInstruction();
// get the operation code of the current instruction
OpCode code = OpCodes.Nop;
ushort value = il[position++];
if (value != 0xfe)
{
code = Globals.singleByteOpCodes[(int)value];
}
else
{
value = il[position++];
code = Globals.multiByteOpCodes[(int)value];
value = (ushort)(value | 0xfe00);
}
instruction.Code = code;
instruction.Offset = position - 1;
int metadataToken = 0;
// get the operand of the current operation
switch (code.OperandType)
{
case OperandType.InlineBrTarget:
metadataToken = ReadInt32(il, ref position);
metadataToken += position;
instruction.Operand = metadataToken;
break;
case OperandType.InlineField:
metadataToken = ReadInt32(il, ref position);
instruction.Operand = module.ResolveField(metadataToken);
break;
....
}
instructions.Add(instruction);
}
We see here the simple loop for parsing the IL. Well, it's not quite simple. It actually has 18 cases and I did not take into account all of the operators, only the most common ones. There are 240+ operators. The operators are loaded into two static lists at the start of the application: public static OpCode[] multiByteOpCodes;
public static OpCode[] singleByteOpCodes;
public static void LoadOpCodes()
{
singleByteOpCodes = new OpCode[0x100];
multiByteOpCodes = new OpCode[0x100];
FieldInfo[] infoArray1 = typeof(OpCodes).GetFields();
for (int num1 = 0; num1 < infoArray1.Length; num1++)
{
FieldInfo info1 = infoArray1[num1];
if (info1.FieldType == typeof(OpCode))
{
OpCode code1 = (OpCode)info1.GetValue(null);
ushort num2 = (ushort)code1.Value;
if (num2 < 0x100)
{
singleByteOpCodes[(int)num2] = code1;
}
else
{
if ((num2 & 0xff00) != 0xfe00)
{
throw new Exception("Invalid OpCode.");
}
multiByteOpCodes[num2 & 0xff] = code1;
}
}
}
}
Upon constructing the object, we can use the object to either parse the list of instructions or get the string representation of them. That's it; have fun decompiling. Future workWell, what's left is to transform the MSIL into C# code. History9 May, 2006Original version posted. 28 June, 2007After a very long time, I managed to have a look at the issues signaled by the readers of my article. Here are the results:
Be sure to download the sources again from the links at the start of the project. References
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||