Click here to Skip to main content
13,557,962 members
Click here to Skip to main content
Add your own
alternative version

Stats

2.6K views
77 downloads
8 bookmarked
Posted 15 May 2018
Licenced CPOL

Reflecting Within a Method Body

, 15 May 2018
Rate this:
Please Sign up or sign in to vote.
Extends .NET reflection to decode the byte array returned by System.Reflection.MethodBody.GetILByteArray(), discusses the techniques to achieve this, and provides a brief primer on .NET reflection.

Background

Within any .NET assembly you have a mix of meta-data and instructions. The meta-data describes the various entities: assemblies, modules, types, fields, methods, properties, etc. For example, the meta-data for a method provides its name, return type, parameters, and calling convention. The instructions are exactly that: instructions. They tell your assembly / application what to do.

While not perfect, .NET provides a feature rich framework for accessing meta-data (in the System.Reflection namespace). It also provides a fairly decent framework for creating new instructions (in the System.Reflection.Emit namespace). It even allows you to access existing instructions, via the System.Reflection.MethodBody.GetILAsByteArray() method.

Curiously, while creating new instructions is made easy, reflecting over existing instructions is made quite difficult. All that is provided for existing instructions is a byte array. The .NET framework provides no real support (reflection-wise) for decoding that byte array.

Introduction

I thought it might be a fun exercise to try to bridge that curious gap in coverage (between System.Reflection and System.Reflection.Emit). This article will describe a framework that can be used for that purpose. It will also describe some of the techniques used to create that framework.

This framework is not intended as a disassembler. There are already some excellent tools out there for that purpose. I have little interest in re-inventing that particular wheel.

The goals for this framework were as follows:

  1. Fit seamlessly into the pre-existing .NET reflection framework. Wherever possible, use pre-existing .NET types and methods.
  2. Provide a means of visualizing instructions. Due to limitations of the .NET framework, true fidelity is not easily achieved. That said, it is possible to get pretty darn close, using what .NET reflection does provide. That is the goal of this framework: close but not perfect.
  3. Provide a decent set of test cases. This should help if anyone wants to take this code any further.

For a more extensive, robust solution, one reader suggested Mono.Cecil.   While its not reflection-based, and I haven't personally used it, I do hear mostly good things.  A link is included in the Additional Reading section at the end of this article.

Using the Code

To use this framework, take advantage of the GetIL() extension method. This method extends the System.Reflection.MethodBase class so that it will return a list of instructions. An example of its usage (found Program.cs) is as follows:

// Get the instructions for the Main method in this Program
var instructions = typeof(Program)
  .GetMethod("Main")
  .GetIL();

// Display all of the instructions in the Main method
Console.WriteLine("********** Main (all instructions) **********");
foreach (Instruction instruction in instructions)
  Console.WriteLine(instruction);

The common interface for all instructions is IInstruction. It has the following members:

Member Description
IsTarget A value indicating if the instruction is the target of a branch or switch instruction.
Label A label for the instruction.
Offset The byte offset of the start of the instruction.
OpCode The operation code (opcode) for the instruction.
Parent The list of instructions containing the instruction.
GetOperand() The operand for the instruction.
GetValue() The resolved value of the operand for the instruction. For example, for method instructions, GetOperand() returns a meta-data token and GetValue() returns an instance of System.Reflection.MethodBase.
Resolve() INTERNAL ONLY: Where possible, resolves an operand into a more meaningful value.

If you're familiar with .NET reflection and the Common Intermediate Language (CIL), this should be all you need. If not, later in the article a brief primer on these topics is provided.

Decoding the Data

Most of the decoding is fairly straightforward.

Every instruction starts with an operation code (opcode). For example, if code is calling a method, you might expect a call opcode (System.Reflection.Emit.OpCodes.Call).

Within the data, this is currently represented as either an 8 bit or 16 bit code. If the first byte is 0xFE (System.Reflections.Emit.OpCodes.Prefix1.Value), its a 16 bit code (two bytes); otherwise, its an 8 bit code (one byte). At the time this article was first written, only 27 of the total of 226 opcodes require two bytes.

The two byte opcodes are as follows: arglist (FE 00), ceq (FE 01), cgt (FE 02), cgt.un (FE 03), clt (FE 04), clt.un (FE 05), ldftn (FE 06), ldvirtftn (FE 07), ldarg (FE 09), ldarga (FE 0A), starg (FE 0B), ldloc (FE 0C), ldloca (FE 0D), stloc (FE 0E), localloc (FE 0F), endfilter (FE 11), unaligned. (FE 12), volatile. (FE 13), tail. (FE 14), initobj (FE 15), constrained. (FE 16), cpblk (FE 17), initblk (FE 18), rethrow (FE 1A), sizeof (FE 1C), refanytype (FE 1D), readonly. (FE  1E).

For many instructions, with an operand type of OperandType.InlineNone, this is the entirety of data for the instruction. For other instructions, with other types of operands, the data for the operand immediately follows the opcode. The full set of operand types, described by the System.Reflection.Emit.OperandType enumeration, is as follows:

OperandType Count Description
InlineBrTarget 14 The operand is a 32-bit integer branch target.
InlineField 6 The operand is a 32-bit metadata token.
InlineI 1 The operand is a 32-bit integer.
InlineI8 1 The operand is a 64-bit integer.
InlineMethod 6 The operand is a 32-bit metadata token.
InlineNone 147 No operand.
InlinePhi 0 The operand is reserved and should not be used.
InlineR 1 The operand is a 64-bit IEEE floating point number.
InlineSig 1 The operand is a 32-bit metadata signature token.
InlineString 1 The operand is a 32-bit metadata string token.
InlineSwitch 1 The operand is the 32-bit integer argument to a switch instruction.
InlineTok 1 The operand is a 32-bit FieldRef, MethodRef, or TypeRef metadata token.
InlineType 17 The operand is a 32-bit metadata token.
InlineVar 6 The operand is 16-bit integer containing the ordinal of a local variable or an argument.
ShortInlineBrTarget 14 The operand is an 8-bit integer branch target.
ShortInlineI 2 The operand is an 8-bit integer.
ShortInlineR 1 The operand is a 32-bit IEEE floating point number.
ShortInlineVar 6 The operand is an 8-bit integer containing the ordinal of a local variable or an argument.

The InstructionList.TryCreate method creates IInstruction instances from the data in the byte array.

A singleton type AllOpCodes reflects over the fields in System.Reflection.Emit.OpCodes to build a table of information for all available operation codes (opcodes). This information includes the numeric value of the opcode and its operand type.

It is worth noting that almost all of the data is serialized as little endian values, where the least significant byte is found at the lowest offset. However, there are a couple of notable exceptions. Both opcodes and compressed integral values are stored in big endian format. The complexity of deserialization is handled by the extension methods in the Transeric.Reflection.ReadOnlyListExtensions class.

There is a clear intent by the architects of Intermediate Language serialization to favor compactness. This is where most of the deserialization complexity arises.

This intent is evident in simple examples, like providing a single byte ldloc.1 instruction, when the five byte ldloc instruction can provide the same functionality.

It is evident in slightly more complex examples, like metadata tokens. For example, instead of repeating the parameters for a method, for each call instruction, a four byte metadata token that references those parameters is provided.

This complexity reaches its highest point with the calli instruction. Here, a four byte metadata token (for the signature) is provided. This token in turn references a compressed representation of the method's signature. The focus on compactness elsewhere is laudable. However, I wonder about the trade-off of complexity versus compactness in this particular instance. More honestly, its a real chore to write a description of signature tokens :)

Metadata Tokens

A number of opcodes have an operand that is a "32-bit metadata token". This token is essentially a unique number that can be used to locate the metadata information. The high order 8 bits of the token indicate the type of token (field, method, type, etc.). The low order 24 bits provide a unique identity within that pool of tokens.

The Transeric.Reflection.Token type, included in the code accompanying this article, makes it easy to separate the parts of a metadata token.

By itself, a metadata token is not very useful. It is necessary to "resolve" the token into its associated metadata information. The System.Reflection.Module class provides the following methods to accomplish this goal: ResolveField, ResolveMember, ResolveMethod, ResolveSignature, ResolveString, and ResolveType. Consider the following example:

Offset Data Description
00 02 00 00 The unique identity is 2.
03 70 The token type indicates a string token (TokenType.String).

Taken as a whole, this data indicates a reference to the second string in the string metadata table. The System.Reflection.Module.ResolveString method is used to resolve this token as follows:

string text = module.ResolveString(metadataToken);

With entities that can take generic arguments, the situation is a bit more complex. Consider the following call to the DoSomething method:

public class MyType<T1>
{
  public static void MyMethod<T2>(T2 arg) =>
    DoSomething<T1, T2>();
}

To fully resolve the token for the DoSomething method, it is necessary to know both the type arguments for the enclosing type (MyType) and the type arguments for the method containing the instruction (MyMethod). Since the method containing the instruction is known, this information is easy to obtain.

To obtain the type arguments for the enclosing type (MyType), the following call is necessary:

Type[] typeArguments = parentMethod.DeclaringType.GetGenericArguments();

To obtain the type arguments for the enclosing method (MyMethod), the following call is necessary:

Type[] methodArguments = parentMethod.GetGenericArguments();

With this information, we can resolve the token for DoSomething into its corresponding method information as follows:

MethodBase method = parentMethod.Module(metadataToken, typeArguments, methodArguments);

InlineBrTarget

Type: BranchInstruction<int>

The operand is a 32-bit signed integer that specifies the byte offset from the end of the instruction. It is initially decoded by the ReadOnlyListExtensions.ReadInt32 method, which reads the four bytes containing this value.

Later, using Transeric.Reflection.MethodIL.ResolveInstruction, this offset is resolved into an instance of Transeric.Reflection.IInstruction. This is accomplished by conducting a binary search for the instruction that occurs at that offset. Consider the following example:

Offset Data Description
00 38 The opcode indicates a branch instruction (OpCodes.Br).
01 0F 00 00 00 The offset is 15 bytes (0F) from the end of the instruction. Here, the offset from the beginning of the data is 20: 5 (end of instruction) plus 15 (branch).
05   The end of the instruction.

Opcodes (14): br (38), brfalse (39), brtrue (3A), beq (3B), bge (3C), bgt (3D), ble (3E), blt (3F), bne.un (40), bge.un (41), bgt.un (42), ble.un (43), blt.un (44), leave (DD)

InlineField

Type: FieldInstruction

The operand is a 32-bit metadata token. It is initially decoded by the ReadOnlyListExtensions.ReadToken method, which reads the four bytes containing this value. Later, using System.Reflection.Module.ResolveField, the token is resolved into an instance of System.Reflection.FieldInfo. Consider the following example:

Offset Data Description
00 7B The opcode indicates a load field instruction (OpCodes.Ldfld).
01 02 00 00 The unique identity of the metadata token is 2.
04 04 The token type indicates a field definition (TokenType.FieldDef).

Opcodes (6): ldfld (7B), ldflda (7C), stfld (7D), ldsfld (7E), ldsflda (7F), stsfld (80)

InlineI

Type: Instruction<int>

The operand is a signed 32-bit integer. It is decoded by the ReadOnlyListExtensions.ReadInt32 method, which reads the four bytes containing this value. Because no metadata is referenced, no resolution is required. Consider the following example:

Offset Data Description
00 20 The opcode indicates a load 32-bit constant instruction (OpCodes.Ldc_I4).
01 00 01 00 00 The value 256 is loaded.

Opcode (1): ldc.i4 (20)

InlineI8

Type: Instruction<long>

The operand is a signed 64-bit integer. It is decoded by the ReadOnlyListExtensions.ReadInt64 method, which reads the eight bytes containing this value. Because no metadata is referenced, no resolution is required. Consider the following example:

Offset Data Description
00 21 The opcode indicates a load 64-bit constant instruction (OpCodes.Ldc_I8).
01 00 01 00 00 00 00 00 00 The value 256 is loaded.

Opcode (1): ldc.i8 (21)

InlineMethod

Type: MethodInstruction

The operand is a 32-bit metadata token. It is initially decoded by the ReadOnlyListExtensions.ReadToken method, which reads the four bytes containing this value. Later, using System.Reflection.Module.ResolveMethod, the token is resolved into an instance of System.Reflection.MethodBase.

Offset Data Description
00 28 The opcode indicates a call instruction (OpCodes.Call).
01 02 00 00 The unique identity of the metadata token is 2.
04 06 The token type indicates a method definition (TokenType.MethodDef).

Opcodes (6): jmp (27), call (28), callvirt (6F), newobj (73), ldftn (FE 06), ldvirtftn (FE 07)

InlineNone

Types: Instruction, ParameterInstruction<byte>, or VariableInstruction<byte>

Since there is no operand, this type is the simplest to decode. It also has the largest number of opcodes (147) associated with it. Consider the following example:

Offset Data Description
00 2A The opcode indicates a return instruction (OpCodes.Ret).

While none of these opcodes have an operand, many do have an implied operand. To simplify reflection, in the cases where the instruction is associated with a parameter or local variable, the code will create an instance of ParameterInstruction<byte> or VariableInstruction<byte>, behaving as if the implied operand were present.

For example, the ldloc.1 instruction (below) implies an operand of "1". For this reason, the code will create an instance of VariableInstruction<byte>, so that the operand's value will resolve into an instance of System.Reflection.LocalVariableInfo.

Offset Data Description
00 07 The opcode indicates a load local variable instruction (OpCodes.Ldloc_1).

Similarly, the ldarg.1 instruction (below), also implies an operand of "1". For this reason, the code will create an instance of ParameterInstruction<byte>, so that the operand's value will resolve into an instance of System.Reflection.ParameterInfo.

Offset Data Description
00 03 The opcode indicates a load local argument instruction (OpCodes.Ldarg_1).

For information on how local variables and parameters are resolved, see the description of the operand type ShortInlineVar.

Opcodes (147): nop (00), break (01), ldarg.0 (02), ldarg.1 (03), ldarg.2 (04), ldarg.3 (05), ldloc.0 (06), ldloc.1 (07), ldloc.2 (08), ldloc.3 (09), stloc.0 (0A), stloc.1 (0B), stloc.2 (0C), stloc.3 (0D), ldnull (14), ldc.i4.m1 (15), ldc.i4.0 (16), ldc.i4.1 (17), ldc.i4.2 (18), ldc.i4.3 (19), ldc.i4.4 (1A), ldc.i4.5 (1B), ldc.i4.6 (1C), ldc.i4.7 (1D), ldc.i4.8 (1E), dup (25), pop (26), ret (2A), ldind.i1 (46), ldind.u1 (47), ldind.i2 (48), ldind.u2 (49), ldind.i4 (4A), ldind.u4 (4B), ldind.i8 (4C), ldind.i (4D), ldind.r4 (4E), ldind.r8 (4F), ldind.ref (50), stind.ref (51), stind.i1 (52), stind.i2 (53), stind.i4 (54), stind.i8 (55), stind.r4 (56), stind.r8 (57), add (58), sub (59), mul (5A), div (5B), div.un (5C), rem (5D), rem.un (5E), and (5F), or (60), xor (61), shl (62), shr (63), shr.un (64), neg (65), not (66), conv.i1 (67), conv.i2 (68), conv.i4 (69), conv.i8 (6A), conv.r4 (6B), conv.r8 (6C), conv.u4 (6D), conv.u8 (6E), conv.r.un (76), throw (7A), conv.ovf.i1.un (82), conv.ovf.i2.un (83), conv.ovf.i4.un (84), conv.ovf.i8.un (85), conv.ovf.u1.un (86), conv.ovf.u2.un (87), conv.ovf.u4.un (88), conv.ovf.u8.un (89), conv.ovf.i.un (8A), conv.ovf.u.un (8B), ldlen (8E), ldelem.i1 (90), ldelem.u1 (91), ldelem.i2 (92), ldelem.u2 (93), ldelem.i4 (94), ldelem.u4 (95), ldelem.i8 (96), ldelem.i (97), ldelem.r4 (98), ldelem.r8 (99), ldelem.ref (9A), stelem.i (9B), stelem.i1 (9C), stelem.i2 (9D), stelem.i4 (9E), stelem.i8 (9F), stelem.r4 (A0), stelem.r8 (A1), stelem.ref (A2), conv.ovf.i1 (B3), conv.ovf.u1 (B4), conv.ovf.i2 (B5), conv.ovf.u2 (B6), conv.ovf.i4 (B7), conv.ovf.u4 (B8), conv.ovf.i8 (B9), conv.ovf.u8 (BA), ckfinite (C3), conv.u2 (D1), conv.u1 (D2), conv.i (D3), conv.ovf.i (D4), conv.ovf.u (D5), add.ovf (D6), add.ovf.un (D7), mul.ovf (D8), mul.ovf.un (D9), sub.ovf (DA), sub.ovf.un (DB), endfinally (DC), stind.i (DF), conv.u (E0), prefix7 (F8), prefix6 (F9), prefix5 (FA), prefix4 (FB), prefix3 (FC), prefix2 (FD), prefix1 (FE), prefixref (FF), arglist (FE 00), ceq (FE 01), cgt (FE 02), cgt.un (FE 03), clt (FE 04), clt.un (FE 05), localloc (FE 0F), endfilter (FE 11), volatile. (FE 13), tail. (FE 14), cpblk (FE 17), initblk (FE 18), rethrow (FE 1A), refanytype (FE 1D), readonly. (FE 1E)

InlinePhi

Type: None

None of the opcodes in System.Reflection.Emit.OpCodes reference this operand type. According to the documentation for OperandType.InlinePhi: "The operand is reserved and should not be used".

InlineR

Type: Instruction<double>

The operand is a 64-bit IEEE floating point number. It is decoded by the ReadOnlyListExtensions.ReadDouble method, which reads the eight bytes containing this value. Because no metadata is referenced, no resolution is required. Consider the following example:

Offset Data Description
00 23 The opcode indicates a load 64-bit constant instruction (OpCodes.Ldc_R8).
01 00 00 00 00 00 00 F0 3F The 64-bit floating point number 1.0 is loaded.

Opcode (1): ldc.r8 (23)

InlineSig

Type: SignatureInstruction

The operand is a 32-bit metadata token. It is initially decoded by the ReadOnlyListExtensions.ReadToken method, which reads the four bytes containing this value. Later, using System.Reflection.Module.ResolveSignature, the token is resolved into a byte array containing the signature's data. Since .NET does not provide much help decoding this byte array, it is further resolved by the Transeric.Reflection.MethodSignature class.

Offset Data Description
00 29 The opcode indicates an indirect call instruction (OpCodes.Calli).
01 02 00 00 The unique identity of the metadata token is 2.
04 11 The token type indicates a signature (TokenType.Signature).

Bear with me, this one is difficult to explain. It took me a very long time to figure it all out.

Here, System.Reflection.Module.ResolveSignature simply returns a byte array that is a compressed representation of the target method's signature. We're largely on our own when we try to decode this byte array into something meaningful.

At a high level, the method signature is simple. It provides: a calling convention, a parameter count, a return type, and a sequence of zero or more parameter types.

The calling convention is simple to decode. Its always a single byte, so no worries about compression. The possible values are described in the Transeric.Reflection.CilCallingConvention enumeration.

The framework described by this article further simplifies interaction by converting this value into the .NET standard System.Runtime.InteropServices.CallingConvention and System.Reflection.CallingConventions enumerations.

For other parts of the method signature, all integral values, we need to worry about compression. To maximize compactness the architects of IL serialization devised a fairly simple compression scheme. Integral values can be stored in one, two, or four bytes. The bytes are serialized in big endian order. The high order bits of the first byte describe the length of the value. The remaining bits provide data. There are three possible forms of the value, which are as follows:

Bit Pattern Description
0XXXXXXX The first bit is clear indicating a single byte value. The remaining bits contain the data for that value.
10XXXXXX XXXXXXXX The first two bits indicate that this is a two byte value. The remaining bits contain the data for that value.
11XXXXXX XXXXXXXX XXXXXXXX XXXXXXXX The first two bits indicate that this is a four byte value. The remaining bits contain the data for that value.

Using the above scheme, we first decode the parameter count. This is simply the number of parameter types that are provided with the method signature.

Next we decode the return type and each of the parameter types. The process for both is identical and (regrettably) complex.

Types come in two broad flavors: simple and complex.

To decode a simple type, we first read a byte representing the type. This byte is interpreted using the enumeration Transeric.Reflection.ElementType. The enumeration recognizes the following common/simple types.

ElementType Value Description
Void 01 A "void" type (System.Void).
Boolean 02 A Boolean type (System.Boolean).
Char 03 A character type (System.Char).
SByte 04 A signed 8-bit integer type (System.SByte).
Byte 05 An unsigned 8-bit integer type (System.Byte).
Int16 06 A signed 16-bit integer type (System.Int16).
UInt16 07 An unsigned 16-bit integer type (System.UInt16).
Int32 08 A signed 32-bit integer type (System.Int32).
UInt32 09 An unsigned 32-bit integer type (System.UInt32).
Int64 0A A signed 64-bit integer type (System.Int64).
UInt64 0B An unsigned 64-bit integer type (System.UInt64).
Single 0C A 32-bit IEEE floating point number type (System.Single).
Double 0D A 64-bit IEEE floating point number type (System.Double).
String 0E A character string type (System.String).
TypedReference 16 A typed reference type (System.TypedReference).
IntPtr 18 A platform-specific signed integral type that is used to represent a pointer or a handle (System.IntPtr).
UIntPtr 19 A platform-specific unsigned integral type that is used to represent a pointer or a handle (System.UIntPtr).
Object 1C An object type that can be used to pass any type (System.Object).

For these simple types, this single byte is all that is necessary to serialize the type.

For complex types (where the byte value is ElementType.Class or ElementType.ValueType) we need to do more work. In these cases, an additional value is provided: an encoded metadata token for the type. We begin by de-compressing an integral value.

Regrettably, the work doesn't end there. The integral value is further encoded. The lowest two order bits indicate the type of token and are interpreted as follows:

Bits Description
00 A type definition metadata token (TokenType.TypeDef).
01 A type reference metadata token (TokenType.TypeRef).
10 A type specification metadata token (TokenType.TypeSpec).
11 This is not an expected, valid code.

The remaining bits (when shifted down) provide the unique identity of the token. After we have decoded the metadata token, we can then resolve it into a System.Type, by using the System.Reflection.Module.ResolveType method. Let's consider a not-so-simple example:

Offset Data Description
00 00 Indicates a standard call (CilCallingConvention.Standard).
01 02 There are two parameters / parameter types.
02 08 The return type is a signed 32-bit integer (ElementType.Int32).
03 0E The first parameter is a character string (ElementType.String).
04 12 The second parameter is class (ElementType.Class).
05 08 The encoded metadata token for the class is 08.

To decode the token, in the above example, we consider the bits in encoded token (00001000). The two low order bits (00) indicate that the token is a type definition (TokenType.TypeDef). The remaining bits (000010) provide the unique identity of that type definition (2). The decoded metadata token is as follows:

Offset Data Description
00 01 00 00 The unique identity of the metadata token is 1.
03 02 The token type indicates a type definition (TokenType.TypeDef).

We can then use the System.Reflection.Module.ResolveType method to resolve this metadata token into a System.Type.

One last bit of additional complexity comes into play. It is possible to indicate that some of the parameters are optional. This is accomplished by placing a value of ElementType.Sentinel before the first optional parameter. So, modifying the previous example, we make the second parameter optional as follows:

Offset Data Description
00 00 Indicates a standard call (CilCallingConvention.Standard).
01 02 There are two parameters / parameter types.
02 08 The return type is a signed 32-bit integer (ElementType.Int32).
03 0E The first parameter is a character string (ElementType.String).
04 41 The sentinel value (ElementType.Sentinel) indicates that all subsequent parameters are optional.
05 12 The second parameter is class (ElementType.Class).
06 08 The encoded metadata token for the class is 08.

Opcode (1): calli (29)

InlineString

Type: StringInstruction

The operand is a 32-bit metadata token. It is initially decoded by the ReadOnlyListExtensions.ReadToken method, which reads the four bytes containing this value. Later, using System.Reflection.Module.ResolveString, the token is resolved into an instance of System.String. Consider the following example:

Offset Data Description
00 72 The opcode indicates a load string instruction (OpCodes.Ldstr).
01 02 00 00 The unique identity of the metadata token is 2.
04 70 The token type indicates a string (TokenType.String).

Opcode (1): ldstr (72)

InlineSwitch

Type: SwitchInstruction

This is one of the few cases where the operand is of variable size and consists of multiple parts.

The first part of the operand is a signed 32-bit integer, which indicates the number of branches associated with this switch instruction. It is decoded by the ReadOnlyListExtensions.ReadInt32 method, which reads the four bytes containing the value.

After this, one or more branch offsets are provided. Each branch offset is a signed 32-bit integer that provides the byte offset from the end of the instruction. Each of these is initially decoded by the ReadOnlyListExtensions.ReadInt32 method, which reads the four bytes containing the value.

Later, using System.Reflection.Module.ResolveInstruction, each offset is resolved into an instance of Transeric.Reflection.IInstruction. This is accomplished by conducting a binary search for the instruction that occurs at that offset. Consider the following example:

Offset Data Description
00 45 The opcode indicates a switch instruction (OpCodes.Switch).
01 02 00 00 00 There are 2 branch offsets for this switch instruction.
05 0E 00 00 00 The first branch offset is 14 bytes (0E) from the end of the instruction. Here, the offset from the beginning of the data is 27: 13 (end of instruction) plus 14 (branch).
09 0F 00 00 00 The second branch offset is 15 bytes (0F) from the end of the instruction. Here, the offset from the beginning of the data is 28: 13 (end of instruction) plus 15 (branch).
0D   End of instruction (13 bytes from the beginning of the data).

Opcode (1): switch (45)

InlineTok

Types: FieldInstruction, MemberInstruction, MethodInstruction, SignatureInstruction, StringInstruction, TypeInstruction

The operand is a 32-bit metadata token. It is initially decoded by the ReadOnlyListExtensions.ReadToken method, which reads the four bytes containing this value. Depending on the metadata token type, an instance of one of the following types will be created: FieldInstruction, MemberInstruction, MethodInstruction, SignatureInstruction, StringInstruction, or TypeInstruction. The later resolution of the token into its corresponding metadata information is dependent on that type. Consider the following example:

Offset Data Description
00 D0 The opcode indicates a load token instruction (OpCodes.Ldtoken).
01 02 00 00 The unique identity of the metadata token is 2.
04 04 The token type indicates a field definition (TokenType.FieldDef).

Since the token type is TokenType.FieldDef, an instance of FieldInstruction is created.

Opcode (1): ldtoken (D0)

InlineType

Type: TypeInstruction

The operand is a 32-bit metadata token. It is initially decoded by the ReadOnlyListExtensions.ReadToken method, which reads the four bytes containing this value. Later, using System.Reflection.Module.ResolveType, the token is resolved into an instance of System.Type. Consider the following example:

Offset Data Description
00 8C The opcode indicates a box instruction (OpCodes.Box).
01 02 00 00 The unique identity of the metadata token is 2.
04 02 The token type indicates a type definition (TokenType.TypeDef).

Opcodes (17): cpobj (70), ldobj (71), castclass (74), isinst (75), unbox (79), stobj (81), box (8C), newarr (8D), ldelema (8F), ldelem (A3), stelem (A4), unbox.any (A5), refanyval (C2), mkrefany (C6), initobj (FE 15), constrained. (FE 16), sizeof (FE 1C)

InlineVar

Types: ParameterInstruction<ushort> or VariableInstruction<ushort>

The operand is a unsigned 16-bit integer. It is initially decoded by the ReadOnlyExtensions.ReadUInt16 method, which reads the two bytes containing this value. Depending on the instruction, an instance of either ParameterInstruction<ushort> or VariableInstruction<ushort> will be created.

Note: It would be nice if System.OperandType defined separate operand types (e.g. InlineArg and InlineVar) for these two distinct operand types. Regrettably, it does not.

ParameterInstruction<ushort>

A ParameterInstruction<ushort> instance is created for the instructions ldarg, ldarga, and starg. Later the operand is resolved via Transeric.Reflection.MethodIL.ResolveParameter into an instance of ParameterInfo. It accomplishes this by using the System.Reflection.GetParameters method. Since GetParameters does not return the this argument, the index value is interpreted according to the containing method's calling convention (notably CallingConventions.HasThis). Consider the following example:

Offset Data Description
00 FE 09 The opcode indicates a load argument instruction (OpCodes.Ldarg).
02 01 00 00 00 The zero-based index value (1) indicates the second parameter.

VariableInstruction<ushort>

A VariableInstruction<ushort> instance is created for the instructions ldloc, ldloca, and stloc. Later the operand is resolved via Transeric.Reflection.MethodIL.ResolveVariable into an instance of LocalVariableInfo. It accomplishes this by using the MethodBody.LocalVariables property. Consider the following example:

Offset Data Description
00 FE 0C The opcode indicates a load local variable instruction (OpCodes.Ldloc).
02 01 00 00 00 The zero-based index value (1) indicates the second local variable.

Opcodes (6): ldarg (FE 09), ldarga (FE 0A), starg (FE 0B), ldloc (FE 0C), ldloca (FE 0D), stloc (FE 0E)

ShortInlineBrTarget

Type: BranchInstruction<sbyte>

The operand is an 8-bit signed integer that specifies the byte offset from the end of the instruction. It is initially decoded by the ReadOnlyListExtensions.ReadSByte method, which reads the byte containing this value.

Later, using Transeric.Reflection.MethodIL.ResolveInstruction, this offset is resolved into an instance of Transeric.Reflection.IInstruction. This is accomplished by conducting a binary search for the instruction that occurs at that offset. Consider the following example:

Offset Data Description
00 2B The opcode indicates a branch instruction (OpCodes.Br_S).
01 0F The offset is 15 bytes (0F) from the end of the instruction. Here, the offset from the beginning of the data is 17: 2 (end of instruction) plus 15 (branch).
02   The end of the instruction.

Opcodes (14): br.s (2B), brfalse.s (2C), brtrue.s (2D), beq.s (2E), bge.s (2F), bgt.s (30), ble.s (31), blt.s (32), bne.un.s (33), bge.un.s (34), bgt.un.s (35), ble.un.s (36), blt.un.s (37), leave.s (DE)

ShortInlineI

Type: Instruction<byte> or Instruction<sbyte>

The operand is an 8-bit integer. Depending on the instruction, it is decoded by either ReadOnlyListExtensions.ReadByte (instruction unaligned.) or ReadOnlyListExtensions.ReadSByte (instruction ldc.i4.s), which reads the byte containing this value. Because no metadata is referenced, no resolution is required. Consider the following example:

Offset Data Description
00 1F The opcode indicates a load 8-bit constant instruction (OpCodes.Ldc_I4_S).
01 FF The value -1 is loaded.

Opcodes (2): ldc.i4.s (1F), unaligned. (FE 12)

ShortInlineR

Types: Instruction<float>

The operand is an 32-bit IEEE floating point number. It is decoded by the ReadOnlyListExtensions.ReadSingle method, which reads the four bytes containing this value. Because no metadata is referenced, no resolution is required. Consider the following example:

Offset Data Description
00 22 The opcode indicates a load 32-bit constant instruction (OpCodes.Ldc_R4).
01 00 00 80 3F The 32 bit IEEE floating point number 1.0 is loaded.

Opcodes (1): ldc.r4 (22)

ShortInlineVar

Types: ParameterInstruction<byte> or VariableInstruction<byte>

The operand is a unsigned 8-bit integer. It is initially decoded by the ReadOnlyExtensions.ReadByte method, which reads the byte containing this value. Depending on the instruction, an instance of either ParameterInstruction<byte> or VariableInstruction<byte> is created.

Note: It would be nice if System.OperandType defined separate operand types (e.g. ShortInlineArg and ShortInlineVar) for these two distinct operand types. Regrettably, it does not.

ParameterInstruction<byte>

A ParameterInstruction<byte> instance is created for the instructions ldarg.s, ldarga.s, and starg.s. Later the operand is resolved via Transeric.Reflection.MethodIL.ResolveParameter into an instance of ParameterInfo. It accomplishes this by using the System.Reflection.GetParameters method. Since GetParameters does not return the this argument, the index value is interpreted according to the containing method's calling convention (notably CallingConventions.HasThis). Consider the following example:

Offset Data Description
00 0E The opcode indicates a load argument instruction (OpCodes.Ldarg_S).
01 01 00 00 00 The zero-based index value (1) indicates the second parameter.

VariableInstruction<byte>

A VariableInstruction<byte> instance is created for the instructions ldloc.s, ldloca.s, and stloc.s. Later the operand is resolved via Transeric.Reflection.MethodIL.ResolveVariable into an instance of LocalVariableInfo. It accomplishes this by using the MethodBody.LocalVariables property. Consider the following example:

Offset Data Description
01 11 The opcode indicates a load local variable instruction (OpCodes.Ldloc_S).
02 01 00 00 00 The zero-based index value (1) indicates the second local variable.

Opcodes (6): ldarg.s (0E), ldarga.s (0F), starg.s (10), ldloc.s (11), ldloca.s (12), stloc.s (13)

Introduction to .NET Reflection

Its a bit unlikely a reader interested in this topic will also be a beginner with reflection. If so, there are undoubtedly far better articles on this topic. That said, I would feel badly, if I didn't at least provide a brief introduction.

As mentioned earlier in the topic, .NET provides a comparatively feature rich framework for examining the metadata associated with an assembly / application. The easiest way to explore this framework is probably to step through the debugger in the Program.ReflectionPrimer method provided with the source code for this article. There, I've provided examples of a lot of common use cases.

Below we cover some of the major types in the framework:

Type Description
Assembly Each application consists of one or more assemblies. From a Visual Studio perspective, building a Project basically results in the creation of an Assembly (.exe or .dll). Some common elements of interest provided by the Assembly include: name, version, product, title, copyright information, file location, modules, and other referenced assemblies.
Module Each assembly consists of one or more modules. In most cases, there is only a single module. Some common elements of interest provided by the Module include: name, types, and a means of resolving metadata tokens.
Type Each module consists of one or more types. Every time you create a class or a struct, you create a Type. There are also a large number of system types (e.g. System.Int32 and System.String). Some common elements of interest provided by the Type include: name, base type, fields, methods, and properties.
MethodInfo Each Type may include methods. When you create a function/method, there is corresponding metadata (MethodInfo) for that method. Some common elements of interest provided by MethodInfo include: name, return type, parameters, calling convention, declaring type, and method body. In this article we extend MethodBase (from which MethodInfo is derived) to also provide the Intermediate Language instructions in the method.
PropertyInfo Each Type may also include properties. When you create a property, there is corresponding metadata (PropertyInfo) for that property. Some common elements of interest provided by PropertyInfo include: name, type, get method, and set method.
FieldInfo Each Type may also include fields. When you create a field, there is corresponding metadata (FieldInfo) for that field. Some common elements of interest provided by FieldInfo include: name and type.

Introduction to Common Intermediate Language (CIL)

The topic of Common Intermediate Language (CIL) is far too broad to cover in this article. Also, I claim no expertise on the topic. IL simply resembles an assembly language. Because I am a bit of a dinosaur (dating back to a time when understanding assembly language was a critical skill), I understand just enough IL to read it with some proficiency. Its a bit of an inate skill for me.

I assume there are entire books on this topic. Regrettably, I haven't read one and can't in good consience personally recommend one. That said, I've noticed people in forums recommending "Expert .NET 2.0 IL Assembler" by Serge Lidin. However, I worry that, since we've progressed to .NET 4.7.1, that recommendation might be a bit dated. The same author seems to have more recent books.

Before we start describing some IL, for comparison, let's consider a simple HelloWorld program in C#:

using System;

namespace HelloWorld
{
  public class Program
  {
    public static void Main(string[] args) =>
      Console.WriteLine("Hello World!");
  }
}

The corresponding instructions, contained in the Main method, would be as follows:

IL_0000: ldstr "Hello World!"
IL_0005: call void [mscorlib]System.Console::WriteLine(string)
IL_000a: nop
IL_000b: ret

Let's consider the first line:

IL_0000: ldstr "Hello World!"

This instruction simply pushes a string onto the stack. Among other things, the stack is used to pass arguments to methods. Stacks are rather important things in Intermediate Language (and most assembly languages). Most instructions modify the stack in some fashion.

The different parts of this instruction are as follows:

Part Description
IL_0000 This is simply a label for the location of the instruction within the method. It doesn't actually contribute to the byte array that stores the instructions. The IL part stands for Intermediate Language. The 0000 is the byte offset from the start of the method. While the label can be arbitrarily chosen (within reason), most of the disassemblers seem to prefer this naming convention.
: This indicates that the bit before ":" is a label.
ldstr This is the opcode for this instruction (OpCodes.Ldstr).
"Hello World!" This is simply the string literal that is passed to the method.

Moving onto the second line:

IL_0005: call void [mscorlib]System.Console::WriteLine(string)

This instruction simply calls the specified method. It is assumed the arguments were previously pushed onto the stack. Notable portions of this instruction include the following:

Part Description
call This is the opcode for the instruction (OpCodes.Call).
void This is the return type for the method that is called. In this case, nothing is returned.
[mscorlib] The name of the assembly (mscorlib) that contains the method.
System. The namespace (System) that contains the method.
Console:: The name of the class (Console) that contains the method.
WriteLine The name of the method.
(string) The types of the parameters for the method. In this case, there is a single parameter of type string (System.String).

Moving onto the third line:

IL_000a: nop

This instruction does nothing ("no operation").

Moving onto the fourth and final line:

IL_000b: ret

This instruction simply returns from the current method.

To actually build the program (with ILASM) and run it we would need a bit of extra metadata. The following minimal program can be built and will run:

.assembly HelloWorld
{
}

.method static void Main()
{
  .entrypoint
  .maxstack 1
  ldstr "Hello World!"
  call void [mscorlib]System.Console::WriteLine(string)
  ret
}

Note: If we truly disassembled the example C# program it would include a whole bunch more metadata. This was omitted for simplicity.

Additional Reading

Below are a collection of links to Microsoft reference materials covering some of the concepts covered in this article:

Metadata and Self-Describing Components
https://docs.microsoft.com/en-us/dotnet/standard/metadata-and-self-describing-components

ECMA C# and Common Language Infrastructure Standards
https://www.visualstudio.com/license-terms/ecma-c-common-language-infrastructure-standards/

ECMA Common Language Infrastructure (CLI) Partitions I to VI
http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-335.pdf

Common Intermediate Language
https://en.wikipedia.org/wiki/Common_Intermediate_Language

Mono.Cecil
http://www.mono-project.com/docs/tools+libraries/libraries/Mono.Cecil/

History

  • 5/15/2018 - The original version was uploaded
  • 5/20/2018 - Added a reference to Mono.Cecil and a couple more useful links in Additional Reading.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Eric Lynch
Software Developer (Senior)
United States United States
No Biography provided

You may also be interested in...

Pro

Comments and Discussions

 
GeneralMy vote of 5 Pin
ipavlu16-May-18 3:40
professionalipavlu16-May-18 3:40 
GeneralRe: My vote of 5 Pin
Eric Lynch16-May-18 4:13
memberEric Lynch16-May-18 4:13 
QuestionThere is great tool for this already Pin
Sacha Barber15-May-18 20:31
mvpSacha Barber15-May-18 20:31 
GeneralRe: There is great tool for this already Pin
wmjordan15-May-18 20:45
professionalwmjordan15-May-18 20:45 
GeneralRe: There is great tool for this already Pin
Sacha Barber15-May-18 21:42
mvpSacha Barber15-May-18 21:42 
AnswerRe: There is great tool for this already Pin
Eric Lynch16-May-18 1:38
memberEric Lynch16-May-18 1:38 
GeneralRe: There is great tool for this already Pin
Sacha Barber16-May-18 3:26
mvpSacha Barber16-May-18 3:26 
GeneralRe: There is great tool for this already Pin
Ken Domino16-May-18 7:01
professionalKen Domino16-May-18 7:01 
GeneralRe: There is great tool for this already Pin
Eric Lynch16-May-18 7:41
memberEric Lynch16-May-18 7:41 
GeneralRe: There is great tool for this already Pin
Ken Domino16-May-18 9:16
professionalKen Domino16-May-18 9:16 
GeneralRe: There is great tool for this already Pin
Eric Lynch16-May-18 10:06
memberEric Lynch16-May-18 10:06 
GeneralRe: There is great tool for this already Pin
Ken Domino16-May-18 11:21
professionalKen Domino16-May-18 11:21 
GeneralRe: There is great tool for this already Pin
Eric Lynch16-May-18 11:28
memberEric Lynch16-May-18 11:28 
GeneralRe: There is great tool for this already Pin
Ken Domino16-May-18 16:46
professionalKen Domino16-May-18 16:46 
GeneralRe: There is great tool for this already Pin
Eric Lynch16-May-18 17:48
memberEric Lynch16-May-18 17:48 
GeneralMy vote of 5 Pin
wmjordan15-May-18 14:52
professionalwmjordan15-May-18 14:52 
GeneralRe: My vote of 5 Pin
Eric Lynch15-May-18 18:24
memberEric Lynch15-May-18 18:24 
GeneralRe: My vote of 5 Pin
ipavlu16-May-18 3:44
professionalipavlu16-May-18 3:44 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.180515.1 | Last Updated 15 May 2018
Article Copyright 2018 by Eric Lynch
Everything else Copyright © CodeProject, 1999-2018
Layout: fixed | fluid