Click here to Skip to main content
Click here to Skip to main content

.NET file format - Signatures under the hood, Part 2 of 2

, 28 Sep 2009 CPOL
Rate this:
Please Sign up or sign in to vote.
A full description of signatures, that are part of the .NET file format.

Contents


1. Signatures (continuation)

Continuation of the first part goes here.


1.1 LocalVarSig

The LocalVarSig signature is also indexed by the StandAloneSig.Signature column, it stores the type of all the local variables allocated during the running of a method. The LOCAL_SIG element is signature's prolog and has constant value 0x07, the Count element is an unsigned integer (of course compressed !) that stores the number of local variable that associated method has, the BYREF element is abbrevation of ELEMENT_TYPE_BYREF constant (see constants in the first part) and indicates that Type element points to actual variable. There is also one more element that is worth mentioning, it is the Constraint element, it indicates that target type will not be moved by the Garbage Collector when performing memory reclaiming, because local variables are located on the stack (where GC does not perform any actions), the Type of the variable shall be either, a reference type (like System.Object - allocated on the heap) or value type (like System.Decimal - allocated on the stack), but when target type (pinned) is value type, its definition should include the BYREF element, in this case reference to variable is hold on the stack, but variable itself is allocated in the heap. You can see more on pinning here. On the Picture 1 below you can see full syntax diagram for this signature.

I would like to bring your special attention to TYPEDBYREF element on the below diagram, this is the typed reference, it contains not only a managed pointer (like normal reference) to a location but also a runtime representation of data. I quote description of it from the specification "The typed reference local variable signature states that the local will contain both a managed pointer to a location and a runtime representation of the type that can be stored at that location. A typed reference signature is similar to a byref constraint, but while the byref specifies the type as part of the byref constraint (and hence statically as part of the type description), a typed reference provides the type information dynamically. A typed reference is a full signature in itself and cannot be combined with other constraints. In particular, it is not possible to specify a byref whose type is typed reference." The typed reference is also very helpful when byref passing of unboxed data (i.e. data that is stored on the stack, those are always value types) to methods that are not statically restricted to the type they accept and require in addition to passing managed pointer to a location, also static type of a location, the typed reference meets these needs. Notice also that typed reference parameter can refer to a location that is on the stack, and that location will have a liftime limited by a time of running a method (within the typed reference is allocated), thus the CIL compiler applies appropriate checks on the lifetime of byref and typed reference parameter, see more in §12.4.1.5.2 in ECMA-355 specification. The typed reference is represented in the .NET's BCL (Base Class Library) as TypedReference structure.

The LocalVarSig signature syntax diagram
Picture 1, The LocalVarSig signature syntax diagram

Example 1
This example represents declaring byref value types on the stack (only), sample code is written in the CIL language, and looks like below.

// Full source: LocalVarSig\1.il
// Binary: LocalVarSig\1.dll
// (...)

.method public static void TestMethod()
{ 
    .locals init(int32 &IntVarByRef)
    ret
}

The LocalVarSig signature for this sample code is explored in the below table.

Offset

Value

Meaning

0x05

0x04

Signature size.

0x06

0x07

Signature's prolog (LOCAL_SIG constant).

0x07

0x01

The total number of variables declared in this method is one.

0x08

0x10

Because actual variable resides on the runtime heap, the BYREF element of value 0x10 is present.

0x09

0x08

The variable's type (int32), see constants in the first part.

Example 2
The sample below illustrates what happens to the signature if we use typed reference, at the beggining we declare the IntVar variable, in the next line we obtain a typed reference using __makeref keyword (is undocumented and not CLS compliant) and save it in the TypedByRefVar variable.

// Full source: LocalVarSig\2.cs
// Binary: LocalVarSig\2.dll
// (...)

[CLSCompliant(false)]
public void TestMethod()
{
    int IntVar = 0;
    TypedReference TypedByRefVar = __makeref(IntVar);
}

The LocalVarSig for this sample looks as below.

Offset

Value

Meaning

0x1E

0x04

Signature size.

0x1F

0x07

Signature's prolog (LOCAL_SIG constant).

0x20

0x02

The total number of variables declared in this method is two.

0x21

0x08

The first variable's type (int32), see constants in the first part.

0x22

0x16

The second variable's type (TYPEDBYREF), see constants in the first part.

Example 3
Now move on to a little bit more difficult example, in this sample code we create TestDataClass class which has only one member named StringVarToBePinned of type string. In the TestMethod method (marked as unsafe) we instantiate the TestDataClass class, in the line below we try to "pin" StringVarToBePinned member and assign reference to them to FixedVar pointer using fixed keyword. This treatment assures that between { and } braces, the dataClass.StringVarToBePinned member will not be moved by the garbage collector actions, thus FixedVar to the member will be always valid inside braces of fixed keyword. Please notice that we can not declare the variable to be pinned, directly in the method, because such value is already pinned (is placed on the stack), therefore the variable must be wrapped with TestDataClass class (which is placed on the heap).

// Full source: LocalVarSig\3.cs
// Binary: LocalVarSig\3.dll
// compile with "/unsafe" switch
// (...)

public class TestDataClass
{
    public string StringVarToBePinned;
}

public class TestClass
{
    public unsafe void TestMethod()
    {
        TestDataClass dataClass = new TestDataClass();
        fixed (char* FixedVar = dataClass.StringVarToBePinned) { }
    }
}

This sample is difficult one because of the one more reason, at some point, it uses element that is not described yet, namely TypeDefOrRefEncoded, this element defines in which row and in which metadata table (TypeDef, TypeRef or TypeSpec) specified type is described. We will not go into further details of this elements here, if you want, you can jump directly to a description of this element by going to 5.2 TypeDefOrRefEncoded subsection in the next chapter. The LocalVarSig for the above code is explored in the below table.

Offset

Value

Meaning

0x20

0x08

Signature size.

0x21

0x07

Signature's prolog (LOCAL_SIG constant).

0x22

0x03

The total number of variables declared in this method is three.

0x23

0x12

The first variable's type (CLASS - followed by the TypeDefOrRefEncoded element), see constants in the first part.

0x24

0x08

The first variable's type is described in the TypeDef metadata table at row 2, which is TestDataClass class. This is the TypeDefOrRefEncoded element not explained in the current chapter.

0x25

0x0F

The second variable's type (PTR - followed by Type element), see constants in the first part.

0x26

0x03

The pointer's type from the previous byte (char - finally this is char*), see constants in the first part.

0x27

0x45

The third variable is pinned, see constants.

0x28

0x0E

The third, pinned variable's type (string), see constants.


1.2 CustomAttrib

As you can guess this signature stores instances of custom attributes, but is a little different from discussed earilier signatures, the key difference is that the CustomAttrib in contrast to, for example MethodRefSig signature, stores values of parameters supplied to a custom attribute, and does not store types of parameters. In other words the CustomAttrib signature stores only values of parameters (fixed and named) supplied at instantiation of a custom attribute, the information about their types and number is not repeated in the signature. The signature is indexed by the CustomAttribute.Value column, the Parent column indicates in which table (TypeDef - for a type, MethodDef - for a method, and so on) and at which row, an attributed element (method, type, and so on) is described. There is also a second significant difference compared to other signatures, in the CustomAttrib signature all binary values are stored in uncompressed little-endian byte order, except the PackedLen item (discussed below) and signature size. And I repeat once again, do not confuse custom attribute with custom modifier ! The full syntax diagram consists of four parts, let us look at the first.

The CustomAttrib signature syntax diagram
Picture 2a, The CustomAttrib signature syntax diagram

So far it is pretty simple, it starts from the Prolog that has constant value 0x0001 and occupies two bytes (unsigned int16 - uncompressed and little-endian). Next comes fixed arguments (FixedArg is illustrated on the Picture 2b), their number and types can be obtained by examining associated constructor's row in the MethodDef or MemberRef (when attribute's class resides in another assembly) metadata table, note that vararg method can not be used as an attribute's constructor. Next, the number of named parameters follows (NumNamed is two byte unsigned int16 - also uncompressed and little-endian), and finally named parameters themselves occur, repeated NumNamed times.

The CustomAttrib signature syntax diagram
Picture 2b, The CustomAttrib signature syntax diagram

This is a little bit harder part than previous, but is also quite simple, the upper path on the diagram, denotes that parameter is not a single-dimensional, zero-based array (SZARRAY, see constants in the first part), the bottom path represents SZARRAY parameter, i.e. parameter is an array, the number of elements in the SZARRAY array is stored in the NumElem element of type int32 (uncompressed and little-endian) which occupies four bytes, if the SZARRAY parameter is null, then the NumNamed is set to 0xFFFFFFFF value. The CLI completely disallows using other than one-dimensional arrays with a lower bound of zero (SZARRAY), single-dimensional zero-based array of type int32, is int32[] but not int32[,,] and also not int32[3...8]. If you want to know more about arrays in .NET, read the Array Types in .NET article from MSDN Magazine.

The CustomAttrib signature syntax diagram
Picture 2c, The CustomAttrib signature syntax diagram

This part is probably the most weird of the all four, the format Elem takes varies depending on the following conditions (quoted from the specification).

If the parameter kind is simple (first line in the above diagram) (bool, char, float32, float64, int8, int16, int32, int64, unsigned int8, unsigned int16, unsigned int32 or unsigned int64) then the 'blob' contains its binary value (Val). (A bool is a single byte with value 0 (false) or 1 (true); char is a two-byte Unicode character; and the others have their obvious meaning.) This pattern is also used if the parameter kind is an enum -- simply store the value of the enum's underlying integer type.

If the parameter kind is string, (middle line in above diagram) then the blob contains a SerString - a PackedLen count of bytes
(compressed and big-endian - added by the author), followed by the UTF8 characters. If the string is null, its PackedLen has the value 0xFF (with no following characters). If the string is empty (""), then PackedLen has the value 0x00 (with no following characters).

If the parameter kind is System.Type
(see typeof keyword - added by the author of the article), (also, the middle line in above diagram) its value is stored as a SerString (as defined in the previous paragraph), representing its canonical name. The canonical name x by the assembly where it is defined, its version, culture and public-key-token. If the assembly name is omitted, the CLI looks first in the current assembly, and then in the system library (mscorlib); in these two special cases, it is permitted to omit the assembly-name, version, culture and public-key-token.

If the parameter kind is System.Object, (third line in the above diagram) the value stored represents the "boxed" instance of that value-type. In this case, the blob contains the actual type's FieldOrPropType (see below), followed by the argument's unboxed value. [Note: it is not possible to pass a value of null in this case. end note]

The CustomAttrib signature syntax diagram
Picture 2d, The CustomAttrib signature syntax diagram

The last part illustrates format of the NamedArg element that represents a named argument (either, a field or a property). Because fields and properties can have the same name, the first element is either FIELD of constant one-byte value 0x53 when named parameter refers to a field or PROPERTY of constant one-byte value 0x54 when named parameter refers to a property. Next comes FieldOrPropType element which describes the type of the named property or field in one or two bytes, if the type of the named parameter is an unboxed simple value type (defined above), then the FieldOrPropType shall contain exactly one associated type's constant value (BOOLEAN, CHAR, I1, U1, I2, U2, I4, U4, I8, U8, R4, R8, STRING - see constants table in the first part), but if the type of the named parameter is boxed simple value type, then FieldOrPropType element is preceded by a byte containing value 0x51, in this case the FieldOrPropType is two-byte long. The FieldOrPropName element is SerString (explained above) containing the name of a property or a field. Finally comes, a single FixedArg element shown earlier. So, as you can see the NamedArg element is the normal FixedArg precedded with some additional information, that identify which field or property it represents. I hope that I do not scared you, as you will see soon, the signature is not so complicated as it looks.

Example 1
This example mainly shows the format of the SerString element and how the CustomAttrib distinguishes between fields and properties whose act as named parameters. In the example below, we have the TestAttribute attribute that needs supplying one fixed parameter Fixed1 of type int32, additionally we may (and we do) supply two additional, named parameters of type int16 and string, as shown on the below code listing.

// Full source: CustomAttrib\1.cs
// Binary: CustomAttrib\1.dll
// (...)

[AttributeUsage(AttributeTargets.Class)]
public class TestAttribute : Attribute
{
    public TestAttribute(int Fixed1) { }

    public short Named1 { get; set; }

    public string Named2;
}

[Test(1, Named1 = 1, Named2 = "Abcd")]
public class TestClass { }

The full CustomAttrib signature for this case is 33-bytes long, so at some points, we have merged several bytes into one row, with single description.

Offset

Value

Meaning

0x3E

0x21

Signature size, stored as a compressed integer, in big-endian byte order.

0x3F
0x40

0x01
0x00

Prolog stored as an uncompressed and little-endian unsigned int16 of value 0x0001.

0x41
0x42
0x43
0x44

0x01
0x00
0x00
0x00

The value of the first fixed argument of the attribute (Fixed1), the value is 0x00000001 and is stored as an uncompressed, little-endian int32. This is represented by the upper line in the Picture 2b and the first path in the Picture 2c.

0x45
0x46

0x02
0x00

The number of the named parameters supplied to the attribute, represented by the NumNamed element on the Picture 2a and stored as an unsigned int16, little-endian. We supplied exactly two optional parameters, and of course value of this two-byte element is 0x0002.

0x47

0x54

The value of this byte indicates that target named parameter is represented by a property (see constants in the first part), this is element PROPOERTY on the Picture 2d.

0x48

0x06

The type of the target property (int16, see constants in the first part). This byte is represented by the FieldOrPropType element on the Picture 2d.

0x49
0x4A
0x4B
0x4C
0x4D
0x4E
0x4F

0x06
0x4E
0x61
0x6D
0x65
0x64
0x31

This is the SerString string which specifies the name of the target property (represented by the FieldOrPropName element on the Picture 2d). The SerString is a normal unicode string preceeded with its size in bytes, the size is stored as a compressed integer, using big-endian byte order. So we have 6-byte long string (offset 0x49), because string name does not contain any characters beyond ASCII table, each one character occupies exactly one byte, we can easily read string text, it is Named1.

0x50
0x51

0x01
0x00

The value of the first named argument of the attribute (Named1), the value is 0x00001 and is stored as an uncompressed, little-endian int16. This is represented by the upper line in the Picture 2b and the first path in the Picture 2c.

0x52

0x53

The value of this byte indicates that target named parameter is represented by a field (see constants in the first part), this is element FIELD on the Picture 2d.

0x53

0x0E

The type of the target field (string, see constants in the first part). This byte is represented by the FieldOrPropType element on the Picture 2d.

0x54
0x55
0x56
0x57
0x58
0x59
0x5A

0x06
0x4E
0x61
0x6D
0x65
0x64
0x32

This is again the SerString string which specifies the name of the target property (represented by the FieldOrPropName element on the Picture 2d). The length of this string is 6-byte (look at offset 0x54), rest of the bytes are very similar to the previous string, it only differs the last byte, the string text is Named2, see ASCII table

0x5B
0x5C
0x5D
0x5E
0x5F

0x04
0x41
0x62
0x63
0x64

The value of the second named argument of the attribute (Named2), the value is Abcd (see ASCII table) and is stored as a SerString. This is represented by the upper line in the Picture 2b and the middle path in the Picture 2c. Because 0x5F - 0x3E = 0x21, i.e. last offset - first offset = signature size, the signature ends here.

Example 2
In this example we will demonstrate signature format, when using System.Type, SZARRAY, and boxed value types as arguments of the TestAttribute attribute defined below.

// Full source: CustomAttrib\2.cs
// Binary: CustomAttrib\2.dll
// (...)

[AttributeUsage(AttributeTargets.Class)]
public class TestAttribute : Attribute
{
    public TestAttribute(object Param1, int[] Param2, Type Param3) { }
}

[Test(1, new int[] {1, 2, 3}, typeof(string))]
public class TestClass { }

As in the previous sample signature is very long (it has 116 bytes !), and I splited it up into smaller parts.

Offset

Value

Meaning

0x2B

0x74

Signature size, stored as a compressed integer, in big-endian byte order.

0x2C
0x2D

0x01
0x00

Prolog stored as an uncompressed and little-endian unsigned int16 of value 0x0001.

0x2E

0x08

The type of the first fixed argument (int32 - boxed inside System.Object), this case is represented by the third path on the Picture 2c, where a value is immediately preceeded by the type of a value.

0x2F
0x30
0x31
0x32

0x01
0x00
0x00
0x00

The value which type was specified in the previous byte, because the type of the value is int32 it occupies exactly 4 bytes. It is stored in little-endian byte order, so the value is 0x00000001.

0x33
0x34
0x35
0x36

0x03
0x00
0x00
0x00

Next comes second parameter's definition, because the second argument is single dimensional and zero-based array (SZARRAY), this four bytes specifies the number of elements supplied to the array of the second parameter, this value is stored as an unsigned int32 in little-endian byte order.

0x37
0x38
0x39
0x3A

0x01
0x00
0x00
0x00

The value of the first element of the array in the second parameter, it is four-byte long because the type of array is int32, the value is 0x00000001.

0x3B
0x3C
0x3D
0x3E

0x02
0x00
0x00
0x00

The value of the second element of the array in the second parameter, it is four-byte long because the type of array is int32, the value is 0x00000002.

0x3F
0x40
0x41
0x42

0x03
0x00
0x00
0x00

The value of the third element of the array in the second parameter, it is four-byte long because the type of array is int32, the value is 0x00000003.

0x43
0x44
0x45
0x46
0x47
0x48
0x49
0x4A
0x4B
0x4C
0x4D
0x4E
0x4F
0x50
0x51
0x52
0x53
0x54
0x55
0x56
0x57
0x58
0x59
0x5A
0x5B
0x5C
0x5D
0x5E
0x5F
0x60
0x61
0x62
0x63
0x64
0x65
0x66
0x67
0x68
0x69
0x6A
0x6B
0x6C
0x6D
0x6E
0x6F
0x70
0x71
0x72
0x73
0x74
0x75
0x76
0x77
0x78
0x79
0x7A
0x7B
0x7C
0x7D
0x7E
0x7F
0x80
0x81
0x82
0x83
0x84
0x85
0x86
0x87
0x88
0x89
0x8A
0x8B
0x8C
0x8D
0x8E
0x8F
0x90
0x91
0x92
0x93
0x94
0x95
0x96
0x97
0x98
0x99
0x9A
0x9B
0x9C
0x9D

0x5A
0x53
0x79
0x73
0x74
0x65
0x6D
0x2E
0x53
0x74
0x72
0x69
0x6E
0x67
0x2C
0x20
0x6D
0x73
0x63
0x6F
0x72
0x6C
0x69
0x62
0x2C
0x20
0x56
0x65
0x72
0x73
0x69
0x6F
0x6E
0x3D
0x32
0x2E
0x30
0x2E
0x30
0x2E
0x30
0x2C
0x20
0x43
0x75
0x6C
0x74
0x75
0x72
0x65
0x3D
0x6E
0x65
0x75
0x74
0x72
0x61
0x6C
0x2C
0x20
0x50
0x75
0x62
0x6C
0x69
0x63
0x4B
0x65
0x79
0x54
0x6F
0x6B
0x65
0x6E
0x3D
0x62
0x37
0x37
0x61
0x35
0x63
0x35
0x36
0x31
0x39
0x33
0x34
0x65
0x30
0x38
0x39

This 90-bytes long SerString describes the canonical name of the type that is supplied to the third parameter, it has following value System.String, mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089. This is represented by the middle path on the Picture 2c

0x9E
0x9F

0x00
0x00

Two ending bytes, that are not part of the previous SerString (0x9F - 0x44 != 0x5A), but they form part of the entire CustomAttrib (0x9F - 0x2C = 0x74) and does not contain any data, I think that canonical name has some alignment and that is why these zeros are present, unfortunately specification does not say anything about it.


1.3 MethodSpec

The MethodSpec signature is straightforward, it describes each instatiation of a generic method, is indexed by the MethodSpec.Signature column, and its syntax is as follows, it begins with GENRICINST (do you see missing "E" ?) prolog of one-byte value 0x0A (this constant has different value than ELEMENT_TYPE_GENERICINST defined in the constants table in the first part), where Type is repeated GenArgCount.

MethodSpecBlob ::=
   GENRICINST GenArgCount Type Type*

Example 1
In sample below we instantiate the TestMethod generic method, supplying three generic arguments.

// Full source: MethodSpec\1.cs
// Binary: MethodSpec\1.dll
// (...)

public class TestClass
{
    public void TestMethod<GenArg1, GenArg2, GenArg3>() { }
}

public class TestRunClass
{
    public void TestRunMethod()
    {
        new TestClass().TestMethod<short, int, string>();
    }
}

The MethodSpec for this case looks as follows.

Offset

Value

Meaning

0x18

0x05

Signature size.

0x19

0x0A

Prolog.

0x1A

0x03

The number of generic arguments supplied to the generic method.

0x1B

0x06

The first parameter's type (int16), see constants in the first part.

0x1C

0x08

The second parameter's type (int32), see constants in the first part.

0x1D

0x0E

The third parameter's type (string), see constants in the first part.


1.4 TypeSpec

The TypeSpec signature is indexed by the TypeSpec.Signature column, and is used when: instantiating type as a multi-dimensional array, instantiating type as a single-dimensional array preceeded with custom modifier(s), instantiating generic type and other actions, as shown on the below diagram. Because some elements are not explained yet (such as custom modifiers, array shapes) we use only limited functionality of the TypeSpec signature, in the next chapter we will focus on the CustomMod, ArrayShape, TypeDefOrRefEncoded elements, and we will back to the TypeSpec signature and use rest of the capabilities of the signature. Also notice that in contrast to previous example, where GENRICINST (missing "E") constant/prolog is also used, in the TypeSpec the ELEMENT_TYPE_GENERICINST constant is used, which is defined in the general constants table (in the first part of the article).

TypeSpecBlob ::=
  PTR      CustomMod*  VOID
| PTR      CustomMod*  Type
| FNPTR    MethodDefSig
| FNPTR    MethodRefSig
| ARRAY    Type  ArrayShape
| SZARRAY  CustomMod*  Type
| GENERICINST (CLASS | VALUETYPE) TypeDefOrRefEncoded GenArgCount Type Type*

Example 1
In this example we instantiate the TypeSpec generic type, as shown on the below code listing.

// Full source: TypeSpec\1.cs
// Binary: TypeSpec\1.dll
// (...)

public class TestClass<GenArg1, GenArg2> { }

public class TestRunClass
{
    public void TestRunMethod()
    {
        TestClass<int, string> TestVar = new TestClass<int, string>();
    }
}

The TypeSpec for this case looks as follows.

Offset

Value

Meaning

0x13

0x06

Signature size.

0x14

0x15

The ELEMENT_TYPE_GENERICINST constant, see constants table in the first part.

0x15

0x12

The type of the generic type (CLASS), see constants table in the first part.

0x16

0x08

The instantiated generic type is described in the TypeDef metadata table at row 2, This is the TypeDefOrRefEncoded element not explained in the current chapter.

0x17

0x02

The number of generic arguments supplied to the type is two.

0x18

0x08

The first geneneric parameter's type (int32), see constants in the first part.

0x19

0x0E

The second geneneric parameter's type (string), see constants in the first part.


1.5 MarshalSpec

The MarshalSpec signature is generated when using MarshalAs attribute on fields, parameters and return parameters. It specifies how data should be marshalled when calling from/to unmanaged code via the Platform Invoke. The signature is indexed in the FieldMarshal.NativeType column, the name of the metadata table is slightly misleading, in fact it does not matter whether the MarshalSpec describes either field, parameter or return parameter, it is always indexed by the previously mentioned column. The ParamNum and NumElem elements on the below syntax listing describe respectively, the parameter in the method call that provides the number of elements in the array, the number of elements or additional elements, both elements are stored in the signature as compressed integers, their aim is to help compute the total size in bytes that an array occupies in the memory. The Microsoft-specific implementation of the marshalling descriptor is richer than that described here, and make use of additional constants and extended syntax, if you want to know more about Microsoft implementation of the MarshalSpec, go to the Partition II metadata specification - section §23.4.

MarshalSpec ::=
  NativeIntrinsic
| ARRAY ArrayElemType
| ARRAY ArrayElemType ParamNum
| ARRAY ArrayElemType ParamNum NumElem

ArrayElemType ::=
   NativeIntrinsic 

NativeIntrinsic ::=
  BOOLEAN | I1 | U1 | I2 | U2 | I4 | U4 | I8 | U8 | R4 | R8
| LPSTR | LPSTR | INT | UINT | FUNC 

To compute the size in bytes of an array, the following pseudo-code is used, where the @ParamNum stands for the value passed in for parameter number ParamNum.

if ParamNum = 0
   SizeInBytes = NumElem * sizeof (elem)
else
   SizeInBytes = ( @ParamNum +  NumElem ) * sizeof (elem)
endif

Constants table for this signature is as on the below table, in the above syntax descriptors and examples in this subsection, instead of full names of constants, abbrevations are used.

Name

Value

NATIVE_TYPE_BOOLEAN

0x02

NATIVE_TYPE_I1

0x03

NATIVE_TYPE_U1

0x04

NATIVE_TYPE_I2

0x05

NATIVE_TYPE_U2

0x06

NATIVE_TYPE_I4

0x07

NATIVE_TYPE_U4

0x08

NATIVE_TYPE_I8

0x09

NATIVE_TYPE_U8

0x0A

NATIVE_TYPE_R4

0x0B

NATIVE_TYPE_R8

0x0C

NATIVE_TYPE_LPSTR

0x14

NATIVE_TYPE_LPWSTR

0x15

NATIVE_TYPE_INT

0x1F

NATIVE_TYPE_UINT

0x20

NATIVE_TYPE_FUNC

0x26

NATIVE_TYPE_ARRAY

0x2A

NATIVE_TYPE_MAX

0x50

Example 1

Let us start with the simplest possible example shown on the below code listing.

// Full source: MarshalSpec\1.cs
// Binary: MarshalSpec\1.dll
// (...)

[MarshalAs(UnmanagedType.LPWStr)]
public string TestField;

This code has generated the following MarshalSpec signature.

Offset

Value

Meaning

0x1C

0x01

Signature size.

0x1D

0x15

The TestField field is marshalled to the LPWSTR in the unmanaged code.

Example 2
Now it is time for more sophisticated example, we will marshal the array of int32 type to LPArray (A pointer to the first element of a C-style array), because such array type does not provide information about rank and bounds of the associated array data, we have to specify which parameter of the method is responsible for providing information about how much elements the array has, this is done by the specifying SizeParamIndex optional parameter, in addition to it, we also set the SizeConst optional parameter, which specifies that Param1 array contains 10 more elements in addition to that specified by the ArraySize argument. Please notice that there is also the SafeArray array type, which is a self-describing array that carries the type, rank, and boundaries of the associated data, and does not require setting any optional parameters in the MarshalAsAttribute, but it is Microsoft-specific, and thus is not described here.

// Full source: MarshalSpec\2.cs
// Binary: MarshalSpec\2.dll
// (...)

 public void TestMethod(
    [MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 2, SizeConst = 10)] int[] Param1,
    int ArraySize)
{
    // nop
}

The following MarshalSpec signature should be generated by the above code.

Offset

Value

Meaning

0x1B

0x05

Signature size.

0x1C

0x2A

Type of marshalling parameter (ARRAY), see constants table for marshalling descriptor.

0x1D

0x50

The MAX constant (see constants table for marshalling descriptor), indicates that this array does not provide information about element's type of the array.

0x1E

0x02

The ParamNum parameter stored as compressed integer.

0x1F

0x0A

The NumElem parameter stored as compressed integer.

0x20

0x01

The ElemMult parameter stored as compressed integer, this is strange parameter, the whole specification mentions about it only two times saying that, if marshalled type is ARRAY the ElemMult must be set to 0x01 but does not specify its meaning and its location in the MarshalSpec signature (see section §22.17 in the Partition II metadata specification).


2. Elements

We have discussed all signatures, but it is not the end, signatures consist of smaller parts named "elements" (I call it this way), they were separated because, they form a part of more than one signature and thus, there is no need to repeat explanation for particular element(s) in each signature. In this chapter we will take a closer look at them.


2.1 CustomMod

This element has frequently repeated in the discussed signatures, and that is why we are starting from it. The custom modifiers are similar to the custom attributes, but in contrast to them, the custom modifiers are part of a signature. Custom modifiers are defined in the CIL using modreq (required modifier) and modopt (optional modifier) keywords in a method declaration, both need supplying a type (class or structure) as their "argument". Two signatures that differ only by the addition of a custom modifier (required or optional) shall not be considered to match, and, as the specification says:

The distinction between required and optional modifiers is important to tools other than the CLI that deal with the metadata, typically compilers and program analysers. A required modifier indicates that there is a special semantics to the modified item that should not be ignored, while an optional modifier can simply be ignored. For example, the const qualifier in the C programming language can be modelled with an optional modifier since the caller of a method that has a const-qualified parameter need not treat it in any special way. On the other hand, a parameter that shall be copy-constructed in C++ shall be marked with a required custom attribute since it is the caller who makes the copy.

Unfortunately C# has some problems with handling parameters that have custom modifiers attached, you can read about it in the Modopt, method signatures, and incomplete specs oh my! and More on modopt articles on the CodeBetter.com.

The CMOD_OPT and CMOD_REQD are just constants defined in the constants table in the first part, the TypeDefEncoded and TypeRefEncoded elements are in fact single TypeDefOrRefEncoded element, throughly discussed in the next subsection. Note that there can be zero, one or more the CustomMods attached to a field, property, parameter or return parameter. As far as I know there is no way to define custom modifier using C#, of course excluding System.Reflection.Emit. In the System.Runtime.CompilerServices namespace you can find several indicators (I call it this way) that can be applied to a custom modifier, for instance CallConvCdecl, IsConst, IsLong.

The CustomMod element syntax diagram
Picture 3, The CustomMod element syntax diagram

Example 1
In the example below we have annotated the TestField field with the modreq modifier, hence the CustomMod lies within the FieldSig signature, depicted at the very begining of the article, at the Picture 2. The IsLong indicator, distinguishes a long from an integer in C++, but, in fact, in our case there is no special semantics behind this custom modifier, we want just demonstrate CustomMod element's format in the signature. The value of the TypeDefOrRefEncoded element is shown twice, in two numeral systems - hexadecimal (16 subscript) and binary (2 subscript), in the next subsection you will see why.

// Full source: CustomMod\1.il
// Binary: CustomMod\1.dll
// (...)

.field public int64 modreq([mscorlib]System.CompilerServices.IsLong) TestField

The table below presents whole FieldSig signature indexed by the Field.Signature column, along with embedded custom modifier generated by the modreq keyword.

Offset

Value

Meaning

0x01

0x04

Signature size.

0x02

0x06

FieldSig's prolog.

0x03

0x1F

Encountered custom, required modifier (modreq), see constants in the first part.

0x04

0x05<sub>16</sub> 
00000101<sub>2</sub>

The TypeDefOrRefEncoded element, in this case it points to first row of the TypeRef table, that is IsLong class. This element is described in the next subsection.

0x05

0x0A

The type of the field (int64), see constants in the first part.


2.2 TypeDefOrRefEncoded

Now we will try to demystify the most mysterious elements at this moment, fortunately, namely the TypeDefOrRefEncoded, it is not so complicated as it may seem. This element determines in which metadata table and at which table's row referenced type's information resides. The first two, least significant bits encode metadata table, 0 for TypeDef (referenced type resides in the current assembly), 1 for TypeRef (referenced type resides in a separate assembly) and 2 for TypeSpec (referenced type is generic type, array, etc. see chapter 4.9 TypeSpec), the rest bits encode the row's index, note that indexes are one-based, in other words, first row in every metadata table is always 1, not 0.

Example 1
In this example we have declared the single field with the custom, required modifier attached to it, the modreq accepts as the argument the TestClass type declared in the same assembly, as shown below.

// Full source: TypeDefOrRefEncoded\1.il
// Binary: TypeDefOrRefEncoded\1.dll
// (...)

.class public TestClass extends [mscorlib]System.Object { }

.field public int64 modreq(TestClass) TestField

The FieldSig for above sample code is as follows.

Offset

Value

Meaning

0x01

0x04

Signature size.

0x02

0x06

FieldSig's prolog.

0x03

0x1F

Encountered custom, required modifier (modreq), see constants in the first part.

0x04

0x08<sub>16</sub> 
00001000<sub>2</sub>

The TypeDefOrRefEncoded element, this time it points to second row of the TypeDef table, the first two, least significant bits stand for type of table (002 - TypeDef), bits from 3 to 8 denotes number of row in the table (0000102 - 2), that is TestClass. Now, compare this, with the TypeDefOrRefEncoded element from previous subsection.

0x05

0x0A

The type of the field (int64), see constants in the first part.


2.3 Param

This element describes a single parameter supplied to a method or a property, and therefore is part of PropertySig, MethodDefSig, MethodRefSig, etc. This is the syntax diagram for the Param element:

The Param element syntax diagram
Picture 4, The Param element syntax diagram

Example 1
In the TestMethod method illustrated below, there are two custom modifiers attached to the single parameter, the aim of this example is to demonstrate the Param element's format, and once again show how the TypeDefOrRefEncoded element works.

// Full source: Param\1.il
// Binary: Param\1.dll
// (...)

.class public TestClass extends [mscorlib]System.Object { }

.method public static void TestMethod(int32 modopt(TestClass) modreq([mscorlib]System.Runtime.CompilerServices.IsLong) Param1) 
{
    ret
}

Associated MethodDefSig signature for this method is:

Offset

Value

Meaning

0x01

0x08

Signature size.

0x02

0x00

Method is static.

0x03

0x01

The number of parameters.

0x04

0x01

The type of the returned value (void), see constants in the first part

0x05

0x1F

Encountered custom, required modifier (modreq), see constants in the first part.

0x06

0x09<sub>16</sub> 
00001001<sub>2</sub>

Referenced row is 2 in TypeRef metadata table, that is IsLong type.

0x07

0x20

Encountered custom, optional modifier (modopt), see constants in the first part.

0x08

0x08<sub>16</sub> 
00001000<sub>2</sub>

Referenced row is 2 in TypeDef metadata table, that is TestClass type.

0x09

0x08

First parameter's type (int32), see constants in the first part.


2.4 RetType

This element is almost identical to the Param element, it has one more extra path that can include VOID type. Because below syntax diagram for this element is self-explanatory there is no examples provided for this subsection.

The RetType element syntax diagram
Picture 5, The RetType element syntax diagram


2.5 Type

Is not surprising that the Type element describes... a type, and not only primitive type (such as int32, bool, string, etc.) but also arrays, generic instance types and complex types (classes and structures). The below listing presents syntax diagram for this element, of course, words written using upper case are constants whose values can be found in the constants table in the first part. You may wonder that the constant GENERICINST is part of this element, but remember that the TypeSpec, MethodSpec and MethodDefSig signatures have different aims !

Type ::=	  
BOOLEAN | CHAR | I1 | U1 | I2 | U2 | I4 | U4 | I8 | U8 | R4 | R8 | I | U |
| ARRAY Type ArrayShape
| CLASS TypeDefOrRefEncoded
| FNPTR MethodDefSig
| FNPTR MethodRefSig
| GENERICINST (CLASS | VALUETYPE) TypeDefOrRefEncoded GenArgCount Type *
| MVAR number
| OBJECT
| PTR CustomMod* Type
| PTR CustomMod* VOID
| STRING
| SZARRAY CustomMod* Type
| VALUETYPE TypeDefOrRefEncoded
| VAR number

Example 1
Let us see what happens to the MethodDefSig signature when method accepts generic types as normal parameters.

// Full source: Type\1.cs
// Binary: Type\1.dll
// (...)

public class TestClass<GenArg1, GenArg2> { }

public class TestRunClass
{
    public void TestRunMethod()
    {
        TestMethod(new TestClass<int, string>());
    }

    public void TestMethod(TestClass<int, string> Param1) { }
}

Dissecting the MethodDefSig signature for the TestMethod method.

Offset

Value

Meaning

0x0E

0x09

Signature size.

0x0F

0x20

The method is instance method.

0x10

0x01

The number of normal parameters.

0x11

0x01

The type of the returned value (void), see constants in the first part.

0x12

0x15

The first parameter's type is generic type (GENERICINST), see constants in the first part.

0x13

0x12

The first parameter's type is generic class (CLASS), see constants in the first part.


2.6 ArrayShape

I think that a lot pepole who use .NET platform know that array can have more than one dimension but do not know that each dimension in an array can have lower bound, that is probably because most of developers use C# language which does not allow using lower bounds, except using Array.CreateInstance method to create such array type. The ArrayShape element holds full definition of a multi-dimensional array, it stores number of dimensions, size and lower boundary of each dimension that array has. The syntax diagram along with brief description copied from the specification for this element is depicted below.

The ArrayShape element syntax diagram
Picture 6, The ArrayShape element syntax diagram

Rank is an integer (stored in compressed form, see §23.2) that specifies the number of dimensions in the array (shall be 1 or more). NumSizes is a compressed integer that says how many dimensions have specified sizes (it shall be 0 or more). Size is a compressed integer specifying the size of that dimension - the sequence starts at the first dimension, and goes on for a total of NumSizes items. Similarly, NumLoBounds is a compressed integer that says how many dimensions have specified lower bounds (it shall be 0 or more). And LoBound is a compressed integer specifying the lower bound of that dimension - the sequence starts at the first dimension, and goes on for a total of NumLoBounds items. None of the dimensions in these two sequences can be skipped, but the number of specified dimensions can be less than Rank.

NOTE: Please do not confuse multi-dimensional arrays with jagged arrays, multi-dimensional array in CIL can be for example: int32[,] and jagged array is int32[][]. Also note that ArrayShape stores information only about multi-dimensional arrays ! Single dimensional array is denoted as SZARRAY constant - nothing more ( see Type element). To learn more about arrays in .NET see Array Types in .NET article in the MSDN Magazine.

IMPORTANT: Unfortunately, as we will see in second example, the ILASM compiler has some problems with handling lower boundaries of arrays (the LoBound field on the Picture 6), the lower boundary is multiplied by two ! Surely, this is not correct, since the specification says that lower boundaries shall be stored in signatures without making any change. Below you can see a table copied from the specification that shows sample arrays declarations and its correct parameters in the ArrayShape element. Moreover, the specification does not specify in which case(s) the NumSizes and the NumLoBounds fields may be less than the Rank field, from my observation the NumSizes and the NumLoBounds fields are less than Rank only in one case - when lower boundary is not specified for all dimensions (this is represented in the second row in the below table), otherwise the NumSizes and the NumLoBounds are always equal to the Rank this is in contradiction with third and fifth case in the below table.

Declaration

Type

Rank

NumSizes

Size

NumLoBounds

LoBound

[0...2]

I4

1

1

3

0

-

[,,,,,,]

I4

7

0

-

0

-

[0...3, 0...2,,,,]

I4

6

2

4 3

2

0 0

[1...2, 6...8]

I4

2

2

2 3

2

1 6

[5, 3...5, , ]

I4

4

2

5 3

2

0 3

Example 1
Let us see how the ArrayShape works in action.

// Full source: ArrayShape\1.il
// Binary: ArrayShape\1.dll
// (...)

.field public int32[,,] TestField

The following FieldSig signature should be generated by the above multi-dimensional array.

Offset

Value

Meaning

0x01

0x06

Signature size.

0x02

0x06

FieldSig's prolog.

0x03

0x14

Field's type value is ARRAY, see constants in the first part.

0x04

0x08

Array's type is int32, see constants in the first part.

0x05

0x03

The number of the array's dimensions (Rank field on the Picture 6).

0x06

0x00

Size of array's dimensions not specified (NumSizes field on the Picture 6).

0x07

0x00

Lower bounds of array's dimensions not specified (NumLoBounds field on the Picture 6).

Example 2
This example is aimed to show you how the ArrayShape element behaves when declaring multi-dimensional arrays with lower boundaries specified.

// Full source: ArrayShape\2.il
// Binary: ArrayShape\2.dll
// (...)

.field public int32[0...5,,4...6] TestField

The whole FieldSig signature looks like.

Offset

Value

Meaning

0x01

0x0C

Siganture size.

0x02

0x06

FieldSig's prolog.

0x03

0x14

Field's type value is ARRAY, see constants in the first part.

0x04

0x08

Array's type is int32, see constants in the first part.

0x05

0x03

The number of the array's dimensions (Rank field on the Picture 6).

0x06

0x03

The number of sizes for this array (NumSizes field on the Picture 6).

0x07

0x06

The size of the first dimension of the array (Size field on the Picture 6).

0x08

0x00

The size of the second dimension of the array, zero means - not specified (Size field on the Picture 6).

0x09

0x03

The size of the third dimension of the array (Size field on the Picture 6).

0x0A

0x03

The number of the lower bounds for this array (NumLoBounds field on the Picture 6).

0x0B

0x00

The lower boundary of the first dimension of the array (LoBound field on the Picture 6).

0x0C

0x00

The lower boundary of the second dimension of the array (LoBound field on the Picture 6).

0x0D

0x08

The lower boundary of the third dimension of the array (LoBound field on the Picture 6). The boundary is multiplied by two, see important note at the begining of current subsection.

Example 3
Now let us look how the ArrayShape element looks in reality and compare results to the specification.

// Full source: ArrayShape\3.il
// Binary: ArrayShape\3.dll
// (...)

.field public int32[0...2] TestField

Yes, the NumLoBounds is equal to the Rank, despite that specification says that NumLoBounds shall be equal zero.

Offset

Value

Meaning

0x01

0x08

Siganture size.

0x02

0x06

FieldSig's prolog.

0x03

0x14

Field's type value is ARRAY, see constants in the first part.

0x04

0x08

Array's type is int32, see constants in the first part.

0x05

0x01

The number of the array's dimensions (Rank field on the Picture 6).

0x06

0x01

The number of sizes for this array (NumSizes field on the Picture 6).

0x07

0x03

The size of the first dimension of the array (Size field on the Picture 6).

0x08

0x01

The number of the lower bounds for this array (NumLoBounds field on the Picture 6).

0x09

0x00

The lower boundary of the first dimension of the array (LoBound field on the Picture 6).


3. Conclusion

As you see, signatures are complicated monstrosity, but makes .NET executable small, compact and consistent. If you have any questions, hints or requests, do not hesitate, just add comment below, constructive comments are always welcome.


4. References


5. Revision History

  • 1.0: 26th September 2009
    Initial release.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Przemyslaw Celej
Software Developer
Poland Poland
Przemek was born in 1988, he lives in small town near Warsaw in Poland, Europe. Currently he codes some C# stuff and J2EE as well, ocasionally he uses C++ for fun. Przemek is cycling fun, if weather permits he rides a bike.

Comments and Discussions

 
Questionquestion about the correct type for transfer to com on this code? Pinmemberjeffery c12-Jul-13 17:13 
GeneralMy vote of 5 PinmemberBrian Pendleton16-Mar-12 11:32 
GeneralMy vote of 5 PinmemberMohammad A Rahman11-Feb-12 23:21 
GeneralA 5 from me... PinmemberRozis29-Sep-09 5:15 
GeneralRe: A 5 from me... PinmemberPrzemyslaw Celej29-Sep-09 5:32 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

| Advertise | Privacy | Terms of Use | Mobile
Web04 | 2.8.1411023.1 | Last Updated 28 Sep 2009
Article Copyright 2009 by Przemyslaw Celej
Everything else Copyright © CodeProject, 1999-2014
Layout: fixed | fluid