In this article, you will see a full description about signatures that are part of the .NET file format.
Contents
- Signatures (continuation)
- LocalVarSig
- CustomAttrib
- MethodSpec
- TypeSpec
- MarshalSpec
- Elements
- CustomMod
- TypeDefOrRefEncoded
- Param
- RetType
- Type
- ArrayShape
- Conclusion
- References
- Revision history
1. Signatures (continuation)
Continuation of the first part.
1.1 LocalVarSig
The LocalVarSig
signature is also indexed by the StandAloneSig.Signature
column, it stores the type of all the local variables allocated during the running of a method. The LOCAL_SIG
element is signature's prolog and has constant value 0x07
, the Count
element is an unsigned integer (of course compressed !) that stores the number of local variable that the associated method has, the BYREF
element is an abbreviation of ELEMENT_TYPE_BYREF
constant (see constants in the first part) and indicates that Type
element points to the actual variable. There is also one more element that is worth mentioning, it is the Constraint
element, it indicates that target type will not be moved by the Garbage Collector when performing memory reclaiming, because local variables are located on the stack (where GC does not perform any actions), the Type
of the variable shall be either, a reference type (like System.Object
- allocated on the heap) or value type (like System.Decimal
- allocated on the stack), but when target type (pinned) is value type, its definition should include the BYREF
element, in this case reference to variable is hold on the stack, but variable itself is allocated in the heap. You can see more on pinning here. In Picture 1 below, you can see the full syntax diagram for this signature.
I would like to bring your special attention to TYPEDBYREF
element on the below diagram, this is the typed reference, it contains not only a managed pointer (like normal reference) to a location but also a runtime representation of data. I quote description of it from the specification:
"The typed reference local variable signature states that the local will contain both a managed pointer to a location and a runtime representation of the type that can be stored at that location. A typed reference signature is similar to a byref constraint, but while the byref specifies the type as part of the byref constraint (and hence statically as part of the type description), a typed reference provides the type information dynamically. A typed reference is a full signature in itself and cannot be combined with other constraints. In particular, it is not possible to specify a byref whose type is typed reference."
The typed reference is also very helpful when byref
passing of unboxed data (i.e., data that is stored on the stack, those are always value types) to methods that are not statically restricted to the type they accept and require in addition to passing managed pointer to a location, also static type of a location, the typed reference meets these needs. Notice also that typed reference parameter can refer to a location that is on the stack, and that location will have a lifetime limited by a time of running a method (within the typed reference is allocated), thus the CIL compiler applies appropriate checks on the lifetime of byref and typed reference parameter, see more in §12.4.1.5.2 in ECMA-355 specification. The typed reference is represented in the .NET's BCL (Base Class Library) as TypedReference structure.
Picture 1: The LocalVarSig signature syntax diagram
Example 1
This example represents declaring byref
value types on the stack (only), the sample code is written in the CIL language, and looks like below:
.method public static void TestMethod()
{
.locals init(int32 &IntVarByRef)
ret
}
The LocalVarSig
signature for this sample code is explored in the below table:
Offset | Value | Meaning |
0x05 | 0x04 | Signature size |
0x06 | 0x07 | Signature's prolog (LOCAL_SIG constant) |
0x07 | 0x01 | The total number of variables declared in this method is one |
0x08 | 0x10 | Because actual variable resides on the runtime heap, the BYREF element of value 0x10 is present |
0x09 | 0x08 | The variable's type (int32 ), see constants in the first part |
Example 2
The sample below illustrates what happens to the signature if we use typed reference, at the beginning, we declare the IntVar
variable, in the next line, we obtain a typed reference using __makeref
keyword (is undocumented and not CLS compliant) and save it in the TypedByRefVar
variable.
[CLSCompliant(false)]
public void TestMethod()
{
int IntVar = 0;
TypedReference TypedByRefVar = __makeref(IntVar);
}
The LocalVarSig
for this sample looks as below:
Offset | Value | Meaning |
0x1E | 0x04 | Signature size |
0x1F | 0x07 | Signature's prolog (LOCAL_SIG constant) |
0x20 | 0x02 | The total number of variables declared in this method is two |
0x21 | 0x08 | The first variable's type (int32 ), see constants in the first part |
0x22 | 0x16 | The second variable's type (TYPEDBYREF ), see constants in the first part |
Example 3
Now move on to a little bit more difficult example, in this sample code, we create TestDataClass
class which has only one member named StringVarToBePinned
of type string
. In the TestMethod
method (marked as unsafe
), we instantiate the TestDataClass
class, in the line below, we try to "pin" StringVarToBePinned
member and assign reference to them to FixedVar
pointer using fixed
keyword. This treatment assures that between {
and }
braces, the dataClass.StringVarToBePinned
member will not be moved by the garbage collector actions, thus FixedVar
to the member will be always valid inside braces of fixed
keyword. Please notice that we cannot declare the variable to be pinned, directly in the method, because such value is already pinned (is placed on the stack), therefore the variable must be wrapped with TestDataClass
class (which is placed on the heap).
public class TestDataClass
{
public string StringVarToBePinned;
}
public class TestClass
{
public unsafe void TestMethod()
{
TestDataClass dataClass = new TestDataClass();
fixed (char* FixedVar = dataClass.StringVarToBePinned) { }
}
}
This sample is a difficult one because of one more reason, at some point, it uses element that is not described yet, namely TypeDefOrRefEncoded
, this element defines in which row and in which metadata table (TypeDef
, TypeRef
or TypeSpec
) specified type is described. We will not go into further details of this elements here, if you want, you can jump directly to a description of this element by going to 2.2 TypeDefOrRefEncoded subsection in the next chapter. The LocalVarSig
for the above code is explored in the below table:
Offset | Value | Meaning |
0x20 | 0x08 | Signature size |
0x21 | 0x07 | Signature's prolog (LOCAL_SIG constant) |
0x22 | 0x03 | The total number of variables declared in this method is three |
0x23 | 0x12 | The first variable's type (CLASS - followed by the TypeDefOrRefEncoded element), see constants in the first part |
0x24 | 0x08 | The first variable's type is described in the TypeDef metadata table at row 2 , which is TestDataClass class. This is the TypeDefOrRefEncoded element not explained in the current chapter. |
0x25 | 0x0F | The second variable's type (PTR - followed by Type element), see constants in the first part |
0x26 | 0x03 | The pointer's type from the previous byte (char - finally this is char* ), see constants in the first part |
0x27 | 0x45 | The third variable is pinned, see constants |
0x28 | 0x0E | The third, pinned variable's type (string ), see constants |
1.2 CustomAttrib
As you can guess, this signature stores instances of custom attributes, but is a little different from earlier discussed signatures, the key difference is that the CustomAttrib
in contrast to, for example, MethodRefSig
signature, stores values of parameters supplied to a custom attribute, and does not store types of parameters. In other words, the CustomAttrib
signature stores only values of parameters (fixed and named) supplied at instantiation of a custom attribute, the information about their types and number is not repeated in the signature. The signature is indexed by the CustomAttribute.Value
column, the Parent
column indicates in which table (TypeDef
- for a type, MethodDef
- for a method, and so on) and at which row, an attributed element (method, type, and so on) is described. There is also a second significant difference compared to other signatures, in the CustomAttrib
signature all binary values are stored in uncompressed little-endian byte order, except the PackedLen
item (discussed below) and signature size. And I repeat once again, do not confuse custom attribute with custom modifier ! The full syntax diagram consists of four parts, let us look at the first.
Picture 2a: The CustomAttrib signature syntax diagram
So far it is pretty simple, it starts from the Prolog
that has constant value 0x0001
and occupies two bytes (unsigned int16
- uncompressed and little-endian). Next comes fixed arguments (FixedArg
is illustrated on the Picture 2b), their number and types can be obtained by examining associated constructor's row in the MethodDef
or MemberRef
(when attribute's class resides in another assembly) metadata table, note that vararg
method can not be used as an attribute's constructor. Next, the number of named parameters follows (NumNamed
is two byte unsigned int16
- also uncompressed and little-endian), and finally named parameters themselves occur, repeated NumNamed
times.
Picture 2b: The CustomAttrib signature syntax diagram
This is a little bit harder part than the previous one, but is also quite simple, the upper path on the diagram, denotes that parameter is not a single-dimensional, zero-based array (SZARRAY
, see constants in the first part), the bottom path represents SZARRAY
parameter, i.e., parameter is an array, the number of elements in the SZARRAY
array is stored in the NumElem
element of type int32
(uncompressed and little-endian) which occupies four bytes, if the SZARRAY
parameter is null
, then the NumNamed
is set to 0xFFFFFFFF
value. The CLI completely disallows using other than one-dimensional arrays with a lower bound of zero (SZARRAY
), single-dimensional zero-based array of type int32
, is int32[]
but not int32[,,]
and also not int32[3...8]
. If you want to know more about arrays in .NET, read the Array Types in .NET article from MSDN Magazine.
Picture 2c: The CustomAttrib signature syntax diagram
This part is probably the most weird of all four, the format Elem
takes varies depending on the following conditions (quoted from the specification).
If the parameter kind is simple (first line in the above diagram) (bool
, char
, float32
, float64
, int8
, int16
, int32
, int64
, unsigned int8
, unsigned int16
, unsigned int32
or unsigned int64
) then the 'blob' contains its binary value (Val
). (A bool
is a single byte with value 0
(false
) or 1
(true
); char
is a two-byte Unicode character; and the others have their obvious meaning.) This pattern is also used if the parameter kind is an enum
-- simply store the value of the enum's underlying integer type.
If the parameter kind is string, (middle line in above diagram) then the blob contains a SerString
- a PackedLen
count of bytes (compressed and big-endian - added by the author), followed by the UTF8 characters. If the string is null, its PackedLen
has the value 0xFF
(with no following characters). If the string is empty (""
), then PackedLen
has the value 0x00
(with no following characters).
If the parameter kind is System.Type
(see typeof
keyword - added by the author of the article), (also, the middle line in above diagram), its value is stored as a SerString
(as defined in the previous paragraph), representing its canonical name. The canonical name x by the assembly where it is defined, its version, culture and public-key-token. If the assembly name is omitted, the CLI looks first in the current assembly, and then in the system library (mscorlib); in these two special cases, it is permitted to omit the assembly-name, version, culture and public-key-token.
If the parameter kind is System.Object
, (third line in the above diagram) the value stored represents the "boxed" instance of that value-type. In this case, the blob contains the actual type's FieldOrPropType
(see below), followed by the argument's unboxed value. [Note: It is not possible to pass a value of null in this case. end note]
Picture 2d: The CustomAttrib signature syntax diagram
The last part illustrates format of the NamedArg
element that represents a named argument (either, a field or a property). Because fields and properties can have the same name, the first element is either FIELD
of constant one-byte value 0x53
when named parameter refers to a field or PROPERTY
of constant one-byte value 0x54
when named parameter refers to a property. Next comes FieldOrPropType
element which describes the type of the named property or field in one or two bytes, if the type of the named parameter is an unboxed simple value type (defined above), then the FieldOrPropType
shall contain exactly one associated type's constant value (BOOLEAN
, CHAR
, I1
, U1
, I2
, U2
, I4
, U4
, I8
, U8
, R4
, R8
, STRING
- see constants table in the first part), but if the type of the named parameter is boxed simple value type, then FieldOrPropType
element is preceded by a byte containing value 0x51
, in this case the FieldOrPropType
is two-byte long. The FieldOrPropName
element is SerString
(explained above) containing the name of a property or a field. Finally comes a single FixedArg
element shown earlier. So, as you can see, the NamedArg
element is the normal FixedArg
preceded with some additional information, that identify which field or property it represents. I hope that I did not scare you, as you will see soon, the signature is not as complicated as it looks.
Example 1
This example mainly shows the format of the SerString
element and how the CustomAttrib
distinguishes between fields and properties that act as named parameters. In the example below, we have the TestAttribute
attribute that needs supplying one fixed parameter Fixed1
of type int32
, additionally, we may (and we do) supply two additional, named parameters of type int16
and string
, as shown in the below code listing:
[AttributeUsage(AttributeTargets.Class)]
public class TestAttribute : Attribute
{
public TestAttribute(int Fixed1) { }
public short Named1 { get; set; }
public string Named2;
}
[Test(1, Named1 = 1, Named2 = "Abcd")]
public class TestClass { }
The full CustomAttrib
signature for this case is 33-bytes long, so at some points, we have merged several bytes into one row, with single description.
Offset | Value | Meaning |
0x3E | 0x21 | Signature size, stored as a compressed integer, in big-endian byte order |
0x3F
0x40
| 0x01
0x00
| Prolog stored as an uncompressed and little-endian unsigned int16 of value 0x0001 |
0x41
0x42
0x43
0x44
| 0x01
0x00
0x00
0x00
| The value of the first fixed argument of the attribute (Fixed1 ), the value is 0x00000001 and is stored as an uncompressed, little-endian int32 . This is represented by the upper line in the Picture 2b and the first path in the Picture 2c. |
0x45
0x46
| 0x02
0x00
| The number of the named parameters supplied to the attribute, represented by the NumNamed element on the Picture 2a and stored as an unsigned int16 , little-endian. We supplied exactly two optional parameters, and of course value of this two-byte element is 0x0002 . |
0x47 | 0x54 | The value of this byte indicates that target named parameter is represented by a property (see constants in the first part), this is element PROPOERTY on the Picture 2d. |
0x48 | 0x06 | The type of the target property (int16 , see constants in the first part). This byte is represented by the FieldOrPropType element on the Picture 2d. |
0x49
0x4A
0x4B
0x4C
0x4D
0x4E
0x4F
| 0x06
0x4E
0x61
0x6D
0x65
0x64
0x31
| This is the SerString string which specifies the name of the target property (represented by the FieldOrPropName element on the Picture 2d). The SerString is a normal unicode string preceded with its size in bytes, the size is stored as a compressed integer, using big-endian byte order. So we have 6-byte long string (offset 0x49 ), because string name does not contain any characters beyond ASCII table, each one character occupies exactly one byte, we can easily read string text, it is Named1 . |
0x50
0x51
| 0x01
0x00
| The value of the first named argument of the attribute (Named1 ), the value is 0x00001 and is stored as an uncompressed, little-endian int16 . This is represented by the upper line in the Picture 2b and the first path in the Picture 2c. |
0x52 | 0x53 | The value of this byte indicates that target named parameter is represented by a field (see constants in the first part), this is element FIELD on the Picture 2d. |
0x53 | 0x0E | The type of the target field (string , see constants in the first part). This byte is represented by the FieldOrPropType element on the Picture 2d. |
0x54
0x55
0x56
0x57
0x58
0x59
0x5A
| 0x06
0x4E
0x61
0x6D
0x65
0x64
0x32
| This is again the SerString string which specifies the name of the target property (represented by the FieldOrPropName element on the Picture 2d). The length of this string is 6-byte (look at offset 0x54 ), rest of the bytes are very similar to the previous string, it only differs the last byte, the string text is Named2 , see ASCII table |
0x5B
0x5C
0x5D
0x5E
0x5F
| 0x04
0x41
0x62
0x63
0x64
| The value of the second named argument of the attribute (Named2 ), the value is Abcd (see ASCII table) and is stored as a SerString . This is represented by the upper line in the Picture 2b and the middle path in the Picture 2c. Because 0x5F - 0x3E = 0x21 , i.e. last offset - first offset = signature size, the signature ends here. |
Example 2
In this example, we will demonstrate signature format, when using System.Type
, SZARRAY
, and boxed value types as arguments of the TestAttribute
attribute defined below:
[AttributeUsage(AttributeTargets.Class)]
public class TestAttribute : Attribute
{
public TestAttribute(object Param1, int[] Param2, Type Param3) { }
}
[Test(1, new int[] {1, 2, 3}, typeof(string))]
public class TestClass { }
As in the previous sample, signature is very long (it has 116 bytes !), and I split it up into smaller parts.
Offset | Value | Meaning |
0x2B | 0x74 | Signature size, stored as a compressed integer, in big-endian byte order |
0x2C
0x2D
| 0x01
0x00
| Prolog stored as an uncompressed and little-endian unsigned int16 of value 0x0001 |
0x2E | 0x08 | The type of the first fixed argument (int32 - boxed inside System.Object ), this case is represented by the third path on the Picture 2c, where a value is immediately preceded by the type of a value |
0x2F
0x30
0x31
0x32
| 0x01
0x00
0x00
0x00
| The value which type was specified in the previous byte, because the type of the value is int32 it occupies exactly 4 bytes. It is stored in little-endian byte order, so the value is 0x00000001 . |
0x33
0x34
0x35
0x36
| 0x03
0x00
0x00
0x00
| Next comes second parameter's definition, because the second argument is single dimensional and zero-based array (SZARRAY ), this four bytes specifies the number of elements supplied to the array of the second parameter, this value is stored as an unsigned int32 in little-endian byte order. |
0x37
0x38
0x39
0x3A
| 0x01
0x00
0x00
0x00
| The value of the first element of the array in the second parameter, it is four-byte long because the type of array is int32 , the value is 0x00000001 . |
0x3B
0x3C
0x3D
0x3E
| 0x02
0x00
0x00
0x00
| The value of the second element of the array in the second parameter, it is four-byte long because the type of array is int32 , the value is 0x00000002 . |
0x3F
0x40
0x41
0x42
| 0x03
0x00
0x00
0x00
| The value of the third element of the array in the second parameter, it is four-byte long because the type of array is int32 , the value is 0x00000003 . |
0x43
0x44
0x45
0x46
0x47
0x48
0x49
0x4A
0x4B
0x4C
0x4D
0x4E
0x4F
0x50
0x51
0x52
0x53
0x54
0x55
0x56
0x57
0x58
0x59
0x5A
0x5B
0x5C
0x5D
0x5E
0x5F
0x60
0x61
0x62
0x63
0x64
0x65
0x66
0x67
0x68
0x69
0x6A
0x6B
0x6C
0x6D
0x6E
0x6F
0x70
0x71
0x72
0x73
0x74
0x75
0x76
0x77
0x78
0x79
0x7A
0x7B
0x7C
0x7D
0x7E
0x7F
0x80
0x81
0x82
0x83
0x84
0x85
0x86
0x87
0x88
0x89
0x8A
0x8B
0x8C
0x8D
0x8E
0x8F
0x90
0x91
0x92
0x93
0x94
0x95
0x96
0x97
0x98
0x99
0x9A
0x9B
0x9C
0x9D
| 0x5A
0x53
0x79
0x73
0x74
0x65
0x6D
0x2E
0x53
0x74
0x72
0x69
0x6E
0x67
0x2C
0x20
0x6D
0x73
0x63
0x6F
0x72
0x6C
0x69
0x62
0x2C
0x20
0x56
0x65
0x72
0x73
0x69
0x6F
0x6E
0x3D
0x32
0x2E
0x30
0x2E
0x30
0x2E
0x30
0x2C
0x20
0x43
0x75
0x6C
0x74
0x75
0x72
0x65
0x3D
0x6E
0x65
0x75
0x74
0x72
0x61
0x6C
0x2C
0x20
0x50
0x75
0x62
0x6C
0x69
0x63
0x4B
0x65
0x79
0x54
0x6F
0x6B
0x65
0x6E
0x3D
0x62
0x37
0x37
0x61
0x35
0x63
0x35
0x36
0x31
0x39
0x33
0x34
0x65
0x30
0x38
0x39
| This 90-bytes long SerString describes the canonical name of the type that is supplied to the third parameter, it has the following value System.String, mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089 . This is represented by the middle path on the Picture 2c. |
0x9E
0x9F
| 0x00
0x00
| Two ending bytes, that are not part of the previous SerString (0x9F - 0x44 != 0x5A ), but they form part of the entire CustomAttrib (0x9F - 0x2C = 0x74 ) and does not contain any data, I think that canonical name has some alignment and that is why these zeros are present, unfortunately specification does not say anything about it. |
1.3 MethodSpec
The MethodSpec
signature is straightforward, it describes each instantiation of a generic method, is indexed by the MethodSpec.Signature
column, and its syntax is as follows, it begins with GENRICINST
(do you see missing "E" ?) prolog of one-byte value 0x0A
(this constant has different value than ELEMENT_TYPE_GENERICINST
defined in the constants table in the first part), where Type
is repeated GenArgCount
.
MethodSpecBlob ::=
GENRICINST GenArgCount Type Type*
Example 1
In the sample below, we instantiate the TestMethod
generic method, supplying three generic arguments.
public class TestClass
{
public void TestMethod<GenArg1, GenArg2, GenArg3>() { }
}
public class TestRunClass
{
public void TestRunMethod()
{
new TestClass().TestMethod<short, int, string>();
}
}
The MethodSpec
for this case looks as follows:
Offset | Value | Meaning |
0x18 | 0x05 | Signature size |
0x19 | 0x0A | Prolog |
0x1A | 0x03 | The number of generic arguments supplied to the generic method |
0x1B | 0x06 | The first parameter's type (int16 ), see constants in the first part |
0x1C | 0x08 | The second parameter's type (int32 ), see constants in the first part |
0x1D | 0x0E | The third parameter's type (string ), see constants in the first part |
1.4 TypeSpec
The TypeSpec
signature is indexed by the TypeSpec.Signature
column, and is used when: instantiating type as a multi-dimensional array, instantiating type as a single-dimensional array preceded with custom modifier(s), instantiating generic type and other actions, as shown on the below diagram. Because some elements are not explained yet (such as custom modifiers, array shapes), we use only limited functionality of the TypeSpec
signature, in the next chapter, we will focus on the CustomMod
, ArrayShape
, TypeDefOrRefEncoded
elements, and we will back to the TypeSpec
signature and use rest of the capabilities of the signature. Also notice that in contrast to previous example, where GENRICINST
(missing "E") constant/prolog is also used, in the TypeSpec
the ELEMENT_TYPE_GENERICINST
constant is used, which is defined in the general constants table (in the first part of the article).
TypeSpecBlob ::=
PTR CustomMod* VOID
| PTR CustomMod* Type
| FNPTR MethodDefSig
| FNPTR MethodRefSig
| ARRAY Type ArrayShape
| SZARRAY CustomMod* Type
| GENERICINST (CLASS | VALUETYPE) TypeDefOrRefEncoded GenArgCount Type Type*
Example 1
In this example, we instantiate the TypeSpec
generic type, as shown in the below code listing:
public class TestClass<GenArg1, GenArg2> { }
public class TestRunClass
{
public void TestRunMethod()
{
TestClass<int, string> TestVar = new TestClass<int, string>();
}
}
The TypeSpec
for this case looks as follows:
Offset | Value | Meaning |
0x13 | 0x06 | Signature size |
0x14 | 0x15 | The ELEMENT_TYPE_GENERICINST constant, see constants table in the first part |
0x15 | 0x12 | The type of the generic type (CLASS ), see constants table in the first part |
0x16 | 0x08 | The instantiated generic type is described in the TypeDef metadata table at row 2 . This is the TypeDefOrRefEncoded element not explained in the current chapter. |
0x17 | 0x02 | The number of generic arguments supplied to the type is two. |
0x18 | 0x08 | The first generic parameter's type (int32 ), see constants in the first part |
0x19 | 0x0E | The second generic parameter's type (string ), see constants in the first part |
1.5 MarshalSpec
The MarshalSpec
signature is generated when using MarshalAs attribute on fields, parameters and return parameters. It specifies how data should be marshalled when calling from/to unmanaged code via the Platform Invoke. The signature is indexed in the FieldMarshal.NativeType
column, the name of the metadata table is slightly misleading, in fact, it does not matter whether the MarshalSpec
describes either field, parameter or return parameter, it is always indexed by the previously mentioned column. The ParamNum
and NumElem
elements on the below syntax listing describe respectively, the parameter in the method call that provides the number of elements in the array, the number of elements or additional elements, both elements are stored in the signature as compressed integers, their aim is to help compute the total size in bytes that an array occupies in the memory. The Microsoft-specific implementation of the marshalling descriptor is richer than that described here, and make use of additional constants and extended syntax, if you want to know more about Microsoft implementation of the MarshalSpec
, go to the Partition II metadata specification - section §23.4.
MarshalSpec ::=
NativeIntrinsic
| ARRAY ArrayElemType
| ARRAY ArrayElemType ParamNum
| ARRAY ArrayElemType ParamNum NumElem
ArrayElemType ::=
NativeIntrinsic
NativeIntrinsic ::=
BOOLEAN | I1 | U1 | I2 | U2 | I4 | U4 | I8 | U8 | R4 | R8
| LPSTR | LPSTR | INT | UINT | FUNC
To compute the size in bytes of an array, the following pseudo-code is used, where the @ParamNum
stands for the value passed in for parameter number ParamNum
.
if ParamNum = 0
SizeInBytes = NumElem * sizeof (elem)
else
SizeInBytes = ( @ParamNum + NumElem ) * sizeof (elem)
endif
Constants table for this signature is as on the below table, in the above syntax descriptors and examples in this subsection, instead of full names of constants, abbrevations are used.
Name | Value |
NATIVE_TYPE_BOOLEAN | 0x02 |
NATIVE_TYPE_I1 | 0x03 |
NATIVE_TYPE_U1 | 0x04 |
NATIVE_TYPE_I2 | 0x05 |
NATIVE_TYPE_U2 | 0x06 |
NATIVE_TYPE_I4 | 0x07 |
NATIVE_TYPE_U4 | 0x08 |
NATIVE_TYPE_I8 | 0x09 |
NATIVE_TYPE_U8 | 0x0A |
NATIVE_TYPE_R4 | 0x0B |
NATIVE_TYPE_R8 | 0x0C |
NATIVE_TYPE_LPSTR | 0x14 |
NATIVE_TYPE_LPWSTR | 0x15 |
NATIVE_TYPE_INT | 0x1F |
NATIVE_TYPE_UINT | 0x20 |
NATIVE_TYPE_FUNC | 0x26 |
NATIVE_TYPE_ARRAY | 0x2A |
NATIVE_TYPE_MAX | 0x50 |
Example 1
Let us start with the simplest possible example shown in the below code listing:
[MarshalAs(UnmanagedType.LPWStr)]
public string TestField;
This code has generated the following MarshalSpec
signature:
Offset | Value | Meaning |
0x1C | 0x01 | Signature size |
0x1D | 0x15 | The TestField field is marshalled to the LPWSTR in the unmanaged code. |
Example 2
Now it is time for a more sophisticated example, we will marshal the array of int32
type to LPArray (a pointer to the first element of a C-style array), because such array type does not provide information about rank and bounds of the associated array data, we have to specify which parameter of the method is responsible for providing information about how much elements the array has, this is done by the specifying SizeParamIndex optional parameter, in addition to it, we also set the SizeConst optional parameter, which specifies that Param1
array contains 10
more elements in addition to that specified by the ArraySize
argument. Please notice that there is also the SafeArray array type, which is a self-describing array that carries the type, rank, and boundaries of the associated data, and does not require setting any optional parameters in the MarshalAsAttribute, but it is Microsoft-specific, and thus is not described here.
public void TestMethod(
[MarshalAs(UnmanagedType.LPArray, SizeParamIndex = 2, SizeConst = 10)] int[] Param1,
int ArraySize)
{
}
The following MarshalSpec
signature should be generated by the above code.
Offset | Value | Meaning |
0x1B | 0x05 | Signature size |
0x1C | 0x2A | Type of marshalling parameter (ARRAY ), see constants table for marshalling descriptor |
0x1D | 0x50 | The MAX constant (see constants table for marshalling descriptor) indicates that this array does not provide information about element's type of the array. |
0x1E | 0x02 | The ParamNum parameter stored as compressed integer |
0x1F | 0x0A | The NumElem parameter stored as compressed integer |
0x20 | 0x01 | The ElemMult parameter stored as compressed integer, this is strange parameter, the whole specification mentions about it only two times saying that, if marshalled type is ARRAY the ElemMult must be set to 0x01 but does not specify its meaning and its location in the MarshalSpec signature (see section §22.17 in the Partition II metadata specification). |
2. Elements
We have discussed all signatures, but it is not the end, signatures consist of smaller parts named "elements" (I call it this way), they were separated because, they form a part of more than one signature and thus, there is no need to repeat explanation for particular element(s) in each signature. In this chapter, we will take a closer look at them.
2.1 CustomMod
This element has frequently repeated in the discussed signatures, and that is why we are starting from it. The custom modifiers are similar to the custom attributes, but in contrast to them, the custom modifiers are part of a signature. Custom modifiers are defined in the CIL using modreq
(required modifier) and modopt
(optional modifier) keywords in a method declaration, both need supplying a type (class or structure) as their "argument". Two signatures that differ only by the addition of a custom modifier (required or optional) shall not be considered to match, and, as the specification says:
The distinction between required and optional modifiers is important to tools other than the CLI that deal with the metadata, typically compilers and program analysers. A required modifier indicates that there is a special semantics to the modified item that should not be ignored, while an optional modifier can simply be ignored. For example, the const qualifier in the C programming language can be modelled with an optional modifier since the caller of a method that has a const-qualified parameter need not treat it in any special way. On the other hand, a parameter that shall be copy-constructed in C++ shall be marked with a required custom attribute since it is the caller who makes the copy.
Unfortunately, C# has some problems with handling parameters that have custom modifiers attached, you can read about it in the Modopt, method signatures, and incomplete specs oh my! and More on modopt articles on CodeBetter.com.
The CMOD_OPT
and CMOD_REQD
are just constants defined in the constants table in the first part, the TypeDefEncoded
and TypeRefEncoded
elements are in fact single TypeDefOrRefEncoded
element, thoroughly discussed in the next subsection. Note that there can be zero, one or more the CustomMod
s attached to a field, property, parameter or return parameter. As far as I know, there is no way to define custom modifier using C#, of course excluding System.Reflection.Emit. In the System.Runtime.CompilerServices namespace, you can find several indicators (I call it this way) that can be applied to a custom modifier, for instance CallConvCdecl, IsConst, IsLong.
Picture 3: The CustomMod element syntax diagram
Example 1
In the example below, we have annotated the TestField
field with the modreq
modifier, hence the CustomMod
lies within the FieldSig
signature, depicted at the very beginning of the article, at the Picture 2. The IsLong indicator, distinguishes a long
from an integer in C++, but, in fact, in our case, there is no special semantics behind this custom modifier, we want just demonstrate CustomMod
element's format in the signature. The value of the TypeDefOrRefEncoded
element is shown twice, in two numeral systems - hexadecimal (<sub>16</sub>
subscript) and binary (<sub>2</sub>
subscript), in the next subsection, you will see why.
.field public int64 modreq([mscorlib]System.CompilerServices.IsLong) TestField
The table below presents whole FieldSig
signature indexed by the Field.Signature
column, along with embedded custom modifier generated by the modreq
keyword.
Offset | Value | Meaning |
0x01 | 0x04 | Signature size |
0x02 | 0x06 | FieldSig 's prolog |
0x03 | 0x1F | Encountered custom, required modifier (modreq ), see constants in the first part |
0x04 | 0x05 16
00000101 2
| The TypeDefOrRefEncoded element, in this case it points to first row of the TypeRef table, that is IsLong class. This element is described in the next subsection. |
0x05 | 0x0A | The type of the field (int64 ), see constants in the first part |
2.2 TypeDefOrRefEncoded
Now we will try to demystify the most mysterious elements at this moment, fortunately, namely the TypeDefOrRefEncoded
, it is not so complicated as it may seem. This element determines in which metadata table and at which table's row referenced type's information resides. The first two, least significant bits encode metadata table, 0
for TypeDef
(referenced type resides in the current assembly), 1
for TypeRef
(referenced type resides in a separate assembly) and 2
for TypeSpec
(referenced type is generic type, array, etc. see chapter 4.9 TypeSpec
), the rest bits encode the row's index, note that indexes are one-based, in other words, first row in every metadata table is always 1
, not 0
.
Example 1
In this example, we have declared the single field with the custom, required modifier attached to it, the modreq
accepts as the argument the TestClass
type declared in the same assembly, as shown below:
.class public TestClass extends [mscorlib]System.Object { }
.field public int64 modreq(TestClass) TestField
The FieldSig
for the above sample code is as follows:
Offset | Value | Meaning |
0x01 | 0x04 | Signature size |
0x02 | 0x06 | FieldSig 's prolog |
0x03 | 0x1F | Encountered custom, required modifier (modreq ), see constants in the first part |
0x04 | 0x08 16
00001000 2
| The TypeDefOrRefEncoded element, this time, it points to the second row of the TypeDef table, the first two, least significant bits stand for type of table (00 2 - TypeDef ), bits from 3 to 8 denotes number of row in the table (000010 2 - 2 ), that is TestClass . Now, compare this, with the TypeDefOrRefEncoded element from previous subsection. |
0x05 | 0x0A | The type of the field (int64 ), see constants in the first part |
2.3 Param
This element describes a single parameter supplied to a method or a property, and therefore is part of PropertySig, MethodDefSig, MethodRefSig, etc. This is the syntax diagram for the Param
element:
Picture 4: The Param element syntax diagram
Example 1
In the TestMethod
method illustrated below, there are two custom modifiers attached to the single parameter, the aim of this example is to demonstrate the Param
element's format, and once again show how the TypeDefOrRefEncoded
element works.
.class public TestClass extends [mscorlib]System.Object { }
.method public static void TestMethod(int32 modopt(TestClass)
modreq([mscorlib]System.Runtime.CompilerServices.IsLong) Param1)
{
ret
}
The associated MethodDefSig
signature for this method is:
Offset | Value | Meaning |
0x01 | 0x08 | Signature size |
0x02 | 0x00 | Method is static |
0x03 | 0x01 | The number of parameters |
0x04 | 0x01 | The type of the returned value (void ), see constants in the first part |
0x05 | 0x1F | Encountered custom, required modifier (modreq ), see constants in the first part |
0x06 | 0x09 16
00001001 2
| Referenced row is 2 in TypeRef metadata table, that is IsLong type |
0x07 | 0x20 | Encountered custom, optional modifier (modopt ), see constants in the first part |
0x08 | 0x08 16
00001000 2
| Referenced row is 2 in TypeDef metadata table, that is TestClass type |
0x09 | 0x08 | First parameter's type (int32 ), see constants in the first part |
2.4 RetType
This element is almost identical to the Param
element, it has one more extra path that can include VOID
type. Because the below syntax diagram for this element is self-explanatory, there are no examples provided for this subsection.
Picture 5: The RetType
element syntax diagram
2.5 Type
Is it not surprising that the Type
element describes... a type, and not only primitive type (such as int32
, bool
, string
, etc.) but also arrays, generic instance types and complex types (classes and structures). The below listing presents syntax diagram for this element, of course, words written using upper case are constants whose values can be found in the constants table in the first part. You may wonder that the constant GENERICINST
is part of this element, but remember that the TypeSpec
, MethodSpec
and MethodDefSig
signatures have different aims !
Type ::=
BOOLEAN | CHAR | I1 | U1 | I2 | U2 | I4 | U4 | I8 | U8 | R4 | R8 | I | U |
| ARRAY Type ArrayShape
| CLASS TypeDefOrRefEncoded
| FNPTR MethodDefSig
| FNPTR MethodRefSig
| GENERICINST (CLASS | VALUETYPE) TypeDefOrRefEncoded GenArgCount Type *
| MVAR number
| OBJECT
| PTR CustomMod* Type
| PTR CustomMod* VOID
| STRING
| SZARRAY CustomMod* Type
| VALUETYPE TypeDefOrRefEncoded
| VAR number
Example 1
Let us see what happens to the MethodDefSig
signature when method accepts generic types as normal parameters.
public class TestClass<GenArg1, GenArg2> { }
public class TestRunClass
{
public void TestRunMethod()
{
TestMethod(new TestClass<int, string>());
}
public void TestMethod(TestClass<int, string> Param1) { }
}
Dissecting the MethodDefSig
signature for the TestMethod
method.
Offset | Value | Meaning |
0x0E | 0x09 | Signature size |
0x0F | 0x20 | The method is instance method |
0x10 | 0x01 | The number of normal parameters |
0x11 | 0x01 | The type of the returned value (void ), see constants in the first part |
0x12 | 0x15 | The first parameter's type is generic type (GENERICINST ), see constants in the first part |
0x13 | 0x12 | The first parameter's type is generic class (CLASS ), see constants in the first part |
2.6 ArrayShape
I think that a lot people who use .NET platform know that array can have more than one dimension but do not know that each dimension in an array can have lower bound, that is probably because most of developers use C# language which does not allow using lower bounds, except using Array.CreateInstance
method to create such array type. The ArrayShape
element holds full definition of a multi-dimensional array, it stores number of dimensions, size and lower boundary of each dimension that array has. The syntax diagram along with brief description copied from the specification for this element is depicted below:
Picture 6: The ArrayShape element syntax diagram
Rank
is an integer (stored in compressed form, see §23.2) that specifies the number of dimensions in the array (shall be 1 or more). NumSizes
is a compressed integer that says how many dimensions have specified sizes (it shall be 0 or more). Size
is a compressed integer specifying the size of that dimension - the sequence starts at the first dimension, and goes on for a total of NumSizes
items. Similarly, NumLoBounds
is a compressed integer that says how many dimensions have specified lower bounds (it shall be 0 or more). And LoBound
is a compressed integer specifying the lower bound of that dimension - the sequence starts at the first dimension, and goes on for a total of NumLoBounds
items. None of the dimensions in these two sequences can be skipped, but the number of specified dimensions can be less than Rank
.
NOTE: Please do not confuse multi-dimensional arrays with jagged arrays, multi-dimensional array in CIL can be for example: int32[,]
and jagged array is int32[][]
. Also note that ArrayShape
stores information only about multi-dimensional arrays ! Single dimensional array is denoted as SZARRAY
constant - nothing more ( see Type
element). To learn more about arrays in .NET see Array Types in .NET article in the MSDN Magazine.
IMPORTANT: Unfortunately, as we will see in second example, the ILASM
compiler has some problems with handling lower boundaries of arrays (the LoBound
field on the Picture 6), the lower boundary is multiplied by two! Surely, this is not correct, since the specification says that lower boundaries shall be stored in signatures without making any change. Below, you can see a table copied from the specification that shows sample arrays declarations and its correct parameters in the ArrayShape
element. Moreover, the specification does not specify in which case(s) the NumSizes
and the NumLoBounds
fields may be less than the Rank
field, from my observation the NumSizes
and the NumLoBounds
fields are less than Rank
only in one case - when lower boundary is not specified for all dimensions (this is represented in the second row in the below table), otherwise the NumSizes
and the NumLoBounds
are always equal to the Rank
, this is in contradiction with the third and fifth case in the below table:
Declaration | Type | Rank | NumSizes | Size | NumLoBounds | LoBound |
[0...2] | I4 | 1 | 1 | 3 | 0 | - |
[,,,,,,] | I4 | 7 | 0 | - | 0 | - |
[0...3, 0...2,,,,] | I4 | 6 | 2 | 4 3 | 2 | 0 0 |
[1...2, 6...8] | I4 | 2 | 2 | 2 3 | 2 | 1 6 |
[5, 3...5, , ] | I4 | 4 | 2 | 5 3 | 2 | 0 3 |
Example 1
Let us see how the ArrayShape
works in action.
.field public int32[,,] TestField
The following FieldSig
signature should be generated by the above multi-dimensional array.
Offset | Value | Meaning |
0x01 | 0x06 | Signature size |
0x02 | 0x06 | FieldSig 's prolog |
0x03 | 0x14 | Field's type value is ARRAY , see constants in the first part |
0x04 | 0x08 | Array's type is int32 , see constants in the first part |
0x05 | 0x03 | The number of the array's dimensions (Rank field on the Picture 6) |
0x06 | 0x00 | Size of array's dimensions not specified (NumSizes field on the Picture 6) |
0x07 | 0x00 | Lower bounds of array's dimensions not specified (NumLoBounds field on the Picture 6) |
Example 2
This example is aimed to show you how the ArrayShape
element behaves when declaring multi-dimensional arrays with lower boundaries specified.
.field public int32[0...5,,4...6] TestField
The whole FieldSig
signature looks like:
Offset | Value | Meaning |
0x01 | 0x0C | Signature size |
0x02 | 0x06 | FieldSig 's prolog |
0x03 | 0x14 | Field's type value is ARRAY , see constants in the first part |
0x04 | 0x08 | Array's type is int32 , see constants in the first part |
0x05 | 0x03 | The number of the array's dimensions (Rank field on the Picture 6) |
0x06 | 0x03 | The number of sizes for this array (NumSizes field on the Picture 6) |
0x07 | 0x06 | The size of the first dimension of the array (Size field on the Picture 6) |
0x08 | 0x00 | The size of the second dimension of the array, zero means - not specified (Size field on the Picture 6) |
0x09 | 0x03 | The size of the third dimension of the array (Size field on the Picture 6) |
0x0A | 0x03 | The number of the lower bounds for this array (NumLoBounds field on the Picture 6) |
0x0B | 0x00 | The lower boundary of the first dimension of the array (LoBound field on the Picture 6) |
0x0C | 0x00 | The lower boundary of the second dimension of the array (LoBound field on the Picture 6) |
0x0D | 0x08 | The lower boundary of the third dimension of the array (LoBound field on the Picture 6). The boundary is multiplied by two, see important note at the beginning of current subsection |
Example 3
Now let us look how the ArrayShape
element looks in reality and compare results to the specification.
.field public int32[0...2] TestField
Yes, the NumLoBounds
is equal to the Rank
, despite that specification says that NumLoBounds
shall be equal to zero.
Offset | Value | Meaning |
0x01 | 0x08 | Signature size |
0x02 | 0x06 | FieldSig 's prolog |
0x03 | 0x14 | Field's type value is ARRAY , see constants in the first part |
0x04 | 0x08 | Array's type is int32 , see constants in the first part |
0x05 | 0x01 | The number of the array's dimensions (Rank field on the Picture 6) |
0x06 | 0x01 | The number of sizes for this array (NumSizes field on the Picture 6) |
0x07 | 0x03 | The size of the first dimension of the array (Size field on the Picture 6) |
0x08 | 0x01 | The number of the lower bounds for this array (NumLoBounds field on the Picture 6) |
0x09 | 0x00 | The lower boundary of the first dimension of the array (LoBound field on the Picture 6) |
3. Conclusion
As you see, signatures are complicated monstrosity, but makes .NET executable small, compact and consistent. If you have any questions, hints or requests, do not hesitate, just add comment below, constructive comments are always welcome.
4. References
5. Revision History
- 1.0: 26th September 2009: Initial release
Przemek was born in 1988, he lives in small town near Warsaw in Poland, Europe. Currently he codes some C# stuff and J2EE as well, ocasionally he uses C++ for fun. Przemek is cycling fun, if weather permits he rides a bike.