Determining Object Layout using FieldDescs





5.00/5 (1 vote)
Determining the memory layout of an object using CLR structures
Introduction
For every field in an object, the CLR allocates a special structure, FieldDesc
, containing metadata for the runtime and Reflection. A FieldDesc
contains information such as the field offset, whether the field is static
or ThreadStatic
, public
or private
, etc. To determine the layout of an object, we'll be looking specifically at the offset metadata.
Layout of a FieldDesc
Before we can determine the layout of an object, we of course need to know the layout of a FieldDesc
. A FieldDesc
contains 3 fields:
Offset | Type | Name |
0 | MethodTable* | m_pMTOfEnclosingClass |
8 | DWORD | (DWORD 1) |
12 | DWORD | (DWORD 2) |
The CLR engineers designed their structures to be as small as possible; because of that, all the metadata is actually stored as bitfields in DWORD 1
and DWORD 2
.
DWORD 1
Bits | Name | Description |
24 |
m_mb
| MemberDef metadata. This metadata is eventually used in FieldInfo.MetadataToken after some manipulation. |
1 |
m_isStatic
| Whether the field is static |
1 |
m_isThreadLocal
| Whether the field is decorated with a ThreadStatic attribute |
1 |
m_isRVA
| (Relative Virtual Address) |
3 | m_prot | Access level |
1 |
m_requiresFullMbValue
| Whether |
DWORD 2
Bits | Name | Description |
27 |
m_dwOffset
| Field offset |
5 |
m_type
| CorElementType of the field |
Replication in C#
We can easily replicate a FieldDesc
in C# using the StructLayout
and FieldOffset
attributes.
[StructLayout(LayoutKind.Explicit)]
public unsafe struct FieldDesc
{
[FieldOffset(0)] private readonly void* m_pMTOfEnclosingClass;
// unsigned m_mb : 24;
// unsigned m_isStatic : 1;
// unsigned m_isThreadLocal : 1;
// unsigned m_isRVA : 1;
// unsigned m_prot : 3;
// unsigned m_requiresFullMbValue : 1;
[FieldOffset(8)] private readonly uint m_dword1;
// unsigned m_dwOffset : 27;
// unsigned m_type : 5;
[FieldOffset(12)] private readonly uint m_dword2;
...
Reading the bitfields themselves is easy using bitwise operations:
/// <summary>
/// Offset in memory
/// </summary>
public int Offset => (int) (m_dword2 & 0x7FFFFFF);
public int MB => (int) (m_dword1 & 0xFFFFFF);
private bool RequiresFullMBValue => ReadBit(m_dword1, 31);
...
We perform a bitwise AND
operation on m_dword2
to get the value of the 27 bits for m_dwOffset
.
111111111111111111111111111 (27 bits) = 0x7FFFFFF
I also made a small function for reading bits for convenience:
static bool ReadBit(uint b, int bitIndex)
{
return (b & (1 << bitIndex)) != 0;
}
We won't write the code for retrieving all of the bitfields' values because we're only interested in m_dwOffset
, but if you're interested, you can view the code for that here. We'll also go back to MB
and RequiresFullMBValue
later.
Retrieving a FieldDesc for a Field
Thankfully, we don't have to do anything too hacky for retrieving a FieldDesc
. Reflection actually already has a way of getting a FieldDesc
.
FieldInfo.FieldHandle.Value
Value
points to a FieldInfo
's corresponding FieldDesc
, where it gets all of its metadata. Therefore, we can write a method to get a FieldInfo
's FieldDesc
counterpart.
public static FieldDesc* GetFieldDescForFieldInfo(FieldInfo fi)
{
if (fi.IsLiteral) {
throw new Exception("Const field");
}
FieldDesc* fd = (FieldDesc*) fi.FieldHandle.Value;
return fd;
}
Note: I throw an Exception
when the FieldInfo
is a literal because you can't access the FieldHandle
of a literal (const
) field.
We'll wrap the above method in another method to let us get the FieldDesc
easier.
private const BindingFlags DefaultFlags =
BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Static;
public static FieldDesc* GetFieldDesc(Type t, string name, BindingFlags flags = DefaultFlags)
{
if (t.IsArray) {
throw new Exception("Arrays do not have fields");
}
FieldInfo fieldInfo = t.GetField(name, flags);
return GetFieldDescForFieldInfo(fieldInfo);
}
Getting a Field's Metadata Token
Earlier in the article, I said that the bitfield m_mb
is used for calculating a field's metadata token, which is used in FieldInfo.MetadataToken
. However, it requires some calculation to get the proper token. If we look at field.h line 171 in the CoreCLR
repo:
mdFieldDef GetMemberDef() const
{
LIMITED_METHOD_DAC_CONTRACT;
// Check if this FieldDesc is using the packed mb layout
if (!m_requiresFullMbValue)
{
return TokenFromRid(m_mb & enum_packedMbLayout_MbMask, mdtFieldDef);
}
return TokenFromRid(m_mb, mdtFieldDef);
}
We can replicate GetMemberDef
like so:
public int MemberDef {
get {
// Check if this FieldDesc is using the packed mb layout
if (!RequiresFullMBValue)
{
return TokenFromRid(MB & (int) MbMask.PackedMbLayoutMbMask, CorTokenType.mdtFieldDef);
}
return TokenFromRid(MB, CorTokenType.mdtFieldDef);
}
}
MbMask
:
enum MbMask
{
PackedMbLayoutMbMask = 0x01FFFF,
PackedMbLayoutNameHashMask = 0xFE0000
}
TokenFromRid
can be replicated in C# like this:
static int TokenFromRid(int rid, CorTokenType tktype)
{
return rid | (int) tktype;
}
CorTokenType
:
enum CorTokenType
{
mdtModule = 0x00000000, //
mdtTypeRef = 0x01000000, //
mdtTypeDef = 0x02000000, //
mdtFieldDef = 0x04000000, //
...
Testing It Out
Note: This was tested on 64-bit.
We'll make a struct
for testing:
struct Struct
{
private long l;
private int i;
public int Int => i;
}
First, we'll make sure our metadata token matches the one Reflection has:
var fd = GetFieldDesc<Struct>("l");
var fi = typeof(Struct).GetField("l", BindingFlags.NonPublic | BindingFlags.Instance);
Debug.Assert(fi.MetadataToken == fd->MemberDef); // passes!
Then we'll see how the runtime laid out Struct
:
Console.WriteLine(GetFieldDesc(typeof(Struct), "l")->Offset); == 0
Console.WriteLine(GetFieldDesc(typeof(Struct), "i")->Offset); == 8
We'll verify we have the correct offset by writing an int
to s
's memory at the offset of i
that i
's FieldDesc
gave us.
Struct s = new Struct();
IntPtr p = new IntPtr(&s);
Marshal.WriteInt32(p, GetFieldDesc(typeof(Struct), "i")->Offset, 123);
Debug.Assert(s.Int == 123); // passes!
i
is at offset 8
because the CLR sometimes puts the largest members first in memory. However, there are some exceptions:
Let's see what happens when we put a larger value type inside Struct
.
struct Struct
{
private decimal d;
private string s;
private int i;
}
This will cause the CLR to insert padding to align Struct
:
Console.WriteLine(GetFieldDesc(typeof(Struct), "d")->Offset); == 16
Console.WriteLine(GetFieldDesc(typeof(Struct), "s")->Offset); == 0
Console.WriteLine(GetFieldDesc(typeof(Struct), "i")->Offset); == 8
This means there's 4
bytes of padding at offset 12
.
The CLR also doesn't insert padding at all if the struct
is explicitly laid out:
[StructLayout(LayoutKind.Explicit)]
struct Struct
{
[FieldOffset(0)] private decimal d;
[FieldOffset(16)] private int i;
[FieldOffset(20)] private long l;
}
Console.WriteLine(GetFieldDesc(typeof(Struct), "d")->Offset); == 0
Console.WriteLine(GetFieldDesc(typeof(Struct), "l")->Offset); == 20
Console.WriteLine(GetFieldDesc(typeof(Struct), "i")->Offset); == 16
What About Static Fields?
According to FieldDescs
of static
fields, they still have offsets. However, their offset will be a big number, like 96. Static
fields are stored in the type's MethodTable
(another internal structure).
What Can We Make With This?
You can make a method identical to C's offsetof
macro:
public static int OffsetOf<TType>(string fieldName)
{
return GetFieldDesc(typeof(TType), fieldName)->Offset;
}
You may be thinking, why not just use Marshal.OffsetOf
? Well, because that's the marshaled offset and it doesn't work with unmarshalable or reference types.
You can also make a class to print the layout of an object. I wrote one which can get the layout of any object (except arrays). You can get the code for that here.
Struct s = new Struct();
ObjectLayout<Struct> layout = new ObjectLayout<Struct>(ref s);
Console.WriteLine(layout);
Output:
| Field Offset | Address | Size | Type | Name | Value |
|--------------|--------------|------|---------|-----------|-------|
| 0 | 0xD04A3FEE60 | 16 | Decimal | d | 0 |
| 16 | 0xD04A3FEE70 | 4 | Int32 | i | 0 |
| 20 | 0xD04A3FEE74 | 4 | Byte | (padding) | 0 |
| 24 | 0xD04A3FEE78 | 8 | Int64 | s | 0 |
Sources
- My GitHub
- Complete FieldDesc code
- CoreCLR: /src/vm/field.cpp, /src/vm/field.h