Determining Object Layout using FieldDescs

Ðecimation

5.00/5 (1 vote)

Aug 31, 2018

CPOL

3 min read

8530

Determining the memory layout of an object using CLR structures

Introduction

For every field in an object, the CLR allocates a special structure, FieldDesc, containing metadata for the runtime and Reflection. A FieldDesc contains information such as the field offset, whether the field is static or ThreadStatic, public or private, etc. To determine the layout of an object, we'll be looking specifically at the offset metadata.

Layout of a FieldDesc

Before we can determine the layout of an object, we of course need to know the layout of a FieldDesc. A FieldDesc contains 3 fields:

Offset	Type	Name
0	`MethodTable*`	m_pMTOfEnclosingClass
8	`DWORD`	(DWORD 1)
12	`DWORD`	(DWORD 2)

The CLR engineers designed their structures to be as small as possible; because of that, all the metadata is actually stored as bitfields in DWORD 1 and DWORD 2.

DWORD 1

Bits	Name	Description
24	m_mb	`MemberDef` metadata. This metadata is eventually used in `FieldInfo.MetadataToken` after some manipulation.
1	m_isStatic	Whether the field is `static`
1	m_isThreadLocal	Whether the field is decorated with a `ThreadStatic` attribute
1	m_isRVA	(Relative Virtual Address)
3	m_prot	Access level
1	m_requiresFullMbValue	Whether `m_mb` needs all bits

DWORD 2

Bits	Name	Description
27	m_dwOffset	Field offset
5	m_type	`CorElementType` of the field

Replication in C#

We can easily replicate a FieldDesc in C# using the StructLayout and FieldOffset attributes.

[StructLayout(LayoutKind.Explicit)]
public unsafe struct FieldDesc
{
   [FieldOffset(0)] private readonly void* m_pMTOfEnclosingClass;

   // unsigned m_mb                   : 24;
   // unsigned m_isStatic             : 1;
   // unsigned m_isThreadLocal        : 1;
   // unsigned m_isRVA                : 1;
   // unsigned m_prot                 : 3;
   // unsigned m_requiresFullMbValue  : 1;
   [FieldOffset(8)] private readonly uint m_dword1;

   // unsigned m_dwOffset                : 27;
   // unsigned m_type                    : 5;
   [FieldOffset(12)] private readonly uint m_dword2;
   ...

Reading the bitfields themselves is easy using bitwise operations:

/// <summary>
///     Offset in memory
/// </summary>
public int Offset => (int) (m_dword2 & 0x7FFFFFF);

public int MB => (int) (m_dword1 & 0xFFFFFF);

private bool RequiresFullMBValue => ReadBit(m_dword1, 31);

...

We perform a bitwise AND operation on m_dword2 to get the value of the 27 bits for m_dwOffset.

‭111111111111111111111111111‬ (27 bits) = 0x7FFFFFF

I also made a small function for reading bits for convenience:

static bool ReadBit(uint b, int bitIndex)
{
   return (b & (1 << bitIndex)) != 0;
}

We won't write the code for retrieving all of the bitfields' values because we're only interested in m_dwOffset, but if you're interested, you can view the code for that here. We'll also go back to MB and RequiresFullMBValue later.

Retrieving a FieldDesc for a Field

Thankfully, we don't have to do anything too hacky for retrieving a FieldDesc. Reflection actually already has a way of getting a FieldDesc.

FieldInfo.FieldHandle.Value

Value points to a FieldInfo's corresponding FieldDesc, where it gets all of its metadata. Therefore, we can write a method to get a FieldInfo's FieldDesc counterpart.

public static FieldDesc* GetFieldDescForFieldInfo(FieldInfo fi)
{
   if (fi.IsLiteral) {
      throw new Exception("Const field");
   }

   FieldDesc* fd = (FieldDesc*) fi.FieldHandle.Value;
   return fd;
}

Note: I throw an Exception when the FieldInfo is a literal because you can't access the FieldHandle of a literal (const) field.

We'll wrap the above method in another method to let us get the FieldDesc easier.

private const BindingFlags DefaultFlags =
   BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Static;

public static FieldDesc* GetFieldDesc(Type t, string name, BindingFlags flags = DefaultFlags)
{
   if (t.IsArray) {
      throw new Exception("Arrays do not have fields");
   }


   FieldInfo fieldInfo = t.GetField(name, flags);

   return GetFieldDescForFieldInfo(fieldInfo);
}

Getting a Field's Metadata Token

Earlier in the article, I said that the bitfield m_mb is used for calculating a field's metadata token, which is used in FieldInfo.MetadataToken. However, it requires some calculation to get the proper token. If we look at field.h line 171 in the CoreCLR repo:

mdFieldDef GetMemberDef() const
{
        LIMITED_METHOD_DAC_CONTRACT;

       // Check if this FieldDesc is using the packed mb layout
       if (!m_requiresFullMbValue)
       {
           return TokenFromRid(m_mb & enum_packedMbLayout_MbMask, mdtFieldDef);
       }
 
       return TokenFromRid(m_mb, mdtFieldDef);
}

We can replicate GetMemberDef like so:

public int MemberDef {
   
   get {
      // Check if this FieldDesc is using the packed mb layout
      if (!RequiresFullMBValue)
      {
         return TokenFromRid(MB & (int) MbMask.PackedMbLayoutMbMask, CorTokenType.mdtFieldDef);
      }

      return TokenFromRid(MB, CorTokenType.mdtFieldDef);
   }
}

MbMask:

enum MbMask
{
   PackedMbLayoutMbMask       = 0x01FFFF,
   PackedMbLayoutNameHashMask = 0xFE0000
}

TokenFromRid can be replicated in C# like this:

static int TokenFromRid(int rid, CorTokenType tktype)
{
   return rid | (int) tktype;
}

CorTokenType:

enum CorTokenType
{
   mdtModule                 = 0x00000000, //
   mdtTypeRef                = 0x01000000, //
   mdtTypeDef                = 0x02000000, //
   mdtFieldDef               = 0x04000000, //
   ...

Testing It Out

Note: This was tested on 64-bit.

We'll make a struct for testing:

struct Struct
{
   private long l;
   private int    i;
   public int Int => i;
}

First, we'll make sure our metadata token matches the one Reflection has:

var fd = GetFieldDesc<Struct>("l");
var fi = typeof(Struct).GetField("l", BindingFlags.NonPublic | BindingFlags.Instance);

Debug.Assert(fi.MetadataToken == fd->MemberDef);      // passes!

Then we'll see how the runtime laid out Struct:

Console.WriteLine(GetFieldDesc(typeof(Struct), "l")->Offset); == 0 
Console.WriteLine(GetFieldDesc(typeof(Struct), "i")->Offset); == 8

We'll verify we have the correct offset by writing an int to s's memory at the offset of i that i's FieldDesc gave us.

Struct s = new Struct();

IntPtr p = new IntPtr(&s);
Marshal.WriteInt32(p, GetFieldDesc(typeof(Struct), "i")->Offset, 123);
Debug.Assert(s.Int == 123);    // passes!

i is at offset 8 because the CLR sometimes puts the largest members first in memory. However, there are some exceptions:

Let's see what happens when we put a larger value type inside Struct.

struct Struct
{
   private decimal d;
   private string s;
   private int    i;
}

This will cause the CLR to insert padding to align Struct:

Console.WriteLine(GetFieldDesc(typeof(Struct), "d")->Offset);   == 16
Console.WriteLine(GetFieldDesc(typeof(Struct), "s")->Offset);   == 0
Console.WriteLine(GetFieldDesc(typeof(Struct), "i")->Offset);   == 8

This means there's 4 bytes of padding at offset 12.

The CLR also doesn't insert padding at all if the struct is explicitly laid out:

[StructLayout(LayoutKind.Explicit)]
struct Struct
{
   [FieldOffset(0)]  private decimal d;
   [FieldOffset(16)] private int     i;
   [FieldOffset(20)] private long    l;
}

Console.WriteLine(GetFieldDesc(typeof(Struct), "d")->Offset);   == 0
Console.WriteLine(GetFieldDesc(typeof(Struct), "l")->Offset);   == 20
Console.WriteLine(GetFieldDesc(typeof(Struct), "i")->Offset);   == 16

What About Static Fields?

According to FieldDescs of static fields, they still have offsets. However, their offset will be a big number, like 96. Static fields are stored in the type's MethodTable (another internal structure).

What Can We Make With This?

You can make a method identical to C's offsetof macro:

public static int OffsetOf<TType>(string fieldName)
{
   return GetFieldDesc(typeof(TType), fieldName)->Offset;
}

You may be thinking, why not just use Marshal.OffsetOf? Well, because that's the marshaled offset and it doesn't work with unmarshalable or reference types.

You can also make a class to print the layout of an object. I wrote one which can get the layout of any object (except arrays). You can get the code for that here.

Struct s = new Struct();
ObjectLayout<Struct> layout = new ObjectLayout<Struct>(ref s);
Console.WriteLine(layout);

Output:

| Field Offset | Address      | Size | Type    | Name      | Value |
|--------------|--------------|------|---------|-----------|-------|
| 0            | 0xD04A3FEE60 | 16   | Decimal | d         | 0     |
| 16           | 0xD04A3FEE70 | 4    | Int32   | i         | 0     |
| 20           | 0xD04A3FEE74 | 4    | Byte    | (padding) | 0     |
| 24           | 0xD04A3FEE78 | 8    | Int64   | s         | 0     |

Sources

My GitHub
Complete FieldDesc code
CoreCLR: /src/vm/field.cpp, /src/vm/field.h