Click here to Skip to main content
15,885,757 members
Articles / Programming Languages / C#

Fast Native Structure Reading in C# using Dynamic Assemblies

,
Rate me:
Please Sign up or sign in to vote.
4.77/5 (14 votes)
17 Feb 2009MIT3 min read 58.8K   506   31   15
This article shows how to generate dynamic methods for fast byte to structure conversion

Introduction  

This article demonstrates how to convert bytes into the user-defined data structures using dynamically emitted code. 

Sasha Goldshtein wrote an excellent article on this topic, analyzing various ways to read user-defined structs from byte arrays. This article builds on his work and proposes a faster and more generic alternative using code generation. The attached code includes both Sasha's original code and an open source toolkit that helps with the code generation. 

Background    

The fastest solution shown by Sasha's article was using the fixed keyword for non-generic types: 

C#
static unsafe Packet ReadUsingPointer(byte[] data)
{
    fixed (byte* packet = &data[0])
    {
        return *(Packet*)packet;
    }
} 

To make this truly useful, we need to have a generic method:

C#
static T Packet ReadUsingPointer<T>(byte[] data)
{
    fixed (byte* packet = &data[0])
    {
        return *(T*)packet; // Would not compile
    }
} 

Unfortunately, due to the limitations of C#, it is not possible to create a generic method T ReadIntoStruct<T>(byte[] data), so replacing Packet with generic T simply would not compile, even if T is restricted to value types (struct). To compile, T must adhere to a different set of requirements set forth in §18.2 of the C# language specifications v3.0:  

An unmanaged-type is any type that isn't a reference-type and doesn't contain reference-type fields at any level of nesting. In other words, an unmanaged-type is one of the following:
•    sbyte, byte, short, ushort, int, uint, long, ulong, char, float, double, decimal, or bool.
•    Any enum-type.
•    Any pointer-type.
•    Any user-defined struct-type that contains fields of unmanaged-types only.

Note that strings are not in that list, even though you can use them in structs. Fixed-size arrays of unmanaged-types are allowed.

The proposed solution is to dynamically generate identical method but for a given type, and use  a generic interface ICall<T>. Alternatively, a static method is also generated to compare the cost of calling static and interface methods.

To avoid any strange behavior when <code>T does not satisfy unmanaged-type requirements, we have to validate type T recursively against all of the rules - TypeExtensions.ThrowIfNotUnmanagedType(). I just hope some day the object Type will have a simple property to check instead of all the code I had to write, but for now it's an extension method on Type object.

Common Intermediary Language (CIL) is fairly complex, but deep understanding is not needed to accomplish method generation. First, I used Reflector to view the CIL generated for the prototype methods. Then, I adapted an excellent OSS library Business Logic Toolkit for .NET to emit CIL identical to the prototype but for a different type. This article gives a good introduction on how to use toolkit's emit functionality.  In my code, I changed all the helper classes into extension methods, making the process much more streamlined.

Here is what the method generation looks like. Note the replacement of ReadingStructureData.Packet with the type of another item.

C#
var emit = methodBuilder.GetILGenerator();

// .locals init (
//  [0] uint8& pinned packet,
var l0 = emit.DeclareLocal(typeof (byte).MakeByRefType(), true);
//  [1] valuetype ReadingStructureData.Packet CS$1$0000,
var l1 = emit.DeclareLocal(itemType);

var L_0012 = emit.DefineLabel();

// because this code was taken from an instance method,
// "this" was the parameter 0, but was never used
emit
    .ldarg(methodBuilder, param) //L_0000: ldarg.0
    .ldc_i4_0()         //L_0001: ldc.i4.0
    .ldelema(typeof (byte))     //L_0002: ldelema uint8
    .stloc(l0)      //L_0007: stloc.0
    .ldloc(l0)      //L_0008: ldloc.0
    .conv_i()           //L_0009: conv.i
    .ldobj(itemType)        //L_000a: ldobj ReadingStructureData.Packet
    .stloc(l1)      //L_000f: stloc.1
    .leave_s(L_0012)        //L_0010: leave.s L_0012
    .MarkLabelExt(L_0012)   //L_0012:
    .ldloc(l1)      //ldloc.1
    .ret()          //L_0013: ret
    ;

Using the Code

The sample creates two methods - one as an interface, which requires an instance of an object, and a delegate to static method.    

C#
// Generate code
ICall<Packet> interfaceObj;
Func<byte[], Packet> staticDelegate;

WrapperFactory.Instance.CreateDynamicMethods(out interfaceObj, out staticDelegate);
var result = staticDelegate(sourceData); 	// Call static implementation (slower)
var result = interfaceObj.ReadItem(sourceData); // Interface implementation (faster)

Performance Study

Even though the numbers change from run to run, the overall results are that generated code is close in speed to prototype. Also note the time it takes to emit the new code. Even though the time would be reduced when multiple types are wrapped, it is still significant.  

Non-Generic Solutions:

BinaryReader:   5,259.00
Pointer:        199.00

Generic Solutions:

MarshalSafe:    10,982.00
MarshalUnsafe:  6,944.00
C++/CLI:        467.00

Dynamically-generated solution:

Calling static prototype:       199.00
Calling interface prototype:    214.00
Creating dynamic methods:       14.00
Calling generated static:       213.00 (07% slower than static prototype)
Calling generated interface:    221.00 (03% slower than interface prototype)

Points of Interest

Even though .NET specification states that the statements fixed(byte *p = array) and fixed(byte *p = &array[0]) are equivalent, IL showed a completely different story. The first statement generated significantly more IL instructions. An issue has been created at Microsoft Connect. You can view the IL code difference there.   

History

  • 2/14/2009 - Initial upload
  • 2/19/2009 - Updated to remove external code dependencies

License

This article, along with any associated source code and files, is licensed under The MIT License


Written By
Chief Technology Officer SELA Group
Israel Israel
Sasha Goldshtein is the CTO of SELA Group, an Israeli company specializing in training, consulting and outsourcing to local and international customers.

Sasha's work is divided across these three primary disciplines. He consults for clients on architecture, development, debugging and performance issues; he actively develops code using the latest bits of technology from Microsoft; and he conducts training classes on a variety of topics, from Windows Internals to .NET Performance.

You can read more about Sasha's work and his latest ventures at his blog: http://blogs.microsoft.co.il/blogs/sasha

Sasha writes from Jerusalem, Israel.

Written By
Architect
United States United States
Yuri works in a small hedge fund in New York, designing various aspects of the trading platform. In the spare time, Yuri participates in various open source initiatives, such as Wikipedia, where he designed and implemented MediaWiki API - http://www.mediawiki.org/wiki/API

Yuri writes from New York

Comments and Discussions

 
QuestionProblems with Structure Fields Getting Reordered? Pin
James R. Twine2-Oct-12 11:40
James R. Twine2-Oct-12 11:40 
AnswerRe: Problems with Structure Fields Getting Reordered? Pin
Yuri Astrakhan2-Oct-12 12:10
Yuri Astrakhan2-Oct-12 12:10 
GeneralRe: Problems with Structure Fields Getting Reordered? Pin
James R. Twine3-Oct-12 4:02
James R. Twine3-Oct-12 4:02 
GeneralRe: Problems with Structure Fields Getting Reordered? Pin
Yuri Astrakhan3-Oct-12 8:15
Yuri Astrakhan3-Oct-12 8:15 
GeneralRe: Problems with Structure Fields Getting Reordered? Pin
James R. Twine4-Oct-12 5:12
James R. Twine4-Oct-12 5:12 
Generalreading msmq and convert byte to structure and insert into database Pin
Amit kumar pathak14-Mar-11 0:29
Amit kumar pathak14-Mar-11 0:29 
GeneralVery nice Pin
Alois Kraus29-Mar-09 11:47
Alois Kraus29-Mar-09 11:47 
GeneralRe: Very nice Pin
Yuri Astrakhan12-Apr-09 9:03
Yuri Astrakhan12-Apr-09 9:03 
GeneralBenchmark is not accurate.. Pin
Robert Cooley15-Mar-09 19:05
Robert Cooley15-Mar-09 19:05 
GeneralRe: Benchmark is not accurate.. Pin
Yuri Astrakhan16-Mar-09 4:26
Yuri Astrakhan16-Mar-09 4:26 
GeneralAccess Denied Pin
Robert Cooley15-Mar-09 4:32
Robert Cooley15-Mar-09 4:32 
GeneralRe: Access Denied Pin
Yuri Astrakhan15-Mar-09 17:59
Yuri Astrakhan15-Mar-09 17:59 
GeneralDoh.. Pin
Robert Cooley15-Mar-09 18:39
Robert Cooley15-Mar-09 18:39 
GeneralExtra CIL code for fixed(byte* packet = data) Pin
Tiberius 5117-Feb-09 8:29
Tiberius 5117-Feb-09 8:29 
GeneralRe: Extra CIL code for fixed(byte* packet = data) Pin
Yuri Astrakhan17-Feb-09 9:07
Yuri Astrakhan17-Feb-09 9:07 
Makes sense, although they should have made a comment about the difference in behavior. Does the pointer become null in case the array is null or empty? I will keep parameter validation outside of the dynamic method, keeping generated code to the minimum. Thanks!

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.