The idea of this article is not to present a complete Orca replacement
but to show how the combination of Reflection.Emit and the Reflection
based discovery mechanism of WPF helps solve the problem when data must
be shown which has a structure that is not known at compile time.
The MSI SDK defines close to 100 tables
and the columns they contain. One possible approach to build a viewer application would be to model all known tables as classes. This approach has two grave disadvantages.
The first is that it's a lot of work and the second and more important one is that a lot of MSI files contain custom tables that couldn't be shown with this approach.
A better solution (and the one presented here) is to use the MSI database functions to retrieve the schema of the database and create the required classes dynamically.
Using the Code
The sample project contains three classes:
This is a helper class for creating dynamic types on the fly. The instance method
GetTypeFromPropertyList takes an array of Name/Type pairs as input and returns
the type of the created class. The class internally caches created types so that each dynamic type is only created once. The criteria for cache lookup are
equality on all input Name/Type pairs. This is the only class that internally uses Reflection.Emit.
This class internally uses the automation interface of Windows Installer to retrieve the data found in an MSI
database. The class uses the passed in
TypeBuilder instance to create dynamic types on the fly. The class exposes three public methods:
GetTableNames: returns a list of tables contained in the installer file.
GetTableContent: returns an array of dynamically created objects that are populated with the content of the table.
The method requires the name of the table as input.
GetBinaryContent: returns an array of bytes representing binary data found in the database.
The reason why binary data is handled differently than the basic types (int and string) is one of size and performance. Simple types are directly read from the database and copied into
the dynamically created object. Because binary data blobs can be huge, it makes no sense to read all the data and copy its entire content into the model.
Instead, a reference to the location in the database is copied into the model. This database reference is represented by the
(which is a nested private class within
IBinaryConentDescription which is its publicly exposed representation. The reference can then be passed
GetBinaryContent method to retrieve the actual data.
This class’ main responsibility is displaying the content of the loaded database and reacting on user input. The
ListBox (left-pane) shows the tables found in the
GridView (right-pane) shows the content of the selected table. The class has a single private method (
that creates an instance of the
MsiReader class, retrieves the data, and sets with it the
DataContext for WPF. The rest of the processing is done
by WPF's data-binding magic.
The image above shows the controls that are data-bound to the model. The model is an anonymous class created within the
The following sequence diagram shows how the three classes and the Windows installer interact:
The following shows how one of the database tables (
is mapped to C# and finally to the IL code that is emitted by the
The generated dynamic type representing the
TextStyle table looks as follows in C#:
public dyn_<guid>(string p1, string p2, int p3, int? p4, int? p5)
m_TextStyle = p1;
m_FaceName = p2;
m_Size = p3;
m_Color = p4;
m_StyleBits = p5;
public string TextStyle
public string FaceName
public int Size
public int? Color
public int? StyleBits
To note, here is how the database types are mapped to the .NET types.
And finally, the same in IL (
StyleBits removed to keep listing short):
.field private string m_TextStyle
.field private int32 m_Size
.field private valuetype [mscorlib]System.Nullable`1<int32> m_Color
.method public hidebysig specialname rtspecialname
instance void .ctor (
valuetype [mscorlib]System.Nullable`1<int32> p4
) cil managed
IL_0001: call instance void [mscorlib]System.Object::.ctor()
IL_0008: stfld string MSIExplorer.dyn_abc::m_TextStyle
IL_0016: stfld int32 MSIExplorer.dyn_abc::m_Size
IL_001c: ldarg.s p4
IL_001e: stfld valuetype [mscorlib]System.Nullable`1<int32>
.method public hidebysig specialname
instance string get_TextStyle () cil managed
IL_0001: ldfld string MSIExplorer.dyn_abc::m_TextStyle
.method public hidebysig specialname
instance int32 get_Size () cil managed
IL_0001: ldfld int32 MSIExplorer.dyn_abc::m_Size
.method public hidebysig specialname
instance valuetype [mscorlib]System.Nullable`1<int32>
get_Color () cil managed
IL_0001: ldfld valuetype [mscorlib]System.Nullable`1<int32>
.property instance string TextStyle()
.get instance string MSIExplorer.dyn_abc::get_TextStyle()
.property instance int32 Size()
.get instance int32 MSIExplorer.dyn_abc::get_Size()
.property instance valuetype [mscorlib]System.Nullable`1<int32> Color()
.get instance valuetype [mscorlib]System.Nullable`1<int32>
And this is how the type looks like in the grid:
This is also the approach I've taken while writing the IL code in the
I've first written the above C# class, used ILSpy to have a look at the generated
IL, and copied that to the source file. Converting the original IL code to Reflection.Emit calls is no big issue.
With the above knowledge, it should be easy to understand what the code below does:
private Type CreateTypeFromPropertyList
(Tuple<string, Type> properties, string typeName)
Emit.TypeBuilder tb = mb.DefineType(typeName, TypeAttributes.Public);
var fields = new List<Emit.FieldBuilder>();
foreach (var prop in properties)
Emit.FieldBuilder fb = tb.DefineField("m_" + prop.Item1,
Emit.PropertyBuilder pb = tb.DefineProperty(prop.Item1,
PropertyAttributes.HasDefault, prop.Item2, null);
Emit.MethodBuilder mbPropGetAccessor = tb.DefineMethod("get_" + prop.Item1,
MethodAttributes.Public | MethodAttributes.SpecialName |
MethodAttributes.HideBySig, prop.Item2, Type.EmptyTypes);
Emit.ILGenerator propGetIL = mbPropGetAccessor.GetILGenerator();
Emit.ConstructorBuilder ctor2 = tb.DefineConstructor(
properties.Select(a => a.Item2).ToArray());
Emit.ILGenerator ctor2IL = ctor2.GetILGenerator();
foreach (Emit.FieldBuilder fb in fields)
ctor2IL.Emit(Emit.OpCodes.Ldarg, (byte)fields.FindIndex(a => a == fb) + 1);
Note the changed ordering compared to the original IL. The reason for that is that I wanted to reduce the number of required loops
without compromising on readability. The first loop creates the field, the property, and the property getter (in that order),
while the second loop emits the code to set the field in the constructor body.
For initial testing, I used Reflection-Emit's capability to generate an assembly and checked its content with
ILSpy and PEVerify.exe.
Points of Interest
One of the biggest surprises for me was how arrays work in regard to Reflection and data-binding. The first naive approach was to just collect the data and do
ToArray to return the data. There was a big surprise when WPF's data binding didn't work with this approach.
public object GetTableContent(string tableName)
var resList = new List<object>();
var arr = Array.CreateInstance(rowType, resList.Count);
Array.Copy(resList.ToArray(), arr, resList.Count);
The difference of the two arrays cannot be seen with the debugger, both look like they are of type
object but the one that
works is actually an array of a dynamic type.
DataGrid and Nullable Types
Another annoying bug/feature is
DataGrid's inability to properly handle nullable types
in regard to sorting. Some of the integers are nullable in the database and are therefore represented as
int? in the dynamic classes. For those columns, sorting just stops working!
LINQ and Memory Consumption
The third issue I had was one of memory consumption. When saving large binary files (> 100 MB), there is a possibility of an
OutOfMemoryException being thrown.
This happens when running the X86 platform debug build. When running the program on X64, the limit is way above anything I could find in my MSI files.
The reason why this happens so early is probably how I convert the stream data (returned as string from the installer API) to a byte array. I use the following LINQ code:
<string buffer>.SelectMany(c => BitConverter.GetBytes(c)).ToArray();
This is probably one of the areas where unsafe code would be justifiable to improve performance and to reduce the risk of running out of memory.
Another interesting observation is how easy it was to introduce delayed loading of the table content into the application by inserting
Lazy<T> member at the proper location.
private void ProcessMsiFile(string fileName)
MsiReader msiReader = new MsiReader(builder, fileName);
this.DataContext = new
FileName = Path.GetFileName(fileName),
(from tableName in msiReader.GetTableNames()
TableName = tableName,
Rows = new Lazy<object>
(() => msiReader.GetTableContent(tableName))
The reason for this trick was the relatively slow loading of large MSI files. Note also that by inserting
ProcessMsiFile function, the sequence diagram above doesn’t reflect anymore how long the instance
The code shown here is by no means ready to be used in a real-word program. The goal here is to show how it's possible to solve a relatively complex
problem with a few hundred lines of code by leveraging some of the more exotic APIs available in the .NET Framework. This is also the reason why
there are a lot of features missing from the sample program to make it really useful. Some of the more obvious ones are support for modifying the
database and a find/replace facility. Although these two features would be really useful, I felt that I'd miss the goal of the article by losing
the simplicity of the current implementation.