.NET CLR Injection: Modify IL Code on Run-time

Jerry.Wang

Rate me:

4.98/5 (240 votes)

3 Oct 2012CPOL10 min read

594.3K

18.4K

352

Modify methods' IL codes on runtime even if they have been JIT-compiled, supports Release mode, and variants of .NET versions, from 2.0 to 4.0.

This is an old version of the currently published article.

Introduction

Modifying .NET methods' MSIL codes during run-time is very cool, it helps to implement hooking, software protection, and other amazing stuff. That's why I want it, but there is a big challenge on the road -- the MSIL code could have been complied to native code by JIT-complier before we have a chance to modify them; also the .NET CLR implantation is not documented and it changes during each version, we need a reliable and stable way without dependency to the exact memory layout.

Anyway, after more than one week research, finally I made it! Here is a simple method in the demo problem:

protected string CompareOneAndTwo()
{
    int a = 1;
    int b = 2;
    if (a < b)
    {
        return "Number 1 is less than 2";
    }
    else
    {
        return "Number 1 is greater than 2 (O_o)";
    }
}

Certainly it returns "Number 1 is less than 2"; let's try to make it return the incorrect result "Number 1 is greater than 2 (O_o)".

Looking at the MSIL code for this method, we can do it by changing the opcode from Bge_S to Blt_S. And then the jump works in a different logic which returns in a wrong result, that is what I need.

And if you try in the demo application, it shows a wrong answer as below.

Here is the code replacing the IL, I assume there are enough comments between the lines.

You can download the demo program and have a try.

Supports variants of .NET versions from 2.0 to 4.0
Supports variants of methods to be modified, including dynamic methods and generic methods.
Supports release mode .NET process.

Using the code

Copy the InjectionHelper.cs file into your project, it contains several methods.

public static class InjectionHelper
{
    // Load the unmanaged injection.dll, the initlaization happens in a background thread
    // you can check if the initialization is completed by GetStatus()
    public static void Initialize()
 
    // Unload the unmanaged injection.dll
    public static void Uninitialize()
 
    // Update the IL Code of a Method.
    public static void UpdateILCodes(MethodInfo method, byte[] ilCodes) 
 
    // The method returns until the initialization is completed
    public static Status WaitForIntializationCompletion()
 
    // Query the current status of the unmanaged dll, returns immediately.
    public static Status GetStatus()
}

The InjectionHelper.Initialize method loads the unmanaged injection.dll from the directory of the current assembly directory, so all the related files need to be there, or you can modify the code to change the location.

Here is the file list.

File Name	Description
Injection.dll	unmanaged DLL to do the work in this article (x86 version, x64 version will be out sooner or later)
EasyHook32.dll	x86 EasyHook DLL from http://easyhook.codeplex.com/ (used by Injection.dll)
EasyHook64.dll	x86 EasyHook DLL from http://easyhook.codeplex.com/ (will be used by x64 Injection.dll)
symchk.exe SymbolCheck.dll	Tool to download PDB file, copied from Windows Debug Tools.
dbg32.dll	x86 version of dbghelp.dll 6.2. used by Injection.dll and symchk.exe I changed the file name to avoid version confliction Also the PE import table of symchk.exe is modified to link to this DLL
PDB_symbols/*	The PDB symbol files local cache. Can be removed but will slow down the initialization.
Test_x86_Net20_Release.exe	Test program for x86 / .NET 2.0 / Release mode, not required for distribution.
Test_x86_Net35_Release.exe	Test program for x86 / .NET 3.5 / Release mode, not required for distribution.
Test_x86_Net40_Release.exe	Test program for x86 / .NET 4.0 / Release mode, not required for distribution.

Background

Replace the IL code

First, take a look at how the CLR and JIT works.

The JIT implementation DLLs (clrjit.dll for .Net4.0+ / mscorjit.dll for .NET 2.0+) expose a _stdcall method getJit, which returns the ICorJitCompiler interface.

The CLR implementation DLLs (clr.dll for .NET 4.0+ / mscorwks.dll for .NET 2.0+) invokes the getJit method to obtain the ICorJitCompiler interface, then calls its compileMethod method to compile MSIL code to native code.

C++

CorJitResult compileMethod(ICorJitInfo * pJitInfo, CORINFO_METHOD_INFO * pMethodInfo, 
   UINT nFlags, LPBYTE * pEntryAddress, ULONG * pSizeOfCode);

This part is quite easy, just find the location of the compileMethod method, replace the entry via EasyHook.

C++

// define the interface method function pointer
typedef CorJitResult (__stdcall ICorJitCompiler::*PFN_compileMethod)(ICorJitInfo * pJitInfo
	, CORINFO_METHOD_INFO * pMethodInfo
	, UINT nFlags
	, LPBYTE * pEntryAddress
	, ULONG * pSizeOfCode
	);
 
// store the address of the real compileMethod
PFN_compileMethod_V4 s_pComplieMethod = ...; 
 
// hook the compileMethod with my own compileMethod
LhInstallHook( (PVOID&)s_pComplieMethod
		, &(PVOID&)CInjection::compileMethod
		, NULL
		, &s_hHookCompileMethod
		);
 
// and here is my compileMethod
CorJitResult __stdcall CInjection::compileMethod(ICorJitInfo * pJitInfo
	, CORINFO_METHOD_INFO * pCorMethodInfo
	, UINT nFlags
	, LPBYTE * pEntryAddress
	, ULONG * pSizeOfCode
	)
{
	// TO DO: modify IL codes here 

	// Call real compileMethod
	CorJitResult result = (pCorJitCompiler->*s_pComplieMethod_V4)( 
	  pJitInfo, pCorMethodInfo, nFlags, pEntryAddress, pSizeOfCode);

	return result;
}

Modify IL code for JIT-complied methods

Now we are here, the compileMethod method above won't be called by CLR for the JIT-compiled method. To solve this problem, my idea is to restore the data structures in CLR to the previous status before JIT-compliation. And in this case, complileMethod will be called again and we can replace the IL.

Thus we have to look into the implementation of CLR a bit, SSCLI (Shared Source Common Language Infrastructure) is a good reference from Microsoft although it is quite out of date and we can't use it in our code.

The above diagram is a bit out of date, but the primary structure is the same. For each "class" in .NET, there is at least one MethodTable structure in memory. And each MethodTable is related to a EEClass, which stores the runtime type information for Reflection and other use.

For each "method", there is corresponding MethodDesc data structure in memory containing the information of this method like flags / slot address / entry address / etc.

Before a method is JITted-complied, the slot is pointed to a JMI thunk (prestub), which triggers JIT compliation; when the IL code is complied, the slot is rewritten to point to the JMI thunk, which jumps to complied native code directly.

To restore the data structure, first clear the flags, then modify the entry address back to a temporary entry address, and so on. I successfully did that in the debugger by modifying the memory directly. But this is messy, it depends on the layout of the data structures, and the code is unreliable for different versions of .NET.

I was seeking a reliable manner, and luckily, I found the MethodDesc::Reset method in SSCLI source code (vm/method.cpp).

C++

void MethodDesc::Reset()
{
    CONTRACTL
    {
        THROWS;
        GC_NOTRIGGER;
    }
    CONTRACTL_END
 
    // This method is not thread-safe since we are updating
    // different pieces of data non-atomically.
    // Use this only if you can guarantee thread-safety somehow.

    _ASSERTE(IsEnCMethod() || // The process is frozen by the debugger
             IsDynamicMethod() || // These are used in a very restricted way
             GetLoaderModule()->IsReflection()); // Rental methods                                                                 

    // Reset any flags relevant to the old code
    ClearFlagsOnUpdate();
 
    if (HasPrecode())
    {
        GetPrecode()->Reset();
    }
    else
    {
        // We should go here only for the rental methods
        _ASSERTE(GetLoaderModule()->IsReflection());
 
        InterlockedUpdateFlags2(enum_flag2_HasStableEntryPoint | enum_flag2_HasPrecode, FALSE);
 
        *GetAddrOfSlotUnchecked() = GetTemporaryEntryPoint();
    }
 
    _ASSERTE(!HasNativeCode());
}

As you can see above, it is doing the same thing for me. Hence I just need invoke this method to reset the MethodDesc status to pre-JITted.

Certainly I can't use the MethodDesc from SSCLI, and the MethodDesc is internal used by MS, whose exact implementation and layout are unknown to everyone except Microsoft.

After endless mountains and rivers that leave doubt whether there is a path out, suddenly one encounters the shade of a willow, bright flowers, and a lovely village.

Fortunately the address of this internal method exists in the PDB symbol from Microsoft Symbol Server, and it solves my problem. The Reset() method's address in the CLR DLL can be known by parsing the PDB file!

Now only one mandatory parameter is left -- the this pointer of MethodDesc. It is not hard to obtain this pointer. Actually MethodBase.MethodHandle.Value == CORINFO_METHOD_HANDLE == MethodDesc address == this pointer of MethodDesc .

Thus, I have my MethodDesc class below defined in unmanaged code.

C++

typedef void (MethodDesc::*PFN_Reset)(void);
typedef BOOL (MethodDesc::*PFN_IsGenericMethodDefinition)(void);
typedef ULONG (MethodDesc::*PFN_GetNumGenericMethodArgs)(void);
typedef MethodDesc * (MethodDesc::*PFN_StripMethodInstantiation)(void);
typedef BOOL (MethodDesc::*PFN_HasClassOrMethodInstantiation)(void);
typedef BOOL (MethodDesc::*PFN_ContainsGenericVariables)(void);    
typedef DictionaryLayout * (MethodDesc::*PFN_GetDictionaryLayout)(void);
typedef Dictionary * (MethodDesc::*PFN_GetMethodDictionary)(void);
typedef MethodDesc * (MethodDesc::*PFN_GetWrappedMethodDesc)(void);
class MethodDesc
{
public:
    void Reset(void) { (this->*s_pfnReset)(); }
    BOOL IsGenericMethodDefinition(void) { return (this->*s_pfnIsGenericMethodDefinition)(); }
    ULONG GetNumGenericMethodArgs(void) { return (this->*s_pfnGetNumGenericMethodArgs)(); }
    MethodDesc * StripMethodInstantiation(void) { return (this->*s_pfnStripMethodInstantiation)(); }
    BOOL HasClassOrMethodInstantiation(void)  { return (this->*s_pfnHasClassOrMethodInstantiation)(); }
    BOOL ContainsGenericVariables(void) { return (this->*s_pfnContainsGenericVariables)(); }
    DictionaryLayout * GetDictionaryLayout(void) { return (this->*s_pfnGetDictionaryLayout)(); }
    Dictionary * GetMethodDictionary(void) { return (this->*s_pfnGetMethodDictionary)(); }
    MethodDesc * GetWrappedMethodDesc(void) { return (this->*s_pfnGetWrappedMethodDesc)(); }
private:
    static PFN_Reset s_pfnReset;
    static PFN_IsGenericMethodDefinition s_pfnIsGenericMethodDefinition;
    static PFN_GetNumGenericMethodArgs s_pfnGetNumGenericMethodArgs;
    static PFN_StripMethodInstantiation s_pfnStripMethodInstantiation;
    static PFN_HasClassOrMethodInstantiation s_pfnHasClassOrMethodInstantiation;
    static PFN_ContainsGenericVariables s_pfnContainsGenericVariables;
    static PFN_GetDictionaryLayout s_pfnGetDictionaryLayout;
    static PFN_GetMethodDictionary s_pfnGetMethodDictionary;
    static PFN_GetWrappedMethodDesc s_pfnGetWrappedMethodDesc;
};

The static variables above store the addresses of the internal methods from the MethodDesc implementation from the CLR DLL. And they are initialized when my unmanaged DLL is loaded. And the public members just call the internal method with the this pointer.

Now it becomes quite easy to invoke Microsoft's internal methods. Like:

C++

MethodDesc * pMethodDesc = (MethodDesc*)pMethodHandle;
pMethodDesc->Reset();

Find internal methods' addresses from the PDB Symbol file

When the unmanaged DLL is loaded, it checks the environment to see which version of CLR/JIT is there. And it tries to seek the address for all the internal methods from the PDB file. If the seek fails, it will try to launch symchk.exe from Windows Debug Tools to download the corresponding PDB symbol files from Microsoft Symbol Server. This procedure requires a long time, from several seconds to several minutes. Maybe we can optimize to cache the address of the CLR/JIT DLLs by calculating their binary hash value.

You can see more details in the source code, the SearchMethodAddresses and Intialize methods from the unmanaged DLL.

Reset the MethodDesc to pre-JITted status

Now everything is ready. The unmanaged DLL exports a method for managed codes, accepts the IL codes and MethodBase.MethodHandle.Value from the managed code.

C++

BOOL CInjection::StartUpdateILCodes( MethodTable * pMethodTable
    , CORINFO_METHOD_HANDLE pMethodHandle
    , LPBYTE pBuffer
    , DWORD dwSize
    )
{
    if( s_nStatus != Status_Ready || !pMethodHandle )
        return FALSE;
 
    MethodDesc * pMethodDesc = (MethodDesc*)pMethodHandle;
    pMethodDesc->Reset();
 
    MethodDesc * pStripMethodDesc = pMethodDesc->StripMethodInstantiation();
    if( pStripMethodDesc )
        pStripMethodDesc->Reset();
 
    // this is a generic method
    if( pMethodDesc->HasClassOrMethodInstantiation() )
    {
        MethodDesc * pWrappedMethodDesc = pMethodDesc->GetWrappedMethodDesc();
        if( pWrappedMethodDesc )
        {
            pWrappedMethodDesc->Reset();
        }
    }
 
    std::map< CORINFO_METHOD_HANDLE, ILCodeBuffer>::iterator iter = s_mpILBuffers.find(pMethodHandle);
    if( iter != s_mpILBuffers.end() )
    {
        LocalFree(iter->second.pBuffer);
        s_mpILBuffers.erase(iter);
    }
 
    ILCodeBuffer tILCodeBuffer = { pBuffer, dwSize };
    s_mpILBuffers[pMethodHandle] = tILCodeBuffer;
 
    return TRUE;
}

The code above just calls the Reset() method, and stores the IL codes in a map, which will be used by complieMethod when the method gets complied.

And in complieMethod, just replace the ILCode, with code like below.

C++

CorJitResult __stdcall CInjection::compileMethod(ICorJitInfo * pJitInfo
    , CORINFO_METHOD_INFO * pCorMethodInfo
    , UINT nFlags
    , LPBYTE * pEntryAddress
    , ULONG * pSizeOfCode
    )
{
    ICorJitCompiler * pCorJitCompiler = (ICorJitCompiler *)this;
    LPBYTE pOriginalILCode = pCorMethodInfo->ILCode;
    unsigned int nOriginalSize = pCorMethodInfo->ILCodeSize;
 
    // find the method to be replaced
    std::map< CORINFO_METHOD_HANDLE, ILCodeBuffer>::iterator iter = s_mpILBuffers.end();
    if( pCorMethodInfo && GetStatus() == Status_Ready )
    {
        MethodDesc * pMethodDesc = (MethodDesc*)pCorMethodInfo->ftn;
        std::map< CORINFO_METHOD_HANDLE, ILCodeBuffer>::iterator iter = 
                       s_mpILBuffers.find((CORINFO_METHOD_HANDLE)pMethodDesc);
 
        // if the current method is not found, try to search its generic definition method
        if( iter == s_mpILBuffers.end() &&
            pMethodDesc->HasClassOrMethodInstantiation() )
        {
            pMethodDesc = pMethodDesc->StripMethodInstantiation();
            iter = s_mpILBuffers.find((CORINFO_METHOD_HANDLE)pMethodDesc);
        }
 
        if( iter != s_mpILBuffers.end() )
        {
            pCorMethodInfo->ILCode = iter->second.pBuffer;
            pCorMethodInfo->ILCodeSize = iter->second.dwSize;
        }
    }
 
    CorJitResult result = (pCorJitCompiler->*s_pComplieMethod_V4)( pJitInfo, 
               pCorMethodInfo, nFlags, pEntryAddress, pSizeOfCode);
 
    if( iter != s_mpILBuffers.end() )
    {
        pCorMethodInfo->ILCode = pOriginalILCode;
        pCorMethodInfo->ILCodeSize = nOriginalSize;
        LocalFree(iter->second.pBuffer);
        s_mpILBuffers.erase(iter);
    }
 
    return result;
}

Points of interest

Compilation optimizations

I found that if the method is too simple and the IL codes are only several bytes, the method may be complied as inline mode. And in this case, Reset MethodDesc does not help anything because the execution even doesn't reach there. More details can be found in CEEInfo::canInline, (vm/jitinterface.cpp in SSCLI)

Dynamic method

To update the IL code of a dynamic method we need to be very careful. Filling incorrect IL code for other kinds of methods only causes an InvalidApplicationException; but incorrect IL code in a dynamic method can crash the CLR and the whole process! And IL code for a dynamic method is different from that for others. Better to generate the IL code from another dynamic method and then copy and update.

Generic method

I think this is the most complicated part. A generic definition method is mapped to a MethodDesc. But calling the generic method with different types of parameters will cause the CLR to create different instantiations of the definition method. Even more, different kinds of generic methods are implemented in different ways.

shared generic method instantiations
unshared generic method instantiations
instance methods in shared generic classes
instance methods in unshared generic classes
static methods in shared generic classes
static methods in unshared generic classes

If you look at the source code below from demo , you may notice that it is calling InjectionHelper.UpdateILCodes not only for the generic definition method, but also for the JIT-compiled generic method instantiation, returned from MethodInfo.MakeGenericMethod.

MethodInfo destMethodInfo = 
  type.GetMethod("GenericMethodToBeReplaced", 
  BindingFlags.NonPublic | BindingFlags.Instance);

// reset the generic definition MethodInfo
InjectionHelper.UpdateILCodes(destMethodInfo, ilCodes);

// reset a specific instantiation  generic MethodInfo
destMethodInfo = destMethodInfo.MakeGenericMethod(new Type[] { typeof(string), typeof(int) });
InjectionHelper.UpdateILCodes(destMethodInfo, ilCodes);

This is what I am stilling trying to improve -- every firsttime when the generic method is called with different type parameters, MethodDesc::FindOrCreateAssociatedMethodDesc is invoked and the generic method instantiation is created & stored in InstMethodHashTable. I was trying to find out all of the complied generic method instantiations in InstMethodHashTable, and then call their Reset() method, which will certainly resolve the problem above.

Sounds logical but I am blocked by two problems.

First, no reliable manner to get the module's InstMethodHashTable.
Module::GetInstMethodHashTable method is excluded in the PDB. I can know the offset of the InstMethodHashTable address within the Module from disassembled ASM, but it is not a reliable & acceptable way.

Second, the internal methods to traversal the MethodDesc in InstMethodHashTable are also excluded in PDB.

By now, It's quite dark and I am just considering some other tricks can be played at the point when the generic instantiation method being called.

Hope you can share your ideas "

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By

Jerry.Wang

Team Leader

China

Jerry is from China. He was captivated by computer programming since 13 years old when first time played with Q-Basic.

Windows / Linux & C++
iOS & Obj-C
.Net & C#
Flex/Flash & ActionScript
HTML / CSS / Javascript
Gaming Server programming / video, audio processing / image & graphics

Contact: vcer(at)qq.com
Chinese Blog: http://blog.csdn.net/wangjia184

.NET CLR Injection: Modify IL Code on Run-time

Introduction

Using the code

Background

Replace the IL code

Modify IL code for JIT-complied methods

Find internal methods' addresses from the PDB Symbol file

Reset the MethodDesc to pre-JITted status

Points of interest

Compilation optimizations

Dynamic method

Generic method

License

Comments and Discussions