Introduction
Modifying .NET methods' MSIL codes during run-time is very cool, it helps to implement hooking, software protection, and other amazing stuff.
That's why I want it, but there is a big challenge on the road -- the MSIL code could have been complied to native code by JIT-complier before
we have a chance to modify them; also the .NET CLR implantation is not documented and it changes during each version, we need a reliable and stable way without dependency to the exact memory layout.
Anyway, after more than one week research, finally I made it! Here is a simple method in the demo problem:
protected string CompareOneAndTwo()
{
int a = 1;
int b = 2;
if (a < b)
{
return "Number 1 is less than 2";
}
else
{
return "Number 1 is greater than 2 (O_o)";
}
}
Certainly it returns "Number 1 is less than 2"; let's try to make it return the incorrect result "Number 1 is greater than 2 (O_o)".
Looking at the MSIL code for this method, we can do it by changing the opcode from Bge_S
to Blt_S
.
And then the jump works in a different logic which returns in a wrong result, that is what I need.
And if you try in the demo application, it shows a wrong answer as below.
Here is the code replacing the IL, I assume there are enough comments between the lines.
You can download the demo program and have a try.
- Supports variants of .NET versions from 2.0 to 4.0
- Supports variants of methods to be
modified, including dynamic methods and generic methods.
- Supports release mode .NET process.
Using the code
Copy the InjectionHelper.cs file into your project, it contains several methods.
public static class InjectionHelper
{
public static void Initialize()
public static void Uninitialize()
public static void UpdateILCodes(MethodInfo method, byte[] ilCodes)
public static Status WaitForIntializationCompletion()
public static Status GetStatus()
}
The InjectionHelper.Initialize
method loads the unmanaged injection.dll from the directory of
the current assembly directory,
so all the related files need to be there, or you can modify the code to change the location.
Here is the file list.
File Name | Description |
Injection.dll | unmanaged DLL to do the work in
this article (x86 version, x64 version will be out sooner or later) |
EasyHook32.dll | x86 EasyHook DLL from
http://easyhook.codeplex.com/ (used by Injection.dll) |
EasyHook64.dll | x86 EasyHook DLL from
http://easyhook.codeplex.com/ (will be used by x64
Injection.dll) |
symchk.exe SymbolCheck.dll | Tool to download PDB file, copied from Windows Debug Tools. |
dbg32.dll | x86 version of dbghelp.dll 6.2. used by
Injection.dll and symchk.exe I changed the file
name to avoid version confliction Also the PE import table of
symchk.exe is modified to link to this DLL |
PDB_symbols/* | The PDB symbol files local cache. Can be removed but will slow down the
initialization. |
Test_x86_Net20_Release.exe | Test program for x86 / .NET 2.0 / Release mode, not required for
distribution. |
Test_x86_Net35_Release.exe | Test program for x86 / .NET 3.5 / Release mode, not required for
distribution. |
Test_x86_Net40_Release.exe | Test program for x86 / .NET 4.0 / Release mode, not required for
distribution. |
Background
Replace the IL code
First, take a look at how the CLR and JIT works.
The JIT implementation DLLs (clrjit.dll for .Net4.0+ / mscorjit.dll for .NET 2.0+) expose a
_stdcall
method getJit
,
which returns the ICorJitCompiler
interface.
The CLR implementation
DLLs (clr.dll for .NET 4.0+ / mscorwks.dll for .NET 2.0+)
invokes the getJit
method to obtain the ICorJitCompiler
interface, then calls its compileMethod
method to compile MSIL code to native code.
CorJitResult compileMethod(ICorJitInfo * pJitInfo, CORINFO_METHOD_INFO * pMethodInfo,
UINT nFlags, LPBYTE * pEntryAddress, ULONG * pSizeOfCode);
This part is quite easy, just find the location of the compileMethod
method, replace the entry via
EasyHook.
typedef CorJitResult (__stdcall ICorJitCompiler::*PFN_compileMethod)(ICorJitInfo * pJitInfo
, CORINFO_METHOD_INFO * pMethodInfo
, UINT nFlags
, LPBYTE * pEntryAddress
, ULONG * pSizeOfCode
);
PFN_compileMethod_V4 s_pComplieMethod = ...;
LhInstallHook( (PVOID&)s_pComplieMethod
, &(PVOID&)CInjection::compileMethod
, NULL
, &s_hHookCompileMethod
);
CorJitResult __stdcall CInjection::compileMethod(ICorJitInfo * pJitInfo
, CORINFO_METHOD_INFO * pCorMethodInfo
, UINT nFlags
, LPBYTE * pEntryAddress
, ULONG * pSizeOfCode
)
{
CorJitResult result = (pCorJitCompiler->*s_pComplieMethod_V4)(
pJitInfo, pCorMethodInfo, nFlags, pEntryAddress, pSizeOfCode);
return result;
}
Modify IL code for JIT-complied methods
Now we are here, the compileMethod
method above won't be called by CLR for
the JIT-compiled method. To solve this problem, my idea is to restore
the data structures in CLR to the previous status before JIT-compliation. And in
this case, complileMethod
will be called again and we can replace the IL.
Thus we have to look into the implementation of CLR a bit, SSCLI (Shared Source Common Language
Infrastructure) is a good reference from Microsoft although it is quite out of date and we can't use it in our code.
The above diagram is a bit out of date, but the primary structure is the same. For each "class" in .NET, there is at least one MethodTable
structure in memory.
And each MethodTable
is related to a EEClass
, which stores the runtime type information for
Reflection and other use.
For each "method", there is corresponding MethodDesc
data structure in memory containing
the information of this method like flags / slot address / entry address / etc.
Before a method is JITted-complied, the slot is pointed to a JMI thunk (prestub), which triggers
JIT compliation; when the IL code is complied, the slot is rewritten
to point to the JMI thunk, which jumps to complied native code directly.
To restore the data structure, first clear the flags, then modify the entry address back to
a temporary entry address, and so on. I successfully did that
in the debugger by modifying the memory directly. But this is messy, it depends on the layout of the data structures, and the code is unreliable for different versions of .NET.
I was seeking a reliable manner, and luckily, I found the MethodDesc::Reset
method in SSCLI source code (vm/method.cpp).
void MethodDesc::Reset()
{
CONTRACTL
{
THROWS;
GC_NOTRIGGER;
}
CONTRACTL_END
_ASSERTE(IsEnCMethod() || IsDynamicMethod() || GetLoaderModule()->IsReflection());
ClearFlagsOnUpdate();
if (HasPrecode())
{
GetPrecode()->Reset();
}
else
{
_ASSERTE(GetLoaderModule()->IsReflection());
InterlockedUpdateFlags2(enum_flag2_HasStableEntryPoint | enum_flag2_HasPrecode, FALSE);
*GetAddrOfSlotUnchecked() = GetTemporaryEntryPoint();
}
_ASSERTE(!HasNativeCode());
}
As you can see above, it is doing the same thing for me. Hence I just need invoke this method to reset the MethodDesc
status to
pre-JITted.
Certainly I can't use the MethodDesc
from SSCLI, and the MethodDesc
is internal used by MS, whose exact
implementation and layout
are unknown to everyone except Microsoft.
After endless mountains and rivers that leave doubt whether there is a path out,
suddenly one encounters the shade of a willow, bright flowers, and a lovely village.
Fortunately the address of this internal method exists in the PDB symbol from Microsoft Symbol Server, and it solves my problem.
The Reset()
method's address in the CLR DLL
can be known by parsing the PDB file!
Now only one mandatory parameter is left --
the this
pointer of MethodDesc
. It is not hard to obtain this pointer.
Actually MethodBase.MethodHandle.Value
== CORINFO_METHOD_HANDLE
== MethodDesc
address == this
pointer of MethodDesc
.
Thus, I have my MethodDesc
class below defined in unmanaged code.
typedef void (MethodDesc::*PFN_Reset)(void);
typedef BOOL (MethodDesc::*PFN_IsGenericMethodDefinition)(void);
typedef ULONG (MethodDesc::*PFN_GetNumGenericMethodArgs)(void);
typedef MethodDesc * (MethodDesc::*PFN_StripMethodInstantiation)(void);
typedef BOOL (MethodDesc::*PFN_HasClassOrMethodInstantiation)(void);
typedef BOOL (MethodDesc::*PFN_ContainsGenericVariables)(void);
typedef DictionaryLayout * (MethodDesc::*PFN_GetDictionaryLayout)(void);
typedef Dictionary * (MethodDesc::*PFN_GetMethodDictionary)(void);
typedef MethodDesc * (MethodDesc::*PFN_GetWrappedMethodDesc)(void);
class MethodDesc
{
public:
void Reset(void) { (this->*s_pfnReset)(); }
BOOL IsGenericMethodDefinition(void) { return (this->*s_pfnIsGenericMethodDefinition)(); }
ULONG GetNumGenericMethodArgs(void) { return (this->*s_pfnGetNumGenericMethodArgs)(); }
MethodDesc * StripMethodInstantiation(void) { return (this->*s_pfnStripMethodInstantiation)(); }
BOOL HasClassOrMethodInstantiation(void) { return (this->*s_pfnHasClassOrMethodInstantiation)(); }
BOOL ContainsGenericVariables(void) { return (this->*s_pfnContainsGenericVariables)(); }
DictionaryLayout * GetDictionaryLayout(void) { return (this->*s_pfnGetDictionaryLayout)(); }
Dictionary * GetMethodDictionary(void) { return (this->*s_pfnGetMethodDictionary)(); }
MethodDesc * GetWrappedMethodDesc(void) { return (this->*s_pfnGetWrappedMethodDesc)(); }
private:
static PFN_Reset s_pfnReset;
static PFN_IsGenericMethodDefinition s_pfnIsGenericMethodDefinition;
static PFN_GetNumGenericMethodArgs s_pfnGetNumGenericMethodArgs;
static PFN_StripMethodInstantiation s_pfnStripMethodInstantiation;
static PFN_HasClassOrMethodInstantiation s_pfnHasClassOrMethodInstantiation;
static PFN_ContainsGenericVariables s_pfnContainsGenericVariables;
static PFN_GetDictionaryLayout s_pfnGetDictionaryLayout;
static PFN_GetMethodDictionary s_pfnGetMethodDictionary;
static PFN_GetWrappedMethodDesc s_pfnGetWrappedMethodDesc;
};
The static variables above store the addresses of the internal methods from
the MethodDesc
implementation from the CLR DLL. And they are initialized when my unmanaged DLL
is loaded. And the public members just call the internal method with the this
pointer.
Now it becomes quite easy to invoke Microsoft's internal methods. Like:
MethodDesc * pMethodDesc = (MethodDesc*)pMethodHandle;
pMethodDesc->Reset();
Find internal methods' addresses from the PDB Symbol file
When the unmanaged DLL is loaded, it checks the environment to see which version of CLR/JIT is there. And
it tries to seek the address for all the internal methods from the PDB file.
If the seek fails, it will try to launch symchk.exe from Windows Debug Tools to download the corresponding PDB symbol files from Microsoft Symbol Server.
This procedure requires a long time, from several seconds to several minutes. Maybe we can optimize to cache the address of the CLR/JIT
DLLs by calculating their binary hash value.
You can see more details in the source code, the SearchMethodAddresses
and Intialize
methods from
the unmanaged DLL.
Reset the MethodDesc to pre-JITted status
Now everything is ready. The unmanaged DLL exports a method for managed codes, accepts the IL codes and MethodBase.MethodHandle.Value
from
the managed code.
BOOL CInjection::StartUpdateILCodes( MethodTable * pMethodTable
, CORINFO_METHOD_HANDLE pMethodHandle
, LPBYTE pBuffer
, DWORD dwSize
)
{
if( s_nStatus != Status_Ready || !pMethodHandle )
return FALSE;
MethodDesc * pMethodDesc = (MethodDesc*)pMethodHandle;
pMethodDesc->Reset();
MethodDesc * pStripMethodDesc = pMethodDesc->StripMethodInstantiation();
if( pStripMethodDesc )
pStripMethodDesc->Reset();
if( pMethodDesc->HasClassOrMethodInstantiation() )
{
MethodDesc * pWrappedMethodDesc = pMethodDesc->GetWrappedMethodDesc();
if( pWrappedMethodDesc )
{
pWrappedMethodDesc->Reset();
}
}
std::map< CORINFO_METHOD_HANDLE, ILCodeBuffer>::iterator iter = s_mpILBuffers.find(pMethodHandle);
if( iter != s_mpILBuffers.end() )
{
LocalFree(iter->second.pBuffer);
s_mpILBuffers.erase(iter);
}
ILCodeBuffer tILCodeBuffer = { pBuffer, dwSize };
s_mpILBuffers[pMethodHandle] = tILCodeBuffer;
return TRUE;
}
The code above just calls the Reset()
method, and stores the IL codes in a map, which will be used by complieMethod
when
the method gets complied.
And in complieMethod
, just replace the ILCode, with code like below.
CorJitResult __stdcall CInjection::compileMethod(ICorJitInfo * pJitInfo
, CORINFO_METHOD_INFO * pCorMethodInfo
, UINT nFlags
, LPBYTE * pEntryAddress
, ULONG * pSizeOfCode
)
{
ICorJitCompiler * pCorJitCompiler = (ICorJitCompiler *)this;
LPBYTE pOriginalILCode = pCorMethodInfo->ILCode;
unsigned int nOriginalSize = pCorMethodInfo->ILCodeSize;
std::map< CORINFO_METHOD_HANDLE, ILCodeBuffer>::iterator iter = s_mpILBuffers.end();
if( pCorMethodInfo && GetStatus() == Status_Ready )
{
MethodDesc * pMethodDesc = (MethodDesc*)pCorMethodInfo->ftn;
std::map< CORINFO_METHOD_HANDLE, ILCodeBuffer>::iterator iter =
s_mpILBuffers.find((CORINFO_METHOD_HANDLE)pMethodDesc);
if( iter == s_mpILBuffers.end() &&
pMethodDesc->HasClassOrMethodInstantiation() )
{
pMethodDesc = pMethodDesc->StripMethodInstantiation();
iter = s_mpILBuffers.find((CORINFO_METHOD_HANDLE)pMethodDesc);
}
if( iter != s_mpILBuffers.end() )
{
pCorMethodInfo->ILCode = iter->second.pBuffer;
pCorMethodInfo->ILCodeSize = iter->second.dwSize;
}
}
CorJitResult result = (pCorJitCompiler->*s_pComplieMethod_V4)( pJitInfo,
pCorMethodInfo, nFlags, pEntryAddress, pSizeOfCode);
if( iter != s_mpILBuffers.end() )
{
pCorMethodInfo->ILCode = pOriginalILCode;
pCorMethodInfo->ILCodeSize = nOriginalSize;
LocalFree(iter->second.pBuffer);
s_mpILBuffers.erase(iter);
}
return result;
}
Points of interest
Compilation optimizations
I found that if the method is too simple and the IL codes are only several bytes, the method may be complied as inline mode.
And in this case, Reset MethodDesc
does not help anything because the execution even doesn't reach there. More details can be found in CEEInfo::canInline
, (vm/jitinterface.cpp in SSCLI)
Dynamic method
To update the IL code of a dynamic method we need to be very careful. Filling incorrect IL code for other kinds of methods only causes an InvalidApplicationException
;
but incorrect IL code in a dynamic method can crash the CLR and the whole process! And IL code for
a dynamic method is different from that for others.
Better to generate the IL code from another dynamic method and then copy and update.
Generic method
I think this is the most complicated part. A generic definition method is mapped to a MethodDesc
.
But calling the generic method with different types of parameters
will cause the CLR to create different instantiations of the definition method. Even more, different kinds of generic methods
are implemented in different ways.
- shared generic method instantiations
- unshared generic method instantiations
- instance methods in shared generic classes
- instance methods in unshared generic classes
- static methods in shared generic classes
- static methods in unshared generic classes
If you look at the source code below from demo , you may notice that it is calling InjectionHelper.UpdateILCodes
not only for the generic definition method, but also for the JIT-compiled generic method instantiation, returned from MethodInfo.MakeGenericMethod
.
MethodInfo destMethodInfo =
type.GetMethod("GenericMethodToBeReplaced",
BindingFlags.NonPublic | BindingFlags.Instance);
InjectionHelper.UpdateILCodes(destMethodInfo, ilCodes);
destMethodInfo = destMethodInfo.MakeGenericMethod(new Type[] { typeof(string), typeof(int) });
InjectionHelper.UpdateILCodes(destMethodInfo, ilCodes);
This is what I am stilling trying to improve -- every firsttime when the generic method is called with different type parameters, MethodDesc::FindOrCreateAssociatedMethodDesc
is invoked and the generic method instantiation is created & stored in InstMethodHashTable
. I was trying to find out all of the complied generic method instantiations in InstMethodHashTable, and then call their Reset()
method, which will certainly resolve the problem above.
Sounds logical but I am blocked by two problems.
First, no reliable manner to get the module's InstMethodHashTable
.
Module::GetInstMethodHashTable
method is excluded in the PDB. I can know the offset of the InstMethodHashTable address within the Module
from disassembled ASM, but it is not a reliable & acceptable way.
Second, the internal methods to traversal the MethodDesc
in InstMethodHashTable are also excluded in PDB.
By now, It's quite dark and I am just considering some other tricks can be played at the point when the generic instantiation method being called.
Hope you can share your ideas "