Introduction
Modifying .NET methods' MSIL codes during run-time is very cool, it helps to implement hooking, software protection, and other amazing stuff.
That's why I want it, but there is a big challenge on the road -- the MSIL code could have been complied to native code by JIT-complier before
we have a chance to modify them; also the .NET CLR implantation is not documented and it changes during each version, we need a stable way.
Anyway, after more than one week research, finally I made it! Here is a simple method in the demo problem:
protected string CompareOneAndTwo()
{
int a = 1;
int b = 2;
if (a < b)
{
return "Number 1 is less than 2";
}
else
{
return "Number 1 is greater than 2 (O_o)";
}
}
Certainly it returns "Number 1 is less than 2"; let's try to make it return the incorrect result "Number 1 is greater than 2 (O_o)".
Looking at the MSIL code for this method, we can do it by changing the opcode from Bge_S
to Blt_S
.
And then the jump works in a different logic which returns in a wrong result, that is what I need.
And if you try in the demo application, it shows a wrong answer as below.
Here is the code replacing the IL, I assume there are enough comments between the lines.
You can download the demo program and have a try.
- Supports variants of .NET versions from 2.0 to 4.0
- Supports variants of methods to be
modified, including dynamic methods and generic methods.
- Supports release mode .NET process.
The demo contains the PDB files and its size exceeds 10 MB, so it can't be updated here. Here is the download link from
a 3rd-party file sharing service: http://www.filefactory.com/file/kbkg95n2ciz/n/CLR_Injection_Demo.zip.
Using the code
The source code can be downloaded here: Download source.
Copy the InjectionHelper.cs file into your project, it contains several methods.
public static class InjectionHelper
{
public static void Initialize()
public static void Uninitialize()
public static void UpdateILCodes(MethodInfo method, byte[] ilCodes)
public static Status WaitForIntializationCompletion()
public static Status GetStatus()
}
The InjectionHelper.Initialize
method loads the unmanaged injection.dll from the directory of
the current assembly directory,
so all the related files need to be there, or you can modify the code to change the location.
Here is the file list.
File Name | Description |
Injection.dll | unmanaged DLL to do the work in
this article (x86 version, x64 version will be out sooner or later) |
EasyHook32.dll | x86 EasyHook DLL from
http://easyhook.codeplex.com/ (used by Injection.dll) |
EasyHook64.dll | x86 EasyHook DLL from
http://easyhook.codeplex.com/ (will be used by x64
Injection.dll) |
symchk.exe SymbolCheck.dll | Tool to download PDB file from Windows Debug Tools. |
dbg32.dll | x86 version of dbghelp.dll 6.2. used by
Injection.dll and symchk.exe I changed the file
name to avoid version confliction Also the PE import table of
symchk.exe is modified to link to this DLL |
PDB_symbols/* | The PDB symbol files local cache. Can be removed but will slow down the
initialization. |
Test_x86_Net20_Release.exe | Test program for x86 / .NET 2.0 / Release mode, not required for
distribution. |
Test_x86_Net35_Release.exe | Test program for x86 / .NET 3.5 / Release mode, not required for
distribution. |
Test_x86_Net40_Release.exe | Test program for x86 / .NET 4.0 / Release mode, not required for
distribution. |
Background
Replace the IL code
First, take a look at how the CLR and JIT works.
The JIT implementation DLLs (clrjit.dll for .Net4.0+ / mscorjit.dll for .NET 2.0+) expose a
_stdcall
method getJit
,
which returns the ICorJitCompiler
interface.
The CLR implementation
DLLs (clr.dll for .NET 4.0+ / mscorwks.dll for .NET 2.0+)
invokes the getJit
method to obtain the ICorJitCompiler
interface, then calls its compileMethod
method to compile MSIL code to native code.
CorJitResult compileMethod(ICorJitInfo * pJitInfo, CORINFO_METHOD_INFO * pMethodInfo,
UINT nFlags, LPBYTE * pEntryAddress, ULONG * pSizeOfCode);
This part is quite easy, just find the location of the compileMethod
method, replace the entry via
EasyHook.
typedef CorJitResult (__stdcall ICorJitCompiler::*PFN_compileMethod)(ICorJitInfo * pJitInfo
, CORINFO_METHOD_INFO * pMethodInfo
, UINT nFlags
, LPBYTE * pEntryAddress
, ULONG * pSizeOfCode
);
PFN_compileMethod_V4 s_pComplieMethod = ...;
LhInstallHook( (PVOID&)s_pComplieMethod
, &(PVOID&)CInjection::compileMethod
, NULL
, &s_hHookCompileMethod
);
CorJitResult __stdcall CInjection::compileMethod(ICorJitInfo * pJitInfo
, CORINFO_METHOD_INFO * pCorMethodInfo
, UINT nFlags
, LPBYTE * pEntryAddress
, ULONG * pSizeOfCode
)
{
CorJitResult result = (pCorJitCompiler->*s_pComplieMethod_V4)(
pJitInfo, pCorMethodInfo, nFlags, pEntryAddress, pSizeOfCode);
return result;
}
Modify IL code for JIT-complied methods
Now we are here, the compileMethod
method above won't be called by CLR for
the JIT-compiled method. To solve this problem, my idea is to restore
the data structures in CLR to the previous status before JIT-compliation. And in
this case, complileMethod
will be called again and we can replace the IL.
Thus we have to look into the implementation of CLR a bit, SSCLI (Shared Source Common Language
Infrastructure) is a good reference from Microsoft although it is quite out of date and we can't use it in our code.
The above diagram is a bit out of date, but the primary structure is the same. For each "class" in .NET, there is at least one MethodTable
structure in memory.
And each MethodTable
is related to a EEClass
, which stores the runtime type information for
Reflection and other use.
For each "method", there is a MethodDesc
data structure in memory containing
the information of this method like flags / slot address / entry address / etc.
Before a method is JITted-complied, the slot is pointed to a JMI thunk (prestub), which triggers
JIT compliation; when the IL code is complied, the slot is rewritten
to point to the JMI thunk, which jumps to complied native code directly.
To restore the data structure, first clear the flags, then modify the entry address back to
a temporary entry address, and so on. I successfully did that
in the debugger by modifying the memory directly. But this is messy, it depends on the layout of the data structures, and the code is unreliable for different versions of .NET.
Luckily, I found the method MethodDesc::Reset
in SSCLI doing the same job.
void MethodDesc::Reset()
{
CONTRACTL
{
THROWS;
GC_NOTRIGGER;
}
CONTRACTL_END
_ASSERTE(IsEnCMethod() || IsDynamicMethod() || GetLoaderModule()->IsReflection());
ClearFlagsOnUpdate();
if (HasPrecode())
{
GetPrecode()->Reset();
}
else
{
_ASSERTE(GetLoaderModule()->IsReflection());
InterlockedUpdateFlags2(enum_flag2_HasStableEntryPoint | enum_flag2_HasPrecode, FALSE);
*GetAddrOfSlotUnchecked() = GetTemporaryEntryPoint();
}
_ASSERTE(!HasNativeCode());
}
As you can see above, it is doing the same thing for me. Hence the target is changed to invoke this method to reset the MethodDesc
status to
pre-JITted.
Certainly I can't use the MethodDesc
from SSCLI, and the MethodDesc
is internal used by MS, whose exact
implementation and layout
are unknown to everyone except Microsoft.
After endless mountains and rivers that leave doubt whether there is a path out,
suddenly one encounters the shade of a willow, bright flowers, and a lovely village.
Yes, luckily this internal method exists in the PDB symbol from Microsoft Symbol Server, and this solves my problem.
The Reset()
method's address in the CLR DLL
can be known by parsing the PDB file.
Now only one mandatory parameter is left --
the this
pointer of MethodDesc
. It is not hard to obtain this pointer.
Actually MethodBase.MethodHandle.Value
== CORINFO_METHOD_HANDLE
== MethodDesc
address == this
pointer of MethodDesc
.
Thus, I have my MethodDesc
class below defined in unmanaged code.
typedef void (MethodDesc::*PFN_Reset)(void);
typedef BOOL (MethodDesc::*PFN_IsGenericMethodDefinition)(void);
typedef ULONG (MethodDesc::*PFN_GetNumGenericMethodArgs)(void);
typedef MethodDesc * (MethodDesc::*PFN_StripMethodInstantiation)(void);
typedef BOOL (MethodDesc::*PFN_HasClassOrMethodInstantiation)(void);
typedef BOOL (MethodDesc::*PFN_ContainsGenericVariables)(void);
typedef DictionaryLayout * (MethodDesc::*PFN_GetDictionaryLayout)(void);
typedef Dictionary * (MethodDesc::*PFN_GetMethodDictionary)(void);
typedef MethodDesc * (MethodDesc::*PFN_GetWrappedMethodDesc)(void);
class MethodDesc
{
public:
void Reset(void) { (this->*s_pfnReset)(); }
BOOL IsGenericMethodDefinition(void) { return (this->*s_pfnIsGenericMethodDefinition)(); }
ULONG GetNumGenericMethodArgs(void) { return (this->*s_pfnGetNumGenericMethodArgs)(); }
MethodDesc * StripMethodInstantiation(void) { return (this->*s_pfnStripMethodInstantiation)(); }
BOOL HasClassOrMethodInstantiation(void) { return (this->*s_pfnHasClassOrMethodInstantiation)(); }
BOOL ContainsGenericVariables(void) { return (this->*s_pfnContainsGenericVariables)(); }
DictionaryLayout * GetDictionaryLayout(void) { return (this->*s_pfnGetDictionaryLayout)(); }
Dictionary * GetMethodDictionary(void) { return (this->*s_pfnGetMethodDictionary)(); }
MethodDesc * GetWrappedMethodDesc(void) { return (this->*s_pfnGetWrappedMethodDesc)(); }
private:
static PFN_Reset s_pfnReset;
static PFN_IsGenericMethodDefinition s_pfnIsGenericMethodDefinition;
static PFN_GetNumGenericMethodArgs s_pfnGetNumGenericMethodArgs;
static PFN_StripMethodInstantiation s_pfnStripMethodInstantiation;
static PFN_HasClassOrMethodInstantiation s_pfnHasClassOrMethodInstantiation;
static PFN_ContainsGenericVariables s_pfnContainsGenericVariables;
static PFN_GetDictionaryLayout s_pfnGetDictionaryLayout;
static PFN_GetMethodDictionary s_pfnGetMethodDictionary;
static PFN_GetWrappedMethodDesc s_pfnGetWrappedMethodDesc;
};
The static variables above store the addresses of the internal methods from
the MethodDesc
implementation from the CLR DLL. And they are initialized when my unmanaged DLL
is loaded. And the public members just call the internal method with the this
pointer.
Now it becomes quite easy to invoke Microsoft's internal methods. Like:
MethodDesc * pMethodDesc = (MethodDesc*)pMethodHandle;
pMethodDesc->Reset();
Find internal methods' addresses from the PDB Symbol file
When the unmanaged DLL is loaded, it checks the environment to see which version of CLR/JIT is there. And
it tries to seek the address for all the internal methods from the PDB file.
If the seek fails, it will try to launch symchk.exe from Windows Debug Tools to download the corresponding PDB symbol files from Microsoft Symbol Server.
This procedure requires a long time, from several seconds to several minutes. Maybe we can optimize to cache the address of the CLR/JIT
DLLs by calculating their binary hash value.
You can see more details in the source code, the SearchMethodAddresses
and Intialize
methods from
the unmanaged DLL.
Reset the MethodDesc to pre-JITted status
Now everything is ready. The unmanaged DLL exports a method for managed codes, accepts the IL codes and MethodBase.MethodHandle.Value
from
the managed code.
BOOL CInjection::StartUpdateILCodes( MethodTable * pMethodTable
, CORINFO_METHOD_HANDLE pMethodHandle
, LPBYTE pBuffer
, DWORD dwSize
)
{
if( s_nStatus != Status_Ready || !pMethodHandle )
return FALSE;
MethodDesc * pMethodDesc = (MethodDesc*)pMethodHandle;
pMethodDesc->Reset();
MethodDesc * pStripMethodDesc = pMethodDesc->StripMethodInstantiation();
if( pStripMethodDesc )
pStripMethodDesc->Reset();
if( pMethodDesc->HasClassOrMethodInstantiation() )
{
MethodDesc * pWrappedMethodDesc = pMethodDesc->GetWrappedMethodDesc();
if( pWrappedMethodDesc )
{
pWrappedMethodDesc->Reset();
}
}
std::map< CORINFO_METHOD_HANDLE, ILCodeBuffer>::iterator iter = s_mpILBuffers.find(pMethodHandle);
if( iter != s_mpILBuffers.end() )
{
LocalFree(iter->second.pBuffer);
s_mpILBuffers.erase(iter);
}
ILCodeBuffer tILCodeBuffer = { pBuffer, dwSize };
s_mpILBuffers[pMethodHandle] = tILCodeBuffer;
return TRUE;
}
The code above just calls the Reset()
method, and stores the IL codes in a map, which will be used by complieMethod
when
the method gets complied.
And in complieMethod
, just replace the ILCode, with code like below.
CorJitResult __stdcall CInjection::compileMethod(ICorJitInfo * pJitInfo
, CORINFO_METHOD_INFO * pCorMethodInfo
, UINT nFlags
, LPBYTE * pEntryAddress
, ULONG * pSizeOfCode
)
{
ICorJitCompiler * pCorJitCompiler = (ICorJitCompiler *)this;
LPBYTE pOriginalILCode = pCorMethodInfo->ILCode;
unsigned int nOriginalSize = pCorMethodInfo->ILCodeSize;
std::map< CORINFO_METHOD_HANDLE, ILCodeBuffer>::iterator iter = s_mpILBuffers.end();
if( pCorMethodInfo && GetStatus() == Status_Ready )
{
MethodDesc * pMethodDesc = (MethodDesc*)pCorMethodInfo->ftn;
std::map< CORINFO_METHOD_HANDLE, ILCodeBuffer>::iterator iter =
s_mpILBuffers.find((CORINFO_METHOD_HANDLE)pMethodDesc);
if( iter == s_mpILBuffers.end() &&
pMethodDesc->HasClassOrMethodInstantiation() )
{
pMethodDesc = pMethodDesc->StripMethodInstantiation();
iter = s_mpILBuffers.find((CORINFO_METHOD_HANDLE)pMethodDesc);
}
if( iter != s_mpILBuffers.end() )
{
pCorMethodInfo->ILCode = iter->second.pBuffer;
pCorMethodInfo->ILCodeSize = iter->second.dwSize;
}
}
CorJitResult result = (pCorJitCompiler->*s_pComplieMethod_V4)( pJitInfo,
pCorMethodInfo, nFlags, pEntryAddress, pSizeOfCode);
if( iter != s_mpILBuffers.end() )
{
pCorMethodInfo->ILCode = pOriginalILCode;
pCorMethodInfo->ILCodeSize = nOriginalSize;
LocalFree(iter->second.pBuffer);
s_mpILBuffers.erase(iter);
}
return result;
}
Points of interest
Compilation optimizations
I found that if the method is too simple and the IL codes are only several bytes (like the attributes), the method will be complied as inline mode.
And in this case, Reset MethodDesc
does not help anything because the execution even doesn't reach there.
Dynamic method
To update the IL code of a dynamic method we need to be very careful. Filling incorrect IL code for other kinds of methods only causes an InvalidApplicationException
;
but incorrect IL code in a dynamic method can crash the CLR and the whole process! And IL code for
a dynamic method is different from that for others.
Better to generate the IL code from another dynamic method and then copy and update.
Generic method
I think this is the most complicated part. A generic definition method is mapped to a MethodDesc
.
But calling the generic method with different types of parameters
will cause the CLR to create a series of instantiations of the definition method. Even more, different kinds of generic methods
are implemented in different ways.
- shared generic method instantiations
- unshared generic method instantiations
- instance methods in shared generic classes
- instance methods in unshared generic classes
- static methods in shared generic classes
- static methods in unshared generic classes
I didn't find a good way to get all the instantiations for a generic definition, so the code in the demo looks like:
MethodInfo destMethodInfo =
type.GetMethod("GenericMethodToBeReplaced",
BindingFlags.NonPublic | BindingFlags.Instance);
InjectionHelper.UpdateILCodes(destMethodInfo, ilCodes);
destMethodInfo = destMethodInfo.MakeGenericMethod(new Type[] { typeof(string), typeof(int) });
InjectionHelper.UpdateILCodes(destMethodInfo, ilCodes);
The above code only works when you know the type parameter of the generic method. A
JIT-complied generic method with different type parameters won't be affected.
Anyway, it is not a perfect way, but works for most cases. I am still looking for a better way.
Hope you can give me some suggestions