Memory Analyzer x86, 32/64-bit & a Free Detour






4.79/5 (24 votes)
Detect memory leaks
Introduction
This article introduces the readers to detours via an example: Memory Analyzer, memory analyzer is a simple tool to detect memory leaks and incorrect memory de allocations (i.e. free called in place of delete etc.).
Detours are a step up from API hooking, they support trapping of recursive functions and are thread safe. They are more complicated since 2 JMP instructions have to be placed intricately and introduce a concept of a return function, this function is returned by the method MyAttach (refer to code) and is also maintained by member variable : m_Return (this may be referred to as a trampoline function).
(I call them detours since I am trying to do the same thing as MS detours).
As with my previous article writing method, it aims to introduce detours and not memchecker.
Also an OutputDebugString
is used to communicate between the process and the memchecker (we build a debugger), debug symbol files are used to show call stack and line number in code regarding memory operations .
Background
My previous articles are a must read: http://www.codeproject.com/Articles/163408/APIHooking , http://www.codeproject.com/Articles/189711/Write-your-own-Debugger-to-handle-Breakpoints .
Basic windows programming is also required and a tad bit of assembly (as mentioned in my previous article APIHooking) .
Using the code
The attached code is built using VS2012 and must be referred to at all times while reading this article. Attached code only supports 32-bit detours, and injects DLL into running code, 32-bit DLL can only be injected in 32-bit process.
Since we are going to analyze the memory, we must trap the following APIs:
HeapAlloc
HeapReAlloc
(not implemented)HeapFree
VirtualAlloc
and associated APIs are not being considered, the readers are free to post their implementations.
These APIs are chosen since malloc, new, free, delete etc make calls to one of the above mentioned APIs.
Lets Hook
Most of the code is similar to API hooking i.e. we add a 5 byte JMP in the original function to direct the call to the trap. Additional return (m_Return
) function must be built since we are not going to re-patch the function when calling the original function from our trap API as we did in API hooking. Notice (refer code below) that I add the Jmp instructions on and after m_Return[5]
, the bytes from m_Return[0]
to m_Return[4]
hold the original instructions (3 complete Opcodes).
memcpy(m_Return,m_Original,sizeof(m_Original)); //build the return function DWORD JmpDiff1 = ((DWORD)func - (DWORD)m_Return-5); memcpy(&TrapJmp[1], &JmpDiff1, 4); memcpy(&m_Return[5],TrapJmp,sizeof(TrapJmp)); m_pReturnedFunc=m_Return;
We will call m_Return
instead of re-patching as we did in API hooking (refer code below).
LPVOID WINAPI MyHeapAlloc( __in HANDLE hHeap, __in DWORD dwFlags, __in SIZE_T dwBytes ) { //striclty do not use any function that may call internally HeapAlloc void* pRet=((void* (__stdcall*)( __in HANDLE hHeap, __in DWORD dwFlags, __in SIZE_T dwBytes))g_myDetour_HeapAlloc.m_pReturnedFunc)(hHeap,dwFlags,dwBytes); ::OutputDebugStringA("you are trapped"); return (void*)pRet; }
Notice that calling convention and parameters are the same, (please refer to my previous article on APIHooking).
Maintain the JMP
Now that we are not restoring the original function, we must make sure that the instructions (Opcode) are executed in the correct sequence and more importantly, we do not jump in between OPCodes, the processor might interpret them to be some other opcode. Notice the variable m_Return
, it holds the opcode of the original function (at least the first 5 bytes which holds the first 3 complete instructions) and then adds a jump to HeapAlloc+5 bytes, we add these 5 bytes so that it does not end up re-executing the JMP added in the first 5 bytes of the original function.
The attached code is hardcoded for assuming HeapAlloc +5 Byte offset, (which is okay for HeapAlloc , HeapFree, MessageBoxA , but may not be for other APIs).
Your code will call HeapAlloc
, in turn MyHeapAlloc
(the trap) via an immediate jmp, then use m_Return to jump to HeapAlloc+5 bytes, and return accordingly. m_Return will execute the first 5 bytes before it jumps to HeapAlloc+5.
Now to understand this using Opcodes, I have trapped MessageBoxA.
without Detours
<a href="mailto:MessageBoxA@16">MessageBoxA@16 7526FD1E 8B FF mov edi,edi 7526FD20 55 push ebp 7526FD21 8B EC mov ebp,esp 7526FD23 6A 00 push 0 7526FD25 FF 75 14 push dword ptr [ebp+14h] 7526FD28 FF 75 10 push dword ptr [ebp+10h] 7526FD2B FF 75 0C push dword ptr [ebp+0Ch] 7526FD2E FF 75 08 push dword ptr [ebp+8] 7526FD31 E8 A0 FF FF FF call MessageBoxExA
With Detours
7526FD1E E9 A5 13 10 8B jmp MyMessageBoxA (03710C8h) 7526FD23 6A 00 push 0 7526FD25 FF 75 14 push dword ptr [ebp+14h] 7526FD28 FF 75 10 push dword ptr [ebp+10h] 7526FD2B FF 75 0C push dword ptr [ebp+0Ch] 7526FD2E FF 75 08 push dword ptr [ebp+8] 7526FD31 E8 A0 FF FF FF call <a href="mailto:MessageBoxExA@20">MessageBoxExA@20</a>
Notice the Jmp instruction in the first 5 bytes of the original function MessageBoxA
, m_Return
will have to execute these 5 bytes (3 instructions when called) before it jumps back to MessageBoxA+5.
the execution flow will be:
00379171 8B FF mov edi,edi 00379173 55 push ebp 00379174 8B EC mov ebp,esp back to MessageBoxA 7526FD23 6A 00 push 0 7526FD25 FF 75 14 push dword ptr [ebp+14h] 7526FD28 FF 75 10 push dword ptr [ebp+10h] 7526FD2B FF 75 0C push dword ptr [ebp+0Ch] 7526FD2E FF 75 08 push dword ptr [ebp+8] 7526FD31 E8 A0 FF FF FF call <a href="mailto:MessageBoxExA@20">MessageBoxExA@20
As you can see that this completes the flow exactly the way it would without Detours .
Do not print anything
You cannot use any function that will call HeapAlloc, like printf (since it will be caught in an infinite loop) I have chosen to use OutputDebugStringA
.
Since you are writing a debugger (in attached code, CrashAnalyzer_v2
serves as the debugger) I recommend you go through my previous article on writing a debugger.
You can use the debug symbol table (http://en.wikipedia.org/wiki/Debug_symbol) to determine the line in code and call stack (a list of function names in order that determine the flow of function calls) from where the allocation function was called ( as a proper memory analyzer should).
New function added to get call stack: GetStack
, this function calls StackWalk64
(look up MSDN) in a loop to get the call stack function (addresses only). We can convert these addresses (one at a time) to function name using PDB (debug symbol) file by calling SymFromAddr
, make sure you call SymInitialize
to initialize the symbol handler.
BOOL b=SymInitialize(hProcess,NULL,TRUE); // make sure that PDB file exists, it is generated by the compiler in the same folder as the exe SymFromAddr(hProcess,cc.Rip,&d64,s); SymCleanup (hProcess);
All this is fine, but how do we inject our code?
Unlike my previous article on API hooking, we cannot use Windows Hook, our application may not have a message loop so SetWindowsHookEx
will not work.
We must use CreateRemoteThread
//injecting DLL { String strPath; GetCurrentDirectoryA(sizeof(strPath),strPath.string); //get the directory strcat(strPath,"<a href="file://\\MemoryCheckerModule.dll">\\MemoryCheckerModule.dll"); HANDLE hProcess=::OpenProcess(PROCESS_ALL_ACCESS ,false,pid); if(hProcess) { void *p=VirtualAllocEx (hProcess,NULL,strlen(strPath.string)+10,MEM_COMMIT,PAGE_EXECUTE_READWRITE); SIZE_T size=0; WriteProcessMemory(hProcess,p,strPath.string,sizeof(strPath)+1,&size); CreateRemoteThread(hProcess,0,0,( LPTHREAD_START_ROUTINE)LoadLibraryA,p,0,0); CloseHandle(hProcess); } else { printf("PID invalid / access denied"); return 1; //cannot find process } }
We must create memory in the remote process by calling VirtualAllocEx
, load it up with the required arguments to be used by LoadLibraryA by calling WriteProcessMemory
(this API is used by all debuggers to change the object code to add a break point).
We then call CreateRemoteThread
and provide function address of LoadLibraryA
to call on separate thread.
LoadLibraryA
function exists in all processes since its associated module is loaded at process startup.
DLL process attach in DllMain will do the rest , refer attached code.
Our custom 64-bit detour will be a bit different and is not implemented in the current project
Code below will help you understand the 64-bit detour, it is also implemented in the attached code (MemoryCheckerModule_with32_&_64Bit.zip).
The below code was written sometime back and is not thoroughly tested but feel free to get in touch with me for any problems.
64-bit code uses registers to pass parameters (so its best not to tamper with them).
Here we DO NOT use the JMP instruction, we use the stack to pass the address and RET
#include<Windows.h> #include<process.h> //68 78 56 34 12 c7 44 24 04 54 63 72 81 c3 // PUSH 12345678h // mov [rsp+4],81726354 // ret //will cause the jump BYTE TrapJmp[] = {0x68,0x78,0x56,0x34,0x12,0xc7,0x44,0x24,0x04,0x54,0x63,0x72,0x81,0xc3}; BYTE TrapJmp_FromTrampoline[] = {0x68,0x78,0x56,0x34,0x12,0xc7,0x44,0x24,0x04,0x54,0x63,0x72,0x81,0xc3}; BYTE StoreOriginal[sizeof(TrapJmp)]; BYTE OriginalOpCode[sizeof(TrapJmp)*2]; //this will maintain the trampoline of the missing opcode; typedef int (WINAPI *pMessageBox)(HWND, LPCWSTR, LPCWSTR, UINT); pMessageBox pOriginal = NULL; int //this function is only used for testing the hook concept WINAPI myMessageBoxW( __in_opt HWND hWnd, __in_opt LPCWSTR lpText, __in_opt LPCWSTR lpCaption, __in UINT uType) { printf("you are trapped\n"); //retore it (for API hooking, not thread safe/recursive safe) /*memcpy(pOriginal,StoreOriginal,sizeof(StoreOriginal)); FlushInstructionCache(GetCurrentProcess(),pOriginal,sizeof(TrapJmp)); MessageBoxW(hWnd,lpText,lpCaption,uType); memcpy(pOriginal, TrapJmp, sizeof(TrapJmp)); //repatch FlushInstructionCache(GetCurrentProcess(),pOriginal,sizeof(TrapJmp)); */ //lets call the originalOpCode...this is more like Detours excepts its free :-)) pMessageBox org=(pMessageBox)(void*)OriginalOpCode; org(hWnd,L"asif",L"asiasdas",uType); return 0; } int _tmain(int argc, _TCHAR* argv[]) { MessageBoxW((HWND)123,L"asif",L"asif",789); { pOriginal = MessageBoxW; DWORD dPermission=0;VirtualProtect(TrapJmp,sizeof(TrapJmp),PAGE_EXECUTE_READWRITE,&dPermission); VirtualProtect(pOriginal,sizeof(TrapJmp),PAGE_EXECUTE_READWRITE,&dPermission); memcpy(StoreOriginal,pOriginal,sizeof(TrapJmp)); //copy the original to be called later DWORD64 JmpDiff =(DWORD64)myMessageBoxW; memcpy(&TrapJmp[1],&JmpDiff,sizeof(DWORD)); //put the first half of the 8 byte address to jump to memcpy(&TrapJmp[9],(char*)&JmpDiff+4,sizeof(DWORD)); //put the remaing 4 byte address memcpy(pOriginal,TrapJmp,sizeof(TrapJmp)); //set the hook FlushInstructionCache(GetCurrentProcess(),pOriginal,sizeof(TrapJmp)); //lets set up the trampoline to the original function memcpy(OriginalOpCode,StoreOriginal,sizeof(StoreOriginal)); VirtualProtect(OriginalOpCode,sizeof(OriginalOpCode),PAGE_EXECUTE_READWRITE,&dPermission); DWORD64 JmpDiff_trampoline =(DWORD64)pOriginal; JmpDiff_trampoline+=sizeof(TrapJmp); memcpy(&TrapJmp_FromTrampoline[1],&JmpDiff_trampoline,sizeof(DWORD)); //put the first half of the 8 byte address to jump to memcpy(&TrapJmp_FromTrampoline[9],(char*)&JmpDiff_trampoline+4,sizeof(DWORD)); //put the remaing 4 byte address memcpy(&OriginalOpCode[sizeof(TrapJmp_FromTrampoline)],TrapJmp_FromTrampoline,sizeof(TrapJmp_FromTrampoline)); /* Change 00007FF73378A180 48 83 EC 38 sub rsp,38h 00007FF73378A184 45 33 DB xor r11d,r11d 00007FF73378A187 44 39 1D 36 5B 01 00 cmp dword ptr [7FF73379FCC4h],r11d //we change certain instructions to NOP (this is specific to some APIs only) To 00007FF73378A180 48 83 EC 38 sub rsp,38h 00007FF73378A184 45 33 DB xor r11d,r11d 00007FF73378A187 44 39 1D 36 5B 01 00 NOP NOP NOP NOP NOP NOP NOP */ memset(&OriginalOpCode[7],0x90,7); } MessageBoxW(0,L"asif",L"asif",0); MessageBoxW(0,L"asif",L"asif",0); MessageBoxW(0,L"asif",L"asif",0); return 0; }
If you do use our 32/64-bit detour implementation please ensure that the EIP does not jump to an inbetween opcode as the processor will interpret it differently
eg:- sub rsp,38h opcode is 48 83 EC 38 at address 00007FF73378A180 Jumping to an adress 00007FF73378A181 will result in opcode 83 EC 38 //this will be interpreted to be another instruction
Points of Interest
Apart from memory analysis, detour allows user to log API calls for logging or to reverse engineer applications (the Hacker in me says he can).
You may use CrashAnalyzer_v2
to attach to any running process (in our case Debuggee_TestApp
) to detect memory leaks and study memory allocations by other APIs.