I read an interesting article the other day that spoke about the various mechanisms a Win32 application can employ for deleting itself from the disk once execution completes. The basic issue is, of course, that while the module is being executed, the operating system has the file locked. So, something like this will just not work:
TCHAR szModule[MAX_PATH]; GetModuleFileName( NULL, szModule, MAX_PATH ); DeleteFile( szModule );
Of the various options available, the author of the said article suggests the following approach as being the definitive one as it has the added benefit of functioning correctly on all versions of Microsoft Windows (starting with '95).
Now would be a good time to hop over to the article and see what it's about (and while you're there, make sure you look at some of the other articles - pretty neat site). Here's the link:
And, here's the approach, in brief:
CreateProcess
CREATE_SUSPENDED
dwCreationFlags
CONTEXT
GetThreadContext
ESP
WriteProcessMemory
EIP
SetThreadContext
ResumeThread
While this approach does get the job done, the fact that our deletion code executes in the remote process even before Windows has had a chance to initialize it fully, places some restrictions on the kind of APIs that we can invoke. It so turns out that APIs like DeleteFile and ExitProcess do work while the process is in this half-baked state. I figured, I'll modify the approach somewhat so that it allows us to call any API we want from our injected code. Here's what I did:
DeleteFile
ExitProcess
#pragma pack( push, 1 ) struct coff_header { unsigned short machine; unsigned short sections; unsigned int timestamp; unsigned int symboltable; unsigned int symbols; unsigned short size_of_opt_header; unsigned short characteristics; }; struct optional_header { unsigned short magic; char linker_version_major; char linker_version_minor; unsigned int code_size; unsigned int idata_size; unsigned int udata_size; unsigned int entry_point; unsigned int code_base; }; #pragma pack( pop ) // // get the module address // char *module = (char *)GetModuleHandle( NULL ); // // get the sig // int *offset = (int*)( module + 0x3c ); char *sig = module + *offset; // // get the coff header // coff_header *coff = (coff_header *)( sig + 4 ); // // get the optional header // optional_header *opt = (optional_header *)( (char *)coff + sizeof( coff_header ) ); // // get the entry point // char *entry_point = (char *)module + opt->entry_point;
main
WinMain -
mainCRTStartup
ReadProcessMemory
module
GetModuleHandle
GetThreadSelectorEntry
Here's the code that achieves this:
// // Gets the address of the entry point routine given a // handle to a process and its primary thread. // DWORD GetProcessEntryPointAddress( HANDLE hProcess, HANDLE hThread ) { CONTEXT context; LDT_ENTRY entry; TEB teb; PEB peb; DWORD read; DWORD dwFSBase; DWORD dwImageBase, dwOffset; DWORD dwOptHeaderOffset; optional_header opt; // // get the current thread context // context.ContextFlags = CONTEXT_FULL | CONTEXT_DEBUG_REGISTERS; GetThreadContext( hThread, &context ); // // use the segment register value to get a pointer to // the TEB // GetThreadSelectorEntry( hThread, context.SegFs, &entry ); dwFSBase = ( entry.HighWord.Bits.BaseHi << 24 ) | ( entry.HighWord.Bits.BaseMid << 16 ) | ( entry.BaseLow ); // // read the teb // ReadProcessMemory( hProcess, (LPCVOID)dwFSBase, &teb, sizeof( TEB ), &read ); // // read the peb from the location pointed at by the teb // ReadProcessMemory( hProcess, (LPCVOID)teb.Peb, &peb, sizeof( PEB ), &read ); // // figure out where the entry point is located; // dwImageBase = (DWORD)peb.ImageBaseAddress; ReadProcessMemory( hProcess, (LPCVOID)( dwImageBase + 0x3c ), &dwOffset, sizeof( DWORD ), &read ); dwOptHeaderOffset = ( dwImageBase + dwOffset + 4 + sizeof( coff_header ) ); ReadProcessMemory( hProcess, (LPCVOID)dwOptHeaderOffset, &opt, sizeof( optional_header ), &read ); return ( dwImageBase + opt.entry_point ); }
dwFSBase
LDT_ENTRY
The routine that deletes our executable looks like this:
#pragma pack(push, 1) // // Structure to inject into remote process. Contains // function pointers and code to execute. // typedef struct _SELFDEL { HANDLE hParent; // parent process handle FARPROC fnWaitForSingleObject; FARPROC fnCloseHandle; FARPROC fnDeleteFile; FARPROC fnSleep; FARPROC fnExitProcess; FARPROC fnRemoveDirectory; FARPROC fnGetLastError; FARPROC fnLoadLibrary; FARPROC fnGetProcAddress; BOOL fRemDir; TCHAR szFileName[MAX_PATH]; // file to delete } SELFDEL; #pragma pack(pop) // // Routine to execute in remote process. // void remote_thread(SELFDEL *remote) { // wait for parent process to terminate remote->fnWaitForSingleObject(remote->hParent, INFINITE); remote->fnCloseHandle(remote->hParent); // try to delete the executable file while(!remote->fnDeleteFile(remote->szFileName)) { // failed - try again in one second's time remote->fnSleep(1000); } // finished! exit so that we don't execute garbage code remote->fnExitProcess(0); }
remote_thread
Fortunately for us, the system APIs (in kernel32, user32 etc.) always get loaded at the same virtual address in all processes. So, all we need to do is initialize a data structure with pointers to all the system calls we want to make from the remote process, and pass this structure along also. With our entry-point overwrite strategy, of course, how are we to do this? To make a long story short, I settled for the following approach.
// // Routine to execute in remote process. // void remote_thread() { // // this will get replaced with a // real pointer to the data when it // gets injected into the remote // process // SELFDEL *remote = (SELFDEL *)0xFFFFFFFF; // // wait for parent process to terminate // remote->fnWaitForSingleObject(remote->hParent, INFINITE); remote->fnCloseHandle(remote->hParent); // // try to delete the executable file // while(!remote->fnDeleteFile(remote->szFileName)) { // // failed - try again in one second's time // remote->fnSleep(1000); } // // finished! exit so that we don't execute garbage code // remote->fnExitProcess(0); }
shellcode is the technical term (in security circles) for binary machine code that is typically used in exploits as the payload. Here's a quick and dirty way of generating the shellcode from the obj file generated when you compile your source files. In our case, we are interested in whipping the shellcode up for the remote_thread routine. Here's what you've got to do:
cl /nologo /c selfdel.c
dumpbin /disasm:bytes selfdel.obj > s.asm
This produces a file called s.asm that looks like this:
Microsoft (R) COFF/PE Dumper Version 8.00.50727.42 Copyright (C) Microsoft Corporation. All rights reserved. Dump of file selfdel.obj File Type: COFF OBJECT _remote_thread: 00000000: 55 push ebp 00000001: 8B EC mov ebp,esp 00000003: 83 EC 10 sub esp,10h 00000006: 53 push ebx 00000007: C7 45 F0 FF FF FF mov dword ptr [ebp-10h],0FFFFFFFFh FF 0000000E: 8B 45 F0 mov eax,dword ptr [ebp-10h] ...... more stuff like this ...... 000000D2: C3 ret ; <-- this marks the end of the assembly dump for _remote_thread ...... even more stuff like this ......
_remote_thread:
ret
00000000: 55 push ebp 00000001: 8B EC mov ebp,esp 00000003: 83 EC 10 sub esp,10h 00000006: 53 push ebx 00000007: C7 45 F0 FF FF FF mov dword ptr [ebp-10h],0FFFFFFFFh FF 0000000E: 8B 45 F0 mov eax,dword ptr [ebp-10h] ...... more stuff like this ...... 000000D2: C3 ret
[0-9A-F]+\::b+{[0-9A-F:b]+}.*
\1
^:b+
:b+$
\n
', '\\x
', '\x
char shellcode[] = { '\x
And finally, type this at the end of the line:
' };
After all of this, you should end up with something that looks like this:
char shellcode[] = { '\x55', '\x8B', '\xEC', '\x83', '\xEC', '\x10', '\x53', '\xC7', '\x45', '\xF0', '\xFF', '\xFF', '\xFF', '\xFF', '\x8B', // ...... more stuff like this ...... '\x5B', '\x8B', '\xE5', '\x5D', '\xC3' };
Phew!
char shellcode[] = { '\x55', '\x8B', '\xEC', '\x83', '\xEC', '\x10', '\x53', '\xC7', '\x45', '\xF0', '\xFF', '\xFF', '\xFF', '\xFF', // replace these 4 bytes // with actual address '\x8B', '\x45', '\xF0', '\x8B', '\x48', '\x20', '\x89', '\x4D', '\xF4', '\x8B', '\x55', '\xF0', '\x8B', '\x42', '\x24', '\x89', '\x45', '\xFC', '\x6A', '\xFF', ... more shell code here
As it turns out in our case, the value 0xFFFFFFFF that we initialized the pointer remote with in remote_thread shows up the exact same way in the shellcode also. Since we know where the entry point lives in the remote process, all we need to do is to first replace 0xFFFFFFFF in the shellcode with the actual pointer to the data before over-writing the entry point. Here's how this looks:
0xFFFFFFFF
remote
STARTUPINFO si = { sizeof(si) }; PROCESS_INFORMATION pi; SELFDEL local; DWORD data; TCHAR szExe[MAX_PATH] = _T( "explorer.exe" ); DWORD process_entry; // // this shellcode self-deletes and then shows a messagebox // char shellcode[] = { '\x55', '\x8B', '\xEC', '\x83', '\xEC', '\x10', '\x53', '\xC7', '\xFF', '\xFF', '\xFF', '\xFF', // replace these 4 bytes // with actual address '\x8B', '\x45', '\xF0', '\x8B', '\x48', '\x20', '\x89', '\x4D', ... snipped lots of meaningless shellcode here! ... '\xFF', '\xD0', '\x5B', '\x8B', '\xE5', '\x5D', '\xC3' }; // // initialize the SELFDEL object // local.fnWaitForSingleObject = (FARPROC)WaitForSingleObject; local.fnCloseHandle = (FARPROC)CloseHandle; local.fnDeleteFile = (FARPROC)DeleteFile; local.fnSleep = (FARPROC)Sleep; local.fnExitProcess = (FARPROC)ExitProcess; local.fnRemoveDirectory = (FARPROC)RemoveDirectory; local.fnGetLastError = (FARPROC)GetLastError; local.fnLoadLibrary = (FARPROC)LoadLibrary; local.fnGetProcAddress = (FARPROC)GetProcAddress; // // Give remote process a copy of our own process handle // DuplicateHandle(GetCurrentProcess(), GetCurrentProcess(), pi.hProcess, &local.hParent, 0, FALSE, 0); GetModuleFileName(0, local.szFileName, MAX_PATH); // // get the process's entry point address // process_entry = GetProcessEntryPointAddress( pi.hProcess, pi.hThread ); // // replace the address of the data inside the // shellcode (bytes 10 to 13) // data = process_entry + sizeof( shellcode ); shellcode[13] = (char)( data >> 24 ); shellcode[12] = (char)( ( data >> 16 ) & 0xFF ); shellcode[11] = (char)( ( data >> 8 ) & 0xFF ); shellcode[10] = (char)( data & 0xFF ); // // copy our code+data at the exe's entry-point // VirtualProtectEx( pi.hProcess, (PVOID)process_entry, sizeof( local ) + sizeof( shellcode ), PAGE_EXECUTE_READWRITE, &oldProt ); WriteProcessMemory( pi.hProcess, (PVOID)process_entry, shellcode, sizeof( shellcode ), 0); WriteProcessMemory( pi.hProcess, (PVOID)data, &local, sizeof( local ), 0); // // Let the process continue // ResumeThread(pi.hThread);
There! That's all there is to it.
Please find the code for a self-deleting executable (that among other things, also displays a message box from the remote process' hijacked entry point) here.
You'll probably never have to write a program that deletes itself, in your career; but there are a few nifty, if somewhat esoteric, tricks in there, eh?! One commercial program that does this sort of thing is the helper program that you download when you use the Copilot service. However, I suspect that program uses a different technique that is probably a lot more straightforward and a gazillion times less interesting ;). Briefly, an implementation note in Copilot's functional specification document states that self-deletion is a simple matter of:
FILE_FLAG_DELETE_ON_CLOSE
CreateFile
The FILE_FLAG_DELETE_ON_CLOSE flag should cause the OS to delete the small EXE when all open handles to it are closed. That's really neat, but hey, where's the fun in that?!
This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)
General News Suggestion Question Bug Answer Joke Rant Admin
Math Primers for Programmers