Self Deleting Executables

Rajasekharan Vengalil

4.67/5 (44 votes)

Jan 7, 2007

CPOL

11 min read

217684

1924

How to write a program that deletes itself

Download source code - 5.42 KB

Introduction
That easy, eh? Well, yeah, but...
Where art thou, Oh! Great entry point of the remote process?
Passing data to the remote process
Shellcode?
What do we do with this shellcode?
Conclusion
Revision history

Introduction

I read an interesting article the other day that spoke about the various mechanisms a Win32 application can employ for deleting itself from the disk once execution completes. The basic issue is, of course, that while the module is being executed, the operating system has the file locked. So, something like this will just not work:

TCHAR szModule[MAX_PATH];
GetModuleFileName( NULL, szModule, MAX_PATH );
DeleteFile( szModule );

Of the various options available, the author of the said article suggests the following approach as being the definitive one as it has the added benefit of functioning correctly on all versions of Microsoft Windows (starting with '95).

Now would be a good time to hop over to the article and see what it's about (and while you're there, make sure you look at some of the other articles - pretty neat site). Here's the link:

Self-Deleting Executables

And, here's the approach, in brief:

When it's time to delete ourselves, we first spawn an external process that is guaranteed to exist on all Windows computers (explorer.exe, for example) in the suspended state. We do this by calling CreateProcess, passing CREATE_SUSPENDED for the dwCreationFlags parameter. Note that when a process is launched this way, there's really no telling at what point the primary thread of the process will get suspended. But, it does appear to get suspended long before the entry point gets invoked and, in fact, it occurs even before the Win32 environment for the process has been fully initialized.
After this, we get the CONTEXT data (basically, the CPU register state) for the suspended primary thread (from the remote process) via GetThreadContext.
We then manipulate the stack pointer (ESP) to allocate some space on the remote stack for storing some of our data (like the path to the executable to be deleted). After this, we plonk the binary code for a local routine that we've written for deleting files over to the remote process (along with the data it needs), by calling WriteProcessMemory.
Next, we mess around with the instruction pointer (EIP) so that it points to the binary code we've copied to the remote process, and update the suspended thread's context (via SetThreadContext).
And finally, we resume execution of the remote process (via ResumeThread). Since the EIP in the remote thread is now pointing to our code, it executes it; which of course, happily deletes the original executable. And that's it!

That Easy, Eh? Well, Yeah, But...

While this approach does get the job done, the fact that our deletion code executes in the remote process even before Windows has had a chance to initialize it fully, places some restrictions on the kind of APIs that we can invoke. It so turns out that APIs like DeleteFile and ExitProcess do work while the process is in this half-baked state. I figured, I'll modify the approach somewhat so that it allows us to call any API we want from our injected code. Here's what I did:

As before, we launch the external process in a suspended state. However, instead of plonking our code at the location that ESP happens to be pointing at when it got suspended, we put it over the executable's entry-point routine, i.e., we replace the remote process' entry point with our own injected code. And when the entry-point code executes, we can be pretty sure that the Win32 environment is fully initialized and primed for use!
Figuring out where the entry point of a module lives requires us to parse PE file format structures. In your own program, for example, the following code would give you a pointer to the entry point routine in the process' executable image:

#pragma pack( push, 1 )

struct coff_header
{
    unsigned short machine;
    unsigned short sections;
    unsigned int timestamp;
    unsigned int symboltable;
    unsigned int symbols;
    unsigned short size_of_opt_header;
    unsigned short characteristics;
};

struct optional_header
{
    unsigned short magic;
    char linker_version_major;
    char linker_version_minor;
    unsigned int code_size;
    unsigned int idata_size;
    unsigned int udata_size;
    unsigned int entry_point;
    unsigned int code_base;
};

#pragma pack( pop )

//
// get the module address
//

char *module = (char *)GetModuleHandle( NULL );

//
// get the sig
//

int *offset = (int*)( module + 0x3c );
char *sig = module + *offset;

//
// get the coff header
//

coff_header *coff = (coff_header *)( sig + 4 );

//
// get the optional header
//

optional_header *opt = (optional_header *)( (char *)coff + sizeof( coff_header ) );

//
// get the entry point
//

char *entry_point = (char *)module + opt->entry_point;

The entry point that you define, by the way, - main or WinMain - isn't the actual entry point routine. The compiler inserts its own entry point, which in turn calls our function. This entry point, typically, does stuff like CRT initialization and cleanup. In an ANSI console app, for instance, the actual entry point routine is something called mainCRTStartup.

Where Art Thou, Oh! Great Entry Point of the Remote Process?

It appears logical, therefore, that we should be able to find the entry point routine in remote processes also in a similar fashion, using ReadProcessMemory. While that is so, finding the equivalent of the module variable in the code given above for remote processes turns out to be trickier than anticipated. The problem is that there is no convenient GetModuleHandle routine that'll work for remote processes.
As it turns out, GetModuleHandle returns a virtual address that is valid only within the relevant process' address space. ReadProcessMemory, however, requires real addresses to work with. So the question is, how do we get to know the remote process' base address in memory? The solution, as it turned out, requires us to dig deep into the OS's internals! The credit for this solution goes to Ashkbiz Danehkar whose article called Injective Code Inside Import Table outlines a method for finding this.
In brief, the operating system maintains a user-mode data structure for every thread in the system, called the Thread Environment Block (TEB), which describes pretty much everything you'd want to know about the thread, including a pointer to another data structure called the Process Environment Block (PEB) which, as may be apparent, describes processes, including, happily for us, a pointer to the image's base address in memory! These structures are not, however, documented (by Microsoft that is ;). But some very, very clever folks here have managed to figure out the layout for these structures all by themselves!
So, all we need to do is:

Figure out where the TEB for the primary thread lives in the remote process; this information is stored in the thread's FS register, which is accessible via the GetThreadSelectorEntry API.
Read the PEB using the pointer to it in the thread's TEB via ReadProcessMemory.
Use the pointer to the image's base address in the PEB, and parse the PE structures till we are left with a reference to the remote process' entry point routine.
Phew!

Here's the code that achieves this:

//
// Gets the address of the entry point routine given a
// handle to a process and its primary thread.
//

DWORD GetProcessEntryPointAddress( HANDLE hProcess, HANDLE hThread )
{
    CONTEXT             context;
    LDT_ENTRY           entry;
    TEB                 teb;
    PEB                 peb;
    DWORD               read;
    DWORD               dwFSBase;
    DWORD               dwImageBase, dwOffset;
    DWORD               dwOptHeaderOffset;
    optional_header     opt;

    //
    // get the current thread context
    //

    context.ContextFlags = CONTEXT_FULL | CONTEXT_DEBUG_REGISTERS;
    GetThreadContext( hThread, &context );

    //
    // use the segment register value to get a pointer to
    // the TEB
    //

    GetThreadSelectorEntry( hThread, context.SegFs, &entry );
    dwFSBase = ( entry.HighWord.Bits.BaseHi << 24 ) |
                     ( entry.HighWord.Bits.BaseMid << 16 ) |
                     ( entry.BaseLow );

    //
    // read the teb
    //

    ReadProcessMemory( hProcess, (LPCVOID)dwFSBase,
                       &teb, sizeof( TEB ), &read );

    //
    // read the peb from the location pointed at by the teb
    //

    ReadProcessMemory( hProcess, (LPCVOID)teb.Peb,
                       &peb, sizeof( PEB ), &read );

    //
    // figure out where the entry point is located;
    //

    dwImageBase = (DWORD)peb.ImageBaseAddress;
    ReadProcessMemory( hProcess, (LPCVOID)( dwImageBase + 0x3c ),
                       &dwOffset, sizeof( DWORD ), &read );

    dwOptHeaderOffset = ( dwImageBase + dwOffset + 4 + sizeof( coff_header ) );
    ReadProcessMemory( hProcess, (LPCVOID)dwOptHeaderOffset,
                       &opt, sizeof( optional_header ), &read );

    return ( dwImageBase + opt.entry_point );
}

If you're wondering what the weird code initializing dwFSBase means, all I can do is direct you to the documentation for the LDT_ENTRY data structure in MSDN. Structures of this kind are partly the reason why system programmers tend to go bald early in life!

Passing Data to the Remote Process

Now that we know where the entry point lives in the remote process, it should be really straightforward, right? Wrong! There still is that itsy bitsy problem of figuring out how we are to pass data to the remote process!

The routine that deletes our executable looks like this:

#pragma pack(push, 1)

//
//  Structure to inject into remote process. Contains
//  function pointers and code to execute.
//

typedef struct _SELFDEL
{
    HANDLE  hParent;                // parent process handle


    FARPROC fnWaitForSingleObject;
    FARPROC fnCloseHandle;
    FARPROC fnDeleteFile;
    FARPROC fnSleep;
    FARPROC fnExitProcess;
    FARPROC fnRemoveDirectory;
    FARPROC fnGetLastError;
    FARPROC fnLoadLibrary;
    FARPROC fnGetProcAddress;
    BOOL    fRemDir;
    TCHAR   szFileName[MAX_PATH];   // file to delete

} SELFDEL;

#pragma pack(pop)

//
//  Routine to execute in remote process.
//

void remote_thread(SELFDEL *remote)
{
    // wait for parent process to terminate

    remote->fnWaitForSingleObject(remote->hParent, INFINITE);
    remote->fnCloseHandle(remote->hParent);

    // try to delete the executable file

    while(!remote->fnDeleteFile(remote->szFileName))
    {
        // failed - try again in one second's time

        remote->fnSleep(1000);
    }

    // finished! exit so that we don't execute garbage code

    remote->fnExitProcess(0);
}

As you might have noticed, the function remote_thread makes all system calls via function pointers, instead of calling them directly. This is done because, in the normal course, the compiler generates tiny stubs whenever calls to routines in dynamically loaded DLLs are made from a program. This stub jumps to a function pointer stored in a table initialized by the operating system's loader at runtime. Since we don't want these fancy stubs generated for code that is meant to be injected into a remote process, we deal exclusively with function pointers.
Fortunately for us, the system APIs (in kernel32, user32 etc.) always get loaded at the same virtual address in all processes. So, all we need to do is initialize a data structure with pointers to all the system calls we want to make from the remote process, and pass this structure along also. With our entry-point overwrite strategy, of course, how are we to do this? To make a long story short, I settled for the following approach.
First, I modified remote_thread to look like this:

//
//  Routine to execute in remote process.
//

void remote_thread()
{
    //
    // this will get replaced with a
    // real pointer to the data when it
    // gets injected into the remote
    // process
    //

    SELFDEL *remote = (SELFDEL *)0xFFFFFFFF;

    //
    // wait for parent process to terminate
    //

    remote->fnWaitForSingleObject(remote->hParent, INFINITE);
    remote->fnCloseHandle(remote->hParent);

    //
    // try to delete the executable file
    //

    while(!remote->fnDeleteFile(remote->szFileName))
    {
        //
        // failed - try again in one second's time
        //

        remote->fnSleep(1000);
    }

    //
    // finished! exit so that we don't execute garbage code
    //

    remote->fnExitProcess(0);
}

I then converted this into shellcode. If you've never heard of the term shellcode before, then here's a quick primer on what it is. If you know what it is already, then feel free to skip the next section.

Shellcode?

shellcode is the technical term (in security circles) for binary machine code that is typically used in exploits as the payload. Here's a quick and dirty way of generating the shellcode from the obj file generated when you compile your source files. In our case, we are interested in whipping the shellcode up for the remote_thread routine. Here's what you've got to do:

First, compile your source file (in our case, selfdel.c) using the /c command line option. This causes the compiler to skip the linking step.

cl /nologo /c selfdel.c

Now, use the utility dumpbin to disassemble your obj file like so:

dumpbin /disasm:bytes selfdel.obj > s.asm

This produces a file called s.asm that looks like this:

Microsoft (R) COFF/PE Dumper Version 8.00.50727.42
Copyright (C) Microsoft Corporation. All rights reserved.


Dump of file selfdel.obj

File Type: COFF OBJECT

_remote_thread:
  00000000: 55                 push        ebp
  00000001: 8B EC              mov         ebp,esp
  00000003: 83 EC 10           sub         esp,10h
  00000006: 53                 push        ebx
  00000007: C7 45 F0 FF FF FF  mov         dword ptr [ebp-10h],0FFFFFFFFh
            FF
  0000000E: 8B 45 F0           mov         eax,dword ptr [ebp-10h]

...... more stuff like this ......

  000000D2: C3                 ret ; <-- this marks the end of the
                                         assembly dump for _remote_thread

...... even more stuff like this ......

The line _remote_thread: marks the beginning of the assembly dump for the remote_thread routine, and the line containing the ret statement indicates the end of the routine. Open s.asm in VS.NET 2002/2003/2005, and delete everything except the stuff that falls between _remote_thread: and ret. Delete the line containing _remote_thread: as well, so you end up with something that looks like this:

  00000000: 55                 push        ebp
  00000001: 8B EC              mov         ebp,esp
  00000003: 83 EC 10           sub         esp,10h
  00000006: 53                 push        ebx
  00000007: C7 45 F0 FF FF FF  mov         dword ptr [ebp-10h],0FFFFFFFFh
            FF
  0000000E: 8B 45 F0           mov         eax,dword ptr [ebp-10h]

...... more stuff like this ......

  000000D2: C3                 ret

What you're left with is a file that can be divided into three logical columns.

All the stuff till and including the ':' is the byte-offset for that instruction. So when you see the number '00000003', it indicates that that instruction is 3 bytes away from the beginning of the routine.
This is followed by the machine code for that instruction with one or more trailing white space characters.
All the stuff after the machine code is the assembly instruction.

We are interested in the second column, which we extract out using some nifty regular expressions in Visual Studio's Find/Replace dialog. Open the Find/Replace dialog in Visual Studio, and run the following expressions in the given order:

Sl. No.	Find	Replace	Description
1	`[0-9A-F]+\::b+{[0-9A-F:b]+}.*`	`\1`	Strips the first and the third column from the file.
2	`^:b+`	nothing	Removes leading white space. Ensure that there is absolutely nothing in the "Replace with" textbox.
3	`:b+$`	nothing	Removes trailing white space. Again, ensure that there is absolutely nothing in the "Replace with" textbox.
4	`\n`	space	Removes all new line characters from the file so there's just one line in the file. Enter a single space character in the "Replace with" textbox.
5	space	`', '\\x`	Replaces all space characters with the literal `', '\x`.

Now, type this at the beginning of the line:

char shellcode[] = { '\x

And finally, type this at the end of the line:

' };

After all of this, you should end up with something that looks like this:

char shellcode[] = {
    '\x55', '\x8B', '\xEC', '\x83', '\xEC',
    '\x10', '\x53', '\xC7', '\x45', '\xF0',
    '\xFF', '\xFF', '\xFF', '\xFF', '\x8B',

    // ...... more stuff like this ......


    '\x5B', '\x8B', '\xE5', '\x5D', '\xC3'
};

Phew!

What Do We Do With this Shellcode?

On converting remote_thread into shellcode, we are left with something that looks like this (this is just representative shellcode, and not the one that got generated for the routine shown above):

char shellcode[] = {
    '\x55', '\x8B', '\xEC', '\x83', '\xEC',
    '\x10', '\x53', '\xC7', '\x45', '\xF0',
    '\xFF', '\xFF', '\xFF', '\xFF',   // replace these 4 bytes

                                      // with actual address


    '\x8B', '\x45', '\xF0', '\x8B', '\x48',
    '\x20', '\x89', '\x4D', '\xF4', '\x8B',
    '\x55', '\xF0', '\x8B', '\x42', '\x24',
    '\x89', '\x45', '\xFC', '\x6A', '\xFF', ... more shell code here

As it turns out in our case, the value 0xFFFFFFFF that we initialized the pointer remote with in remote_thread shows up the exact same way in the shellcode also. Since we know where the entry point lives in the remote process, all we need to do is to first replace 0xFFFFFFFF in the shellcode with the actual pointer to the data before over-writing the entry point. Here's how this looks:

STARTUPINFO             si = { sizeof(si) };
PROCESS_INFORMATION     pi;
SELFDEL                 local;
DWORD                   data;
TCHAR                   szExe[MAX_PATH] = _T( "explorer.exe" );
DWORD                   process_entry;

//
// this shellcode self-deletes and then shows a messagebox
//

char shellcode[] = {
    '\x55', '\x8B', '\xEC', '\x83',
    '\xEC', '\x10', '\x53', '\xC7',

    '\xFF', '\xFF', '\xFF', '\xFF',   // replace these 4 bytes

                                      // with actual address

    '\x8B', '\x45', '\xF0', '\x8B',
    '\x48', '\x20', '\x89', '\x4D',


    ... snipped lots of meaningless shellcode here! ...

    '\xFF', '\xD0', '\x5B', '\x8B',
    '\xE5', '\x5D', '\xC3'

};

//
// initialize the SELFDEL object
//

local.fnWaitForSingleObject     = (FARPROC)WaitForSingleObject;
local.fnCloseHandle             = (FARPROC)CloseHandle;
local.fnDeleteFile              = (FARPROC)DeleteFile;
local.fnSleep                   = (FARPROC)Sleep;
local.fnExitProcess             = (FARPROC)ExitProcess;
local.fnRemoveDirectory         = (FARPROC)RemoveDirectory;
local.fnGetLastError            = (FARPROC)GetLastError;
local.fnLoadLibrary             = (FARPROC)LoadLibrary;
local.fnGetProcAddress          = (FARPROC)GetProcAddress;

//
// Give remote process a copy of our own process handle
//

DuplicateHandle(GetCurrentProcess(), GetCurrentProcess(),
    pi.hProcess, &local.hParent, 0, FALSE, 0);
GetModuleFileName(0, local.szFileName, MAX_PATH);

//
// get the process's entry point address
//

process_entry = GetProcessEntryPointAddress( pi.hProcess, pi.hThread );

//
// replace the address of the data inside the
// shellcode (bytes 10 to 13)
//

data = process_entry + sizeof( shellcode );
shellcode[13] = (char)( data >> 24 );
shellcode[12] = (char)( ( data >> 16 ) & 0xFF );
shellcode[11] = (char)( ( data >> 8 ) & 0xFF );
shellcode[10] = (char)( data & 0xFF );

//
// copy our code+data at the exe's entry-point
//

VirtualProtectEx( pi.hProcess,
                  (PVOID)process_entry,
                  sizeof( local ) + sizeof( shellcode ),
                  PAGE_EXECUTE_READWRITE,
                  &oldProt );
WriteProcessMemory( pi.hProcess,
                    (PVOID)process_entry,
                    shellcode,
                    sizeof( shellcode ), 0);
WriteProcessMemory( pi.hProcess,
                    (PVOID)data,
                    &local,
                    sizeof( local ), 0);

//
// Let the process continue
//

ResumeThread(pi.hThread);

There! That's all there is to it.

Please find the code for a self-deleting executable (that among other things, also displays a message box from the remote process' hijacked entry point) here.

Conclusion

You'll probably never have to write a program that deletes itself, in your career; but there are a few nifty, if somewhat esoteric, tricks in there, eh?! One commercial program that does this sort of thing is the helper program that you download when you use the Copilot service. However, I suspect that program uses a different technique that is probably a lot more straightforward and a gazillion times less interesting ;). Briefly, an implementation note in Copilot's functional specification document states that self-deletion is a simple matter of:

embedding a small EXE as a resource into your main program
extracting this EXE to a temporary location, passing FILE_FLAG_DELETE_ON_CLOSE to CreateFile
and having the small EXE wait till the main EXE exits, before deleting it

The FILE_FLAG_DELETE_ON_CLOSE flag should cause the OS to delete the small EXE when all open handles to it are closed. That's really neat, but hey, where's the fun in that?!

Revision history

January 07, 2007: Article first published
January 27, 2010: Fixed typo in the search/replace strings for generating shellcode

Self Deleting Executables

Contents