Sample Image

Introduction

After publishing my last article ([2]) explaining how to emulate some missing Windows functions used for remote code execution, the next logical step was to use these functions as a framework for implementing a library that allows easy remote code injection. Remote code injection is the method that permits executing code within the address space of a process other that the current one. Because the architecture of Windows isolates each process to protect them against memory overwrites and other bugs in applications, injecting code into a remote process is not straightforward. This library implements functions that allow direct remote code injection, DLL remote injection and remote subclassing for Win32 processes (GUI and CUI) and NT native processes. Don't expect to find any innovative code as this library is mainly based on the techniques described by Robert Kuster in his article "Three ways to inject your code into another process" ([1]). Nevertheless I hope that you'll find the library useful and use it in your projects.

Remote SEH (Structured Exception Handling)

All the remote code execution is protected by SEH to avoid any exception to crash the remote process. The SEH code you normally find in a C/C++ application looks like the following:

__try 
{
    // try code
}
__except(filter-expression)
{
    // except code
}

You cannot use this code in remote code because this is the compiler implementation of SEH and internally it calls the standard library functions (__except_handler3) that reside on the current process. You need to use system-level SEH ([6]). System-level SEH is implemented as a per-thread linked list of callback exception handler functions. A pointer to the beginning of this list can be retrieved from the first DWORD of the TIB (Thread Information Block). The FS segment register always points to the current TIB. To implement SEH all that is needed is to add an exception handler to the linked list. In the simplest form this can be accomplished with the following code:

push addr _exception_handler  ; Addr. of our exception handler
push dword ptr fs:[0]         ; Addr. of previous handler
mov  fs:[0], esp              ; Add it to the list

; try code goes here

pop  dword ptr fs:[0]         ; Remove our handler
add  esp, 4                   ; Clean up stack

Every time an exception in the try code block occurs, the operating system calls the _exception_handler routine. In the simplest form, only two DWORDs (which make up an EXCEPTION_REGISTRATION structure) must be pushed on the stack. Of course nothing prevents us from adding additional data fields to this structure (VC, for example, pushes an extended EXCEPTION_REGISTRATION structure containing five fields). In my implementation, I'm adding two fields to the standard SEH frame: the value of the EBP register and the address where the execution should resume after the exception occurs. The final code will look like this (You'll notice that the code is written in assembly. I used assembly for two reasons: assembly permits a greater control of the generated code and only in assembly is it possible to access the FS register):

; Set a new SEH frame
push ebp                         ; EBP at safe-place (needed for ENTER/LEAVE)
push addr _resume_at_safe_place  ; Addr. of safe-place
push addr _exception_handler     ; Addr. of our exception handler
push dword ptr fs:[0]            ; Addr. of previous handler
mov  fs:[0], esp                 ; Install new SEH handler

; ... try code ...

_resume_at_safe_place:
; Remove SEH frame
pop dword ptr fs:[0]             ; Remove SEH handler
add esp, 3*4                     ; Remove additional data from stack

EXCEPTION_DISPOSITION __cdecl _exception_handler(
         struct _EXCEPTION_RECORD                 *ExceptionRecord,
         struct _EXTENDED_EXCEPTION_REGISTRATION  *EstablisherFrame,
         struct _CONTEXT                          *ContextRecord,
         void                                     *DispatcherContext)
{
    ContextRecord->cx_Eax = ExceptionRecord->ExceptionCode;
    ContextRecord->cx_Eip = EstablisherFrame->SafeExit;
    ContextRecord->cx_Ebp = EstablisherFrame->SafeEBP;
    ContextRecord->cx_Esp = EstablisherFrame;

    return ExceptionContinueExecution;
}

The _exception_handler restores the EBP register, sets the EAX register to the exception code, and resumes execution at _resume_at_safe_place. The complete source code can be found on file "Stub.asm".

GetProcessInfo()

The GetProcessInfo() function returns valuable information about a process needed to decide what type of injection can be performed in this process. The following information is returned:

OS family: Windows 9x (95, 98, Me) or Windows NT (3, 4, 2000, XP, Vista, 7)
Process is invalid: DOS, 16-bit, system, other
Process is being debugged
Process has not yet finished its initialization
Protected process

OS family

This information is necessary because the injection algorithms are different for the Windows 9x (95, 98, Me) and NT (3, 4, 2000, XP, Vista, 7) families. The information is returned directly by a call to GetVersionEx():

OSVERSIONINFO osvi;
osvi.dwOSVersionInfoSize = sizeof(OSVERSIONINFO);
GetVersionEx(&osvi);

fWin9x = (osvi.dwPlatformId == VER_PLATFORM_WIN32_WINDOWS);
fWinNT = (osvi.dwPlatformId == VER_PLATFORM_WIN32_NT);

Invalid process

NT

An NT process is considered invalid if its exit code is not equal to 259 (hex 0x103) or if it doesn't have a PEB (Process Environment Block) (i.e., a system process).

PROCESS_BASIC_INFORMATION pbi;
NtQueryInformationProcess(hProcess,
                          ProcessBasicInformation,
                          &pbi,
                          sizeof(pbi),
                          NULL);

fINVALID = ((pbi.ExitStatus != 0x103) || 
            (pbi.PebBaseAddress == NULL));

9x

A Win9x process is invalid if its exit code is not 259 (hex 0x103) unless it is a DOS 16-bit process or it's in a termination state.

#define fINVALIDPROCFLAGS (fTerminated | fTerminating | 
                  fNearlyTerminating | fDosProcess | fWin16Process)
PDB *pPDB = GetPDB(dwPID);

fINVALID = ((pPDB->TerminationStatus != 0x103) || 
            (pPDB->Flags & fINVALIDPROCFLAGS));

Process is being debugged

NT

If either the ProcessDebugPort or the PEB BeingDebugged field is non-zero then the NT process is being debugged.

PROCESS_BASIC_INFORMATION pbi;
BOOL                      DebugPort;
PEB_NT                    PEB, *pPEB;

NtQueryInformationProcess(hProcess,
                          ProcessDebugPort,
                          &DebugPort,
                          sizeof(DebugPort),
                          NULL);

NtQueryInformationProcess(hProcess,
                          ProcessBasicInformation,
                          &pbi,
                          sizeof(pbi),
                          NULL);

pPEB = pbi.PebBaseAddress;
ReadProcessMemory(hProcess, pPEB, &PEB, sizeof(PEB), NULL);

fDEBUGGED = DebugPort || PEB.BeingDebugged;

9x

If the PDB (Process Database) Debug Context pointer is non NULL or the fDebugSingle bit of the PDB flag is set then the Win9x process is being debugged.

PDB *pPDB = GetPDB(dwPID);

fDEBUGGED =  ((pPDB->DebuggeeCB != NULL) || 
              (pPDB->Flags & fDebugSingle));

Process is not initialized

NT

If the LdrData or LoaderLock fields of the PEB are NULL then the NT process is not initialized. Both fields are set by the NT loader user-mode APC routine LdrpInitialize() while initializing the process.

fNOTINITIALIZED = (PEB.LdrData == NULL || PEB.LoaderLock == NULL);

9x

Only if the last DWORD of the main thread stack is below 2GB (0x80000000) the Win9x process is initialized ([3]).

PDB *pPDB = GetPDB(dwPID);
DWORD *pThreadHead = pPDB->ThreadList;
THREADLIST *pThreadNode = *pThreadHead;
TDB *pTDB = pThreadNode->pTDB;
void *pvStackUserTop = pTDB->tib.pvStackUserTop;
pvStackUserTop = (DWORD *)((DWORD)pvStackUserTop - sizeof(DWORD));
DWORD StackUserTopContents;
ReadProcessMemory(hProcess, pvStackUserTop, &StackUserTopContents, 
                              sizeof(StackUserTopContents), NULL);

fNOTINITIALIZED = ((int)StackUserTopContents < 0);

Protected process

Starting with Windows Vista a new type of process, called a protected process, is introduced. In a protected process the following operations cannot be performed: inject a thread, access the virtual memory, debug the process, duplicate a handle or change the quota or working set. Therefore remote injection it's not possible in protected processes. Use the following code to detect a protected process:

HANDLE                             hProcess;
PROCESS_EXTENDED_BASIC_INFORMATION ExtendedBasicInformation;
	
// Get process handle (note the PROCESS_QUERY_LIMITED_INFORMATION access !)
hProcess = OpenProcess(PROCESS_QUERY_LIMITED_INFORMATION, FALSE, dwPID);

// Get process Extended Basic Info
ExtendedBasicInformation.Size = sizeof(PROCESS_EXTENDED_BASIC_INFORMATION);
NtQueryInformationProcess(hProcess,
                          ProcessBasicInformation,
                          &ExtendedBasicInformation,
                          sizeof(ExtendedBasicInformation),
                          NULL);

fPROTECTED = ExtendedBasicInformation.IsProtectedProcess;

Subsystem

This is the type of subsystem the process uses for its user interface. It's the same as the Subsystem field found in the PE Header of the file on disk (and of the Module Header in memory).

NT

In NT the subsystem type can be directly retrieved from the PEB ImageSubsystem field:

Subsystem = PEB.ImageSubsystem;

9x

The subsystem type can be retrieved from the module's header Subsystem field. To locate the module's header in memory we can use the Kernel32 GetModuleHandle() function or the MTEModTable. The pointer to the NT header is obtained from the pNTHdr field of the IMTE (Internal Module Table Entry). The IMTE address is obtained from the MTEModTable using the PDB MTEIndex field as an index ([4] chapter 3 details all these structures and explains the hack needed to obtain the address of the MTEModTable from the Kernel32 GDIReallyCares() function).

#define GDIREALLYCARES_ORDINAL 23  // 0x17

HMODULE hKernel32 = GetModuleHandle("Kernel32.dll");
void *pGDIReallyCares = _GetProcAddress(hKernel32, GDIREALLYCARES_ORDINAL);
int GDIReallyCaresLength = GetProcLength(hKernel32, GDIREALLYCARES_ORDINAL);

// Search for MOV ECX,[addr] (8B,0D,...) inside GDIReallyCares() function
BYTE *p = MemSearch(pGDIReallyCares, GDIReallyCaresLength, "\x8B\x0D", 2);
IMTE **pMTEModTable = (IMTE **)*(DWORD *)*(DWORD *)(p+2);

PDB *pPDB = GetPDB(dwPID);
IMTE *pIMTE = pMTEModTable[pPDB->MTEIndex];
PIMAGE_NT_HEADERS32 pNTHeader = pIMTE->pNTHdr;

Subsystem = pNTHeader->OptionalHeader.Subsystem;

RemoteExecute()

The RemoteExecute() function executes code in the context of a remote process. It accepts 7 parameters:

hProcess: Handle of the remote process.
ProcessFlags: Returned by GetProcessInfo(). Can be zero.
Function: Thread function that will be executed within the remote process context. The thread function is protected against exceptions by SEH.
pData: Memory block that will be copied to the remote process address space. Can be NULL.
Size: Size of the pData block. If zero is specified pData is treated as a DWORD.
dwTimeout: Timeout in milliseconds used in wait functions. Can be INFINITE.
ExitCode: Pointer to a DWORD that will receive the remote code exit status.

The following steps are executed by RemoteExecute() (see [1]):

If ProcessFlags is zero then call GetProcessInfo().
Check if the function code is safe to be relocated (no calls or absolute addressing) and calculate its length. Note that this is not 100% secure! You should write relocatable code and analyze the generated code.
Allocate a remote memory block and copy the function code to it.
If a data block is specified, allocate a remote memory block and copy the data to it.
Allocate a remote memory block and copy the stub code to it (see file "Stub.asm"). The stub code will set an SEH frame and call the user thread function. The special native process exit is also handled by this code.
According to the ProcessFlags it will run the remote code using one of the available methods: CreateRemoteThread(), RtlCreateUserThread() or NtQueueApcThread().
Wait for remote code to finish using WaitForSingleObject(hThread) or check the Finished flag set by the stub code.
If a data block was specified, read back the data from the remote memory block.
Cleanup and return error code.

Depending on the ProcessFlags a different remote code execution method must be used:

Win32 initialized process

Use the CreateRemoteThread() function to execute the remote code (because this function doesn't exist in Win9x it must be emulated (see [2])). Starting with Windows Vista CreateRemoteThread() will fail if the target process is in a different session than the calling process. The solution to this limitation is to use the undocumented NtCreateThreadEx() function on Windows Vista and 7 ([8]). Wait for the remote code to finish by calling WaitForSingleObject() on the returned thread handle, and get the remote exit code by calling GetExitCodeThread().

Win32 non-initialized process

What you can do in a non-initialized process is very limited (because you cannot assume that the system internal structures are initialized, the DLLs are loaded, ...) therefore you should be extremely careful while injecting code into this type of process. It's advised to wait until the process finishes its initialization. For GUI processes, this can be accomplished by using the WaitForInputIdle() function, but unfortunately there's no equivalent function for the other types of processes. Anther possible technique involves setting a breakpoint into the process entry point (this allows to detect when the system part of the process initialization has terminated).

9x

Just set a bit in the CreateRemoteThread() dwCreationFlags parameter that causes this function internally to prevent the THREAD_ATTACH message being sent before PROCESS_ATTACH (see [3]).

NT

The NtQueueApcThread() function is used to queue an APC routine (our remote code) on an existing remote thread. The APC routine will run as soon as the thread becomes signaled. We cannot use wait functions on a thread for which the APC was queued and therefore to get the remote code exit status we poll the Finished flag set by the remote stub code. We also cannot use GetExitCodeThread() to get the remote exit code (this will return the "hijacked" thread exit status) so we always set the exit code to zero (of course we could save the exit status in a variable and read it later as we do with the Finished flag).

NT native process

To create an NT native process the RtlCreateUserThread() function is used. The WaitForSingleObject() and GetExitCodeThread() can be used on the returned thread handle. Note that the native remote code requires a different exit code. This is handled by the remote stub code. The code used for the native exit is the Kernel32 ExitThread() equivalent but for native processes:

Call LdrShutdownThread() to notify all DLLs on thread exit.
Release the thread stack by calling NtFreeVirtualMemory(). Note that before releasing the stack we must switch to a temporary stack. The UserReserved area within the TEB is used for this purpose.
Terminate the thread by calling NtTerminateThread().

InjectDll()

The InjectDll() function loads a DLL into the address space of a remote process. It accepts 5 parameters:

hProcess: Handle of the remote process.
ProcessFlags: Returned by GetProcessInfo(). Can be zero.
szDllPath: Path of the DLL to load. ANSI/Unicode strings can be passed to InjectDllA()/InjectDllW().
dwTimeout: Timeout in milliseconds used in wait functions. Can be INFINITE.
hRemoteDll: Pointer to an HINSTANCE variable that will receive the loaded DLL handle.

InjectDll() just initializes the data block needed by the remote code and use RemoteExecute() to remote execute the function RemoteInjectDll().

DWORD WINAPI RemoteInjectDll(RDATADLL *pData)
{
    return (pData->hRemoteDll = pData->LoadLibrary(pData->szDll));
}

RemoteInjectDll() will run in the address space of the remote process and calls LoadLibrary() to load the specified DLL within the address space of the remote process. The handle of the loaded DLL is returned.

EjectDll()

The EjectDll() function unloads a DLL from the address space of a remote process. It accepts 5 parameters:

hProcess: Handle of the remote process.
ProcessFlags: Returned by GetProcessInfo(). Can be zero.
szDllPath: Path of the DLL to unload. ANSI/Unicode strings can be passed to EjectDllA()/EjectDllW(). Can be NULL.
hRemoteDll: If szDllPath is NULL the hRemoteDll parameter is used as the DLL handle.
dwTimeout: Timeout in milliseconds used in wait functions. Can be INFINITE.

EjectDll() initializes the data block needed by the remote function and use RemoteExecute() to remote execute the function RemoteEjectDll().

DWORD WINAPI RemoteEjectDll(RDATADLL *pData)
{
    if (pData->szDll[0] != '\0')
        pData->hRemoteDll = pData->GetModuleHandle(pData->szDll);

    do {
        pData->Result = pData->FreeLibrary(pData->hRemoteDll);
    } while (pData->Result);

    return 0;
}

RemoteEjectDll() will run in the address space of the remote process and calls FreeLibrary() to unload the specified DLL. FreeLibrary() is called a number of times necessary to decrease the reference count to zero. If the DLL name is specified GetModuleHandle() is used to retrieve the handle of the DLL needed by FreeLibrary().

StartRemoteSubclass()

The StartRemoteSubclass() function subclasses a remote window (i.e., changes a remote process window procedure). It accepts 2 parameters:

rd: Pointer to a RDATA structure defined as:

typedef struct _RDATA {
  int             Size;              // Size of structure
  HANDLE          hProcess;          // Process handle
  DWORD           ProcessFlags;      // Process flags
  DWORD           dwTimeout;         // Timeout
  HWND            hWnd;              // Window handle
  struct _RDATA   *pRDATA;           // Pointer to RDATA structure
  WNDPROC         pfnStubWndProc;    // Address of stub window handler
  USERWNDPROC     pfnUserWndProc;    // Address of user's window
                                     // procedure handler
  WNDPROC         pfnOldWndProc;     // Address of old window handler
  LRESULT         Result;            // Result from user's
                                     // window procedure handler
  SETWINDOWLONG   pfnSetWindowLong;  // Address of SetWindowLong()
  CALLWINDOWPROC  pfnCallWindowProc; // Address of CallWindowProc()
} RDATA;

If you need to pass extra data to the new window procedure handler, it must be appended to the existing RDATA. Before calling StartRemoteSubclass(), the following fields of the RDATA structure must be initialized: Size must contain the size of the RDATA structure plus any appended data, hProcess must contain the handle of the remote process, and hWnd must contain the handle of the window to be subclassed. The extra fields of the appended data should also be initialized at this point. All the remaining fields should be considered private and not used.

WndProc: User window procedure that will handle the subclassed window messages. It's defined as:

typedef LRESULT (WINAPI* USERWNDPROC)(RDATA *, HWND, UINT, WPARAM, LPARAM);

Except for the first parameter (a pointer to the RDATA structure) the remaining parameters are the normal window handle, message type, and wParam and lParam found in any window procedure handler. The new window procedure handler will be called by Windows every time a message to the window must be processed, therefore the function should be coded as a "normal" window procedure handler (with the switch(Msg) loop). Please note that because this function will be executed on a remote process, it must follow the same rules as any remote code execution. Any unhandled message should be processed by the default window procedure handler. For this, the function must return FALSE. If you want to process yourself some messages, return the value in the Result field of the RDATA structure and return TRUE for the function. This function is protected from exceptions by a remote SEH frame.

StartRemoteSubclass() initializes the remaining RDATA fields and uses RemoteExecute() to remote execute the function RemoteStartSubclass():

DWORD WINAPI RemoteStartSubclass(RDATA *pData)
{
    return (pData->pfnOldWndProc = 
            pData->pfnSetWindowLong(pData->hWnd, 
                                    GWL_WNDPROC,
                                    pData->pfnStubWndProc));
}

RemoteStartSubclass() will run in the address space of the remote process and calls SetWindowLong() with the parameter GWL_WNDPROC to change the window procedure handler to a new window handler. This handler will be called by Windows every time a message to the window must be processed. The new window procedure handler (StubWndProc() of file "Stub.asm") sets an SEH frame and calls UserWndProc(). If UserWndProc() returns FALSE a call to CallWindowProc() allows the original window procedure to handle the message.

StopRemoteSubclass()

The StopRemoteSubclass() function restores the remote process original window handler. It accepts one parameter:

rd: This is the same RDATA structure passed to StartRemoteSubclass() and contains the needed data initialized by this function.

StopRemoteSubclass() releases the allocated memory and uses RemoteExecute() to remote execute the function RemoteStopSubclass():

DWORD WINAPI RemoteStopSubclass(RDATA *pData)
{
    return (pData->pfnSetWindowLong(pData->hWnd, 
            GWL_WNDPROC, pData->pfnOldWndProc));
}

RemoteStopSubclass() will run in the address space of the remote process and calls SetWindowLong() with parameter GWL_WNDPROC to restore the original window procedure handler.

Demo

Finally to demonstrate how to use the Injection Library exported functions, I wrote an application that lets you use all the injection methods on any running process (if applicable!). The application just fills a listview control with all running processes, and according to the user choices, injects code, a DLL, or subclasses a process window. From my tests, only the following processes couldn't be injected:

Windows 9x: 16-bit processes (they are considered invalid processes).
Windows NT: idle process (PID = 0), system process (PID = 4) and protected processes.

History

September 27, 2005: version 1.0 - Windows 95 to Windows XP.
November 1, 2011: version 2.0 - Updated for Vista, Windows 7.

References

"Three ways to inject your code into another process" by Robert Kuster.
"Remote Library" by António Feijão.
"PrcHelp" by Radim Picha.
"Windows 95 System Programming Secrets" by Matt Pietrek.
"Windows NT/2000 Native API Reference" by Gary Nebbett.
"A Crash Course on the Depths of Win32 Structured Exception Handling" by Matt Pietrek.
"Enumerating Windows Processes" by Alex Fedotov.
"Remote Thread Execution in System Process using NtCreateThreadEx for Vista & Windows 7" by SecurityXploded.
"Process Hacker" by wj32.