Click here to Skip to main content
13,189,324 members (65,551 online)
Click here to Skip to main content
Add your own
alternative version


207 bookmarked
Posted 21 Apr 2004

Kernel-mode API spying - an ultimate hack

, 21 Apr 2004
Rate this:
Please Sign up or sign in to vote.
An article on kernel-mode API spying.


After having published my article about process-wide API spying, I received plenty of encouraging messages - readers have generally accepted my model of hooking function calls. In this article, we will extend our model to kernel- mode spying, and hook the API calls that are made by our target device driver. We will also introduce a brand-new way of communication between the kernel-mode driver and the user-mode application - instead of using system services, we will implement our own mini-version of Asynchronous Procedure Calls. This task is not as complicated as it may seem - in fact, it is just shockingly easy. Windows flat memory model offers us plenty of exciting opportunities - the only thing we need is a sense of adventure (plus a good knowledge of assembly language, of course). All tips and tricks, described in this article, are 100% of my own design - you would not find anything more or less similar to these tricks anywhere.

I took into account the messages, saying that I should have provided the readers with the source code. Therefore, the source code for this article is available for download (please read the installation instructions in the end of the article).

Important: This article is based upon the bold assumption that you have read my previous article about process-wide API spying. If you haven't read it yet, go and do it now. This is an absolute must - otherwise, you will be unable to understand absolutely anything here, and the whole article will seem totally incoherent to you.


First of all, let's look at complications, arising from the fact that our spying activity is going to take place in the kernel mode. To begin with, Windows is a protected system, which means user-mode applications have no access to the kernel address space. Therefore, a spying DLL, described in the previous article, cannot work in kernel mode - in order to start spying in kernel mode, we have to write not a DLL but a kernel-mode spying driver.

Our spying driver has to communicate with the user-mode controller application, and do it asynchronously. Any virtual address in the lower 2G of the process address space is meaningless for the kernel code - even in the same process. This means kernel-mode code cannot call user-mode functions, so that our driver cannot send a window message to its controller application the way user-mode spying DLL does, because SendMessage() API function resides in user-mode address space. In fact, even if our driver could call SendMessage(), doing so would be an extremely unwise move on its behalf - SendMessage() does not return until the message gets processed, and kernel-mode code cannot afford to wait patiently for the user-mode one to accomplish its task, unless we are desperate to bring the system down. Therefore, we must find some way of asynchronous communication between the spying driver and its controller application.

Furthermore, if we spy on user-mode modules, we can save the original return address, as well as other relevant information, in the thread local storage. However, our kernel-mode driver cannot call user-mode TlsGetValue() or TlsSetValue(). TLS-related system functions that can be called by kernel-mode code don't seem to exist. If you disassemble TlsAlloc(), TlsGetValue() or TlsSetValue(), you will see that neither of these functions invokes INT 2Eh, i.e. managing TLS does not involve system services. Once every user-mode thread in the system receives its own copy of CPU stack and registers, the FS register is mapped to the beginning of the section where thread-specific data is stored. This is how kernel32.dll implements thread local storage - everything is implemented purely by user-mode code, which cannot be called by our spying driver. Therefore, we have to find some other thread-safe way of saving the original return address, as well as all other relevant information.

In addition to the above, we should not forget that our spying code may run at any IRQL. Some system services may be called at any IRQL, and some are IRQL-dependent. Therefore, we have to take the current IRQL into account when we call certain system services. Furthermore, we must make sure that, if our spying code currently runs at high IRQL, it does not go anywhere close to the paged pool. If our spying code tries to access the paged memory while running at high IRQL, or calls any function that tries to do it, the “blue screen of death”, due to the page fault, is inevitable.

We should also keep in mind that kernel-mode code is interruptible. If an interrupt occurs while our spying code runs, the system will transfer control to the interrupt handler, which may call some API function that we have hooked. In other words, the system may interrupt the execution of our spying code only in order to execute exactly the same spying code. As a result, when our interrupted code continues its execution, it may find global variables and resources in the state, very different from the one they used to be in before the interrupt had occurred. We should always remember this, and disable interrupts whenever we expect global variables and resources to be immutable while our code runs - otherwise, we can get an unpleasant surprise (i.e., the "blue screen", which is the system's standard reaction to any mistake we make in kernel-mode code).

As you can see, in order to work in kernel mode, Prolog() and Epilog() need quite a few adjustments. Although ProxyProlog() and ProxyEpilog() may stay as they are - the only thing they do is saving and restoring CPU registers and flags, which is done the same way in both kernel and user modes. We will start with "theoretical foundations" - first, we will look at how our custom asynchronous message queue and TLS can be implemented, and proceed to the actual task of spying afterwards.

Asynchronous Message Queuing

Look at the code below (I hope that, after having read my previous article, you are able to recognize handcrafted indirect jump instruction):

BYTE chunk1[32];
chunk1[0]=0xFF;chunk1[1]=0x25;int i=(int)&chunk1[6];
CreateThread(0,0,(LPTHREAD_START_ROUTINE)&chunk1[0],0,0 ,&dw);

What does the newly created thread do? Absolutely nothing - in the beginning of chunk1 array, it finds the instruction to jump to the location, address of which is stored 6 bytes above the beginning of chunk1 array. However, the only thing stored there is the address of chunk1 array itself. Therefore, the code is instructed to jump to its current position, where it finds the instruction to jump to its current position, etc., etc., etc... As a result, the thread gets into the processor-level version of infinite loop, and is unable to move anywhere - it spins at its start address.

What is going to happen if, at some point, the program executes the following lines (addr is the variable, containing a pointer to the function which, for the sake of simplicity, takes no arguments):

//fill the array with the machine codes

// what is going to happen???

When the last line of the above code gets executed, the thread will break out of its infinite loop, and jump to the handcrafted code in chunk2 array. Why? Because the thread is instructed to jump to the location, address of which is stored 6 bytes above the beginning of chunk1 array. We wrote the address of chunk1 array there, and, as a result, made the thread spin. If we write some other address 6 bytes above the beginning of chunk1 array, the code flow will jump to the new location.

The code in chunk2 array (starting from byte 10, i.e. the location to which our thread will jump) instructs the thread to push the address of chunk2 array on the stack, and to jump to the target function. Therefore, after the target function returns, the code flow will jump to the beginning of chunk2 array, where it will find the instruction to jump to the location, address of which is stored 6 bytes above the beginning of chunk2 array. Once the only thing stored there is the address of chunk2 array, the thread starts spinning again.

We can repeat this sequence again and again and again with chunk3, chunk4, chunk5, etc. - the thread will break out of its infinite loop, execute the target function, enter an infinite loop, break out when the next chunk of instructions arrives, execute the target function, enter an infinite loop again, and so on and so forth. Therefore, we can schedule the target function for subsequent execution simply by writing data to the array, rather than by means of system-defined APC. We can do it whenever we wish - our thread never terminates. The only thing we need to know is the location where the thread currently spins, or, in case if it currently executes the target function, the location where it will spin after the target function returns - we have to write the address we want it to jump to 6 bytes above this location. In practical terms, it makes sense to allocate all these chunks from the same pool, and to use an offset index as a pointer into this pool. When the pool gets fully packed, we can reuse it- all we have to do is to set an index to zero, and start it over again.

Let's say the thread runs in the user-mode process X. It is understandable that the function we want it to execute must reside in the address space of user-mode process X as well, but what about the code that actually fills the array with the machine instructions and makes the thread break out of its infinite loop? Where should it reside? It can reside anywhere - in the same module, in a different module of the same process, in another user-mode process, or even in kernel- mode driver. Furthermore, this code can be concurrently executed by different threads in different processes. As long as this code has a write access to the address space of the process X, it can post an asynchronous message that schedules the target function for execution. In order to do so, it does not need any system services - what it needs are the addresses of the target function and of the pool (as they are known to the process X), the ability to write data to this pool, plus the current value of offset index. This approach is absolutely generic, and can be used whenever we need to post an asynchronous message to an application, but, for some reason, are unable to use system-defined APC and asynchronous message queuing.

Now, let's look at how we can save thread-specific data without using system-defined TLS.

Storing Thread-Specific Data

Prolog and epilog of any function that uses EBP register as a base pointer to its local variables look like the following:

push ebp; save ebp
mov ebp,esp
sub esp, XXX; allocate local variables 
do actual things...
mov esp,ebp
pop ebp; restore ebp

It is easy to see that, as a very first step, the function must save the value of EBP register on the stack, and restore it before returning the control. This is an absolute must - otherwise, the caller will be unable to keep track of its local variables after the function returns. It is also easy to see that the function saves the value of EBP on the stack just one stack entry above the function's return address, and then copies the current value of ESP into EBP register. Therefore, the current value of EBP register always points to the location that stores the previous value of EBP.

What does it have to do with spying? Look at how our spying code flow affects the original value of EBP, i.e. the value of EBP at the time when ProxyProlog() starts execution (I hope you remember our "spying team" from the previous article):

  • ProxyProlog() - does not change EBP value before calling Prolog()
  • Prolog() - Once Prolog() is not a naked routine, the compiler will definitely generate instructions to save the value of EBP above Prolog()'s return address, and to pop this value into EBP register before Prolog() returns.
  • ProxyProlog() - does not change EBP value after Prolog() returns.

    Actual callee - either saves the original value of EBP and restores it before returning control, or leaves EBP register intact. In any case, the original value of EBP is of no importance for the actual callee - it is only the client code that actually uses it.

  • ProxyEpilog() - does not change EBP value before calling Epilog()
  • Epilog() - Once Epilog() is not a naked routine, the compiler will definitely generate instructions to save the value of EBP above Epilog()'s return address....

    Did you get it? The value, stored above Prolog()'s return address, is always going to be equal to the one, stored above Epilog()'s return address!!! Furthermore, this value in itself is of no importance until the program flow returns to the client code. This is what we can take advantage on.

  • Prolog() can save the original value of EBP, along with the original return address and all other relevant information, in the Storage structure, and write the pointer to this structure just above its return address. Once the same pointer to the Storage structure is going to be stored above Epilog()'s return address, and the original value of EBP is available from this structure, Epilog() can write this value above its return address, so that it will get popped into EBP register before Epilog() returns. As a result, at the time when the program flow jumps to the client code, the value of EBP is going to be exactly the same it used to be at the time when ProxyProlog() started execution.

    Therefore, we can save the pointer to the Storage structure in the location, pointed to by EBP register, rather than in the thread local storage - our spying code is going to stay thread-safe anyway. Such approach is suitable for both kernel-mode and user-mode spying. It is needless to say that, in case of kernel-mode spying, the Storage structure must be allocated from non-paged pool - our spying code may run at high IRQL.

Armed with this theoretical knowledge, we can implement it in practice, and proceed to the actual task of kernel-mode spying. I am sorry - I forgot to tell you that overwriting the addresses of functions, imported by kernel-mode driver, is a bit different from modifying user-mode module's IAT.

Kernel-Mode Spying

First of all, let's look at how IAT is filled with the addresses of imported functions at load time. As a first step, the loader has to locate the module's import directory, i.e. the array of IMAGE_IMPORT_DESCRIPTOR structures, from which it can obtain the pointers to Import Name and Import Address tables. After having obtained the name of the imported function from the Import Name Table, the loader can get its address from the IMAGE_EXPORT_DIRECTORY of the module that exports the given function, and write this address to the Import Address Table, so that the program can call the imported function. It is understandable that the Import Address Table is needed as long as the program runs, but what about the Import Name Table and the array of IMAGE_IMPORT_DESCRIPTOR structures? Are they needed after the loader fills the Import Address Table with the addresses of imported functions? Not really - they are needed only at the load time. What is the point of keeping them in memory after the module gets loaded?

As long as they reside in pageable memory, this is not really a big deal - they will be normally swapped to the disk, and get loaded into RAM only if we want to access them, i.e. they are not going to take up space in RAM anyway. Once user-mode modules can be paged to the disk, the loader just cannot be bothered to discard the Import Name Table and the array of IMAGE_IMPORT_DESCRIPTOR structures from memory after the user module gets loaded - it is not worth an effort. Therefore, the pointer to the array of IMAGE_IMPORT_DESCRIPTOR structures, available from the target user module's IMAGE_OPTIONAL_HEADER, is always valid.

However, the kernel-mode driver, apart from its code that has been explicitly marked as pageable, has to be constantly loaded in RAM, which, compared to virtual memory, is scarce. Under such circumstances, keeping the Import Name Table and the array of IMAGE_IMPORT_DESCRIPTOR structures in memory is just an unreasonable waste of resources. In order to solve this problem, the linker may place the Import Name Table and the array of IMAGE_IMPORT_DESCRIPTOR structures to the driver's INIT section, along with DriverEntry() and DriverReinitialize() routines, i.e. the code that is needed only during driver's initialization. After the driver is loaded, the loader simply discards its .INIT section from memory, because it does not contain any information that may be needed after driver's initialization. As a result, after the driver is loaded, the pointer to the array of IMAGE_IMPORT_DESCRIPTOR structures, available from the target driver's IMAGE_OPTIONAL_HEADER, may point right to the middle of nowhere, and, hence, should not be accessed - in order to figure it out, I had to spend two evenings in "blue screen - reboot loop".

However, the array of IMAGE_IMPORT_DESCRIPTOR structures contains information, crucial for locating the Import Address Table. What are we going to do? We will map the target driver's file into the memory, and obtain all necessary information from the file mapping. This can be done by the user-mode controller application. Look at the code below:

    ULONG Reserved[2];
    PVOID Base;
    ULONG Size;
    ULONG Flags;
    USHORT Index;
    USHORT Unknown;
    USHORT LoadCount;
    USHORT ModuleNameOffset;
    CHAR ImageName[256];

int spy(char*path)
    DWORD a,dw,x,num; 
    SYSTEM_MODULE_INFORMATION info;char buff[256]; 
    DWORD* base=0;char* fullname; char* drivername1;char* drivername2;

    //get the name of the target driver

        if(path[a]=='/'||path[a]=='\\' )

    //get the list of all loaded drivers
    typedef DWORD (__stdcall*func)(DWORD,LPVOID,DWORD,DWORD*);

    func ZwQuerySystemInformation=

    BYTE *array=new BYTE[dw];

    // check if the target driver is loaded


            "This driver is not loaded",
            "spy",MB_OK);return 0;
    // map the target driver's file into memory
    // and get the pointer to import directory
    HANDLE filehandle=CreateFile(path,GENERIC_READ|GENERIC_WRITE,
    HANDLE maphandle=CreateFileMapping(filehandle,0,
    IMAGE_DOS_HEADER * dosheader=(IMAGE_DOS_HEADER *)readbuff;


    int totalcount=0;DWORD*ptr=(DWORD*)&buff[16];

    // now we are filling the control array with offsets to IATs
    // and numbers of entries in each IAT

        DWORD firstthunk=descriptor->FirstThunk;


    // create the thread and the pool
    writebuff=(BYTE* )VirtualAllocEx(GetCurrentProcess(),
    writebuff=(BYTE* )VirtualAllocEx(GetCurrentProcess(),

    writebuff[0]=0xFF;writebuff[1]=0x25;int i=(int)&writebuff[6];
    HANDLE threadhandle=CreateThread(0,0,
        (LPTHREAD_START_ROUTINE)&writebuff[0],0,0 ,&dw);


    // keep on filling the control array with relevant info

    // open the spying driver
            "Spying driver is not loadeded","spy",MB_OK);return 0;

    //create the logfile
    char namebuff[256]; 
    strcpy(&namebuff[a], "spylogfile.txt");


    // send the control array to the spying driver

    return 1;

As a first step, we find the address at which our target driver is loaded in memory. We do it by calling ZwQuerySystemInformation() native API function, which can return information about all currently loaded drivers. ZwQuerySystemInformation() returns this information as an array of SYSTEM_MODULE_INFORMATION structures. Base field of SYSTEM_MODULE_INFORMATION structure indicates the address at which the given driver is loaded, and ImageName field may contain either the name of the driver, or the whole path to the driver's file. If ImageName field contains the path, we extract the name of the driver from the path, and compare it to the target driver's name, which has also been extracted from the path that was received by spy() as an argument. We do it until we locate the target driver, and save the Base field of its corresponding SYSTEM_MODULE_INFORMATION structure in local variable.

Then we map the target driver's file into the memory, and locate its import directory, i.e. the array of IMAGE_IMPORT_DESCRIPTOR structures. The FirstThunk field of every IMAGE_IMPORT_DESCRIPTOR structure indicates the offset to the beginning of Import Address Table that corresponds to the given imported module, and the OriginalFirstThunk field indicates the offset to the beginning of the array of IMAGE_THUNK_DATA structures. By counting the number of IMAGE_THUNK_DATA structures that have non-zero value of the Function field, we can find the number of entries in IAT that corresponds to the given imported module. For every imported module, we get the offset to the beginning of its corresponding Import Address Table, and count the number of entries in this table. We write these values into the control array, starting from control array's 16th byte. We also save pointers to the names of imported functions in global functionbuff array.

Then we allocate a page of virtual memory, fill first 10 bytes of it with the machine codes, and create the thread which will process our asynchronous messages - it is going to spin in infinite loop until the message arrives. We also write the current value of our offset index (which is currently zero), and address of the logging function to be invoked asynchronously by the spying driver, into structbuff array. This function is too simple to be listed here - it just takes the return value of the API function and the position of its name in functionbuff array as parameters, formats the null-terminated string in the form ApiFunctionXXX - returned YYY, and writes it to the log file. The only thing worth mentioning is that this logging function definitely has to be declared with __stdcall calling convention, so that it pops its arguments off the stack.

Then we fill the first 16 bytes of the control array with the remaining relevant data. This includes the address at which the target driver is loaded, the number of functions imported by the target driver, the address of the pool, and the address of the array that stores the address of the target function and the offset index. Then we create the log file in the folder where the controller application's .exe file resides. Finally, we send the control array to the spying driver by calling DeviceIoControl() with our user-defined IOCTL_START_SPYING command.

Now, let's look at what happens in the kernel mode after our spying driver receives IOCTL_START_SPYING command:

typedef struct tagRelocatedFunction{
LONG address;
LONG function;
} RelocatedFunction,*PRelocatedFunction;

typedef struct tagStorage{
    DWORD isfree;
    DWORD retaddress;
    DWORD prevEBP;
    RelocatedFunction* ptr;

//global variables
char savebuff[64];KEVENT event1,event2;
long totalcount=0,base,userbuff,userstruct;
unsigned char *replacementchunks;DWORD *functionarray;
Storage storagearray[256];
BYTE retbuff[16];BOOLEAN ishooking;
DWORD * userstructptr; BYTE * userbuffptr;

NTSTATUS DrvDispatch(IN PDEVICE_OBJECT  devobject,IN PIRP irp)
    char*buff=NULL;PIO_STACK_LOCATION loc;
    DWORD thunk, count,x,a;long num,addr;DWORD * ptr;
    BYTE*byteptr; BYTE *array=0; RelocatedFunction * reloc;



        //free resources that might be allocated 
        //by our previous call to DrvDispatch

        //map the addresses of user -mode writebuff and structbuff
        //arrays into the kernel address space
        userbuffptr= (BYTE *) 
        userstructptr=(DWORD *)

        //save the remaining relevant data

        //allocate function replacement chunks
        replacementchunks=(unsigned char*)ExAllocatePool(NonPagedPool,

        //allocate the array that holds addresses of actual functions

        // overwrite IAT entries



                reloc=(RelocatedFunction *)&byteptr[6];




    return STATUS_SUCCESS;

As a first step, we map the addresses of user-mode pool, and of the array which holds the current value of offset index and the address of the target function, into the kernel address space. We save these pointers in userstructptr and userbuffptr global variables - we must make them available to Epilog(), which would need them. We also save their virtual addresses, as they are known to the user-mode code, in userstruct and userbuff global variables - Epilog() would need them when it fills the pool with the machine instructions.

Then we save the remaining data received in the control array, in global variables. We would need this data in order to unhook the functions that we are about to hook now, so we must make sure that it is available to DrvClose(), which is going to do this job. Then we allocate the arrays that are going to hold function replacement chunks and the addresses of actual functions - we already know the number of functions we need to hook, and, hence, the number of bytes to allocate. We save these pointers in replacementchunks and functionarray global variables - we must make them available to Prolog() and DrvClose().

At this point, we can proceed to the actual task of overwriting IAT entries. We don't even have to process PE-related structures - the user-mode controller application has provided us with all information we need. I hope that, after having read my previous article, you are able to understand how we do it, so I don't go into details here - the only difference is that RelocatedFunction structure stores not the actual address of imported function, but the position at which this address can be found in functionarray table.

Now, let's look at modifications that have to be applied to Prolog() and Epilog(). ProxyProlog() and ProxyEpilog() don't need any modifications, so we don't discuss them here.

 void __stdcall Prolog(DWORD * relocptr)

    DWORD x;DWORD *ebpptr; int a=0;
    RelocatedFunction * reloc=(RelocatedFunction*)relocptr[0];
    DWORD *retaddessptr=relocptr+1;
    KIRQL irql=KeGetCurrentIrql( );

    //find the first available Storage structure
        KeWaitForSingleObject (&event1,Executive,KernelMode,0,0);

    _asm {
            lea ebx,storagearray
start:                  mov ecx,dword ptr[ebx]
            cmp ecx,100
            jne fin
            add ebx,16
            jmp start
fin:                  mov dword ptr[ebx],100
            mov storptr,ebx


    //store all relevant information in the Storage structure
    _asm mov ebpptr,ebp

    //modify the CPU stack


As a first step, Prolog() locates the first available Storage structure in storagearray table, and sets its isfree field to 100, i.e. marks it as having been occupied. We have to synchronize this operation, so we wait on synchronization event that has been initialized in DriverEntry() (DriverEntry() initializes two synchronization events and fills the retbuff array with the instructions that call ProxyEpilog(), i.e. does pretty much the same things as DllMain() in the previous article). Once we are about to wait until our event is set to the signaled state, i.e. possibly for non-zero interval, we must make sure that we do it if and only if current IRQL is below the DISPATCH_LEVEL. We also have to make sure that this operation does not get interrupted, so we disable interrupts by clearing IF flag. Once we must re-enable them as quickly as possible, the code that locates the first available Storage structure is written in pure assembly.

Then we store all relevant information in the Storage structure. This includes the original return address, the value pointed to by EBP register, and the pointer to RelocatedFunction structure. Finally, we modify the CPU stack the way we did it in the previous article, and write the pointer to the Storage structure to the location, pointed to by EBP register, i.e., one stack entry above Prolog()'s return address.

Now, let's look at Epilog():

void  __stdcall Epilog(DWORD*retvalptr)

    DWORD *ebpptr;
    DWORD*retaddessptr=retvalptr+1;DWORD retval=retvalptr[0];
    Storage*storptr;RelocatedFunction * reloc;
    DWORD i,a,b,pos,n;    KIRQL irql;

    // get the pointer to the Storage structure
    _asm mov ebpptr,ebp


    //modify the CPU stack

    // mark the Storage structure as free

    if (!ishooking)

    // now we are going to send data to
    // the controller application:

mov ebx,userstructptr
mov ecx,dword ptr[ebx]
mov a,ecx
add ecx,32
cmp ecx,4096
jl skip
sub ecx,4096
skip: mov pos,ecx
mov dword ptr[ebx],ecx

mov ebx,userbuffptr
add ebx,ecx
add ebx,6

mov edx,userbuff
add edx,ecx
mov dword ptr[ebx],edx


    // keep on filling the array with machine codes

    // instructions to spin 

    // instructions to push arguments

    //instruction to jump to the target function

    //finally, schedule the target function for execution

Epilog() gets the pointer to the Storage structure from the location pointed to by EBP register, obtains the original return address and the original value of EBP from this structure, and modifies the CPU stack - it stores the original value of EBP one stack entry above its return address, and replaces the address to which ProxyEpilog() would otherwise return control, with the original return address. Then sets the isfree field of the Storage structure to 0, i.e. marks it as free. Finally, it informs the controller application that the given API function has returned, and sends it the return value of this function. The way it is being done requires a little bit more attention.

We schedule the thread, which runs in the controller application's process, for asynchronous execution of the target function, by writing the machine codes to the pool the way it was explained in introduction. First of all, we need to get the current value of offset index in order to find the address of the chunk where we are going to write data, to write the address of this chunk (as it is known to the controller application) 6 bytes above its beginning, and to update the value of offset index. We must synchronize this operation, and make sure that it does not get interrupted. Therefore, we wait until the synchronization event is set to the signaled state (certainly, if and only if current IRQL is below the DISPATCH_LEVEL), and then disable interrupts. Once we must re-enable them as quickly as possible, the code that executes the above tasks is, again, written in pure assembly.

After having accomplished the above, we must fill the remaining part of the chunk with handcrafted instructions to push the index of real callee's address in functionarray table (which is available from RelocatedFunction structure, pointer to which was saved in Storage structure), the real callee's return value, and the address of the current chunk (as it is known to the controller application), on the stack, and then to jump to the logging function. At this point, we already don't have to worry about either interrupts or context switches - the concurrent threads are already unable to overwrite our data anyway. Therefore, before proceeding to the above task, we re-enable interrupts and set the synchronization event to the signaled state, so that other threads don't need to wait until we finish filling the chunk with the machine codes. Once the remaining part of the job is not so time-critical, we can afford to do it in C, rather than assembly.

Finally, we schedule the logging function, which resides in controller's application address space, for execution, by writing the address of the current chunk's 10th byte (as it is known to the controller application) 6 bytes above the address of the previous chunk.

As a result, when the logging function starts execution, it will receive the index of real callee's address in functionarray table and its return value as parameters. Once the functionbuff table in controller application's address space stores names of functions imported by target driver, in the same order as functionarray table in kernel address space stores their addresses, the former parameter is sufficient for locating the name of the real callee. Therefore, the logging function formats the null-terminated string in the form ApiFunctionXXX - returned YYY, and writes it to the log file.

As you can see, kernel-mode spying is a not so terribly complex task, although it makes us worry about things we don't even have to know about when spying in user-mode. The funniest thing is that this model is suitable for user-mode spying as well. Therefore, the source code, provided with this article, applies to my previous article as well.

In order to run the sample application, you have to copy the spying driver into the C://WINNT/system32/drivers directory. And to create an on-demand-start service, it can be done either manually, or with the following lines:


You must manually start the spying service by typing net start spyservice line on the command prompt before running the application. Click on Start button in the Spy menu, choose the driver you want to spy on, relax for a while, and then either click on Stop button, or just close the program. After that, you can open the log file in the text editor, and examine its contents.

Warning: The spying driver has been built by Windows 2000 DDK, and tested on Windows 2000. I really don't know what happens if you run it on any other NT platform - it is your task to find it out. If it does not work, you can always rebuild the spying driver for your platform - the source code would not need any modifications for sure.

Furthermore, I would not advise you to hook Ntfs.sys with this sample. I still have to figure out why it happens, but if you try to hook Ntfs.sys, the system starts slowing down, stops responding pretty shortly, and, finally, goes off after a while. This is not an issue when hooking all other drivers. I tried to hook keyboard and mouse class drivers, i8042prt, atapi, disk, CDROM, floppy, videoport and display - it works fine everywhere. I think that all problems with Ntfs.sys arise because, once our spying code synchronizes all calls that are made by different threads in different systems and user processes, the system cannot cope with the resulting slowdown - in case of Ntfs.sys, some of these calls are made by system threads of high priority, but our spying code currently does not take priority into the account. Probably, we should also take the priority of the calling thread into the account when we synchronize calls and disable interrupts. However, this is only the suggestion - I am not yet in a position to make a definite conclusion (otherwise, instead of speaking about this problem, I would just fix it, wouldn't I?)

I would highly appreciate if you send me an e-mail with your comments and suggestions.


In conclusion, I must say that, although we are able to hook the API calls that are made by both user-mode modules and kernel-mode drivers, our exploration of Windows is far from being over. Would not it be interesting to hook the calls that are made by the system to the target driver? It is obvious that the system has to store the addresses of all functions exported by the driver, in some kind of service table, so that it can call the driver. Therefore, we have to locate this table somehow.

Furthermore, even our process-wide API spying is far from being complete - we can hook only the old-fashioned functional API. Would not it be interesting to spy on COM interfaces as well? In the forthcoming articles, we will try to do the above mentioned things.


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Anton Bassov
Web Developer
Luxembourg Luxembourg
No Biography provided

You may also be interested in...

Comments and Discussions

GeneralMy vote of 5 Pin
gndnet25-Jul-12 10:02
membergndnet25-Jul-12 10:02 
GeneralHelp! every's advice. Pin
songjacki8-Mar-09 1:25
membersongjacki8-Mar-09 1:25 
GeneralHookAPI - WriteProcessMemory Pin
Vitoto1-Jan-06 9:16
memberVitoto1-Jan-06 9:16 
Generalprevent files I/O to USB disk driver Pin
hoangchau7-Dec-05 20:28
memberhoangchau7-Dec-05 20:28 
QuestionPE Graphical Area Access? Pin
IslamianFalcon24-Nov-05 20:56
memberIslamianFalcon24-Nov-05 20:56 
AnswerRe: PE Graphical Area Access? Pin
Vertexwahn4-Dec-05 21:42
memberVertexwahn4-Dec-05 21:42 
GeneralRe: PE Graphical Area Access? Pin
IslamianFalcon5-Dec-05 20:30
memberIslamianFalcon5-Dec-05 20:30 
GeneralThis code vs your other code... Pin
Anonymous18-Oct-05 17:47
sussAnonymous18-Oct-05 17:47 
GeneralMap Pin
luolody23-Sep-05 20:49
memberluolody23-Sep-05 20:49 
GeneralRe: Map Pin
Anton Bassov24-Sep-05 7:05
memberAnton Bassov24-Sep-05 7:05 
Generalkernell mode keylogger Pin
Kristof VB6-Aug-05 3:58
memberKristof VB6-Aug-05 3:58 
GeneralRe: kernell mode keylogger Pin
Anton Bassov7-Aug-05 12:46
memberAnton Bassov7-Aug-05 12:46 
GeneralRe: kernell mode keylogger Pin
geek18-Aug-05 3:18
membergeek18-Aug-05 3:18 
GeneralRe: kernell mode keylogger Pin
Anton Bassov18-Aug-05 4:03
memberAnton Bassov18-Aug-05 4:03 
GeneralRe: kernell mode keylogger Pin
Medus21-Nov-05 9:24
memberMedus21-Nov-05 9:24 
GeneralRe: kernell mode keylogger Pin
Anton Bassov21-Nov-05 11:07
memberAnton Bassov21-Nov-05 11:07 
GeneralRe: kernell mode keylogger Pin
Medus22-Nov-05 11:18
memberMedus22-Nov-05 11:18 
GeneralRe: kernell mode keylogger Pin
Anton Bassov22-Nov-05 14:47
memberAnton Bassov22-Nov-05 14:47 
GeneralRe: kernell mode keylogger Pin
Medus23-Nov-05 6:05
memberMedus23-Nov-05 6:05 
> Looks like we are talking about quite different things - I am talking about kernel-mode
> keylogging, and you seem to be talking about placing rootkits in microcontrollers, i.e.
> hardware rootkits that can survive system reinstallation, as well as hard-disk format.
> I am afraid that this topic, although EXTREMELY(!!!)interesting, has nothing to do with
> actual Windows kernel-mode programming

No, you made a sarcastic comment about me being able to read the key before it was even pressed - So, for making fun of me I decided to open your eyes a little. A gift. Me to you.

But, as I originally said, I have a rootkit (software) that WILL bypass your protection. I can say this because I write HAL rootkits (Something else you laughed at me about). Usermode coders rely on the OS as a Trusted Base .... And Driver coders rely on the abstractions below them. You DONT talk directly to hardware... Your PORT and REGISTER operations go through the HAL - So where the hell is your safetynet when the HAL ain't friendly ? Native calls ? Nope!

Try to think in terms of what occurs when the driverspace uses platform abstractions such as READ_PORT_xxx/WRITE_PORT_xxx/READ_REGISTER_xxx/WRITE_REGISTER_xxx to communicate with the hardware. HAL converts this to the appropriate CPU PORT operations (Or the standard memory LOAD/STORE 'indirect' operations if the CPU doesn't support the native port ops) - In this sense the kernel and the drivers sit on a comfortable abstraction of the hardware. Now, since this abstraction is implimented in SOFTWARE - a HAL-Based rootkit can read, write and delay these operations as long as it does so carefully. This allows us complete control over such things as the injection and interception of NIC data below even an NDIS wrapper firewall hook (Or even a vendor-driver-embedded firewall hook) ... And, despite the Apparent Voodoo, its quite simple to profile the NICs BUS location and vendor protocol owing to the limited categories and manufacturers of NIC chipsets.

I know this because I come from an embedded systems background and frequently have to make do WITHOUT drivers... instead, coding FPGAs in VHDL/Verilog to form hardware-specific ASICs to handle the task. So, in that sense subverting the HAL is actually a richer and more flexible playground than I'm used to. Unlike platform drivers it really is ONE layer above the onboard hardware. And IMO theres no way your code is going to get underneath this, particularly since the exposed HAL interface you see on an infected system is just a mock-up. Also, protective software is unlikely to attempt anthing so dangerously invasive in the foreseeable future - not while vendors have liability concerns and clients require platform certification and driver signing.

You can consider that the natives found through KiSystemService are also compromised and associated Ki-structures in phymem are compromised too. And, just to top it all off, if the kit wants to run a ring-3 process you're NOT going to find it without exhaustive manual examination... even if you walk the PsActiveProcessHead's _EPROCESS APL structures in physical memory our process simply isn't there (Most conventional rootkits are spotted this way, but then, they normally rely on tampering the APIs to simply filter returns)

So, as you can see - I'm talking about a SOFTWARE wall that pulls your Trusted Computing Base out from under your feet. Just as a kernel-mode coder can pull the rug out from under a userland app.

The reason you think I'm talking about hardware is that you seem unable to seperate what I said about 'hacked keyboards' and my comments about HAL rootkits ... You appear to think theres nothing below your kernel code except the hardware itself which, as I point out, is a little silly.

> Now look at the question that has started off our debate - the bloke asks how to write a kernel-mode filter driver in order to record keystrokes. This is why I told him that his logger has no chance.

Lets assume you are right (and you ain't)
- Does every PC have your wonderous code installed, protecting all apps systemwide ? No!
- Does it protect all apps, or just a single app like PGP or a Sharedealing client? Prolly not!
- So, Will his keylogging efforts work on the majority of systems? Yep!
- Will they work against the majority of Apps, despite your code protecting others? Yep!
- Will it be a great learning experience for the poster, and one that he should be encouraged to pursue ? I think so.

I don't see your problem, unless its a moral objection. Still, it has no place in academia - if this guy wants to understand the anatomy of a keylogger who are we to discourage him ? If he wants to join the eternal battle between good'n'evil then fine. People grow, and its fair to say that some of the best defenders started out as crackers.

> Concerning detection of hardware rootkits, I am afraid my software is out of luck,I admit.

No such thing as a hardware rootkit. The others are essentially databugging systems designed for embedding in keyboards/routers-consoles/terminals etc... or ethernet taps to be places in the HCC, Wiring closet or inside networked devices such as LaserPrinters. Hell, one of the best places is dangling from the jack inside a cavity wall. But rootkits? No! Unless by hardware you are referring to firmware - anyway, its of little importance.

I mention them in response to your dismissive attitude and flippant remarks about my apparent omniscience. You made a comment and you got a rise out of me. And yes, I should know better - so should you : )

> I am really glad that you mentioned them - you offered me really exciting puzzle which I
> will try to solve.
> Concerning the rest of your message..... to be honest, I do not like your pathetic and
> ridiculous show-off. For example, what is the point of mentioning Cray supercomputers on
> Windows forum???

You seemed to prefer to make fun of me. I've deliberately given you all the ammunition you want if that IS your aim. If you don't believe me on any of this, please call me out - And I'll publicly prove you wrong. Don't just snigger behind your hand like a little kid, it doesn't suit you.

> Now let's look at your other statements. On one hand,
> "I hope you realise the great difference between a tiger-team and a script-monkey...";
> "... no commercial software protection is going to want to get this dirty.... After all,
> the market chases the 'percieved threat' and currently thats predominantly gutter-code."
> On the other hand, "Sorry no, I ain't throwing out any example code..." Are you 100% sure
> you've got one, matie??? Me neither

Like I said, I'll quite happily let you install your product on one of my machines at a Con of your choice... And from another I will tell you everything you type to your protected app. No trick KBs, no tricked out BIOS.... I'll show you how, why and what you can do about it.

But code, you ain't getting. Its not just mine, its a collaborative effort and it has much diminished value to us if revealed. We're a closed-code group. I notice that your claims that the original poster should give up before he even starts because you got keylogging completely solved didn't come with code either - a situation which I completely understand. Since you are obviously aware of the need to protect proprietary methods you really have no excuse to use this against me. Personally I wouldn't even reveal this particular code under a strict NDA.

> Let's face it - this site is supposed to be the place where people can share
> their code and ideas with other people. If you really want to prove your point,
> you can write an article about hardware rootkits - I will be just delighted to
> give you 5. Otherwise, you just make a fool out of yourself - you make huge
> claims, critisize the whole IT security inductry, as well as writers of publicly
> available rootkits.... without posting a single line of code

And you told the original poster not to bother coding because you've sewn up the whole 'keylogging' problem in your latest and greatest gift to the industry (code unseen). I'm telling you that you are wrong, that an existing rootkit can demonstrate this, and I'm quite willing to show you.

Most Blackhatter groups DO NOT publish their malware till they have something better... I think you'll find that all rootkits are pretty old news by the time they go public. Thats not just my view, its pretty much how the scene goes. Unreleased tools and exploits are far more useful than the brief ego-boost of publishing. So, you only publish when you have something better. Same is true of exploits.

> BTW, could you please "point me to a site with a 4-part article outlining the schematics
> firmware and various build options" - I just love stuff like that. You said that you can
> do it, but somehow failed to mention the link

OK, Here you go:
And, the same framework, applied differently...

Now, is there anything else you'd like to call me out on ? Or are you just going to snigger behind your hand some more ? Anton, you got my offer. We both have code to protect - But I'll give you proof of the concept and show you how and why it works.

Personally, I'd like to think theres not much could be done to get under it or detect it whilst the system is running, short of snapshotting phymem and physically working through the code. But then, you might have other ideas and I'd be very happy to hear them - I don't consider our rootkit (as proud as we are of it) to be the ultimate answer either, everything evolves.

You have my respect as a coder - But don't ridicule me unless you're committed enough to put your own reputation on the line with it. You're telling people not to climb because you already conquered the mountain. Well, whats wrong with letting the new guy see how far he can take the concept ?

But telling people not to bother because you've already solved the problem. Its crass, arrogant and almost always completely wrong.

> Regards
> Anton

Now I really must apologise. I'm way off topic and certainly abusing the nature of this board... so unless you want to laugh, snigger, and tell me I don't have a clue what I'm talking about - and maintain that you've solved the great keylogging issue... then I won't write any more on the subject. I don't release code because its not in my interests, you don't release code because its not in your clients interests ... but if you want to arrange to hook up and see exactly what I'm talking about I will happily do that. We'll install your clients App and you can see the depth of my rootkit firsthand. Of course, I can't let you keep the drive afterwards : )

Mail me if you like and we can hook up, and I'll buy you that beer I mentioned. Or, if you want to tell me off - well, you can do that too : ) Also, if the original poster needs any help with K-Mode keylogging he can mail me and I'll be happy to help.

Best regards,

GeneralRe: kernell mode keylogger Pin
Anton Bassov23-Nov-05 12:22
memberAnton Bassov23-Nov-05 12:22 
GeneralRe: kernell mode keylogger Pin
Medus24-Nov-05 16:35
memberMedus24-Nov-05 16:35 
GeneralRe: kernell mode keylogger Pin
Anton Bassov24-Nov-05 20:39
memberAnton Bassov24-Nov-05 20:39 
GeneralRe: kernell mode keylogger Pin
Medus25-Nov-05 6:39
memberMedus25-Nov-05 6:39 
GeneralRe: kernell mode keylogger Pin
ivan françois12-Aug-06 13:55
memberivan françois12-Aug-06 13:55 
GeneralRe: kernell mode keylogger Pin
_DmG_24-May-07 12:34
member_DmG_24-May-07 12:34 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web01 | 2.8.171016.2 | Last Updated 22 Apr 2004
Article Copyright 2004 by Anton Bassov
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid