Click here to Skip to main content
12,828,476 members (46,366 online)
Click here to Skip to main content
Add your own
alternative version


206 bookmarked
Posted 21 Apr 2004

Kernel-mode API spying - an ultimate hack

, 21 Apr 2004 CPOL
Rate this:
Please Sign up or sign in to vote.
An article on kernel-mode API spying.


After having published my article about process-wide API spying, I received plenty of encouraging messages - readers have generally accepted my model of hooking function calls. In this article, we will extend our model to kernel- mode spying, and hook the API calls that are made by our target device driver. We will also introduce a brand-new way of communication between the kernel-mode driver and the user-mode application - instead of using system services, we will implement our own mini-version of Asynchronous Procedure Calls. This task is not as complicated as it may seem - in fact, it is just shockingly easy. Windows flat memory model offers us plenty of exciting opportunities - the only thing we need is a sense of adventure (plus a good knowledge of assembly language, of course). All tips and tricks, described in this article, are 100% of my own design - you would not find anything more or less similar to these tricks anywhere.

I took into account the messages, saying that I should have provided the readers with the source code. Therefore, the source code for this article is available for download (please read the installation instructions in the end of the article).

Important: This article is based upon the bold assumption that you have read my previous article about process-wide API spying. If you haven't read it yet, go and do it now. This is an absolute must - otherwise, you will be unable to understand absolutely anything here, and the whole article will seem totally incoherent to you.


First of all, let's look at complications, arising from the fact that our spying activity is going to take place in the kernel mode. To begin with, Windows is a protected system, which means user-mode applications have no access to the kernel address space. Therefore, a spying DLL, described in the previous article, cannot work in kernel mode - in order to start spying in kernel mode, we have to write not a DLL but a kernel-mode spying driver.

Our spying driver has to communicate with the user-mode controller application, and do it asynchronously. Any virtual address in the lower 2G of the process address space is meaningless for the kernel code - even in the same process. This means kernel-mode code cannot call user-mode functions, so that our driver cannot send a window message to its controller application the way user-mode spying DLL does, because SendMessage() API function resides in user-mode address space. In fact, even if our driver could call SendMessage(), doing so would be an extremely unwise move on its behalf - SendMessage() does not return until the message gets processed, and kernel-mode code cannot afford to wait patiently for the user-mode one to accomplish its task, unless we are desperate to bring the system down. Therefore, we must find some way of asynchronous communication between the spying driver and its controller application.

Furthermore, if we spy on user-mode modules, we can save the original return address, as well as other relevant information, in the thread local storage. However, our kernel-mode driver cannot call user-mode TlsGetValue() or TlsSetValue(). TLS-related system functions that can be called by kernel-mode code don't seem to exist. If you disassemble TlsAlloc(), TlsGetValue() or TlsSetValue(), you will see that neither of these functions invokes INT 2Eh, i.e. managing TLS does not involve system services. Once every user-mode thread in the system receives its own copy of CPU stack and registers, the FS register is mapped to the beginning of the section where thread-specific data is stored. This is how kernel32.dll implements thread local storage - everything is implemented purely by user-mode code, which cannot be called by our spying driver. Therefore, we have to find some other thread-safe way of saving the original return address, as well as all other relevant information.

In addition to the above, we should not forget that our spying code may run at any IRQL. Some system services may be called at any IRQL, and some are IRQL-dependent. Therefore, we have to take the current IRQL into account when we call certain system services. Furthermore, we must make sure that, if our spying code currently runs at high IRQL, it does not go anywhere close to the paged pool. If our spying code tries to access the paged memory while running at high IRQL, or calls any function that tries to do it, the “blue screen of death”, due to the page fault, is inevitable.

We should also keep in mind that kernel-mode code is interruptible. If an interrupt occurs while our spying code runs, the system will transfer control to the interrupt handler, which may call some API function that we have hooked. In other words, the system may interrupt the execution of our spying code only in order to execute exactly the same spying code. As a result, when our interrupted code continues its execution, it may find global variables and resources in the state, very different from the one they used to be in before the interrupt had occurred. We should always remember this, and disable interrupts whenever we expect global variables and resources to be immutable while our code runs - otherwise, we can get an unpleasant surprise (i.e., the "blue screen", which is the system's standard reaction to any mistake we make in kernel-mode code).

As you can see, in order to work in kernel mode, Prolog() and Epilog() need quite a few adjustments. Although ProxyProlog() and ProxyEpilog() may stay as they are - the only thing they do is saving and restoring CPU registers and flags, which is done the same way in both kernel and user modes. We will start with "theoretical foundations" - first, we will look at how our custom asynchronous message queue and TLS can be implemented, and proceed to the actual task of spying afterwards.

Asynchronous Message Queuing

Look at the code below (I hope that, after having read my previous article, you are able to recognize handcrafted indirect jump instruction):

BYTE chunk1[32];
chunk1[0]=0xFF;chunk1[1]=0x25;int i=(int)&chunk1[6];
CreateThread(0,0,(LPTHREAD_START_ROUTINE)&chunk1[0],0,0 ,&dw);

What does the newly created thread do? Absolutely nothing - in the beginning of chunk1 array, it finds the instruction to jump to the location, address of which is stored 6 bytes above the beginning of chunk1 array. However, the only thing stored there is the address of chunk1 array itself. Therefore, the code is instructed to jump to its current position, where it finds the instruction to jump to its current position, etc., etc., etc... As a result, the thread gets into the processor-level version of infinite loop, and is unable to move anywhere - it spins at its start address.

What is going to happen if, at some point, the program executes the following lines (addr is the variable, containing a pointer to the function which, for the sake of simplicity, takes no arguments):

//fill the array with the machine codes

// what is going to happen???

When the last line of the above code gets executed, the thread will break out of its infinite loop, and jump to the handcrafted code in chunk2 array. Why? Because the thread is instructed to jump to the location, address of which is stored 6 bytes above the beginning of chunk1 array. We wrote the address of chunk1 array there, and, as a result, made the thread spin. If we write some other address 6 bytes above the beginning of chunk1 array, the code flow will jump to the new location.

The code in chunk2 array (starting from byte 10, i.e. the location to which our thread will jump) instructs the thread to push the address of chunk2 array on the stack, and to jump to the target function. Therefore, after the target function returns, the code flow will jump to the beginning of chunk2 array, where it will find the instruction to jump to the location, address of which is stored 6 bytes above the beginning of chunk2 array. Once the only thing stored there is the address of chunk2 array, the thread starts spinning again.

We can repeat this sequence again and again and again with chunk3, chunk4, chunk5, etc. - the thread will break out of its infinite loop, execute the target function, enter an infinite loop, break out when the next chunk of instructions arrives, execute the target function, enter an infinite loop again, and so on and so forth. Therefore, we can schedule the target function for subsequent execution simply by writing data to the array, rather than by means of system-defined APC. We can do it whenever we wish - our thread never terminates. The only thing we need to know is the location where the thread currently spins, or, in case if it currently executes the target function, the location where it will spin after the target function returns - we have to write the address we want it to jump to 6 bytes above this location. In practical terms, it makes sense to allocate all these chunks from the same pool, and to use an offset index as a pointer into this pool. When the pool gets fully packed, we can reuse it- all we have to do is to set an index to zero, and start it over again.

Let's say the thread runs in the user-mode process X. It is understandable that the function we want it to execute must reside in the address space of user-mode process X as well, but what about the code that actually fills the array with the machine instructions and makes the thread break out of its infinite loop? Where should it reside? It can reside anywhere - in the same module, in a different module of the same process, in another user-mode process, or even in kernel- mode driver. Furthermore, this code can be concurrently executed by different threads in different processes. As long as this code has a write access to the address space of the process X, it can post an asynchronous message that schedules the target function for execution. In order to do so, it does not need any system services - what it needs are the addresses of the target function and of the pool (as they are known to the process X), the ability to write data to this pool, plus the current value of offset index. This approach is absolutely generic, and can be used whenever we need to post an asynchronous message to an application, but, for some reason, are unable to use system-defined APC and asynchronous message queuing.

Now, let's look at how we can save thread-specific data without using system-defined TLS.

Storing Thread-Specific Data

Prolog and epilog of any function that uses EBP register as a base pointer to its local variables look like the following:

push ebp; save ebp
mov ebp,esp
sub esp, XXX; allocate local variables 
do actual things...
mov esp,ebp
pop ebp; restore ebp

It is easy to see that, as a very first step, the function must save the value of EBP register on the stack, and restore it before returning the control. This is an absolute must - otherwise, the caller will be unable to keep track of its local variables after the function returns. It is also easy to see that the function saves the value of EBP on the stack just one stack entry above the function's return address, and then copies the current value of ESP into EBP register. Therefore, the current value of EBP register always points to the location that stores the previous value of EBP.

What does it have to do with spying? Look at how our spying code flow affects the original value of EBP, i.e. the value of EBP at the time when ProxyProlog() starts execution (I hope you remember our "spying team" from the previous article):

  • ProxyProlog() - does not change EBP value before calling Prolog()
  • Prolog() - Once Prolog() is not a naked routine, the compiler will definitely generate instructions to save the value of EBP above Prolog()'s return address, and to pop this value into EBP register before Prolog() returns.
  • ProxyProlog() - does not change EBP value after Prolog() returns.

    Actual callee - either saves the original value of EBP and restores it before returning control, or leaves EBP register intact. In any case, the original value of EBP is of no importance for the actual callee - it is only the client code that actually uses it.

  • ProxyEpilog() - does not change EBP value before calling Epilog()
  • Epilog() - Once Epilog() is not a naked routine, the compiler will definitely generate instructions to save the value of EBP above Epilog()'s return address....

    Did you get it? The value, stored above Prolog()'s return address, is always going to be equal to the one, stored above Epilog()'s return address!!! Furthermore, this value in itself is of no importance until the program flow returns to the client code. This is what we can take advantage on.

  • Prolog() can save the original value of EBP, along with the original return address and all other relevant information, in the Storage structure, and write the pointer to this structure just above its return address. Once the same pointer to the Storage structure is going to be stored above Epilog()'s return address, and the original value of EBP is available from this structure, Epilog() can write this value above its return address, so that it will get popped into EBP register before Epilog() returns. As a result, at the time when the program flow jumps to the client code, the value of EBP is going to be exactly the same it used to be at the time when ProxyProlog() started execution.

    Therefore, we can save the pointer to the Storage structure in the location, pointed to by EBP register, rather than in the thread local storage - our spying code is going to stay thread-safe anyway. Such approach is suitable for both kernel-mode and user-mode spying. It is needless to say that, in case of kernel-mode spying, the Storage structure must be allocated from non-paged pool - our spying code may run at high IRQL.

Armed with this theoretical knowledge, we can implement it in practice, and proceed to the actual task of kernel-mode spying. I am sorry - I forgot to tell you that overwriting the addresses of functions, imported by kernel-mode driver, is a bit different from modifying user-mode module's IAT.

Kernel-Mode Spying

First of all, let's look at how IAT is filled with the addresses of imported functions at load time. As a first step, the loader has to locate the module's import directory, i.e. the array of IMAGE_IMPORT_DESCRIPTOR structures, from which it can obtain the pointers to Import Name and Import Address tables. After having obtained the name of the imported function from the Import Name Table, the loader can get its address from the IMAGE_EXPORT_DIRECTORY of the module that exports the given function, and write this address to the Import Address Table, so that the program can call the imported function. It is understandable that the Import Address Table is needed as long as the program runs, but what about the Import Name Table and the array of IMAGE_IMPORT_DESCRIPTOR structures? Are they needed after the loader fills the Import Address Table with the addresses of imported functions? Not really - they are needed only at the load time. What is the point of keeping them in memory after the module gets loaded?

As long as they reside in pageable memory, this is not really a big deal - they will be normally swapped to the disk, and get loaded into RAM only if we want to access them, i.e. they are not going to take up space in RAM anyway. Once user-mode modules can be paged to the disk, the loader just cannot be bothered to discard the Import Name Table and the array of IMAGE_IMPORT_DESCRIPTOR structures from memory after the user module gets loaded - it is not worth an effort. Therefore, the pointer to the array of IMAGE_IMPORT_DESCRIPTOR structures, available from the target user module's IMAGE_OPTIONAL_HEADER, is always valid.

However, the kernel-mode driver, apart from its code that has been explicitly marked as pageable, has to be constantly loaded in RAM, which, compared to virtual memory, is scarce. Under such circumstances, keeping the Import Name Table and the array of IMAGE_IMPORT_DESCRIPTOR structures in memory is just an unreasonable waste of resources. In order to solve this problem, the linker may place the Import Name Table and the array of IMAGE_IMPORT_DESCRIPTOR structures to the driver's INIT section, along with DriverEntry() and DriverReinitialize() routines, i.e. the code that is needed only during driver's initialization. After the driver is loaded, the loader simply discards its .INIT section from memory, because it does not contain any information that may be needed after driver's initialization. As a result, after the driver is loaded, the pointer to the array of IMAGE_IMPORT_DESCRIPTOR structures, available from the target driver's IMAGE_OPTIONAL_HEADER, may point right to the middle of nowhere, and, hence, should not be accessed - in order to figure it out, I had to spend two evenings in "blue screen - reboot loop".

However, the array of IMAGE_IMPORT_DESCRIPTOR structures contains information, crucial for locating the Import Address Table. What are we going to do? We will map the target driver's file into the memory, and obtain all necessary information from the file mapping. This can be done by the user-mode controller application. Look at the code below:

    ULONG Reserved[2];
    PVOID Base;
    ULONG Size;
    ULONG Flags;
    USHORT Index;
    USHORT Unknown;
    USHORT LoadCount;
    USHORT ModuleNameOffset;
    CHAR ImageName[256];

int spy(char*path)
    DWORD a,dw,x,num; 
    SYSTEM_MODULE_INFORMATION info;char buff[256]; 
    DWORD* base=0;char* fullname; char* drivername1;char* drivername2;

    //get the name of the target driver

        if(path[a]=='/'||path[a]=='\\' )

    //get the list of all loaded drivers
    typedef DWORD (__stdcall*func)(DWORD,LPVOID,DWORD,DWORD*);

    func ZwQuerySystemInformation=

    BYTE *array=new BYTE[dw];

    // check if the target driver is loaded


            "This driver is not loaded",
            "spy",MB_OK);return 0;
    // map the target driver's file into memory
    // and get the pointer to import directory
    HANDLE filehandle=CreateFile(path,GENERIC_READ|GENERIC_WRITE,
    HANDLE maphandle=CreateFileMapping(filehandle,0,
    IMAGE_DOS_HEADER * dosheader=(IMAGE_DOS_HEADER *)readbuff;


    int totalcount=0;DWORD*ptr=(DWORD*)&buff[16];

    // now we are filling the control array with offsets to IATs
    // and numbers of entries in each IAT

        DWORD firstthunk=descriptor->FirstThunk;


    // create the thread and the pool
    writebuff=(BYTE* )VirtualAllocEx(GetCurrentProcess(),
    writebuff=(BYTE* )VirtualAllocEx(GetCurrentProcess(),

    writebuff[0]=0xFF;writebuff[1]=0x25;int i=(int)&writebuff[6];
    HANDLE threadhandle=CreateThread(0,0,
        (LPTHREAD_START_ROUTINE)&writebuff[0],0,0 ,&dw);


    // keep on filling the control array with relevant info

    // open the spying driver
            "Spying driver is not loadeded","spy",MB_OK);return 0;

    //create the logfile
    char namebuff[256]; 
    strcpy(&namebuff[a], "spylogfile.txt");


    // send the control array to the spying driver

    return 1;

As a first step, we find the address at which our target driver is loaded in memory. We do it by calling ZwQuerySystemInformation() native API function, which can return information about all currently loaded drivers. ZwQuerySystemInformation() returns this information as an array of SYSTEM_MODULE_INFORMATION structures. Base field of SYSTEM_MODULE_INFORMATION structure indicates the address at which the given driver is loaded, and ImageName field may contain either the name of the driver, or the whole path to the driver's file. If ImageName field contains the path, we extract the name of the driver from the path, and compare it to the target driver's name, which has also been extracted from the path that was received by spy() as an argument. We do it until we locate the target driver, and save the Base field of its corresponding SYSTEM_MODULE_INFORMATION structure in local variable.

Then we map the target driver's file into the memory, and locate its import directory, i.e. the array of IMAGE_IMPORT_DESCRIPTOR structures. The FirstThunk field of every IMAGE_IMPORT_DESCRIPTOR structure indicates the offset to the beginning of Import Address Table that corresponds to the given imported module, and the OriginalFirstThunk field indicates the offset to the beginning of the array of IMAGE_THUNK_DATA structures. By counting the number of IMAGE_THUNK_DATA structures that have non-zero value of the Function field, we can find the number of entries in IAT that corresponds to the given imported module. For every imported module, we get the offset to the beginning of its corresponding Import Address Table, and count the number of entries in this table. We write these values into the control array, starting from control array's 16th byte. We also save pointers to the names of imported functions in global functionbuff array.

Then we allocate a page of virtual memory, fill first 10 bytes of it with the machine codes, and create the thread which will process our asynchronous messages - it is going to spin in infinite loop until the message arrives. We also write the current value of our offset index (which is currently zero), and address of the logging function to be invoked asynchronously by the spying driver, into structbuff array. This function is too simple to be listed here - it just takes the return value of the API function and the position of its name in functionbuff array as parameters, formats the null-terminated string in the form ApiFunctionXXX - returned YYY, and writes it to the log file. The only thing worth mentioning is that this logging function definitely has to be declared with __stdcall calling convention, so that it pops its arguments off the stack.

Then we fill the first 16 bytes of the control array with the remaining relevant data. This includes the address at which the target driver is loaded, the number of functions imported by the target driver, the address of the pool, and the address of the array that stores the address of the target function and the offset index. Then we create the log file in the folder where the controller application's .exe file resides. Finally, we send the control array to the spying driver by calling DeviceIoControl() with our user-defined IOCTL_START_SPYING command.

Now, let's look at what happens in the kernel mode after our spying driver receives IOCTL_START_SPYING command:

typedef struct tagRelocatedFunction{
LONG address;
LONG function;
} RelocatedFunction,*PRelocatedFunction;

typedef struct tagStorage{
    DWORD isfree;
    DWORD retaddress;
    DWORD prevEBP;
    RelocatedFunction* ptr;

//global variables
char savebuff[64];KEVENT event1,event2;
long totalcount=0,base,userbuff,userstruct;
unsigned char *replacementchunks;DWORD *functionarray;
Storage storagearray[256];
BYTE retbuff[16];BOOLEAN ishooking;
DWORD * userstructptr; BYTE * userbuffptr;

NTSTATUS DrvDispatch(IN PDEVICE_OBJECT  devobject,IN PIRP irp)
    char*buff=NULL;PIO_STACK_LOCATION loc;
    DWORD thunk, count,x,a;long num,addr;DWORD * ptr;
    BYTE*byteptr; BYTE *array=0; RelocatedFunction * reloc;



        //free resources that might be allocated 
        //by our previous call to DrvDispatch

        //map the addresses of user -mode writebuff and structbuff
        //arrays into the kernel address space
        userbuffptr= (BYTE *) 
        userstructptr=(DWORD *)

        //save the remaining relevant data

        //allocate function replacement chunks
        replacementchunks=(unsigned char*)ExAllocatePool(NonPagedPool,

        //allocate the array that holds addresses of actual functions

        // overwrite IAT entries



                reloc=(RelocatedFunction *)&byteptr[6];




    return STATUS_SUCCESS;

As a first step, we map the addresses of user-mode pool, and of the array which holds the current value of offset index and the address of the target function, into the kernel address space. We save these pointers in userstructptr and userbuffptr global variables - we must make them available to Epilog(), which would need them. We also save their virtual addresses, as they are known to the user-mode code, in userstruct and userbuff global variables - Epilog() would need them when it fills the pool with the machine instructions.

Then we save the remaining data received in the control array, in global variables. We would need this data in order to unhook the functions that we are about to hook now, so we must make sure that it is available to DrvClose(), which is going to do this job. Then we allocate the arrays that are going to hold function replacement chunks and the addresses of actual functions - we already know the number of functions we need to hook, and, hence, the number of bytes to allocate. We save these pointers in replacementchunks and functionarray global variables - we must make them available to Prolog() and DrvClose().

At this point, we can proceed to the actual task of overwriting IAT entries. We don't even have to process PE-related structures - the user-mode controller application has provided us with all information we need. I hope that, after having read my previous article, you are able to understand how we do it, so I don't go into details here - the only difference is that RelocatedFunction structure stores not the actual address of imported function, but the position at which this address can be found in functionarray table.

Now, let's look at modifications that have to be applied to Prolog() and Epilog(). ProxyProlog() and ProxyEpilog() don't need any modifications, so we don't discuss them here.

 void __stdcall Prolog(DWORD * relocptr)

    DWORD x;DWORD *ebpptr; int a=0;
    RelocatedFunction * reloc=(RelocatedFunction*)relocptr[0];
    DWORD *retaddessptr=relocptr+1;
    KIRQL irql=KeGetCurrentIrql( );

    //find the first available Storage structure
        KeWaitForSingleObject (&event1,Executive,KernelMode,0,0);

    _asm {
            lea ebx,storagearray
start:                  mov ecx,dword ptr[ebx]
            cmp ecx,100
            jne fin
            add ebx,16
            jmp start
fin:                  mov dword ptr[ebx],100
            mov storptr,ebx


    //store all relevant information in the Storage structure
    _asm mov ebpptr,ebp

    //modify the CPU stack


As a first step, Prolog() locates the first available Storage structure in storagearray table, and sets its isfree field to 100, i.e. marks it as having been occupied. We have to synchronize this operation, so we wait on synchronization event that has been initialized in DriverEntry() (DriverEntry() initializes two synchronization events and fills the retbuff array with the instructions that call ProxyEpilog(), i.e. does pretty much the same things as DllMain() in the previous article). Once we are about to wait until our event is set to the signaled state, i.e. possibly for non-zero interval, we must make sure that we do it if and only if current IRQL is below the DISPATCH_LEVEL. We also have to make sure that this operation does not get interrupted, so we disable interrupts by clearing IF flag. Once we must re-enable them as quickly as possible, the code that locates the first available Storage structure is written in pure assembly.

Then we store all relevant information in the Storage structure. This includes the original return address, the value pointed to by EBP register, and the pointer to RelocatedFunction structure. Finally, we modify the CPU stack the way we did it in the previous article, and write the pointer to the Storage structure to the location, pointed to by EBP register, i.e., one stack entry above Prolog()'s return address.

Now, let's look at Epilog():

void  __stdcall Epilog(DWORD*retvalptr)

    DWORD *ebpptr;
    DWORD*retaddessptr=retvalptr+1;DWORD retval=retvalptr[0];
    Storage*storptr;RelocatedFunction * reloc;
    DWORD i,a,b,pos,n;    KIRQL irql;

    // get the pointer to the Storage structure
    _asm mov ebpptr,ebp


    //modify the CPU stack

    // mark the Storage structure as free

    if (!ishooking)

    // now we are going to send data to
    // the controller application:

mov ebx,userstructptr
mov ecx,dword ptr[ebx]
mov a,ecx
add ecx,32
cmp ecx,4096
jl skip
sub ecx,4096
skip: mov pos,ecx
mov dword ptr[ebx],ecx

mov ebx,userbuffptr
add ebx,ecx
add ebx,6

mov edx,userbuff
add edx,ecx
mov dword ptr[ebx],edx


    // keep on filling the array with machine codes

    // instructions to spin 

    // instructions to push arguments

    //instruction to jump to the target function

    //finally, schedule the target function for execution

Epilog() gets the pointer to the Storage structure from the location pointed to by EBP register, obtains the original return address and the original value of EBP from this structure, and modifies the CPU stack - it stores the original value of EBP one stack entry above its return address, and replaces the address to which ProxyEpilog() would otherwise return control, with the original return address. Then sets the isfree field of the Storage structure to 0, i.e. marks it as free. Finally, it informs the controller application that the given API function has returned, and sends it the return value of this function. The way it is being done requires a little bit more attention.

We schedule the thread, which runs in the controller application's process, for asynchronous execution of the target function, by writing the machine codes to the pool the way it was explained in introduction. First of all, we need to get the current value of offset index in order to find the address of the chunk where we are going to write data, to write the address of this chunk (as it is known to the controller application) 6 bytes above its beginning, and to update the value of offset index. We must synchronize this operation, and make sure that it does not get interrupted. Therefore, we wait until the synchronization event is set to the signaled state (certainly, if and only if current IRQL is below the DISPATCH_LEVEL), and then disable interrupts. Once we must re-enable them as quickly as possible, the code that executes the above tasks is, again, written in pure assembly.

After having accomplished the above, we must fill the remaining part of the chunk with handcrafted instructions to push the index of real callee's address in functionarray table (which is available from RelocatedFunction structure, pointer to which was saved in Storage structure), the real callee's return value, and the address of the current chunk (as it is known to the controller application), on the stack, and then to jump to the logging function. At this point, we already don't have to worry about either interrupts or context switches - the concurrent threads are already unable to overwrite our data anyway. Therefore, before proceeding to the above task, we re-enable interrupts and set the synchronization event to the signaled state, so that other threads don't need to wait until we finish filling the chunk with the machine codes. Once the remaining part of the job is not so time-critical, we can afford to do it in C, rather than assembly.

Finally, we schedule the logging function, which resides in controller's application address space, for execution, by writing the address of the current chunk's 10th byte (as it is known to the controller application) 6 bytes above the address of the previous chunk.

As a result, when the logging function starts execution, it will receive the index of real callee's address in functionarray table and its return value as parameters. Once the functionbuff table in controller application's address space stores names of functions imported by target driver, in the same order as functionarray table in kernel address space stores their addresses, the former parameter is sufficient for locating the name of the real callee. Therefore, the logging function formats the null-terminated string in the form ApiFunctionXXX - returned YYY, and writes it to the log file.

As you can see, kernel-mode spying is a not so terribly complex task, although it makes us worry about things we don't even have to know about when spying in user-mode. The funniest thing is that this model is suitable for user-mode spying as well. Therefore, the source code, provided with this article, applies to my previous article as well.

In order to run the sample application, you have to copy the spying driver into the C://WINNT/system32/drivers directory. And to create an on-demand-start service, it can be done either manually, or with the following lines:


You must manually start the spying service by typing net start spyservice line on the command prompt before running the application. Click on Start button in the Spy menu, choose the driver you want to spy on, relax for a while, and then either click on Stop button, or just close the program. After that, you can open the log file in the text editor, and examine its contents.

Warning: The spying driver has been built by Windows 2000 DDK, and tested on Windows 2000. I really don't know what happens if you run it on any other NT platform - it is your task to find it out. If it does not work, you can always rebuild the spying driver for your platform - the source code would not need any modifications for sure.

Furthermore, I would not advise you to hook Ntfs.sys with this sample. I still have to figure out why it happens, but if you try to hook Ntfs.sys, the system starts slowing down, stops responding pretty shortly, and, finally, goes off after a while. This is not an issue when hooking all other drivers. I tried to hook keyboard and mouse class drivers, i8042prt, atapi, disk, CDROM, floppy, videoport and display - it works fine everywhere. I think that all problems with Ntfs.sys arise because, once our spying code synchronizes all calls that are made by different threads in different systems and user processes, the system cannot cope with the resulting slowdown - in case of Ntfs.sys, some of these calls are made by system threads of high priority, but our spying code currently does not take priority into the account. Probably, we should also take the priority of the calling thread into the account when we synchronize calls and disable interrupts. However, this is only the suggestion - I am not yet in a position to make a definite conclusion (otherwise, instead of speaking about this problem, I would just fix it, wouldn't I?)

I would highly appreciate if you send me an e-mail with your comments and suggestions.


In conclusion, I must say that, although we are able to hook the API calls that are made by both user-mode modules and kernel-mode drivers, our exploration of Windows is far from being over. Would not it be interesting to hook the calls that are made by the system to the target driver? It is obvious that the system has to store the addresses of all functions exported by the driver, in some kind of service table, so that it can call the driver. Therefore, we have to locate this table somehow.

Furthermore, even our process-wide API spying is far from being complete - we can hook only the old-fashioned functional API. Would not it be interesting to spy on COM interfaces as well? In the forthcoming articles, we will try to do the above mentioned things.


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


About the Author

Anton Bassov
Web Developer
Luxembourg Luxembourg
No Biography provided

You may also be interested in...

Comments and Discussions

GeneralMy vote of 5 Pin
gndnet25-Jul-12 11:02
membergndnet25-Jul-12 11:02 
GeneralHelp! every's advice. Pin
songjacki8-Mar-09 2:25
membersongjacki8-Mar-09 2:25 
GeneralHookAPI - WriteProcessMemory Pin
Vitoto1-Jan-06 10:16
memberVitoto1-Jan-06 10:16 
Generalprevent files I/O to USB disk driver Pin
hoangchau7-Dec-05 21:28
memberhoangchau7-Dec-05 21:28 
QuestionPE Graphical Area Access? Pin
IslamianFalcon24-Nov-05 21:56
memberIslamianFalcon24-Nov-05 21:56 
AnswerRe: PE Graphical Area Access? Pin
Vertexwahn4-Dec-05 22:42
memberVertexwahn4-Dec-05 22:42 
GeneralRe: PE Graphical Area Access? Pin
IslamianFalcon5-Dec-05 21:30
memberIslamianFalcon5-Dec-05 21:30 
GeneralThis code vs your other code... Pin
Anonymous18-Oct-05 18:47
sussAnonymous18-Oct-05 18:47 
GeneralMap Pin
luolody23-Sep-05 21:49
memberluolody23-Sep-05 21:49 
GeneralRe: Map Pin
Anton Bassov24-Sep-05 8:05
memberAnton Bassov24-Sep-05 8:05 
Generalkernell mode keylogger Pin
Kristof VB6-Aug-05 4:58
memberKristof VB6-Aug-05 4:58 
GeneralRe: kernell mode keylogger Pin
Anton Bassov7-Aug-05 13:46
memberAnton Bassov7-Aug-05 13:46 
GeneralRe: kernell mode keylogger Pin
geek18-Aug-05 4:18
membergeek18-Aug-05 4:18 
GeneralRe: kernell mode keylogger Pin
Anton Bassov18-Aug-05 5:03
memberAnton Bassov18-Aug-05 5:03 
GeneralRe: kernell mode keylogger Pin
Medus21-Nov-05 10:24
memberMedus21-Nov-05 10:24 
GeneralRe: kernell mode keylogger Pin
Anton Bassov21-Nov-05 12:07
memberAnton Bassov21-Nov-05 12:07 
GeneralRe: kernell mode keylogger Pin
Medus22-Nov-05 12:18
memberMedus22-Nov-05 12:18 
GeneralRe: kernell mode keylogger Pin
Anton Bassov22-Nov-05 15:47
memberAnton Bassov22-Nov-05 15:47 
GeneralRe: kernell mode keylogger Pin
Medus23-Nov-05 7:05
memberMedus23-Nov-05 7:05 
GeneralRe: kernell mode keylogger Pin
Anton Bassov23-Nov-05 13:22
memberAnton Bassov23-Nov-05 13:22 
GeneralRe: kernell mode keylogger Pin
Medus24-Nov-05 17:35
memberMedus24-Nov-05 17:35 
GeneralRe: kernell mode keylogger Pin
Anton Bassov24-Nov-05 21:39
memberAnton Bassov24-Nov-05 21:39 
GeneralRe: kernell mode keylogger Pin
Medus25-Nov-05 7:39
memberMedus25-Nov-05 7:39 
> Hi mate

> For the first time ever you sound like someone with whom I can have a "civilized and orderly"
> discussion. First of all, I would like to apologize for my relatively harsh message of
> yesterday. In fact, I was just really annoyed by your attitudes - you were 100% sure that
> my software "sits on top of HAL", although you never had a chance to see it. How can someone
> make ANY(!!!) assumption about the internals of the program he has never seen???

I believe I said 'I haven't seen your code, but...' Upon reflection, and having been impressed by your approach, I'd have to say that I think theres very little room for two such apps operating correctly on one machine. I'm assuming that your code is only in place during the entry of 'sensitive' text, such as a passphrase. Since mine is agressively testing the IDT is in place your hook is going to be displaced rather quickly. Your hook will therefore be taken as the new trampolining destination and you would end up in second place (And hooked before). Conversely if we're BOTH aggressively maintaining the IDT then, well, it would be fun to watch - but ultimately, bad news for the PC operator.

> In fact, you have misinterpreted me - I never said that you don't know anything. The only
> thing I said is that you should not be so self-sure - in fact, ignorance and dismissive
> attitudes to other people's skills normally go hand in hand. After all, someone who does
> not want to listen to other people because he thinks that he knows everything is very
> unlikely to discover anything new for himself, don't you think???

Absolutely. However, talking about being self-sure... The entire reason I posted here was that you were claiming to have made systemwide hooking (Using a keyboard filter driver) a worthless endeavour ... which, unless your code is going to be made a new system-wide approach in the next microsoft service pack, probably isn't quite true.

> Now we are even, so let's proceed to our discussion. I am not going to take the piss out
> of you any more, but I still have to point you to something I disagree with.

> > And also reiterate that you are quite wrong, you haven't
> > solved the problem or damaged the effectiveness of
> > systemwide hooks - you 'may' have prevented a single
> > application from being intercepted - but thats not quite
> > the same thing is it, and its certainly doesn't put you in
> > a position to declare all keyboard hooking irrelevant and
> > doomed to failure.
> Could you please explain this paragraph. On one hand you claim that your rootkit is
> able to record keystrokes on system-wide basis. On the other hand, you claim that
> my software is able to protect only some certain application.


> Where is the logic behind your statements??? You just don't seem to realize that my
> software is, esssentially, exactly the same thing as your rootkit. The only difference
> between them is that my software removes the scan code from the port(which gives my
> software the additional task of doing the job of both i8042prt.sys and kbdclass.sys,
> i.e. translating scan code into VK and inserting data into kbd input queue), while
> your rootkit puts it back, so that keystroke gets processed by the system in a way
> it is supposed to get processed.

Exactly that. Since I copy and then resume calling the original handlers, I would expect my method not to interfere with further processing - Applications using the 'standard mechanisms' still obtain their keypresses.

I am making two, possibly flawed assumptions....
- That your code is passing keystrokes directly to a particular sensitive application
without standard processing (As ANY standard processing would be a possible capture vector)
- That your code is shipped WITH this sensitive application, and not as a seperate
(systemwide security upgrade) which would have to integrate with standard processing
in order to interoperate with applications working on the standard message pump.

Now, whilst I admit its possible that you may be attempting to erradicate ALL keylogging systemwide (Thus partially accounting for your negative statement to the original poster) this would open up a capture vector at the user-land side (Message delivery) ... as you still have to place the message into unsuspecting applications message loops - and, whilst you would effectively bypass keyboard filter drivers, you'd still be allowing for the interception to take place in user mode.

To be honest, I don't much care anymore Smile | :) We both started out with some strong-minded assertions and its quite funny that we both did this on the basis of essentially the same strong techniques. However, I still doubt highly that Keystroke capture by way of kernel mode drivers or patching has been negated (as a systemwide concept) by your code.

> Now let's proceed to "hardware rootkits". If you don't mind, let me stick to this term,
> although, strictly speaking, it would be more appropriate to call them "firmware rootkits"
> or "embedded software rootkits". First of all, let's agree on the topic of our discussion.
> I am speaking ONLY (!!!)about PC - this is what I work with. I don't know ANYTHING (!!!)
> about Cisco routers and switches, so that you don't have to mention them.

Shame though, they are far more interesting Wink | ;)

> I cannot point you to any hardware rootkit. As far as I am concerned, there is no
> publicly available hardware rootkit -this is why I say that this is brand-new development
> in the field of the IT security. Don't get me wrong - I am not speaking about compromising
> BIOS (this trick is not new)

I can show you at least one, possibly two. One was created by Lord and was crudely demonstrated at HOD in 2003/4. It purportedly works by grabbing the original image and placing vectors around certain identifiable entrypoints. It apparently doesn't directly compromise the box, but rather reseeds the compromise by causing a small loader to be placed on the drive.

Thats not technically the same thing as a BIOS rootkit - but since a BIOS-only rootkit is of limited value, one could say that this is the most useable hybrid solution that is still 'hardware persistent'

Although well received I believe it has a number of problems being, by its nature, specific to certain ranges of bios, it is probably also OS specific (Having only seen it demonstrated against NT/2k) and I don't know if it support systems that boot from SATA or how it handles RAID volumes. Obviously, its not going to work on 100% of boxes and would probably result in an awful lot of mobo casualties if used indiscriminately in the wild.

I've never seen the code, as far as I am aware the code is still unavailable, however - I know I can hook you up with a demonstration. I know a few members of Lords group quite well although they do tend to be quite picky about which CONs they frequent.

The other I saw demonstrated was a demo of an addon for a russian rootkit. Unfortunately, I have even less details, but the demonstration was most impressive. I'll try to get a handle on the author for you.

> What I am speaking about is placing a ROOTKIT(!!!) into on-chip memory. Furthermore,
> although BIOS seems to be the most obvious target, theoretically rootkit may be placed
> into any programmable chip that has on-chip memory.


> This is why I say that this is very new and complex topic -I hope there is no need
> to explain the difference between operating in stealth mode, i.e. the way rootkits
> are supposed to work, and making the machine unbootable. You mentioned "BIOS buffers"
> in your original poster, which made me believe that you claimed to be the author of
> such rootkit. This is why I took your statement with some certain degree of sceptisism,
> and asked you to post the code

Our group has never written a BIOS exploit, either as a destructive payload or a backdooring mechanism... Our core experience as regards 'firmware modification as a means of continued stealthy access' is Cisco, Juniper and CobaltRaq, mostly Cisco. Its the same principle, however, since the whole Operating System resides there (in the flash) its a great deal simpler. It is far less hardware specific, and IMHO is a far more effective cracking tool But then, you don't want to hear about that : )

Actually, until just recently - the number of arguments I've had with network admins about the ease of cracking and compromising routers has been astounding. I guess many will be eating their words now that they are starting to become public (some 5 years late)

> In case if you have any idea in this field, I would be just delighted to hear from you
> - after all, you seem to know on-board stuff much better than I do

The best I could do is hook you up with at least one decent demonstration. I can't garauntee you're going to get any detailed discussion on HOW it works (I didn't) - but you're certainly going to come away with a firm knowledge of what is already available in some blackhat groups. Although, I doubt thats of any great help to you.

It will certainly be a good example of how the industry is chasing its own tail. Predominantly the security industry is reactive rather than proactive. You may think I'm arrogant in dismissing most developments in IT security, but its a billion-dollar baby and I doubt it's in anyones interests to actually try SOLVING The problems rather than incrementally REACTING to them as they occur.

On the other hand, I have a windows based PC on which modern exploits fail (I've never had it fall at a Capture-The-Flag competition - ever!). Every process is first bounds checked - not only profiled on file/library requirements (as in linux chroot jail) but down to individual imports. If any process steps beyond its bounds it can dump process memory (capturing the exploit in-situ: honeypot mode) and restart, or simply fail the call with a GUI warning (In stability mode). You simply couldn't get any service compromised in any useable way using a stock exploit, known or unknown. Its a high-granularity rights assignment that wraps the entire API at the level of individual exports (at some speed cost, granted) and even at the parameter level for certain 'risk' APIs. In fact, I ran IE4 unmnodified right through all the GodMessage days and I can keep an unpatched SP1 Win2k box online indefinately without infection.

Of course, the industry says I should use a FW and an AV subscription. This means playing the 'numbers' game and still being exploitable. Theres more than one way to skin a cat, and the industry is profiting greatly by doing it hair by hair.

> Regards
> Anton

If you want to continue any aspects of this debate we can move it to a forum, and optionaly we can delete the messages here to improve your page loading time. If you just want me to try to arrange a demonstration of any 0-day blackhat techniques then mail me and I'll get in touch with Ascension, DaVinci and a few other mature groups and see what CONs they will be attending in the next few months.

Please respond by mail (You should find it in the topic-reply notifications)



The reason it would be so much fun to watch your product and mine try to coexist is because we SIDT the IDTR so we can trap without disturbing the old IDTs (which is how we make the transfer to complete the ISR with the default handling). Depending on if you recheck the IDT after setting it you could end up in second place when it comes to receiving control. Also, depending on whether you LIDT each time you check, you could just be aggressively maintaining the SECOND place rather than the actual hardware handler, due to the IDT translation. Finally, the kernel is forced into using the original IDT, which makes us far more robust to any checks from above and this may have a bearing on the result.

Interesting experiment, no?

I really would like to purchase the product you coded this for. It certainly has my commendation - Care to give it a free advertisement (Or mail me privately)? ... I'd be very interested to try it out. I won't reveal the result - And you can even NDA me to that if you wish.

GeneralRe: kernell mode keylogger Pin
ivan françois12-Aug-06 14:55
memberivan françois12-Aug-06 14:55 
GeneralRe: kernell mode keylogger Pin
_DmG_24-May-07 13:34
member_DmG_24-May-07 13:34 
Questionhow to block USB Pin
tungpn1-Jul-05 16:38
membertungpn1-Jul-05 16:38 
GeneralLocks up on XP Pin
Todd Smith23-Jun-05 13:52
memberTodd Smith23-Jun-05 13:52 
GeneralRe: Locks up on XP Pin
Vertexwahn4-Dec-05 21:50
memberVertexwahn4-Dec-05 21:50 
QuestionSlowdown caused by ntfs.sys driver hooks could be due to log file writes? Pin
rwid28-May-05 18:24
memberrwid28-May-05 18:24 
AnswerRe: Slowdown caused by ntfs.sys driver hooks could be due to log file writes? Pin
Anton Bassov31-May-05 16:04
memberAnton Bassov31-May-05 16:04 
GeneralRe: Slowdown caused by ntfs.sys driver hooks could be due to log file writes? Pin
rwid31-May-05 16:43
memberrwid31-May-05 16:43 
GeneralExcellent article! Just some questions... Pin
bengheng4-May-05 0:24
memberbengheng4-May-05 0:24 
GeneralRe: Excellent article! Just some questions... Pin
Anton Bassov31-May-05 16:00
memberAnton Bassov31-May-05 16:00 
GeneralTo grab copied file event Pin
tungpn13-Dec-04 22:11
membertungpn13-Dec-04 22:11 
GeneralRe: To grab copied file event Pin
Adrien Pinet25-Feb-05 23:23
memberAdrien Pinet25-Feb-05 23:23 
Generalhello I have something to ask Pin
euacela7-Aug-04 13:03
membereuacela7-Aug-04 13:03 
GeneralI really like you articles and your insight Pin
eku3-Aug-04 2:54
membereku3-Aug-04 2:54 
Generalquestion Pin
Pavel Koptev24-Jun-04 6:35
sussPavel Koptev24-Jun-04 6:35 
QuestionPost-Hook? Pin
paullox7-Jun-04 14:45
memberpaullox7-Jun-04 14:45 
AnswerRe: Post-Hook? Pin
Anton Bassov11-Jun-04 3:19
memberAnton Bassov11-Jun-04 3:19 
AnswerRe: Post-Hook? Pin
Anton Bassov26-Jan-05 0:53
memberAnton Bassov26-Jan-05 0:53 
GeneralGreat Article Pin
vijayv4-May-04 4:51
membervijayv4-May-04 4:51 
GeneralGreat Articles...^_^ Pin
hlchou2-May-04 0:11
memberhlchou2-May-04 0:11 
GeneralWindows DDK - Help Pin
Willian.BR27-Apr-04 2:30
sussWillian.BR27-Apr-04 2:30 
GeneralRe: Windows DDK - Help Pin
Anton Bassov28-Apr-04 3:14
memberAnton Bassov28-Apr-04 3:14 
GeneralRe: Windows DDK - Help Pin
A. Riazi2-May-04 0:50
memberA. Riazi2-May-04 0:50 
GeneralRe: Windows DDK - Help Pin
Anton Bassov26-Jan-05 0:57
memberAnton Bassov26-Jan-05 0:57 
GeneralRe: Windows DDK - Help Pin
virginlin16-Jun-05 5:06
membervirginlin16-Jun-05 5:06 
GeneralCould not compile. Pin
WREY22-Apr-04 10:53
memberWREY22-Apr-04 10:53 
GeneralRe: Could not compile. Pin
Jörgen Sigvardsson26-Apr-04 21:08
memberJörgen Sigvardsson26-Apr-04 21:08 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Permalink | Advertise | Privacy | Terms of Use | Mobile
Web02 | 2.8.170326.1 | Last Updated 22 Apr 2004
Article Copyright 2004 by Anton Bassov
Everything else Copyright © CodeProject, 1999-2017
Layout: fixed | fluid