Hooking the kernel directly

Anton Bassov

Rate me:

4.87/5 (52 votes)

4 Apr 200619 min read

334.2K

5.5K

194

How to hook the kernel functions directly.

Download source files - 5.22 Kb

Introduction

Sometimes, we run into a situation when we badly need to hook some kernel function, but are unable to do it via conventional PE-based hooking. This article explains how kernel functions can be directly hooked. As a sample project, we are going to present a removable USB storage device as a basic disk to the system, so that we can create and manage multiple partitions on it (for this or that reason, Windows does not either allow or recognize multiple partitions on removable storage devices, so we are going to cheat the system). On this particular occasion, we will hook only one function, but the approach described in this article can be extended to handle multiple functions (for example, one of my projects required direct hooking of quite a few functions from the NDIS library). You should clearly realize that this article is about direct hooking and not about dealing with USB storage, so please don't tell me that the sample problem may have been solved differently.

The sample problem

The way the USB device is presented to the system is defined by the RemovableMedia field of the STORAGE_DEVICE_DESCRIPTOR structure that USBSTOR.SYS returns in response to a IOCTL_STORAGE_QUERY_PROPERTY request. If the device manufacturer wants the device to present itself as a basic disk, they make the driver set the RemovableMedia field of the STORAGE_DEVICE_DESCRIPTOR structure that it returns in response to the IOCTL_STORAGE_QUERY_PROPERTY request to FALSE. As a result, the device gets presented to the system as a basic disk - DISK.SYS clients will have no idea as to whether they are actually dealing with the hard drive or with a USB device.

Therefore, if we hook the IRP_MJ_DEVICE_CONTROL routine of USBSTOR.SYS, we can present the removable disk as a basic one to the system, simply by modifying the return value of the IOCTL_STORAGE_QUERY_PROPERTY request - luckily for us, no more checks are done. It can be done the following way:

MC++

typedef NTSTATUS (__stdcall*ProxyDispatch)
       (IN PDEVICE_OBJECT device,IN PIRP Irp);
ProxyDispatch realdispatcher;

///Proxy function
NTSTATUS Dispatch(IN PDEVICE_OBJECT device,IN PIRP Irp)
{
    NTSTATUS status=0; ULONG a=0;PSTORAGE_PROPERTY_QUERY query;
    PSTORAGE_DEVICE_DESCRIPTOR descriptor;

    PIO_STACK_LOCATION loc= IoGetCurrentIrpStackLocation(Irp);

    if(loc->Parameters.DeviceIoControl.IoControlCode
                         ==IOCTL_STORAGE_QUERY_PROPERTY)
    {
        query=(PSTORAGE_PROPERTY_QUERY)
               Irp->AssociatedIrp.SystemBuffer;
        if(query->PropertyId==StorageDeviceProperty)
        {
            descriptor=(PSTORAGE_DEVICE_DESCRIPTOR)
                        Irp->AssociatedIrp.SystemBuffer;
            status=realdispatcher(device,Irp);
            descriptor->RemovableMedia=FALSE;
            return status;
        }
    }
    return realdispatcher(device,Irp);
}

// somewhere in the code...
realdispatcher=(ProxyDispatch)
  driver->MajorFunction[IRP_MJ_DEVICE_CONTROL];
driver->MajorFunction[IRP_MJ_DEVICE_CONTROL]=Dispatch;

As you can see, a removable USB device can be presented as a basic disk to the system pretty easily. However, there is a "small" complication - the USBSTOR.SYS gets loaded only when you plug a device into the USB port, and stays loaded until you unplug it. Therefore, we cannot hook the USBSTOR.SYS in advance - we have to plug in a device first. If we hook the USBSTOR.SYS after it has already handled the IOCTL_STORAGE_QUERY_PROPERTY request, then it already is a bit too late to do anything. We cannot plug the device, hook USBSTOR.SYS, unplug it, and then plug it in again either - when you unplug the device, USBSTOR.SYS gets unloaded, so that our hooking will be just a waste of effort. The most appropriate time to hook USBSTOR.SYS is the moment when it is about to create its device objects - on one hand, we know that USBSTOR.SYS has been already loaded, and, on the other hand, we know that the IOCTL_STORAGE_QUERY_PROPERTY request has not been handled yet. If we manage to capture calls to IoCreateDevice() that USBSTOR.SYS makes, our task gets simplified dramatically - IoCreateDevice() takes a pointer to the DRIVER_OBJECT of the newly created device as an argument. At this point, we can replace a pointer in the driver's MajorFunction[IRP_MJ_DEVICE_CONTROL], and that's it.

In order to be able to do the above, we are going to hook IoCreateDevice() directly by inserting instructions into its executable code, i.e., do the so-called "hooking-by-overwriting". In fact, we could have done it by hooking the ntoskrnl.exe's export directory, but, for one this article is about direct hooking, and we are going to hook the IoCreateDevice() directly. However, first of all, we have to learn about something that seems to be unrelated to our task, at the first glance - we are going to learn about interrupt hooking.

Dealing with interrupts and exceptions

In response to hardware interrupts or exceptions, the CPU saves the execution context of the currently running thread, and transfers the execution to a special kernel-mode procedure, called a handler. The way the execution context is saved depends on the privilege level of the interrupted code. If the interrupted code is non-privileged, the processor has to switch to the privileged stack and the code segments in order to be able to execute a kernel-mode handler procedure. Therefore, the CPU pushes the values of the user-mode SS, ESP, EFLAGS, and CS registers, plus the return address (all pushes occur in the above described order) on the kernel stack before it transfers the execution to the appropriate handler. In the case of some exceptions, the CPU may also push an error code on top of the stack above the return address. If the interrupted code is privileged, the stack switch is not needed. Therefore, in such cases, only EFLAGS, CS, the return address, and, possibly, the error code are pushed on the stack - SS and ESP registers are not saved on the stack if the interrupted code is privileged.

Each interrupt and exception has its own associated number, called a vector. There are 256 interrupt vectors. The addresses of all the interrupt and exception handlers are stored in a kernel-mode data structure called the Interrupt Descriptor Table (IDT), which is nothing more than an array of 256 8-byte entries, called Gate Descriptors. On a SMP machine, each processor has is own IDT, although the addresses of all the interrupt and exception handlers are, certainly, the same for all CPUs in the system. Each IDT entry is associated with its corresponding vector. An IDT can hold Interrupt Gate Descriptors, Trap Gate Descriptors, and Task Gate Descriptors. The binary layout of the interrupt and trap gate descriptors is described by the following structure:

MC++

struct GATE
{
  WORD    OffsetLow;      
  WORD    Selector;       
  WORD Unused:8;
  WORD Type:5;
  WORD DPL:2;
  WORD Present:1;
  WORD    OffsetHigh;    
} ;

As you can see, the binary layout of the interrupt and trap gate descriptors is quite similar to that of call gate descriptors, presented in my article Entering the kernel without a driver and getting the interrupt information from APIC. The difference between an interrupt and a trap gate lies in the state of the IF flag in the EFLAGS register at the time when the interrupt or the exception handler starts execution. If the interrupt or the exception is vectored via an interrupt gate, the IF flag gets automatically cleared by the processor. If the interrupt or the exception is vectored via a trap gate, the IF flag does not get affected. In all other respects, interrupt and trap gates are the same - no wonder they are described by the same structure. The binary layout of task gate descriptors is different. Although, for performance reasons, all user processes run in the context of a single task under Windows NT, there are a few task gate descriptors in IDT. They are mainly reserved for "exceptional circumstances", like a system crash - their task is to make sure that the system is able to operate long enough to throw a blue screen before the CPU resets itself. They are of no interest to us anyway, so I won't present their binary layout.

The way Windows maps hardware interrupts to interrupt vectors is presented in my article: Entering the kernel without a driver and getting the interrupt information from APIC, so we are not going to discuss hardware interrupts here. Instead, we will concentrate on exceptions. The first 32 entries of the IDT deal with the exception handlers (their mapping to particular vectors is pre-defined by Intel). Exceptions can be classified as traps, faults, and aborts. Exceptions of the abort class do not allow the failing task to be resumed. A typical example of an abort-class exception is the Machine-Check exception (INT 0x12). Traps and faults allow the failing task to continue its execution after the exception has been dealt with. The difference between a trap and a fault lies in the return address that is saved on the stack. In the case of a fault-class exception, this address points to the instruction that caused the exception, i.e., one more attempt to execute the failing instruction will be made after the exception handler returns control. A typical example of a fault-class exception is the Page Fault exception (INT 0xE). In the case of a trap-class exception, the return address points to the instruction following the one that caused the exception. A typical example of a trap-class exception is the Debug Breakpoint exception (INT 3).

A Debug Exception (INT 1) is a quite interesting exception in itself - depending on the reason for the exception, it may be raised as either a trap or a fault. A Debug Exception may be raised for any of the following reasons:

Breakpoint on execution.
Breakpoint on memory access.
Breakpoint on IO port access.
General detect condition.
The TF flag in the EFLAGS register is set. In such a case, a Debug Exception is raised upon every instruction's execution.
Task switch (irrelevant under Windows).
INT 1 instruction.

In cases 1 and 4, INT 1 is raised as a fault, in all other cases it is raised as a trap. The reason why an exception has been raised can be discovered by the INT 1 handler from the DR6 register. A Debug Exception may be raised for more than one reason - for example, a breakpoint on execution may be reached at the time when the TF flag is set. In such a case, a breakpoint on execution has a higher priority than the TF flag, so that INT 1 is raised as a fault, rather than as a trap.

So, what does all the above have to do with hooking functions??? We are going to copy the first few bytes (8 bytes is more than enough) from the beginning of the target function into the array that we are going to allocate from the non-paged pool, hook INT 1 and INT 3 handlers, and write a 0xCC opcode (which represents the INT 3 instruction) to the beginning of the target function. As a result, when the target function tries to execute its very first instruction, our proxy INT 3 handler will get invoked. The stack layout at the time when our proxy INT 3 handler enters execution can be described by the following structure:

MC++

struct INTTERUPT_STACK
{
    ULONG InterruptReturnAddress;
    ULONG SavedCS;
    ULONG SavedFlags;
    ULONG FunctionReturnAddress;
    ULONG Argument;
};

On top of the stack, there is a frame that has been set up by the CPU in response to an INT 3 instruction (i.e., the address, to which the INT 3 handler is supposed to return control, plus the CS and EFLAGS registers); the address, to which the target function is supposed to return control, comes next; and the array of function arguments is below the return address on the stack (I think that, for practical reasons, it makes sense to treat all arguments as ULONGs - we can always cast them to their actual type if it is needed). At this point, we can do whatever we want - we can inspect and/or modify function arguments, change its return address, i.e., do everything we normally do when we hook functions. For the purpose of our task, we are interested only in the first argument, i.e., PDRIVER_OBJECT that has been passed to IoCreateDevice(), so we have presented the structure as if the target function had only one argument.

Before our proxy INT 3 handler returns, it will change the InterruptReturnAddress field of the structure on top of the stack to that of the array with the instructions that we have copied, and set the TF flag in the SavedFlags field. After our proxy INT 3 handler returns, the InterruptReturnAddress and SavedFlags fields of the structure that is saved on the stack will get popped into, respectively, the EIP and EFLAGS registers. As a result, execution will be resumed at the beginning of the array with the instructions that we have copied, and, once we have modified the TF flag, it will be resumed in single-step mode, i.e., INT 1 will get raised upon every instruction's execution.

If INT 1 gets raised because of the TF flag, it is processed as a trap. Therefore, after the very first instruction in the array gets executed, our proxy INT 1 handler will get invoked, and EIP that is saved on the stack will point to the second instruction in the array. At this point, by subtracting the address of our array from the return address that is saved on top of the stack, we will be able to discover the size of the instruction that has been just executed. Therefore, before our proxy INT 1 handler returns, it will change the return address to (beginning of the target function + size of the executed instruction), and clear the TF flag in EFLAGS that is saved on the stack. As a result, execution will be resumed at the location of the target function's second instruction, with the TF flag cleared after our proxy INT 1 handler returns. In other words, the target function will continue its execution as if nothing has ever happened.

Apparently, our approach seems to be rather convoluted, at first glance - we could do things differently. For example, we could copy a few instructions from the beginning of the target function into our array, and then overwrite the beginning of the target function with a JMP instruction, so that execution would jump to our hooking code. In such a case, we would have to figure out the offset within the target function at which execution has to be resumed after our hooking code executes. Therefore, we would have to figure out the instruction size. However, it is easier said than done - in order to do something like that, we would have to write a full-fledged disassembler program. To make things even more complex, the instruction may reference memory relative to the given instruction's location, and, in such a case, we would have to adjust the instruction's operand after we have relocated it. In other words, if we choose to overwrite the beginning of the function with a JMP, rather than a INT 3 instruction, our program would be huge, and around 95% of our code would deal with disassembly, rather than with hooking in itself. Therefore, I think that hooking INT 1 and INT 3 is a much more reasonable thing to do - by taking advantage of INT 1 and INT 3, we can make the CPU do all "dirty" jobs for us.

Now, let's proceed to the actual work.

Solving the sample problem

For the purpose of our particular project, we can do all the hooking-related work right in DriverEntry(). Let's look at the code:

MC++

// this routine hooks and restores IDT.
// We have to make sure that this function runs only
//on one CPU, so that we disable interrupts throughout
// its execution in order to avoid context
// swithches
void HookIDT()
{
    ULONG handler1,handler2,idtbase,tempidt,a;
    UCHAR idtr[8];

    //get the addresses that we have write to IDT
    handler1=(ULONG)&replacementbuff[0];
    handler2=(ULONG)&replacementbuff[32];

    //allocate temp. memory. This should be our first 
    //step - from the moment we disable interrupts
    //till return we don't risk to call any code 
    //that has not been written by ourselves
    //(theoretically this code may re-enable 
    //interrupts without our knowledge, and then.....)
    tempidt=(ULONG)ExAllocatePool(NonPagedPool,2048);

    _asm
    {
        cli
        
        sidt idtr
        lea ebx,idtr
        mov eax,dword ptr[ebx+2]
        mov idtbase,eax
    }

    //check whether our IDT has already been hooked. 
    //If yes, re-enable interrupts and return
    for(a=0;a<IdtsHooked;a++)
    {
        if(idtbases[a]==idtbase)
        {
            _asm sti
            ExFreePool((void*)tempidt);
            KeSetEvent(&event,0,0);
            PsTerminateSystemThread(0);
        }
    }

    _asm
    {
        //now we are going to load the copy of IDT into IDTR register
        // in my experience, modifying memory,
        //pointed to by IDTR register, is unsafe
        mov edi,tempidt
        mov esi,idtbase
        mov ecx,2048
        rep movs

        lea ebx,idtr
        mov eax,tempidt
        mov dword ptr[ebx+2],eax
        lidt idtr

        //now we can safely modify IDT. Get ready
        mov ecx,idtbase

        //hook INT 1
        add ecx,8
        mov ebx,handler1

        mov word ptr[ecx],bx
        shr ebx,16
        mov word ptr[ecx+6],bx

        ///hook INT 3
        add ecx,16
        mov ebx,handler2

        mov word ptr[ecx],bx
        shr ebx,16
        mov word ptr[ecx+6],bx

        //reload the original idt
        lea ebx,idtr
        mov eax,idtbase
        mov dword ptr[ebx+2],eax
        lidt idtr
        sti
    }

    //now add the address of IDT we just 
    //hooked to the list of hooked IDTs
    idtbases[IdtsHooked]=idtbase;
    IdtsHooked++;
    ExFreePool((void*)tempidt);
    KeSetEvent(&event,0,0);
    PsTerminateSystemThread(0);
}

NTSTATUS DriverEntry(IN PDRIVER_OBJECT driver,IN PUNICODE_STRING path)
{
    ULONG a;PUCHAR pool=0;
    UCHAR idtr[8];HANDLE threadhandle=0;

    //fill the array with machine codes
    replacementbuff[0]=255;replacementbuff[1]=37;
    a=(long)&replacementbuff[6];
    memmove(&replacementbuff[2],&a,4);
    a=(long)&INT1Proxy;
    memmove(&replacementbuff[6],&a,4);

    replacementbuff[32]=255;replacementbuff[33]=37;
    a=(long)&replacementbuff[38];
    memmove(&replacementbuff[34],&a,4);
    a=(long)&BPXProxy;
    memmove(&replacementbuff[38],&a,4);

    //save the original addresses of INT 1 and INT 3 handlers
    _asm
    {
        sidt idtr
        lea ebx,idtr
        mov ecx,dword ptr[ebx+2]

        /////save INT1
        add ecx,8
        mov ebx,0
        mov bx,word ptr[ecx+6]
        shl ebx,16
        mov bx,word ptr[ecx]
        mov Int1RealHandler,ebx

        /////save INT3
        add ecx,16
        mov ebx,0
        mov bx,word ptr[ecx+6]
        shl ebx,16
        mov bx,word ptr[ecx]
        mov BPXRealHandler,ebx
    }

    //hook INT 1 and INT 3 handlers - it has 
    //to be done before overwriting NDIS
    //Run HookUnhookIDT() as a separate 
    //thread until all IDTs get hooked
    KeInitializeEvent(&event,SynchronizationEvent,0);

    RtlZeroMemory(&idtbases[0],64);
    a=KeNumberProcessors[0];
    while(1)
    {
        PsCreateSystemThread(&threadhandle,
                (ACCESS_MASK) 0L,0,0,0,
                (PKSTART_ROUTINE)HookIDT,0); 
        KeWaitForSingleObject(&event, 
           Executive,KernelMode,0,0);
        if(IdtsHooked==a)
            break;
    }

    KeSetEvent(&event,0,0);

    //fill the structure...
    a=(ULONG)&IoCreateDevice;
    HookedFunctionDescriptor.RealCode=a;
    pool=ExAllocatePool(NonPagedPool,8);
    memmove(pool,a,8);
    HookedFunctionDescriptor.ProxyCode=(ULONG)pool;

    //now let's proceed to overwriting memory
    _asm
    {
        //remove protection before overwriting
        mov eax,cr0
        push eax
        and eax,0xfffeffff
        mov cr0,eax

        //insert breakpoint (0xCC opcode)

        mov ebx,a
        mov al,0xcc

        mov byte ptr[ebx],al

        //restore protection
        pop eax
        mov cr0,eax
    }

    return 0;
}

Apparently, quite a few explanations are needed here. First of all, we fill two chunks of memory with an indirect jump instruction - we are going to use it when we hook the IDT. I've got no logical explanation to this phenomenon, but when I try to write the address of the function itself into the IDT, I always see the blue-screen. However, if I write the address of the array with an indirect jump instruction into the IDT, i.e., make the execution jump to my function, everything works fine - it looks like computers have their own unserstanding of things. Then, we save the addresses of the actual handlers of INT 1 and INT 3 in global variables, and then proceed to hooking the IDT. The way it is done requires a bit more attention.

As I have already said, on a SMP machine, each processor has is own IDT, and, with the advent of the HT technology (as far as I am concerned, Intel does not produce CPUs without HT support any more), we live in the age of SMP machines - a single CPU with support for HT is treated as two independent CPUs by the system. Therefore, we have to hook all IDTs in the system, so we create threads that run HookIDT() until all IDTs in the system get hooked.

First of all, HookIDT() allocates memory so that it can copy the contents of IDT to it - all my experience shows that writing to memory, pointed to by the IDTR register, is unsafe, even if interrupts are disabled. Therefore, we copy the IDT to the memory that we have allocated, and load a pointer to this memory into the IDTR register by using the LIDT instruction. At this point, we can safely modify the original IDT. After this job is done, we will reload IDTR with the address of the original IDT. It is understandable that, from the moment HookIDT() discovers that IDT has not yet been hooked, until it modifies and reloads IDT, it has to run on the same CPU, so that we disable interrupts in order to avoid context switches. However, this should be done only after having allocated memory for the temporary IDT. Why? Because, in our situation, calling any(!!!) code that we have not written ourselves is an unwise thing to do - if this code re-enables interrupts, we are more than likely to make a mess. Therefore, we avoid calling any code that we have not written ourselves - as you can see, even the contents of the original IDT are copied to the memory that we have allocated by the REP MOVS instruction, rather than by the conventional memcpy().

After having hooked the INT 1 and INT 3 handlers in IDT, we copy the first eight bytes from the beginning of the target function (i.e., IoCreateDevice()) to the memory that we have to allocate from the non-paged pool, and insert the 0xCC opcode to the beginning of the target function. It is understandable that the target function's executable code resides in read-only memory. Therefore, we have to change either the page protection in the page table, or clear the WP flag in the CR0 register before we can overwrite the function (for the sake of simplicity, we choose to clear the WP flag). As a result of all our manipulations, our code that hooks INT 3 gets executed every time a call to IoCreateDevice() is made.

Now, let's look at the code that hooks INT 1 and INT 3.

MC++

//this function is needed in order to make our hooking work 
ULONG __stdcall INT1check(INTTERUPT_STACK * savedstack)
{
    ULONG offset=0,stepping=savedstack->SavedFlags&0x100;

    //if INT 1 has been raised for some 
    //reason other than single-stepping, return 0,
    //so that execution will eventually 
    //go to the real handler of INT 1
    if(!stepping)return 0;

    //check if single-stepping is somewhow 
    //related to our hooking. If not,return 0

    if(savedstack->InterruptReturnAddress<=
            HookedFunctionDescriptor.ProxyCode)
        return 0;
    if(savedstack->InterruptReturnAddress>=
            HookedFunctionDescriptor.ProxyCode+8)
        return 0;

    //change the return address on the stack, and clear TF flag
    offset=savedstack->InterruptReturnAddress-
              HookedFunctionDescriptor.ProxyCode;
    savedstack->InterruptReturnAddress=
              HookedFunctionDescriptor.RealCode+offset;
    savedstack->SavedFlags &=0xfffffeff;

    //clear DR6
    _asm
    {
        mov eax,0
        mov dr6,eax
    }

    return 1;
}

ULONG __stdcall BPXcheck(INTTERUPT_STACK * savedstack)
{ 
    PDRIVER_OBJECT driver;char buff[1024]; HANDLE handle=0;
    PUNICODE_STRING unistr=(PUNICODE_STRING)&buff[0];ULONG a=0;

    //if breakpoint has nothing to do 
    //with our hooking,return 0
    if(savedstack->InterruptReturnAddress!=
       HookedFunctionDescriptor.RealCode+1)return 0;
    
    //make INT1 return to the code that 
    //we have copied, and set TF flag
    savedstack->SavedFlags|=0x100;
    savedstack->InterruptReturnAddress=
       HookedFunctionDescriptor.ProxyCode;

    //All x86-related stuff has been done above. 
    //Now let's proceed to actual job

    driver=(PDRIVER_OBJECT)savedstack->Arg;

    if(ObOpenObjectByPointer(driver,0, NULL, 0, 
                0,KernelMode,&handle))return 1;
    ZwQueryObject(handle,1,buff,256,&a);
    if(!unistr->Buffer){ZwClose(handle);return 1;}
    if(_wcsicmp(unistr->Buffer,L"\\Driver\\USBSTOR"))
        {ZwClose(handle);return 1;}

    ZwClose(handle);

    a=(ULONG)driver->MajorFunction[IRP_MJ_DEVICE_CONTROL];

    if(a==(ULONG)Dispatch)return 1;

    realdispatcher=(ProxyDispatch)a;
    driver->MajorFunction[IRP_MJ_DEVICE_CONTROL]=Dispatch;
    return 1;
}

_declspec(naked) INT1Proxy()
{
    _asm    
    {
        pushfd
        pushad
        mov ebx,esp
        add ebx,36
        push ebx
        call INT1check

        cmp eax,0
        je fin

        popad
        popfd
        iretd

        fin: popad
             popfd
             jmp Int1RealHandler
    }
}

_declspec(naked) BPXProxy()
{

    _asm    
    {
        pushfd
        pushad
        mov ebx,esp
        add ebx,36
        push ebx
        call BPXcheck

        cmp eax,0
        je fin
    
        popad
        popfd
        iretd

        fin: popad
             popfd
             jmp BPXRealHandler

    }
}

When a call to IoCreateDevice() is made, the BPXProxy() gets invoked. BPXProxy() saves the registers and flags, pushes the value ESP had at the time when BPXProxy() started execution, and calls BpxCheck(). Therefore, BpxCheck() receives a pointer to the INTTERUPT_STACK structure that we mentioned above, as an argument. First of all, by matching the InterruptReturnAddress of the structure against the address of the target function, BpxCheck() checks whether the INT 3 invocation is related to our hooking. If not, it returns 0. Otherwise, it changes the InterruptReturnAddress field of the structure to that of the array with the instructions that we have copied, and sets the TF flag in the SavedFlags field. At this point, we can do the job that is related to the hooking itself. In our particular case, we check whether the PDEVICE_OBJECT that is passed to IoCreateDevice() is the one of \\Driver\\USBSTOR (which means USBSTOR.SYS has already been loaded), and replace the IRP_MJ_DEVICE_CONTROL handler with the address of our function (certainly, only if it has not yet been done already). As a result, we are able to monitor all IRP_MJ_DEVICE_CONTROL requests that are sent to USBSTOR by the system, i.e., accomplish our original goal. The way an interrupt is processed after BpxCheck() returns depends on its return value. If INT 3 was raised because of our hooking, BpxCheck() returns 1, otherwise it returns 0. If it returns 0, we transfer control to the INT 3 actual handler, otherwise we just return with the IRETD instruction, so that execution will be resumed at the beginning of the array with the instructions that we have copied. Once we have modified the TF flag, it will be resumed in single-step mode, i.e., INT1Proxy() gets invoked.

Implementation of INT1Proxy() is almost identical to that of BPXProxy() - the only difference is that it calls INT1Check(), rather than BpxCheck(). First of all, INT1Check() examines the TF flag in the EFLAGS register that is saved on the stack, and, if it discovers that INT 1 has been raised for some reason, other than single-stepping, it returns 0 (as you may remember, INT 1 may get raised for various reasons). Otherwise, it checks whether the return address is somewhere in the array that holds the instructions we have copied. If not, it returns 0 - after all, the TF flag may be on just because some program is being debugged. Otherwise, by subtracting the address of the array from the return address on the stack, it discovers the size of the target function's first instruction (i.e., the one that just got executed), changes the return address on the stacks to (beginning of the target function + size of its first instruction), clears the DR6 register and the TF flag in EFLAGS that are saved on the stack, and returns 1. As a result, if INT 1 has been raised for some reason, unrelated to our hooking, INT1Proxy() transfers control to the actual handler of INT 1; otherwise, it returns with the IRETD instruction, so that the target function (i.e., IoCreateDevice()) resumes its execution as if nothing had happened.

In order to run the sample driver, you have to create an on-demand-start service, and start it manually on the command prompt. If you plug a USB storage device while the service runs, you will see a basic disk rather than a removable storage, on the MyComputer panel. Therefore, if you start the Disk Manager from the Control Panel, you will be able to create multiple partitions on the device.

Notice: This driver has been built using the Windows 2000 DDK so that it treats the KeNumberProcessors exported symbol as a pointer. If you use the XP DDK, the KeNumberProcessors has to be treated as a variable - otherwise, you will be unable to compile the sample. However, the above applies only to compilation - the sample is going to work on both Windows 2000 and Windows XP, regardless of the DDK version that you use.

Conclusion

In conclusion, I must say that, although in our particular case we have hooked just one function, the same approach can be extended to dealing with multiple functions. Furthermore, we made a bold assumption that the target function's first instruction is not JMP. In practical terms, I think it makes sense to do this check, and, in case if the target function's first instruction is JMP, to adjust the sample (all you have to do is to just figure out the location to which execution is about to jump, and to hook the kernel at this location). In other words, you can easily adjust this sample to the particular needs of your project.

I would highly appreciate if you send me an e-mail with your comments and suggestions.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here