The author assumes that you have decent knowledge
of windows drivers, windows debugger symbols, windows internals, driver
IO, C and asm. No example program will be available for download as all
details will be explained here.
As we have all long been aware, with the release of 64 bit windows and since windows XP, Microsoft included a technology known as Kernel Patch Protection aimed at running integrity checks on various kernel structures in order to ensure the security of the system. We wont get into detail as to the real reasons Microsoft implemented patchguard as some believe it was truly a security factor, while others insist KPP along with driver signing are steps towards easier facilitation of DRM.
This article also will not cover methods on how to bypass patchguard (even though its pretty simple). Instead, Microsoft stated that production drivers that do bypass patchguard will eventually be met with an ever looming kernel update which in the end will bug check all of your users and make YOU look incompetent.
Patchguard as you may or may not know aims to ensure integrity on the following structures:
- The interrupt descriptor table
- The global descriptor table
- Certain MSR's (LSTAR for example)
- Code and data sections of ntoskrnl, hal, kdcom, (expanded in windows 8)
- System service descriptor table
Prior to patchguard, one of the most interesting, and the scope of our discussion is the system service table. As you probably already know, in x86 windows this is an array of function pointers and in x64 this is an array of offsets. The offset being the distance from the base of the service table, to the first byte of the function. Hooking system services in this fashion was quite popular ranging from popular rootkits, to symantec anti-virus software, and even to sony DRM software.
This article will explain how we can work alongside patchguard to hook these services in a less invasive way but still retain the powerful aspects behind it. Those being that code running at a CPL of 3 (usermode) can use the SYSCALL instruction at any location and not just the stubs provided by ntdll.
This technique is commonly known as a manual system call.
The aforementioned technique is common in various anti-debugging and anti-tampering schematics as well malware. Thus the motivation for kernel level code when a user mode debugger just isn't enough.
We will also see how this method not only allows us monitor system call access from usermode on a per-process basis but all instances in which the kernel iret/sysret back to usermode. Enabling our debugger to have even more control over the target. These instances include:
LdrInitializeThunk - Thread and initial process thread creation starting point.
KiUserExceptionDispatcher - Kernel exception dispatcher will IRET here on 1 of 2 conditions.
- the process has no debug port.
- the process has a debug port, but the debugger chose not to handle the exception.
KiRaiseUserExceptionDispatcher - Control flow will land here in certain instances during a system service when instead of returning a bad status code, it can simply invoke the user exception chain. For instance:
CloseHandle() with an invalid handle value.
KiUserCallbackDispatcher - Control flow will land here for Win32K window and thread message based operations. It then calls into function table contained in the process PEB
KiUserApcDispatcher - This is where user queued apc's are dispatched.
Putting it to use
Our journey begins in the
structure, to be precise:
which we see at 0x100. I had been using this method for a little over a year in my own debugger which used a driver for this feature. This was because until just recently I hadn't discovered that this member is able to be set from usermode with
(we will get to the fun in a bit).
As you can imagine then, the driver made an excellent debugging tool. Just a simple IOCTL to our driver to set up an instrumentation callback for our target process and simple userland debugger becomes god-mode debugger. Getting even more creative, you can totally eliminate the need for an actual debugport mutex because you could simply handle exceptions invisibly prior to landing at
. This can bypass a
plethora of anti-debug techniques.
+0x000 Header : _DISPATCHER_HEADER
+0x018 ProfileListHead : _LIST_ENTRY
+0x028 DirectoryTableBase : Uint8B
+0x030 ThreadListHead : _LIST_ENTRY
+0x040 ProcessLock : Uint8B
+0x048 Affinity : _KAFFINITY_EX
+0x070 ReadyListHead : _LIST_ENTRY
+0x080 SwapListEntry : _SINGLE_LIST_ENTRY
+0x088 ActiveProcessors : _KAFFINITY_EX
+0x0b0 AutoAlignment : Pos 0, 1 Bit
+0x0b0 DisableBoost : Pos 1, 1 Bit
+0x0b0 DisableQuantum : Pos 2, 1 Bit
+0x0b0 ActiveGroupsMask : Pos 3, 4 Bits
+0x0b0 ReservedFlags : Pos 7, 25 Bits
+0x0b0 ProcessFlags : Int4B
+0x0b4 BasePriority : Char
+0x0b5 QuantumReset : Char
+0x0b6 Visited : UChar
+0x0b7 Unused3 : UChar
+0x0b8 ThreadSeed :  Uint4B
+0x0c8 IdealNode :  Uint2B
+0x0d0 IdealGlobalNode : Uint2B
+0x0d2 Flags : _KEXECUTE_OPTIONS
+0x0d3 Unused1 : UChar
+0x0d4 Unused2 : Uint4B
+0x0d8 Unused4 : Uint4B
+0x0dc StackCount : _KSTACK_COUNT
+0x0e0 ProcessListEntry : _LIST_ENTRY
+0x0f0 CycleTime : Uint8B
+0x0f8 KernelTime : Uint4B
+0x0fc UserTime : Uint4B
+0x100 InstrumentationCallback : Ptr64 Void
+0x108 LdtSystemDescriptor : _KGDTENTRY64
+0x118 LdtBaseAddress : Ptr64 Void
+0x120 LdtProcessLock : _KGUARDED_MUTEX
+0x158 LdtFreeSelectorHint : Uint2B
+0x15a LdtTableLength : Uint2B
Let's see how this works.
Each time the kernel encounters a situation (as described in the callbacks above) in which it returns to user level code. It checks the
InstrumentationCallback member of the current KPROCESS structure under which the processor executes. If it is not NULL and assuming it points to valid memory, the kernel will swap out the RIP on the trap frame and exchange it for the value contained at I
You may now be wondering, how will our injected debugger code know which callback it originated from? The answer lies in r10. For instance if an exception occurred, r10 will contain the linear address of
KiUserExceptionDispatcher, or if it's a user APC, r10 will contain the linear address of
KiUserApcDispatcher. If it's a syscall (this mean the system call has already been dispatched) r10 will contain the return address. This is the address following the SYSCALL instruction.
An important thing to note is that the value of Dr7 for the thread effects whether or not InstrumentationCallback will be used to reroute control flow for certain transitions. For
LdrInitializeThunk it does not matter if Dr7 is active or
NULL. Dispatches to
LdrInitializeThunk will always be rerouted to the callback. However unless Dr7 is active, SYSCALLS and all other remaining kernel to user transition callbacks will not be rerouted to the instrumentationcallback.
You can also see how this could work in favor of the debugee, as a pretty interesting anti-debug mechanism.
How do we set this on our current process or a target debugee process?
NtSetInformationProcess prototype looks like this:
Input buffer must be a pointer to a valid linear address within your own process address space or within the target.
SeDebugPrivilege is also required. Since we are only passing a pointer, and this is only for x64 windows, the size is of course 8 bytes. The info class is 0x28.
Bear in mind this functionality was only added in x64 versions of Windows. 32 bit processes that run in the wow64 thunk layer still make use of system services as you know, however not always in a direct manner.
There are several scenarios in which
InstrumentationCallback will have no effect.
The first is
NtTerminateThread(if call on self). This is because the caller does not return from these calls.
The second is
NtContinue. This function takes the supplied context argument and applies directly to the current trap frame, then performs an IRET without using
The third (but catchable) is
NtContinue, this function takes the supplied context argument and applies it to the current trap frame. However, if not handled
KiUserExceptionDispatcher will be invoked, granting us a chance at an intercept.
In conclusion we see that monitoring system calls whether they be manual or not was also an excellent debugging feature prior to patchguard and still is. We can see that windows left us this loophole to continue to work alongside patchguard to analyze native system services in a way that works out even better then 32 bit windows.
I hope you found this information useful. I hope you use it to extend debugging features of your own personal debuggers or known existing ones.