Click here to Skip to main content
14,599,199 members

How to Use a Virtual CPU to Emulate and Analyze Suspicious Code Execution at Runtime

,
Rate this:
0.00 (No votes)
Please Sign up or sign in to vote.
0.00 (No votes)
20 Nov 2018CPOL
Find out approaches which you can use to improve runtime algorithms for zero-day threat detection

Introduction

In this article you find out approaches which you can use to improve runtime algorithms for zero-day threat detection, focusing on our own solution – the Diana Dasm disassembler, developed by the lead of our network security team, Victor A.Milokum.

We describe the main pros and cons of using Diana Dasm and Diana Processor for shellcode analysis and suspicious activity detection with the help of partial emulation of functions at runtime.

We also describe two methods of implementing breakpoints in memory to monitor readwrite, and execute access to user-mode memory and provide a practical comparison of these methods.

Table of contents:

Function hooking for memory access monitoring

Breakpoints in memory

Implementing fast breakpoints in memory at runtime using a virtual CPU

Thread-safe implementation of breakpoints in memory

Choosing the best approach

Conclusion

Function hooking for memory access monitoring

Function hooking is a popular technique for monitoring the execution of code or particular functions without changing program source code. There are several well-known libraries that provide basic APIs for function interception, including the open source Mhook library and Microsoft Detours.

While it’s possible to use function hooks to influence the course of program execution, this technique is nearly useless when it comes to tracking access to certain memory regions. Function hooks can’t be used for tracking memory access from an injected DLL or shellcode.

Also, when monitoring and analyzing shellcode it’s crucial to keep the impact on performance as low as possible. There are two things that help us achieve this goal:

  • Partial emulation of function execution
  • The use of a lightweight virtual CPU such as Diana Processor

Diana Processor is a lightweight open source emulator of processor commands that can help us better understand and analyze the nature of zero-day threats. Using both Diana Processor and partial execution of functions, we can improve the well-known zero-day threat detection algorithms that are included in the Enhanced Mitigation Experience Toolkit (EMET) developed by Microsoft.

We’ve created our own solution, Memory Access Monitor (MAM), which tracks memory access to particular memory regions and restricts access to certain regions of process memory from other processes or from the Windows kernel. MAM is based on the breakpoints in memory approach and plays a significant part in detecting suspicious code execution at runtime and preventing exploitations.

READ ALSO: Mhook Enhancements: 10x Speed Improvement and Other Fixes

Breakpoints in memory

Breakpoints in memory are an effective tool for monitoring readwrite, and execute access requests to specific memory regions. They provide you with a notification mechanism and the ability to track and control any memory access requests at runtime.

Breakpoints in memory can be used to

  • monitor or restrict access to certain memory regions;
  • restrict other processes from accessing process memory;
  • monitor and log modules or functions that access process memory;
  • analyze program execution threads and monitor memory changes without a debugger;
  • virtualize memory access.

There are two common types of breakpoints in memory:

  • Hardware breakpoints
  • Software breakpoints

Let’s look closer at each of these two types.

Hardware breakpoints

Some CPUs offer hardware breakpoints in memory so that the CPU itself monitors memory access and reports on it. CPUs that support this functionality contain eight special debug registers (DR0 to DR7) to control these breakpoints.

This method was used in an early version of Microsoft EMET to implement Export Address Filtering. Hardware breakpoints in memory have a number of constraints that can be considered both advantages and disadvantages depending on the task at hand.

When it comes to Memory Access Monitor, hardware breakpoints have three main limitations:

  1. The number of memory addresses is limited to four registers, so only four breakpoints can be used simultaneously.
  2. The size of the memory region to be monitored is limited to either 1, 2, 4, or 8 bytes.
  3. Each thread of the process has its own set of DR0–DR7 registers, so you need to set a breakpoint in memory for each thread separately. This involves implementing
  • a mechanism for tracking the creation of new threads;
  • a mechanism for setting the values of the DR0–DR7 registers for running threads.

Meanwhile, this method provides us with two important advantages:

  1. Hardware breakpoints in memory function at the CPU level, so the overhead while working with memory is minimal.
  2. Hardware breakpoints in memory are thread-safe, apart from the delay between the creation of a thread and setting the values of the DR0–DR7 registers. All threads can access memory simultaneously, and at the same time, each of these memory access requests will be monitored.

You can see an example of implementing hardware breakpoints in memory here.

While this method is suitable for debugging purposes, it can’t be used for threat detection algorithms due to its limitation to four breakpoints and small memory capacity. Besides, it can’t restrict access to process memory from other processes.

Software breakpoints

When implementing software breakpoints, you need to ensure that you can

  • get notifications about access requests to protected memory (before the memory is actually accessed);
  • allow execution of memory operations once all required checks have been passed.

In Windows, there’s a special page attribute,PAGE_GUARD, that can help you accomplish both of these tasks. You can use this attribute along with vectored exception handling to get notifications about all access requests to protected memory.

The main advantage of using the PAGE_GUARD attribute is that once the attribute is cleared, you can get uncontrolled access to process memory.

On the other hand, this method has several drawbacks:

  • The PAGE_GUARD attribute can be applied only to a whole memory page. So when you need to monitor access to, say, a 10-byte structure, you’ll need to monitor an entire page with the PAGE_SIZE of 4096 bytes. And if these 10 bytes you need to monitor are located on two adjacent memory pages, both of these pages, with a combined PAGE_SIZE of 8192 bytes, will need to be monitored.
  • Notifications are received only once, after the first attempt to access an address within a guarded memory page. After that, the system clears the PAGE_GUARD modifier and lifts the guarded status from the monitored page.
  • You have to execute only one current instruction and then immediately restore the PAGE_GUARD attribute.
  • Until the PAGE_GUARD attribute is restored, other threads can access the protected memory region without any restrictions.

The first problem can be solved with the help of a trap flag. A trap flag allows you to execute one current processor instruction and then generates an EXCEPTION_SINGLE_STEP exception. This exception will be intercepted by the same vectored exception handler which, in turn, will restore the PAGE_GUARD attribute for the memory page.

Figure 1 below illustrates how PAGE_GUARD and trap flag work together.

PAGE_GUARD and Trap Flag workflow

Figure 1. PAGE_GUARD and trap flag workflow

Here are the main steps of the PAGE_GUARD and trap flag workflow:

  1. One of the threads requestsreadwrite, or execute access to protected memory.
  2. As the protected page has the PAGE_GUARD attribute, the CPU generates a memory access exception.
  3. The system processes this exception and calls all registered vectored exception handlers with the EXCEPTION_GUARD_PAGE status.
  4. Our registered vectored exception handler checks whether this page belongs to our protected memory pages. If it doesn’t, the vectored exception handler allows the system to regain control and generates an EXCEPTION_CONTINUE_SEARCH exception (the exception isn't handled).
  5. The vectored exception handler calls the registered callback, notifying external code about the memory access event.
  6. With this callback, the external code can perform certain checks, for instance analyze the call stack or context thread.
  7. After these checks, the callback gives control back to the vectored exception handler.
  8. The vectored exception handler copies the attributes of the original page to add the PAGE_GUARD attribute to them later.
  9. The vectored exception handler sets a trap flag to execute only one current processor instruction that requests access to memory.
  10. The vectored exception handler generates the EXCEPTION_CONTINUE_EXECUTION exception (the exception is processed and the system can continue execution). 
  11. The system applies all changes to the current thread context and renews thread execution starting from the same instruction that triggered EXCEPTION_GUARD_PAGE.
  12. Since memory access is temporarily allowed for everyone, the CPU has no obstacles and processes the instruction successfully.
  13. Thanks to the trap flag, the CPU generates an exception after processing one instruction. The system calls the vectored exception handler with the EXCEPTION_SINGLE_STEP status.
  14. The vectored exception handler restores the PAGE_GUARD attribute using the information that was saved in step 8.
  15. The vectored exception handler generates the EXCEPTION_CONTINUE_EXECUTION exception, which means the exception has been processed and the system can continue execution.
  16. The program continues its normal execution.

This scheme describes the processing of one processor instruction that attempts to access a protected memory region.

During the execution of steps 3 through 14, the protected memory page does not have the PAGE_GUARD attribute. This gives an opportunity for other threads to get uncontrolled access to this memory.

This approach is implemented in the latest version of EMET for the Export Address Filtering and Export Address Filtering Plus protections.

In the next section, we describe how to implement fast breakpoints in memory with the help of a virtual CPU. The following approach is our own solution and, depending on the researcher’s purposes, can be used for both memory access monitoring and emulating suspicious actions in an isolated environment based on shellcode analysis.

READ ALSO: Detecting Hook and ROP Attacks: Methods with Examples

Implementing fast breakpoints in memory at runtime using a virtual CPU

To address the issue of uncontrolled memory access, we need to remove the PAGE_GUARD attribute and substitute it with another attribute, such as PAGE_NOACCESS. In this case, all threads will always get the ACCESS_VIOLATION exception, which can be handled by the same vectored exception handler.

Now the main question is how we can allow actual memory access after passing all the checks. If we decide to restore the original page attributes, we’ll get the same issue with uncontrolled memory access from the parallel threads. If we decide to pause all other threads except the current one, we’ll get a significant deterioration in performance.

A possible solution is to create a shadow memory page containing all the attributes of the original page. Let’s look closer at this method.

Creating shadow memory pages

A shadow memory page is basically a duplicate of an original memory page (see Figure 2). You can create shadow memory pages by adding a memory region to the monitoring list. However, you have to add this new memory region before setting the PAGE_NOACCESS attribute for the memory pages.

As a result, when shadow memory pages are deleted from the monitoring list (right after clearing PAGE_NOACCESS), the original page data is restored.

Shadow memory pages

Figure 2. Shadow memory page

In this way, the shadow page stores all original page data including page attributes, but the program will refer to this data via the address of the original memory page. All the vectored exception handler has to do is substitute the original page address with the shadow page address, but only for processing one instruction.

However, using a trap flag to redirect access to the shadow page is challenging, as you need a fitting disassembler to search for the right register or memory with the original page address.

In the next section, we describe how to use a particular processor — Diana Processor — for this purpose.

Diana Dasm and Diana Processor

Diana Dasmis a small and fast disassembler that can be used by Windows kernel developers. It’s a lightweight C disassembler with a flexible architecture that has its own full processor instruction emulator called Diana Processor and supports emulation of both x86 and x64 instructions.

The execution of one processor instruction looks pretty simple:

ReadWriteStreamAdapter memoryStream;
DianaRandomReadWriteStream * randomReaderWriter = &memoryStream;
DianaProcessor processor;
DianaMAllocator allocator;

Diana_InitMAllocator(&allocator);

int res = DianaProcessor_Init(&processor
                    , randomReaderWriter
                    , &allocator.m_parent
                    , DIANA_MODE64);

if (DI_SUCCESS != res)
{
    throw std::runtime_error("Diana process init failed");
}

res = DianaProcessor_ExecOnce(&processor);

if (DI_SUCCESS != res)
{
    throw std::runtime_error("Diana process exec once failed");
}

Diana Processor has its own set of virtual processor registers where all real processor instructions are executed. Allread and write requests, including the reading of the current instruction that the virtual RIP/EIP register points to, are processed via theDianaRandomReadWriteStream interface:

Here, the DianaProcessor_Init function is called only once when DianaProcessor is initialized. DianaProcessor_ExecOnceis called to emulate one processor instruction, to which the virtual RIP/EIP register points from Diana Processor.

typedef struct _dianaRandomReadWriteStream
{ 
    DianaRandomRead_fnc pReadFnc;
    DianaRandomWrite_fnc pWriteFnc;
} DianaRandomReadWriteStream;

Here are the three main steps required to emulate the execution of one processor instruction from the vectored exception handler:

1. Load the current thread context (CPU registers at the moment of exception generation) into Diana Processor. This information is passed to the vectored exception handler function in the format of the IN OUT parameter:

LONG CALLBACK VectoredHandler(
    _In_ PEXCEPTION_POINTERS ExceptionInfo
);
    
typedef struct _EXCEPTION_POINTERS {
    PEXCEPTION_RECORD ExceptionRecord;
    PCONTEXT ContextRecord; // <= Thread context
} EXCEPTION_POINTERS, *PEXCEPTION_POINTERS;

2. Emulate the execution of one current processor instruction via DianaProcessor_ExecOnce.

3. Apply the execution results by rewriting the CPU registers inExceptionInfo::ContextRecord with the ones received from Diana Processor.

The entire process seems quite simple. Now let’s see how it works in practice.

Thread-safe implementation of breakpoints in memory

Let’s see if we can use Diana Processor for a thread-safe implementation of breakpoints in memory. Figure 3 below shows a basic scheme for this process.

PAGE_NOACCESS and Diana Processor workflow

Figure 3. PAGE_NOACCESS and Diana Processor workflow

The first seven steps of the process are similar to the regular PAGE_GUARD and trap flag workflow described in one of the previous sections. So we’ll start from the eighth step of the workflow:

  • Load the current stack context to Diana Processor, execute one processor instruction, and apply the execution results by changing the thread context (i.e. changing the IN OUT parameter of theExceptionInfofunction in the vectored exception handler).
  • The vectored exception handler returns the EXCEPTION_CONTINUE_EXECUTION status, which means the exception has been processed successfully and the program can continue executing.
  • The operating system applies the modified thread context and passes control over using the address from RIP/EIP.
  • The program continues execution starting from the next instruction, as the previous instruction has been successfully processed via Diana Processor by the vectored exception handler.

As you can see, this approach provides us with exactly what we were looking for – a thread-safe implementation of breakpoints in memory. In addition, all protected memory now contains the PAGE_NOACCESS attribute, making it inaccessible from the Windows kernel and from other processes.

Of course, you can still change the page attributes either from kernel mode or via VirtualProtectEx, but the page will contain irrelevant data as all changes are applied only to the shadow page.

Limitations of the approach

It’s noteworthy that implementing Memory Access Monitor with the help of Diana Processor has a number of limitations. For instance, while the approach with the PAGE_NOACCESS attribute and Diana Processor works fine with common debuggers, a couple of steps are required:

  • The debugger has to be connected before memory pages come under protection.
  • The protected pages will be inaccessible when being viewed via the Memory window.

Also, the current implementation can monitor access to executable memory, but only for code that isn’t used in the vectored exception handler. Therefore, if we decide to set protection for a page that contains the EnterCriticalSection function, we’ll get infinite recursion.

Diana Emulator can emulate only one processor instruction per call by the vectored exception handler. Otherwise, Diana Processor could make too many changes to register stack pointer (RSP)/ extended stack pointer (ESP) during emulation and the data could be overwritten, as both the emulator and emulation process is executed on the same thread stack.

When calling the VirtualQuery function whose address belongs to the protected memory region, the function will return the PAGE_NOACCESS attribute. However, in this case, it would be appropriate to return the original page attributes. So far this feature hasn’t been implemented, however.

When calling the VirtualProtect function whose address belongs to the protected memory region, the function will apply new attributes to the shadow page instead of the original, but only if the NtProtectVirtualMemory hook is set and control is passed to mam::hooks::NtProtectVirtualMemory.

Finally, when generating an exception for the access right violation to the shadow page, the address will point to the shadow page instead of the original to which the code previously referred.

Now let’s look at some test results to see which approach works more efficiently.

Choosing the best approach

To see which approach is better, we ran benchmark speed tests for both PAGE_GUARD and trap flag and PAGE_NOACCESS and Diana Processor implementations. Performance measurements were carried out on Windows10 (64-bit) with an Intel i5 6400 CPU (4 cores) for 1 Mb of protected memory.

First, we tested the average speed of memory access of one thread for both approaches (see Table 1).

Table 1. Average speed of memory access of one thread within five minutes

   read memcpy

   read 8 bytes

   write memcpy    write 8 bytes
PAGE_NOACCESS and Diana Processor 5732 KB/s 2815 KB/s 5601 KB/s 2806 KB/s
PAGE_GUARD and trap flag 2805 KB/s 1414 KB/s 2799 KB/s 1416 KB/s

As you can see, read and write operations take the same amount of time to execute. However, the approach with Diana Processor is two times faster due to processing only one exception per instruction instead of two exceptions.

Now let’s see how fast these approaches can handle two threads (see Table 2).

Table 2. Average speed of memory access of two threads within five minutes

     read memcpy     read 8 bytes
PAGE_NOACCESS and Diana Processor        9453 KB/s        4584 KB/s
       PAGE_GUARD and trap flag      9,009,056 KB/s     6,181,597 KB/s

As we mentioned earlier, the approach with the PAGE_GUARD attribute and trap flag is prone to uncontrolledaccessin the event of multithreaded memory access requests. Our tests confirmed this, showing a significant increase in the memory read speed. The approach with PAGE_NOACCESS and Diana Processor, on the other hand, is thread-safe and allows you to control memory access at any moment. However, while working with multiple threads, the memory read speed also decreases because of Diana Processor synchronization.

Conclusion

As we mentioned earlier, the approach with the PAGE_GUARD attribute and trap flag is prone to uncontrolled access in the event of multithreaded memory access requests. Our tests confirmed this, showing a significant increase in the memory read speed. The approach with PAGE_NOACCESS and Diana Processor, on the other hand, is thread-safe and allows you to control memory access at any moment. However, while working with multiple threads, the memory read speed also decreases because of Diana Processor synchronization.

By employing a virtual CPU, you can significantly improve and at the same time simplify your runtime algorithms. Diana Processor is an efficient tool for partial emulation and analysis of code execution.

While the approach we’ve introduced using shadow pages and implementing Diana Processor and the PAGE_NOACCESS attribute still has a number of limitations, it shows better results when compared to a more common implementation of the PAGE_GUARD attribute and a trap flag.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Authors

Apriorit Inc
Chief Technology Officer Apriorit Inc.
United States United States
ApriorIT is a software research and development company specializing in cybersecurity and data management technology engineering. We work for a broad range of clients from Fortune 500 technology leaders to small innovative startups building unique solutions.

As Apriorit offers integrated research&development services for the software projects in such areas as endpoint security, network security, data security, embedded Systems, and virtualization, we have strong kernel and driver development skills, huge system programming expertise, and are reals fans of research projects.

Our specialty is reverse engineering, we apply it for security testing and security-related projects.

A separate department of Apriorit works on large-scale business SaaS solutions, handling tasks from business analysis, data architecture design, and web development to performance optimization and DevOps.

Official site: https://www.apriorit.com
Clutch profile: https://clutch.co/profile/apriorit
Group type: Organisation

33 members


Artem K.
Software Developer (Senior) Apriorit
Ukraine Ukraine
No Biography provided

Comments and Discussions

 
GeneralPAGE_GUARD auto-removal - What were they thinking? Pin
Chad3F24-Nov-18 6:48
MemberChad3F24-Nov-18 6:48 
GeneralRe: PAGE_GUARD auto-removal - What were they thinking? Pin
Artem K.22-Jan-19 2:28
MemberArtem K.22-Jan-19 2:28 
I think, it works as expected by Windows architects, PAGE_GUARD uses for another tasks than the article talks about. Anyway, PAGE_GUARD is not designed for visualization of access to memory (

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Article
Posted 20 Nov 2018

Stats

3.3K views
1 bookmarked