(now with VS solution and automatic compilation/ISO generation for bochs)
The Infamous Trilogy: Part 2
This article targets the user who has first read my ASM tutorial (http://www.codeproject.com/Articles/45788/The-Real-Protected-Long-mode-assembly-tutorial-for) and wants to learn how Virtualization works. You want to create your own VMWare workstation? Let's go!
Background
Required items:
- A complete understanding of how the CPU works in protected and long mode - read my article: http://www.codeproject.com/KB/system/asm.aspx.
- Bochs Source - recompilation with VMX extensions. VMWare (or other virtualizers) won't work. Also, chances are that my code could use features not found in your CPU version - but you can try. Oh, and you can test it in raw DOS PC if you are really brave. The github source includes Bochs for your convenience.
- =very good= Assembly knowledge
- Flat Assembler (http://flatassembler.net/)
- FreeDos (or any other DOS you might have licensed). The source includes a FreeDos floppy drive for your convenience.
- LOTS OF PATIENCE
If you are a beginner programmer, quit right now.
If you are an advanced programmer, quit right now.
If you are an expert programmer, quit right now. When you start reading this article, you will feel like a beginner anyway.
BUT, since I am a beginner too, you will be eventually able to read what I have to say because I felt the same after I read the virtualization manuals. So keep reading!
Startup
We will create an application that prepares the CPU for virtualization, creates a guest, enters it and exits. All this will be done in x64 mode for simplicity. In x86, it is also possible, but we will focus on the x64 architecture to avoid unnecessary overhead in the code.
The code demonstrates only the basic VMX features and it might not work in your own CPU. However, you can use bochs with virtualization enabled and then you will be able to test my code.
Terminology
- VMM (Virtual Machine Monitor): The hosting application
- VM (Virtual Machine): The guest application
- Root Operation: The code/context the VMM runs
- Non Root Operation: The code/context the VM runs
- VMX Transition: Going from host to guest (VMEntry) or from guest to host (VMExit)
- VMCS: A structure to control a VM and VMX transitions.
- VM Entry: A transition from the host application to the guest.
- VM Exit: A transition from the guest to the host due to some reason.
Life Cycle of VMX Operations
- VMM checks for CPU virtualization (CPUID) and enables it (CR4 and
VMXON
) - VMM initializes a control structure, called VMCS, for each VM. Tell the CPU where this pointer is by using
VMPTRST
and VMPTRLD
. Read/Write VMCS with VMREAD
, VMWRITE
and VMCLEAR
. - VMM enters a VM using
VMLAUNCH
or VMRESUME
- VM exits to the VMM with
VMEXIT
- Do all the above over and over again
- VMM eventually shutdowns itself
VMXOFF
Does My CPU have Virtualization Support?
Yes (or you wouldn't be reading this one by now anyway), but if you still want to verify, you check the ECX's bit 5 after a CPUID
with EAX = 1
:
mov eax,1
cpuid
bt ecx,5
jc VMX_Supported
jmp VMX_NotSupported
After you know that your CPU supports VMX operations, you should check the IA32_VMX_BASIC MSR
(index 0x480) to check implementation-specific information for your CPU:
mov ecx, 0480h
rdmsr
This 64-bit MSR has a lot of information, but at the moment, we are interested in 2 fields:
- Bits 0 - 31: 32-bit VMX Revision Number
- Bits 32 - 44: Number of bytes (up to 4096) that a
VMXON
region or a VMCS
should be.
The VMX revision (4 bytes) should be put in every VMCS
/VMXON
structure so the processor knows the format that should be used to store data in it. Each VMCS
/VMX
structure size should be exactly the number of bytes indicated by bits 32-44 (max 4096).
Enabling VMX Operations
- Enter Long Mode.
- Set CR4's bit 13 to 1. This bit enables the
VMX
operations. - Set CR0's bit 5 to 1 (NE) - this is required for the
VMXON
to succeed. - Initialize a
VMXON
region. - Execute the
VMXON
instruction.
A VMCS
is a 4-KB aligned memory area used to support VM operations. It consists of 3 fields: 4 bytes that hold the revision number (0x480 MSR Register returned value), 4 bytes that are used for VMX Abort data (more on this later), and the rest is a collection of six fields to control the VM operations.
A VMXON
region is a single VMCS region which you only need to initialize the revision number. Initialization of the VMXON
region requires putting the correct revision number (first 4 bytes) as returned by the 0x480 MSR register above.
The VMXON
instruction requires an address (e.g., VMXON
[rdi]). This address should contain the 64-bit physical address of the VMXON
region (4-KB aligned) and the first 4 bytes of that region should contain the VMX revision.
File: VMX.ASM
Func: VMX_Enable
CR4 bit set for VMX operations:
mov rax,cr4
bts rax,13
mov cr4,rax
Enable VMX:
mov [rdi],ebx ; Put the revision. Rdi holds the VMCS address and ebx holds the revision
VMXON [rsi] ; Assuming rsi holds the address of the VMCS
The VMCS Groups
It was easy so far, but here starts your hell. The rest of the VMCS (that is, after the first 8 bytes (revision + VMX Abort) is divided into 6 subgroups:
- Guest State
- Host State
- Non root controls
VMExit
controls VMEntry
controls VMExit
information
Each of the above fields contains important information about how the VM starts (State after a VMEntry
), what is the host state after a VMExit
, when a VMExit
will occur and others.
File: VMX.ASM
Func: VMX_TryGuest and VMX_TryGuest2
The Guest State
This contains the following information (In parentheses, the bit number):
- CR0,CR3,CR4,DR7,RSP,RIP,RFLAGS, (64 each)
- For each of CS,SS,DS,ES,FS,GS,LDTR,TR:
- Selector (16)
- Base address (64)
- Segment limits (32)
- Access rights (32)
- For GDTR and IDTR:
- Base address (64)
- Limit (32)
IA32_DEBUGCRTL
(64) IA32_SYSENTER_CS
(32) IA32_SYSENTER_ESP
(64) IA32_SYSENTER_EIP
(64) IA_PERF_GLOBAL_CTRL
(64) IA32_PAT
(64) IA32_EFER
(64) SMBASE
(32) - Activity State (32) - 0 Active , 1 Inactive (HLT executed) , 2 Triple fault occured , 3 waiting for startup IPI (SIPI).
- Interruptibility state (32) - a state that defines some features that should be blocked in the VM - more on that later.
- Pending debug exceptions (64) - to facilitate hardware breakpoings with DR7 - more on that later.
- VMCS Link pointer (64) - reserved, set to
0xFFFFFFFFFFFFFFFF
. - *VMX Preemption timer value (32) - more on this later.
- *Page Directory pointer table entries (4x64) - pointers to pages - more on this later.
The guest state describes the values of the registers that the CPU has after a VMEntry
. Because you can totally control the registers, you can start a VM in any mode (real, protected, long, etc.). But even if you are to start a real mode VM (as my code does), you have to initialize the segment registers as normal p-mode selectors, with proper limits access, etc.
The values that are used for the segment registers (limits, base address, selector, access rights and flags) are the same with those used in ordinary protected mode, so for example, you will see my code adding a 0x92 access flag for a DS read/write data segment.
The Host State
This contains the following information (In parentheses, the bit number):
- CR0,CR3,CR4,RSP,RIP (64 each)
- CS,SS,DS,ES,FS,GS,TR selectors (16 each)
- FS,GS,TR,GDTR,IDTR base addresses (64 each)
IA32_SYSENTER_CS
(32) IA32_SYSENTER_ESP
(64) IA32_SYSENTER_EIP
(64) *IA32_PERF_GLOBAL_CTRL
(64) *IA32_PAT
(64) *IA32_EFER
(64)
The host state tells the CPU how to return to the VMM after a VMExit
.
Executon Control Fields
These fields essentially tell the CPU what is allowed to be executed in the VM and what is not. Everything not allowed causes a VMExit
. The sections are:
- Pin-Based (32b) : Interrupts
- Processor-Based (2x32b)
- Primary: Single Step, TSC HLT INVLPG MWAIT CR3 CR8 DR0 I/O Bitmaps
- Secondary: EPT, Descriptor Table Change, Unrestricted Guest and others
- Exception bitmap (32b): One bit for each exception. If bit is 1, the exception causes a
VMExit
. - I/O bitmap addresses (2x64b): Controls when IN/OUT cause
VMExit
. - Time Stamp Counter offset
- CR0/CR4 guest/host masks
- CR3 Targets
- APIC Access
- MSR Bitmaps
My code only uses the pin-based and the processor based for simplicity, but these fields are your real Swiss army knife; you can control entirely what the VM is and is not allowed to perform.
VM-Exit Control Fields
These fields tell the CPU what to load and what to discard in case of a VMExit
:
VMExit
Controls (32b) VMExit
Controls for MSRs
VM-Entry Control Fields
VMEntry
Controls (32b) VMEntry
Controls for MSRs VMEntry
Controls for event injection
This event injection is your second weapon. When a VM exits, you can inject an event so the VM believes that the exception was generated by its code. Yes, a VMM can become really mighty.
VM-Exit Information (Read only) Field
- Basic information
- Exit Reason (32)
- Exit Qualification (64)
- Guest Linear Address (64)
- Guest Physical Address (64)
- Vectored exit information
- Event delivery exits
- Intstruction execution exits
- Error field
The VCMS Initialization
To mark a VMCS
for further reading/writing with VMREAD
or VMWRITE
, you would first initialize its first 4 bytes to the revision (as with the VMXON
structure above), and then execute a VMPTRLD
with its address.
Appendix H of the 3B Intel Manual has a list of all indices. For example, the index of the RIP of the guest is 0x681e
. To write the value 0
to that field, we would use:
mov rax,0681eh
mov rbx,0
vmwrite rax,rbx
This means that, after a successful VM Entry, the guest will start with RIP set to 0
.
Giving Memory to your VM
You would think you are done? Hahahah. Not so fast. You have to give your new Virtual Machine some memory to work, and you have to configure the EPT. An EPT is a mechanism that translates host physical address to guest physical addresses. Fortunately for you, it is exactly the same as the known long mode paging mechanism, so you can review it in my article.
Originally, the VMX capabilities of the CPU required guests to start in paged protected mode, and VMM applications usually put the virtual CPU into VM86 mode, to allow OSes (which expect a clean real mode boot) to work. Soon they introduced the "Unrestricted Guest" flag (bit 7 in Secondary Exit Controls) that would allow a guest to start in real mode. However, putting the virtual CPU in real mode means we have to map the lower 640KBtyes, so we have to use EPT.
If your CPU doesn't allow the "unrestricted guest" mode, then you can setup a protected mode guest using similar code, because my code creates protected mode style segments anyway. The github project, which uses Bochs, automatically creates a protected mode guest.
Of course, depending on your guest's initial state (for example, if you'd want to start a guest in long mode), you would also need to configure Guest PAE, paging, proper CR4 and stuff. But our little application will configure a real mode guest, so it needs to map a region of host memory to guest physical address.EPT Translation uses the lower 48 bits (as the nowadays CPU actually do nowadays - not the entire 64-bit range is used).
The code currently tests a protected mode guest. Initialization for this VM is in VMX_Initialize_Guest2
. This time, CR0 is set to be in protected paged mode, CR4 is loaded with the page directory (the very same used for normal protected mode since our EPT is a see-through). File guest32.asm has the entry point for our protected mode guest and this time, the selectors are ready to go. It merely sets a flag and exits to the VMM with VMCall
.
Launch It!
Having initialized the VMCS properly (ok, that's a joke, but I have to say "properly" anyway - prepare for LOTS of failures here), the VMLAUNCH
opcode will start the execution of the virtual machine (from the VMCS
guest set CS:XIP
). If the entry fails, the Z flag will be set immediately after execution of VMLAUNCH
.
This is where BOCHS will help you. After VMLAUNCH
fails, the bochs debugger window will show you a message depending on what went wrong, so you will get an idea what to fix in the VMCS
.
If VMLAUNCH
succeeds, control will not return to the host until a VM Exit occurs. When a VM exit occurs, control is transferred to the VMM's exit routine (as configured in the VMCS host state fields). VMExit
merely checks the flags set by VMEntry
to know if the VMEntry
code was successfully executed.
Note that, even if VMLAUNCH
succeeds, starting the VM might immediately cause a VMExit
due to any fault (page faults, EPT misconfigurations, etc). That way, VMLAUNCH
will succeed but control will immediately return to your exit routine without the VMEntry
code to be executed.
Launched, Now What?
Nothing. The VM executes as if nothing is present, unless you make something present. You need now to implement your own BIOS, copy it at the virtual memory at the proper address (so execution starts from 0xFFFF:0xFFF0
) and your drivers to transfer data between the actual hardware and the actual memory to the virtual hardware and memory you may have allowed within the VM. Yup, that's why VMWare Workstation is some 500 MB in side; It contains bios and drivers and communication protocols to allow, e.g., a virtual screen (which is seen as an Actual driver from the guest) to be shown in your actual screen within a window. The same with USB hardware which is duplicated from the actual system to the virtual system.
For a simple test, one might think that it should be easy to copy the actual bios to the virtual memory so, for example, DOS can boot. Right, and from what device will DOS boot since there is no one in the VM? That's why you have to duplicate an actual device into the VM using your custom BIOS in order to communicate with the host with a specific protocol, then emulate the allowed devices in order for the VM to function properly.
My application simply forwards memory in a see-through style, so calling BIOS and DOS from the VM is possible. But in real life, you don't want to do that, as then the VM can ruin the VMM because they share the same memory.
In real life also, in case that the "unrestricted guest" isn't allowed, you have to start the guest in VM86 paging protected mode and if the guest likes itself to set protected/long mode (like an OS), you must catch the VMExit
(which would occur when the guest software attempts to execute LGDT) and emulate all the calls that would otherwise fail (LGDT
, LIDT
, CR0
, Paging initialization, etc.) so the guest can assume that its operations were successful.
VM Exits
A VMExit
can occur for various reasons, either because you had specified a VMExit
reason in VMCS
control/exit fields, or if the VM actually entered a shutdown state (for example, a ring 0 crash) that would reset the CPU if it would be run in an actual, non virtualized state, or anything else. Execution resumes at the VMCS
host state saved (CS:XIP
), and you can read the VMCS
exit information (read only) to detect the reasons of the exit.
Use VMRESUME
to resume the VM after an exit.
VMCall
Some systems know that they run under virtualization (for example, VMWare drivers) and they do want to jump back to their host in order to exchange information. The VMCall
opcode causes a VMExit
to the host, and the virtualized system can exchange information with the host. My code also uses VMCall
to exit to the host.
Of course, if VMCall
is executed in a non VMX-non root environment, an unrecognized opcode exception is thrown.
Control MSRs
For simplicity, my code doesn't check for all features (that's the most probable reason it won't work in your raw DOS), but you should check the VMX MSRs for available features before testing them. Intel's 3B Appendix G contains all these MSRs. To load a MSR, you put its number to RCX and execute the rdmsr
opcode. The result is in RAX.
IA32_VMX_BASIC
(0x480): Basic VMX information including revision, VMCS size, memory types and others. IA32_VMX_PINBASED_CTLS
(0x481): Allowed settings for pin-based VM execution controls. IA32_VMX_PROCBASED_CTLS
(0x482): Allowed settings for processor based VM execution controls. IA32_VMX_PROCBASED_CTLS2
(0x48B): Allowed settings for secondary processor based VM execution controls. IA32_VMX_EXIT_CTLS
(0x483): Allowed settings for VM Exit controls. IA32_VMX_ENTRY_CTLS
(0x484): Allowed settings for VM Entry controls. IA32_VMX_MISC MSR
(0x485): Allowed settings for miscellaneous data, such as RDTSC
options, unrestricted guest availability, activity state and others. IA32_VMX_CR0_FIXED0
(0x486) and IA32_VMX_CR0_FIXED1
(0x487): Indicate the bits that are allowed to be 0 or to 1 in CR0 in the VMX operation. IA32_VMX_CR4_FIXED0
(0x488) and IA32_VMX_CR4_FIXED1
(0x489): Same for CR4. IA32_VMX_VMCS_ENUM
(0x48A): enumerator helper for VMCS. IA32_VMX_EPT_VPID_CAP
(0x48C): provides information for capabilities regarding VPIDs and EPT.
Creating the Hypervisor Virus
So far, we are interested in the VM science all right, but the programmer's soul will always contain notorious feelings like killing, revenging, cheating, cracking and all sort of that stuff.
Now, we'll take into account the fact that you are evil (or you wouldn't make it up to here) and discuss about Blue Pill. Blue Pill is a hypervisor virtus that controls the entire OS. For this to work, you would simply map the entire memory as a see through and start the VM with Windows in it, while configuring almost anything to cause a VMExit
. Now, whatever Windows tries will be reported to your hypervisor via VMExit
s, and using the injection technology, you can fake any response - and since Intel doesn't have a (known) way to detect if an application is running in Virtualization, you will never get caught. Never? Who knows - but if you ever get caught, let me assure you that I know nothing about it. :)
But wait! CR4's 13th bit should be 1 inside a Virtual Machine so if that bit is 0, you definitely know you are not virtualized! But if this bit is 1, do you really know if you are under a VMM? Who knows. If anybody gets the Windows Loader source and finds out that a mov eax,cr4 - test eax 0x2000 - jz WE_ARE_OWNED
sequence is there, let me know.
Another possible option to test if you are owned is the VMCall
, which would raise an exception and you can catch it. However, did anyone ensure you that there wasn't an exit and your host injected an exception for you to catch and assume you are free?
Another possible option is to test if the CPU does not support unrestricted guests and you started in VM86 mode. If you see that you are running in VM86 mode, then chances are that you are virtualized. But whoops - did we forget EMM386 exe
? But Windows NT-based OSes do not load any DOS drivers so if NT loaded checks for VM86 and it is enabled, it may assume it is under virtualization.
Conclusion
As you saw, virtualization is not initially a very complex subject, but to make something that really works, you need to implement a BIOS, drivers, etc. That's why not many programmers really try such a thing and that's why only a few applications support virtualization. VMWare has completed a great deal of work to make their Workstation actually do the job.
The code is imported from my previous article and organized in 6 files. It is rather dirty, but it works.
Try it and tell me. If it doesn't work, tell me and help me to improve it. Either way, the fact that you are reading up to here is appreciated.
If you aren't disappointed by now, I urge you to apply to a Virtualization Software company like VMWare for a job - you will do very well. And tell them you have read my article, they might hire me as well. :)
GOOD LUCK!
References
History
- 31-12-2018: Happy new year, include VMX code in the main github project
- 11-01-2015: Happy new year, some formatting
- 02-07-2012: Added protected mode guest and fixed some minimal bugs
- 26-06-2011: First release
I'm working in C++, PHP , Java, Windows, iOS, Android and Web (HTML/Javascript/CSS).
I 've a PhD in Digital Signal Processing and Artificial Intelligence and I specialize in Pro Audio and AI applications.
My home page: https://www.turbo-play.com