Introduction
This article targets the user who wants to know how the CPU works. I explain some assembly basics, the real, the protected and the long mode. Assembly working code is included so you can test yourself how the processor works in real mode, how protected mode is entered, how we get to 64-bit mode and finally, how to exit from all them and go back to DOS.
Do you dare to follow? Let's go!
Requirements
- Assembly knowledge. While you will not be writing code, basic knowledge such as registers, memory access, basic commands and stuff is helpful.
- Flat Assembler, a modern assembler which can make executables for win32, x64, and DOS.
- A clean DOS installation (see below). You can try the excellent FreeDOS.
DOS is not the easiest environement to work, so you can simulate it with 2 ways:
- VMWare , the best virtualization software.
- Bochs , the must tool for system developers. I recommend it, not only because it is free, but because it has a debugger that will trap any exceptions your program will generate and tell you what happened.
In the far past, Borland's Turbo Assembler (TASM) was used because it was the first one to allow 32-bit segments. But since then, quite a few things have changed, and TASM is dead.
All the code here is configured to compile with the modern FASM , which can create executables for both DOS and Windows, 16/32/64 bit.
Visual Studio includes ML.EXE and ML64.EXE , the newer MASM versions. However, these assemblers only output Windows executables and only their relative bit, so ML.EXE only outputs 32-bit flat code for Win32, and ML64.EXE only outputs 64-bit flat code for Win64.
NASM is also good, but it hasn't so many output options as FASM.
Debugger
Yes, you have figured it out already. Your real mode debugger will NOT work once you step into the LIDT instruction (good bye protected mode debugging). However there is a way around it, its called 386SWAT and it is both a real and protected mode debugger which you can invoke while in protected mode. For this to work, you must initialize some GDT entries for it. My code does that so you can see the way.
However, BOCHS has its own hardware debugger that can step into anything, so you can do your stuff there!
Assembly in General
Introduction
The assembly language is basically a collection of low level instructions (opcodes) to do useful stuff and to access memory. In order to make things easier, there are "registers" , e.g. places to get and set data.
Segmentation
Memory is not treated as a continuous array of bytes (like a C array). It is divided in segments. A segment has different meaning, depending on the CPU mode (Real,Protected or Long mode). Each memory address is referred by a segment register which holds the segment value, and an offset, indicating the distance from the start of the segment.
Stack
In order to make it easier for functions to have local variables (like in C++) and transfer data between them, each application sets up a special segment "stack segment" which holds the address of the memory used for stack. Stack is a "LIFO" vector: The last you push to it is the first you get from it.
Pointer Access
It is like pointers in C. If you have a variable that contains a pointer in C, you can use * to access the data. In assembly, you do the same with []:
mov si,000fh
mov dx,[si]
What if you try to access something that doesn't exist? In DOS, you can trash yourself or the OS or both; In Protected mode systems you will not be allowed to access non-existant memory and thus, the exception handler will be called (or else, your program will be sacked).
Function calls/Interrupts
Just as in C++, there are function calls in assembly (no, not everything is implemented with goto :). Depending on the programming model (you will learn this later) , a function call can be near (in which the current IP is pushed to the stack and the function exits with a RET) , far (in which the IP and CS are pushed, and the function exits with a RETF) , or interrupt.
An Interrupt is basically a handler that gets executed when something calls it. Many times, these are software interrupts, which means that the program calls it using the INT instruction. In fact, all DOS/BIOS services are provided through interrupts, so if the programmer does:
mov ah,9
mov dx,msgint 21h
then the programmer knows that it has called function 9 of INT 21h, which is a DOS function to display the string pointed to by DS:DX. The address for each interrupt (there are 255) is stored in the Interrupt Table, which is accessible via the SIDT command and it is different, depending on the processor mode (you will learn about later). IP , CS and the flags are pushed to the stack, and the interrupt exits with IRET.
The interrupt can also happen when something else occurs (usually, an exception). For example, when your code divides something by zero, the INT 00h is automatically called. The code for INT 00h sees that you have tried to divide something with zero and thus, you can't continue. So Windows displays a nice message and closes your application.
If you could install an INT 00h hander yourself (via DOS function 25h) , then the exception would get to your code before making it to Windows (thats what the structured exception handling basically does), but you must still fix the error. If you can't fix the error, then you must abort - pretty much what Windows does.
In DOS, the default exception handlers do little than block the CPU from further processing, so you can only do Ctrl+Alt+Del to resume.
Registers
16-bit registers
- AX
- BX
- CD
- DX
- SI
- DI
- BP
- SP
- IP
- 16-bit Flag Register
IP holds the current execution point. As commands are executed, IP changes its value automatically.
AX,BX,CX,DX can be accessed either entirely:
mov ax,1 mov cx,ax
or using their low 8-bit (al,bl,cl,dl) or high bit (ah,bh,ch,dh)
xor ax,ax
mov ah,1 mov al,2
SI,DI are always accessed as 16-bit registers (there is nothing like sl,sh) and they are generally used as pointers to data. BP can be also used as a generic-purpose register (although it is usually used to access the stack), and SP holds the pointer to the current stack entry. So let's see what happens when you put something to stack:
mov ax,3 push ax mov dx,5 push dx pop bx pop cd
What happens when we push more than the current stack can hold? Boom, stack overflow. You have probably encountered it in your C++ recursive functions.
The flags register is a set of 16 bits (not all are actually used) that change their value, depending on the operation of each opcode. The variables of the JMP command (JZ,JAE,JB etc) can jump conditionally depending on those flags. For example, the ZF (Zero Flag) is set to 1 after an operation is zero:
mov ax,bx or ax,ax je AxIsZero
jmp AxIsNotZero
You can use pushf and popf to set or read the flags to a register, for example:
pushf pop ax
or al,1
push ax
pop
16-bit segment registers
These registers hold a value that identifies the current segment. The way this value is interpreted depends on the current CPU mode (Real/Protected/Long).
CS always holds the segment of the currently executing code. You cannot set cs by using , say, mov cs,ax. When you call a function that resides to another segment (FAR call) , or when you jump to another segment (FAR jump) , CS changes.
DS holds the default data segment. That means that if you do this:
mov si,0
mov ax,[si]
mov bx,[1000h]
Then ax gets its value from the segment pointed by DS, with the offset specified by si, and bx gets its value from the segment pointed by DS with an offset of 1000h. If you want to use another segment, you must do that explicitly:
mov di,0
mov ax,[fs:di]
mov bx,[es:1000h]
When DI is used as an index, ES hols the default segment. When BP is used as an index, SS is the default segment. In all other cases, DS is the default segment. Note that not every register can be used as an index in real mode, for example mov ax,[dx] is not valid in real mode.
ES, FS and GS are generic-purpose auxilliary segment registers. SS holds the value of the stack segment.
32-bit registers
32 bit registers are available in all of the modes (real,protected and long).
- EAX
- EBX
- ECD
- EDX
- ESI
- EDI
- EBP
- ESP
- EIP
- 32-bit Flag Register
Each of them is an extension of their relative 16-bit register. For example
mov eax,0
mov ax,1 or eax,0FFFF0000
The 32-bit registers are usable in real mode, but indexes (EDI and ESI) cannot be used unless their upper 16 bits are zero (that is, you can use a max index of 65535 only).
64-bit registers
64 bit registers are only available when the processor is in 64-bit mode.
- RAX
- RBX
- RCX
- RDX
- RSI
- RDI
- RBP
- RSP
- RIP
- 64-bit Flag Register
In addition, x64 defines 8 more 64-bit registers (r8,r9,r10,r11,r12,r13,r14,r15) to be used as auxilliary registers, and some 128-bit registers to be used when programming multimedia.
Control Registers
These registers hold information about the current state of the cpu.
- CR0 is mostly used to set the CPU to protected mode (bit 0) , and enable paging ( bit 31).
- CR1 is reserved
- CR2 holds the Page Fault Linear address when a page fault exception is triggered.
- CR3 holds the address of the paging table.
- CR4 defines some other flags , like Physical Address Extensions and VM86 mode.
For more on these, see Control Registers in Wikipedia.
Debug Registers
These registers hold information for hardware debugging. DR0-DR3 hold the linear address of 4 breakpoints, and DR6-DR7 some flags to use them. See Debug Registers for more.
Test Registers
These registers used to hold information for CPU testing (Now removed from the set). See Test Registers for more.
Real Mode
Addressing and segmentation
In real mode, everything is 16 bits. The entire memory is not accessed with an absolute index from 0, but it is divided into segments. Each segment represents the actual offset from 0, multiplied by 16. To this segment, an offset value can be added to refer to a distance from the start of this segment. These two things (segment:offset) tell the CPU the absolute value of the memory we need to access. For example:
- 0000h : 0000h : Indicates segment 0, offset 0. 0*16 + 0 = 0 actual memory address
- 0100h : 000Fh : Indicates segment 100h, offset 0Fh. 100h*16 + 0Fh = 100Fh = actual memory address
- 0002h : 0000h : 2h*16 + 0 = 20h actual memory address
- 0001h : 0010h : 1h*16 + 10h = 20h actual memory address. As you can see, memory addresses can overlap
Because the segment and the offset are only 16-bit values, the maximum memory accessible by this method is 0ffffh : 0010h = 1MB. Specifying 0ffffh segment and an offset larger than 0010h results in wrapping (see protected mode, A20 line). And because the area after 0a000h:0000 is reserved for the system (screen, etc), only 640KB remains for DOS applications.
In addition, all segments have read/write/execute access from anywhere (that is, any program can read/write or execute code within any segment). Because in 16-bit real mode OS the CPU sees the memory the way above, any application can read from or write to any part of memory, including the part in which the OS resides. That is why a real mode OS is a single tasking OS.
In real mode, CS:IP holds the current execution point, DS holds the default data segment, and SS holds the stack segment. Any application that has more than 64k of a code or data segment must break it into multiple segments.
Interrupts
Interrupts are simply special functions that are called when something happens (hardware interrupt) , like a division by zero, or when called by software (by using the INT instruction - software interrupt). In real mode, there are 256 interrupts. The table that holds the segment:offset for each interrupt is initially put into absolute address 0, but (in 286+) may be put elsewhere when using the LIDT instruction (Use SITD to get the table address).
In real mode, the OS provides features to the application via software interrupts, for example, DOS provides a range of functions in int 21h.
Program Execution
The program gets loaded by DOS into a memory segment , and execution starts at the offset that is specified in the EXE's header (or at 0100h if it is a COM file which has no header). After that, the application is free to do anything, to completely trash the memory. This is a real mode "feature": an application owns the entire machine. In addition, the application is allowed to communicate directly with any hardware (via in/out opcodes) , thus bypassing any limitations or security restrictions the OS might have. And if the application crashes, the entire system crashes and you have to reboot.
Code
Here is an easy "Hello World" sample for 16-bit EXE that uses multiple segments
FORMAT MZ ENTRY CODE16:Main STACK STACK16:stackdata
SEGMENT CODE16_2 USE16
ShowMsg:
mov ax,DATA16
mov ds,ax mov ax,0900h
mov dx,Msg
int 21h retf
SEGMENT CODE16 USE16 ORG 0
Main:
mov ax,CODE16_2
mov es,ax
call far [es:ShowMsg] mov ax,4c00h int 21h
SEGMENT DATA16 USE16
Msg db "Hello World!$"
SEGMENT STACK USE16
stackdata dw 0 dup(1024)
How does the assembler know the actual value of the "data16" , "code16" , "code16_2" and "stack16" segments? It doesn't. What it does is to put null values, and then creates entries to the EXE file (Known as "Relocations") so the loader, once it copies the code to the memory, writes to the specified address the true values of the segments. And because this relocation map has a header, COM files cannot have multiple segments even if they sum to less than 64KB in total.
This program calls a function "ShowMsg" in another segment via a far call, which uses a DOS function (09h, int 21h) to display text. However, it could do it as well by writing directly into the video buffer (which for text mode it resides in the segment 0b000h) thus bypassing any OS or any security the function 09h might implement. Therefore, multitasking is not possible because each application can easily write to anywhere, thus destroying another application's or the OS'es data.
Here is an easy "Hello World" sample for 16-bit COM.
org 100h use16
mov ax,0900h
mov dx,Msg
int 21hmov ax,4c00h
int 21h
Msg db "Hello World!$"
What are the differences here? All stuff (code,data,stack) must reside in one segment. Code must start from offset 100h, (to allow DOS to put information to the low 100h bytes) , and no stack segment or data segment must be defined - COM files are "memory maps" and are limited to 64KB. For that reason, COM files are rarely used.
Generally, a DOS program consists of some code segments, some data segments and a stack segment like above. A DOS program calls DOS and BIOS functions (through Interrupts) and accomplishes its task.
Programming models
Because segments are limited to 64KB, there are many programming models depending on the applications' requirements:
- Tiny, when everything has to fit in one single segment (COM files)
- Small , when there is one code segment and one data segment. Calls and jumps are near.
- Medium, when there is one data segment but more code segments. Calls and jumps are far.
- Compact, when there is one code segment but more data segments. Calls and jumps are near.
- Large, when there are more code and data segments. Calls and jumps are far.
- Huge, when the data structures exceed 64KB in size and thus, they have to programmatically be splitted to segments.
The most common models are the Small and the Large.
Protected Mode.
Segments
In 32-bit protected code (We are not discussing 16-bit protected mode here because it is very rare), a segment can have any size, from 1 byte to 4GB. The OS defines the size of each segment, and now each segment can have limitations (read,write,execute on or off). This allows the OS to "protect" the memory. In addition, there are 4 levels of authority (0 to 3, 0 = highest) , so, for example, when a user application runs in level 3 it cannot touch the OS which runs at level 0.
In addition, if a 32-bit protected mode task crashes, OS catches the exception and terminates the program safely, without crashing any other application or the OS itself. This way, true multitasking can occur.
Multitasking
Many people believe that multitasking is the art of running applications at the same time. This is not true, for one CPU core can only execute one command at a time. What is really happening is that OS permits Task #1 to run for X time, switches to Task #2, permits it to run for X time, switches to Task #3 and this is so fast that it appears that it is simultaneous.
A-20 line
Enabling the A-20 line is the first step to something over 640KB of ram. This trick (Available in 286+) s the way to earn 0xFFF0 bytes of ram (in the range 0ffffh:0010 through 0ffffh:0ffffh) accessible in real mode. Enabling the line (via the keyboard controller) forces the CPU to avoid wrapping. This memory (known as High Memory Area, HMA) is used by HIMEM.SYS to load parts of DOS to it and therefore make more low memory available for applications.
The following code enables/disables A20. Note that if HIMEM.SYS is installed, A20 is enabled by default. HIMEM.SYS should be queried to alter A20 status instead of doing it directly.
WaitKBC:
mov cx,0ffffh
A20L:
in al,64h
test al,2
loopnz A20L
ret
ChangeA20:
call WaitKBC
mov al,0d1h
out 64h,al
call WaitKBC
mov al,0dfh out 60h,al
ret
The following code checks A20 and returns 1 to CF if it is enabled, 0 otherwise.
CheckA20:
PUSH ax
PUSH ds
PUSH es
XOR ax,ax
MOV ds,ax
NOT ax
MOV es,ax
MOV ah,[ds:0]
CMP ah,[es:10h]
JNZ A20_ON
CLI
INC ah
MOV [ds:0],ah
CMP [es:10h],ah
PUSHF
DEC ah
MOV [ds:0],ah
STI
POPF
JNZ A20_ON
CLC
POP es
POP ds
POP ax
RET
A20_ON:
STC
POP es
POP ds
POP ax
RET
Global Descriptor Table Type 1 : Application Entries
The Global Descriptor Table is a table that contents all the globally visible segments. Each segment has properties like:
- Size
- Base address (physical address in memory)
- Access restrictions
For Protected Mode, the system maintains the the GDTR register (accessible via SGDT / LGDT) which contains 6-byte data:
- 2 bytes - size of the entire array. Because each GDT entry is an 8-byte entry, a maximum of 8192 entries may be specified.
- 4 bytes - physical address of the GDT array in memory .
There are 2 types for a GDT entry. An entry for the application (S flag == 1, see below), and an entry for the OS (S flag == 0).
The definition for a GDT for an application entry as a C++ structure is this:
struct GDT_STR
{
unsigned short seg_length_0_15;
unsigned short base0_15;
unsigned char base16_23;
unsigned char flags;
unsigned char seg_length_16_19:4;
unsigned char access:4;
usigned char base24_31;
};
Although this seems easy, its more complicated than you might think. Let's examine the fields.
- seg_length
- An 20-bit value describing the length of the segment. If the G flag (see below) is not set, this value represents the actual segment length. If the G flag is set, this value is multiplied with 4096 to represent the segment length. So if you set it to FFFFFh (20 bits) and G is set, it is 10000h * 4096 = 4GB.
- base
- A 32-bit value indicating the start of the segment in physical memory.
- flags
- Flags for the segment
- Bit 0: Type
- Bit 1: Subtype.
- For Code Segment (B0 == 1)
- 0 - Not conforming.
- 1 - Conforming. A conforming segment can be called from any segment that has equal or higher priviledge. So if a segment is conforming with priviledge level 3 , you can call it from a priviledge level 0,1, or 2 segment. If the segment is not conforming , then it can only be called from a segment with the same priviledge level.
- For Data Segment (B0 == 0)
- 0 - Expand up. The segment starts from its base address and ends to its limit
- 1 - Expand down. The segment starts from its limit and ends to its base, with the address going the reverse way. This flag was created so a stack segment could be easily expanded, but it is not used by today's OSes.
- Bit 2: Accessibility
- For Code Segment (B0 == 1)
- 0 - Not readable. Any code that tries to read memory from this segment will generate an exception.
- 1 - Readable.
Note that a code segment is not writable. However, because segment base addresses can overlap, you can create a writable data segment with the same base adress and limit of a code segment.
- For Data Segment (B0 == 0)
- 0 - Not writable. Any code tbat tries to write to this data segment will generate an exception. Data segments are always readable.
- 1 - Writable.
- Bit 3: Access
- 0 - Segment is not accessed.
- 1 - Segment is accessed. The CPU sets this bit each time the segment is accessed, so the OS gets an idea how frequent is the access to the segment, so it knows if it can cache it to disk or not.
- Bit 4: S
- 0 - This descriptor is for the OS.
- 1 - This descriptor is for the application.
- Bit 5-6 : DPL
- The priviledge level of this segment, from 00 (highest) to 11b (3) (lowest)
- Bit 7 : P
- Set to 1 to indicate that the segment is present in memory. If the OS caches this segment to the disk, then it sets P to 0. Any attempt to access the removed segment causes an exception. The OS catches this exception, and reloads the segment to disk, setting P to 1 again.
- access
- Bit 0: AVL
- Bit 1: L
- Set to 0 for 32-bit segments. If set to 1, it indicates 64-bit segments used in long mode.
- Bit 2: D
- When D is not set, the default for opcodes is 16-bit. The segment can still execute 32-bit commands by putting the 0x66 or 0x67 prefix to them.
- When D is set, the default for opcodes is 32-bit. The segment can still execute 16-bit commands by putting the 0x66 or 0x67 prefix to them.
Real mode segments are always 16-bit default.
- Bit 3: G
- Set to 1 to multiply the seg_length by 4096 to find the true segment length as discussed above.
As you saw, the segment might not be present in memory at all, which allows the OS to cache the segment to the disk and reload it only when it is needed.
The first entry in the GDT table is always 0. CPU does not read information from the entry #0 and thus it is considered the "dummy" entry. This allows the programmer to put the 0 value to a segment register (DS,ES,FS,GS) without causing an exception.
The following code creates some GDT entries, then loads them:
struc GDT_STR s0_15,b0_15,b16_23,flags,access,b24_31 {
.s0_15 dw s0_15
.b0_15 dw b0_15
.b16_23 db b16_23
.flags db flags
.access db access
.b24_31 db b24_31
}
gdt_start dw gdt_size
gdt_ptr dd 0
dummy_descriptor GDT_STR 0,0,0,0,0,0
code32_descriptor GDT_STR 0ffffh,0,0,9ah,0cfh,0 data32_descriptor GDT_STR 0ffffh,0,0,92h,0cfh,0 stack32_descriptor GDT_STR 0ffffh,0,0,92h,0cfh,0 code16_descriptor GDT_STR 0ffffh,0,0,9ah,0,0 data16_descriptor GDT_STR 0ffffh,0,0,92h,0,0 stack16_descriptor GDT_STR 0ffffh,0,0,92h,0,0 gdt_size = $-(dummy_descriptor)
xor eax,eax
mov ax,CODE32 shl eax,4 mov [ds:code32_descriptor.b0_15],ax shr eax,16
mov [ds:code32_descriptor.b16_23],al mov [ds:code32_descriptor.b24_31],ah
xor eax,eax
mov ax,ds
shl eax,4 add ax,dummy_descriptor mov [gdt_ptr],eax mov bx,gdt_start
lgdt [bx]
Note that you have to create entries for your current real mode segments if you want to access the data at them.
Selectors
In real mode, the segment registers (CS,DS,ES,SS,FS,GS) specify a real mode segment. And you can put anything to them, no matter where it points. And you can read and write and execute from that segment. In protected mode, these registers are loaded with selectors.
Selector
- Bit 0 - 1 : RPL
- Requested Protection Level. It must be equal or less priviledged of the segments DPL
- Bit 2 : TI
- If this bit is set to 1, the selector selects an entry from the LDT instead of the GDT (See below for LDT)
- Bits 3 - 15:
- Zero based iIndex to the table (GDT or LDT)
So, to load ES with the code32 segment we would do
mov ax,0008h mov es,ax
In protected mode, you can't just select random values to the segment registers like in real mode. You must put valid values or you will get an exception.
Interrupts
The OS uses the LIDT instruction to load the interrupt table. The IDTR contains the 6-byte data, 2 for the length of the tables and 4 for the physical address in memory.
Each entry in it is now 8 bytes, describing the location of the interrupt handlers.
struc IDT_STR
{
.ofs0_15 dw ofs0_15
.sel dw sel
.zero db zero
.flags db flags .ofs16_31 dw ofs16_31
}
Let's see some code to define only 1 interrupt:
SEGMENT CODE32 USE32
intr00:
IRETD
...
SEGMENT DATA16 USE16
idt_PM_start dw idt_size
idt_PM_length dd 0
interrupt0 db 6 dup(0)
idt_size=$-(interruptsall)
...
SEGMENT CODE16 USE16
xor eax,eax
mov eax,CODE32
shl eax,4 add eax,intr00 mov [interrupt0 + 2],eax
mov ax,0008h mov [interrupt0],ax
...
mov bx,idt_PM_start
mov ax,DATA16
mov ds,ax
cli
lidt [bx]
Notice the =NO DEBUG HERE=. Once the IDT table has been reset, a real mode debugger cannot work. So if you try to step into LIDT, you will crash. And no, you cannot call DOS or BIOS interrupts from protected mode.See below on how to use 386SWAT to be able to debug protected mode.
Preparing for crash
It is very rare that your first protected mode application won't crash. When this happens, CPU does the triple fault and gets reset. To avoid resetting, you can put a real code to be executed:
MOV ax,40h
MOV es,ax
MOV di,67h
MOV al,8fh
OUT 70h,al
MOV ax,ShutdownProc
STOSW
MOV ax,cs
STOSW
MOV al,0ah
OUT 71h,al
MOV al,8dh
OUT 70h,al
If the CPU crashes, your routine will be executed. That routine must reset all registers and stack, then exit to dos.
Entering protected mode.
cli
mov eax,cr0
or eax,1
mov cr0,eax
After that you must execute a far jump to a protected mode code segment. If this code segment is a 16-bit code segment, you must do:
db 0eah dw StartPM dw 018h
If this code segment is a 32-bit code segment, you must do:
db 66h db 0eah dd StartPM dw 08h
Before enabling interrupts you must setup the stack and other registers
mov ax, data_selector
mov ds,ax
mov ax, stack_selector
mov ss,ax
mov esp,1000h sti
...
Exiting protected mode
cli
mov eax,cr0
and eax,0ffffffeh
mov cr0,eax
mov ax,data16
mov ds,ax
mov ax,stack16
mov ss,ax
mov sp,1000h mov bx,RealMemoryInterruptTableSavedWithSidt
litd [bx]
sti
(You can debug here) ...
Unreal mode
Because protected mode cannot call DOS or BIOS interrupts, it is generally not useful to DOS applications. However a 'bug' in the 386+ processor turned out to be a feature called unreal mode. The unreal mode is a method to access the entire 4GB of memory from real mode. This trick is undocumented, however a large number of applications (including HIMEM.SYS) are using it.
- Enable A20.
- Enter protected mode.
- Load a segment register (ES or FS or GS) with a 4gb data segment.
- Return to real mode.
As long as the register does not change its value, it still points to a 4GB data segment, so it is possible to use it along with EDI to access the entire address space. After returning from protected mode, you can easily do:
mov edi,1048576 mov byte [fs:edi],0
286 lacks this capability because to exit protected mode the CPU has to be reset, so all registers are destroyed.
The following function is a routine that will put your CPU to unreal mode and set DS/ES to the 4GB stuff.
SetUnreal:
jmp UnrealCode
gdt:
dw 0 dw 0 db 0 db 0 db 0 db 0 DATA_SEL equ $-gdt dw 0FFFFh
dw 0
db 0
db 92h db 0CFh db 0
gdt_end:
gdt_ptr:
dw gdt_end - gdt - 1 dd 0
UnrealCode:
pushad
push ds
push ds
mov ax,code16
mov ds,ax
xor eax,eax mov ax,ds
shl eax,4
add ax,offset gdt mov [gdt_ptr + 2],eax
cli mov bx,offset gdt_ptr
lgdt [bx]
mov eax,cr0
or al,1
mov cr0,eax mov bx,DATA_SEL mov ds,bx
mov es,bx dec al
mov cr0,eax pop es pop ds popad
sti
ret
286 lacks this very useful mode because it can't exit protected mode without all the registers destroyed.
Global Descriptor Table Type 2 : OS Entries
When the S flag is set to 0, the meaning of a GDT entry is quite different.
flags - Flags for the segment
- Bits 3 2 1 0 : Type of the entry
- 0000 - Reserved
- 0001 - Available 16-bit TSS
- 0010 - Local Descriptor Table (LDT)
- 0011 - Busy 16-bit TSS
- 0100 - 16-bit Call Gate
- 0101 - Task Gate
- 0110 - 16-bit Interrupt Gate
- 0111 - 16-bit Trap Gate
- 1000 - Reserved
- 1001 - Available 32-bit TSS
- 1010 - Reserved
- 1011 - Busy 32-bit TSS
- 1100 - 32-bit Call Gate
- 1101 - Reserved
- 1110 - 32-bit Interrupt Gate
- 1111 - 32-bit Trap Gate
More on gates later in this article.
Local Descriptor Table
Local Destriptor Table (LDT) is a method for each application to have a private set of segments, loaded with the LLDT assembly instruction. The LDT bit in the selector specifies if the segment loaded is from the GDT or from the LDT. This, although originally helpful, is not used in modern OSes because of Paging.
Paging
There are a number of problems that occur in a multitasking OS when the above setups are used:
- A task has to be loaded in memory entirely
- DOS applications think that they always access the lowest MB of ram, so they can't be put outside it.
- An application must handle its own segments which must be different than other application's, thus making the application of dynamic link libraries costly.
Paging is the method to redirect an address to another address. The address that the application uses is called the "linear address" and the actual address is the "physical address".
There are some methods to use paging, which we need not disuss here. What you have to do is to create 2 tables in memory: The Page Directory, an array of pointers to the page table, and the Page Table , an array of pointers to the physical memory. My code has a sample to create an 64-bit paging table.
Physical Address Extension (PAE)
PAE is the ability of x86 to use 36 address bits instead of 32. This increases the available memory from 4GB to 64GB. The 32-bit applications still see only a 4GB address space, but the OS can map (via paging) memory from the high area to the lower 4GB address space. This extension was added to the x86 to cope with the (nowadays not enough) limit of 4GB, before 64-bit software came to the foreground.
Flat Mode
Initially, the Local Descriptor Table was used so each application could have a local array of segments. But because of paging, modern 32-bit OSes now use the "flat" mode. This way the applications receive the entire 4GB address space to hold code,data and stack, but this portion of the address space is mapped into different physical memory. So the applications can use same memory addresses which are mapped to different physical addresses.
For example, see these 2 C++ programs running under 32-bit Windows:
int main()
{
; CS:EIP at this point is, (say) 010Ch : 00004000h.
int flags = MB_OK;
char* msg = "Hello there";
char* title = "Title";
MessageBox(0,msg,title,flags); ; Address of message box is (say) 00547D45h
}
int main()
{
; CS:EIP at this point are the same as in previous program. However paging actually
; maps them to a different physical address so these two programs do not interfere with same memory.
; This is transparent to the application
int flags = MB_OK;
char* msg = "Hello there";
char* title = "Title";
MessageBox(0,msg,title,flags); ; Address of message box is (say) 00547D45h, and this value is mapped to the same
; memory as in the previous application, so the shared function "MessageBox" is only once found in physical memory.
}
This allows the application programmer to never consider segmentation. All pointers are near, there are no segments (all have the same value) and thus, creating applications is easier. There is no thing as "small/large" model , because all the stuff is within the same segment.
Because of its simplicity, the "flat mode" is now the mode used by most common 32-bit OSes, and also it is the only one that exists in 64-bit mode.
VM86 mode
So far all nice with protected mode, but many of existing applications were real-mode at that time. Even today, many (mostly Games) are played under Windows. To force these applications (which think they own the machine) to cooperate, a special mode should be created.
The VM86 mode is a special flag to the EFlags register, allowing a normal 16-bit DOS memory map of 640KB which is of course forwarded via paging to the actual memory - this make it possible to run multiple DOS applications at the same time without risking the change for one application to overwrite the other. EMM386.EXE, the old known memory manager, puts the processor to that state. The OS performs a step-by-step watching to the process, making sure that the process won't execute something illegal (so don't expect to enter protected mode when EMM386.EXE is loaded because once you try to set the GDT with LGDT you will be sacked :).
Once VM flag is set, you can load a normal "segment" to a segment register. Interrupt calls by DOS applications are caught by the OS and emulated through it - if possible. Also, some instructions are ignored, for example, if you do a CLI, the interrupts are not actually disabled. The OS sees that you prefer to not be interrupted and acts accordingly, but interrupts are still there.
All VM86 code executes in PL 3, the lowest privilege level. Ins/Outs to ports are also captured and emulated if possible. The interesting thing about VM86 is that there are 2 interrupt tables, one for the real and one for the protected mode. But only protected mode interrupts are executed.
VM86 was removed from the 64-bit mode, so an 64-bit OS cannot anymore execute 16-bit DOS code. In order to execute such code, you need an emulator such as DosBox.
HIMEM.SYS
HIMEM is the generic extended memory manager for DOS. At that time, the extended memory was mostly, if not totally, used to cache data from the disk, especially from big apps. HIMEM puts the CPU in unreal mode and provides a simple interface to the applications that wanted more memory without messing with the protected mode details. By enabling the A20 line, HIMEM allowed portion of DOS COMMAND.COM to reside in the high memory area. Because unreal mode is still real mode, your protected mode application can do the stuff we have discussed even if HIMEM.SYS is loaded.
EMM386.EXE
At that time, a form of memory now eliminated, the "expanded" memory existed. Many applications were written to take advantage of it, but the modern standard was the protected mode. EMM386 puts the CPU in VM86 mode and maps via paging the memory over 1MB to real mode segments (over 0xA0000) , so an application that would like to use expanded memory can use it via EMM386.EXE. In addition, EMM386 allowed "devicehigh" and "loadhigh" commands in CONFIG.SYS, allowing applications to get loaded to these high segments if possible.
Because VM86 mode is protected mode, your protected mode application cannot do the stuff we have discussed if EMM386 is loaded.
DPMI
Dos Protected Mode Interface is a system that allows DOS applications to run 32-bit code. Unreal mode was not enough because it only allows data to be moved, but not code to be executed. What a DPMI server does is to take care of the nasty tables we have discussed above, allowing the executable to specify 32-bit code directly. When the executable calls DOS, the DPMI server catches the call, switches to real mode, calls DOS, then back to protected mode.
Debugging Protected Mode
As you have figured by now, your real mode debugger won't work. However, 386SWAT will work. What you have to do is to reserve around 50 entries to the GDT table for it, then call it (via an interrupt while in real mode) to fill these entries. Once you enter protected mode and setup the stack, you can place an int 3; 386SWAT will pop up.
See my code for further details.
What is missing from 286 protected mode? 32-bit segments (286 only has 16-bit segments), VM86 and paging. Also, because 286 lacks the CR0, it uses the SMSW/LMSW to read/write the msw in order to enter protected mode, but this can't be used to exit from protected mode. Because of this, the processor has to be put to the triple fault state (i.e. reset manually) , to return to real mode, which makes the Unreal mode impossible.
Long Mode
An x64 CPU has 3 modes of work:
- Real mode, same as in DOS
- Legacy mode, same as 32-bit protected mode.
- Long mode
Long mode has 2 submodes:
- Compatibility mode , same as 32-bit protected mode. This allows an 64-bit OS to run 32-bit applications.
- 64 - bit mode, for 64-bit applications
To work in Long Mode, the programmer must take into consideration the facts below:
- Unlike protected mode, which could ran without paging or PAE, long mode absolutely needs PAE and paging. That is, you cannot leave paging out even if your map is "see-through". You have to create PAE - style page tables and the "flat" mode is the only valid in long mode. Almost no segmentation.
- AMD docs say that, in order to enter long mode, you have to enter protected mode - however this has proven not to be true, since you can now get into long mode directly from real mode, by enabling protected mode and long mode within 1 instruction (this can work because Control Registers are accessible from real mode)
Global Descriptor Table
- Creating an 64-bit segment
- A segment marked for 64-bit is pretty much the same like a 32-bit segment with a limit of 4GB, but with the L bit set to 1 and the D bit set to 0. The D bit is set to 0 in 16-bit segments, but when L bit is set, then it indicates an 64-bit segment.
- 64-bit segments start always from 0 and always end to 0xFFFFFFFFFFFFFFFF.
If your GDT resides into the lower 4GB of memory , you need not change it after entering long mode. However, if you plan to call SGDT or LGDT while in long mode, you must now deal with the 10-byte GDTR which holds 2 bytes for the length of the GDT and 8 bytes for the physical address of it.
Any selector you might load to access an 64-bit segment is ignored, and DS,ES,SS are not used at all. End of an era ;)
Interrupts
You have to reset the IDT to use 64-bit descriptors.
Each entry in it is now 16 bytes, describing the location of the interrupt handlers in the 64-bit mode.
struc IDT_STR
{
.ofs0_15 dw ofs0_15
.sel dw sel
.flags db flags
.ofs16_31 dw ofs16_31
.ofs32_63 dd ofs32_63
.zero dd zero
}
Entering long mode
mov eax, cr4
bts eax, 5
mov cr4, eax
- Create the new page tables and load CR3 with them. Because CR3 is still 32-bits before entering long mode, the page table must reside in the lower 4GB.
- Enable long mode (note, this does not enter long mode, it just enables it)
mov ecx, 0c0000080h rdmsr bts eax, 8 wrmsr
- Enable paging. Enabling paging activates and enters long mode.
mov eax, cr0 or eax,80000000h mov cr0, eax
Because the rdmsr/wrmsr opcodes are also available in real mode, you should be able to activate long mode from real mode directly.
Entering 64-bit
Now you are in compatibility mode. Enter 64-bit mode by jumping to an 64-bit code segment:
db 0eah
dd LinearAddressOfStart64
dw code64_idx
The initial 64-bit segment must reside in the lower 4GB because compatibility mode does not see 64-bit addresses.
Note that you must use the linear address, because 64-bit segments always start from 0. Note also that if the current compatibility segment is 16-bit default, you have to use the 066h prefix.
The only thing you have to do in 64-bit mode is to reset the RSP:
mov rsp,stack64_end
SS,DS,ES, are not used in 64-bit mode. That is, if you want to access data at another segment, you cannot load DS with that segment's selector and access the data. You must specify the linear address of the data. "Flat" mode is not only the default, it is the only one for 64-bit.
Once you are in 64-bit mode, the defaults for the opcodes (except from jmp/call) are still 32-bit. So a REX prefix is required (0x40 to 0x4F) to mark an 64-bit opcode. Your assembler handles that automatically if it supports a "code64" segment.
In addition, an 64-bit interrupt table must now be set with a new LIDT instruction, this time taking a 10-byte operator (2 for the length and 8 for the location) , and each entry in the IDT table takes 10 bytes, 2 for the selector and 8 for the offset.
Returning to compatibility mode
Because 0eah is not a valid jump when in 64-bit mode, you have to use the RET trick to get back to a compatibility mode segment.
push code32_idx xor rcx,rcx
mov ecx,Back32 push rcx
retf
This gets you back to compatibility mode. 64-bit OSes keep jumping from 64-bit to compatibility mode in order to be able to run both 64-bit and 32-bit applications.
Why Windows drivers have to be 64-bit for an 64-bit OS? Because no WOW64 for driver (ring 0) code exists. They could had created one if they wanted to - I guess they wanted to force manufacturers to finally move to 64-bit. Nice decision, I must admit.
Exiting from long mode
You have to setup again all the registers with 32-bit selectors! Back to segmentation!
mov ax,stack32_idx
mov ss,ax
mov esp,stack32_end
mov ax,data32_idx
mov ds,ax
mov es,ax
mov ax,data16_idx
mov gs,ax
mov fs,ax
mov eax, cr0 and eax,7fffffffh mov cr0, eax
mov ecx, 0c0000080h rdmsr btc eax, 8 wrmsr
Unreal mode in 64-bit
I am sorry for I made you feel well for the moment. There is no such a thing that would allow you to access over 4GB of ram from real mode, (unless AMD has an easter egg in its CPU). In addition, although the 32-bit registers EAX EBX etc are available in real mode, the 64-bit registers RAX RBX are not even available in compatibility mode - only in 64-bit mode.
Virtual 86 mode in 64-bit
Once the CPU enters long mode, V86 is not anymore supported. That is the reason why 64-bit OSes cannot run 16-bit applications. However emulators like DosBox will run fine your 16-bit old game.
DPMI for 64-bit
In your dreams only. Or even, in my dreams only. Perhaps I will make it one day. Hey, is there any DOS game there that needs more than 4GB of ram ?
Debugging Long Mode
Unfortunately, protected mode debuggers do not work in long mode, just like real mode debuggers will not work in protected mode.
However, BOCHS with my gui debugger will work nicely.
The Code
The code presented here messes with everything we have discussed so far. It has yet some dirty functions, but it works. Have fun with it!
History
- 02/12/2009 - First release.