Anyone that has already read my "infamous trilogy":
would want to combine all the stuff in one nice application. Here is such a combination, along with some new tips/techniques not discussed in the previous articles. It is implemented as a TSR which other apps can call for true multithreading in real, protected or long mode in raw DOS.
Using this code, you can create a DOS app that can:
- Use all your CPUs together
- Lock/Unlock mutexes
- Start threads in real, protected, long and virtualized mode
You need flat assembler, and a freedos installation in some virtualization environment that can have multiple cores. VMWare works until virtualization. DOSBox doesn't because it doesn't expose an ACPI. Bochs will work in the special SMP edition for real, protected and long mode with virtualization. VirtualBox support is not yet completed. My github project includes all these setups for your convenience.
- 1024 assembly books
- 4.023 x 10^23 C++ lines written
- 1 << 62 free space in your mind. The upper bits are reserved for the kernel.
- Lots of patience and humor :)
Locking the Mutex
Yes in Win32, you have the nice Mutex functions. But what about in raw DOS?
First, a word about spin loops. When a Win32 thread calls
WaitForSingleObject, the kernel checks if the object is signaled and, if not, it does not schedule the thread for resuming. If there is no thread to be scheduled, the kernel halts the CPU code with the
HLT instruction, until later. In our little program, we own the system, there is no scheduler. So the code will simply spin loop until the mutex is available.
Therefore, one would expect code like this:
Not so. The problem is that, when the mutex is released, another CPU might lock the variable before this code. That is, something might be executed after the JZ command but before the MOV command.
Therefore, we have to use some atomic operation to achieve the lock:
LOCK CMPXCHG [shared_val],BL
The magic here is simple. We use the
CMPXCHG instruction which, along with the
LOCK prefix, atomically tests the shared
val if it is still
0xFF (the value in
AL), and if yes, then it writes
BL to it and sets the
ZF. If another CPU has grabbed the mutex, the
ZF is cleared and
BL is not moved to the
shared_var. Most convenient.
The another interesting thing is the
pause opcode, a hint to the CPU that we are inside a spin loop. This greatly improves performance since the CPU knows we are in a spin loop and therefore, it will not prefetch code.
Waking the CPUs
As we saw in the trilogy, we send the
INIT and the
SIPI. The CPU must start in a 4096-aligned address, so I've filled an array with NOPs and adjust the startup address accordingly. The CPU starts in real mode.
Therefore, a "
SipiStart" routine would be like that:
db 4096 dup (144)
lidt fword [ds:RealIDT]
call FAR CODE16:EnterUnreal
jmp far [ds:di]
Anyway, to access the APIC, I have to enter unreal mode, so I call
EnterUnreal. Note the FAR call; The segment value in which
EnterUnreal begins is not the same with the CS which is loaded during the SIPI. A newly awoken CPU must also enable spurious vector and software APIC, as we have seen earlier. Finally, the code jumps far to the '
startup' address for the CPU, depending on the CPU index.
The APIC provides us a way to send a message to another CPU. Apart from
SIPI, which we saw earlier, the local APIC can be used to send a '
normal' interrupt, i.e., merely executing
INT XX in the context of the target CPU. We have to take into consideration the following:
- If the CPU is in
HLT state, the interrupt awakes it, and when the interrupt returns the CPU resumes with the instruction after the
HLT opcode. If there is also a
CLI, then we must send a NMI interrupt (A flag in the APIC Interrupt Register) to wake the CPU.
- If the CPU is in
HLT state and we send again an
INIT and a
SIPI, the CPU starts all over again from real mode.
- The interrupt must exist in the target processor. For example, in protected mode, the interrupt must have been defined in
- The Local APIC is common to all CPUS (memorywise), therefore, we must lock for write access (mutex) before we can issue the interrupt.
- Because the registers cannot be passed from CPU to CPU, we have to write all the registers (that will be used for the interrupt, if any) in a separated memory area.
- The interrupt might fail. I don't know why, but that's what they say. So, you have to rely on some inter-cpu communication (via shared memory and mutexes) to verify the delivery. I'm doing that in my code with a simple flag.
- Finally, the handler of the interrupt must tell its own Local APIC that there is an "End of Interrupt". Remember out 020h,al in the past? Now we write to the EOI register (
LocalApic + 0xB0) the value
CPU Real Mode
If CPU will be running in real mode, you may want to call DOS. It will work, provided that no other CPU calls DOS at the same time, which of course cannot be assumed in our simple app. Therefore, you have to use
int 0xF0 function
5 to manage mutexes. The thread starts automatically in unreal mode and with stack and FS stored. The thread terminates with
retf. If you call DOS through interrupt 0xF0 function 4, then locking is automatically provided.
This is the code in dmmic.asm real mode thread:
CPU Protected Mode
This thread runs in 32-bit full 4GB protected mode. GS is pointing to base-0 32-bit data. It uses
int 0xF0 to call DOS, then exits:
SEGMENT T32 USE32
CPU Long Mode
As I had said in the trilogy, long mode can be entered directly from real mode, because the instructions
WRMSR are available. This is also implemented in two pieces. One to prepare the long mode by:
- Loading the GDT.
- Preparing a see-through page table for the first 1GB and ,apping the Local APIC to a fixed position (1GB - 2MB) memory area, because the Local APIC is usually located at
0xFEE00000, which means it won't be visible in our 1GB see through, OR, preparing a 4GB page table with 1GB pages, if your system supports 1GB pages. Most do.
PSE, and long mode.
And one to enter long mode by enabling paging, enabling interrupts with int
0xf0 accessible, then jumping to the code. Remember long mode is flat 64 bit and
SS have no meaning. Or so they say, I still had to set the
page64_idx in Bochs. Perhaps a Bochs bug?
SEGMENT T64 USE64
CPU Virtualized Protected Mode
This thread runs in 32-bit full 4GB virtualized protected mode. It can still call DOS. This mode is very useful since, whatever your thread might do, it can never crash the entire PC, only exit with a VMEXIT procedure.
I've called it DOS Multicore Mode Interface. It is a driver which helps you develop 32 and 64 bit applications for DOS, using
int 0xF0. This interrupt is accessible from both real, protected and long mode. Put the function number to
To check for existence, check the vector for
INT 0xF0. It should not be pointing to
0 or to an
IRET, ES:BX+2 should point to a dword 'dmmi'.
Int 0xF0 provides the following functions to all modes (
AH = 0, verify existence. Return values,
AX = 0xFACE if the driver exists,
DL = total CPUs. This function is accessible from
AH = 1, begin thread. BL is the CPU index (
max-1). The function creates a thread, depending on
0, begin (un)real mode thread.
ES:DX = new thread seg:ofs. The thread is run with FS capable of unreal mode addressing, must use
RETF to return.
1, begin 32 bit protected mode thread.
EDX is the linear address of the thread. The thread must return with
2, begin 64 bit long mode thread.
EDX holds the linear address of the code to start in 64-bit long mode. The thread must terminate with
3, begin virtualized thread. BH contains the virtualization mode (currently only mode 2 = protected mode virtualization is supported), and EDX the virtualized linear stack. The thread must return with
AH = 5, mutex functions.
AL = 0 => initialize mutex to
ES:DI (real) ,
EDI linear (protected),
RDI linear (long).
AL = 1 => Lock mutex
AL = 2 => Unlock mutex
AL = 3 => Wait for mutex
AH = 4, execute real mode interrupt.
AL is the interrupt number,
BP holds the
AX value and
BX,CX,DX,SI,DI are passed to the interrupt.
ES are loaded from the high 16 bits of
Now, if you have more than one CPU, your DOS game can now directly access all 2^64 of memory and all your CPUs, while still being able to call DOS directly. Isn't that fun?
INT 0x21 Redirection
In order to avoid calling
int 0xF0 directly from assembly and to make the driver compatible with higher level languages, an
INT 0x21 redirection handler is installed. If you call
INT 0x21 from the main thread,
INT 0x21 is executed directly. If you call
INT 0x21 from
long mode thread, then
INT 0xF0 function
AX = 0x0421 is executed automatically.
So with a bit of luck, you can use your favorite
stdio functions from a C function in another thread directly!
Once you run entry.exe with /r, the library installs as a
int 0xf0 is available. DMMIC.asm shows example calls.
- Add more virtualization modes
- 08-1-2018: Added virtualization capabilities
- 07-1-2018: Fixed Long mode int 0xF0 call
- 06-1-2018: Updated DMMI to my new github project
- 22-5-2015: Thanks to Brendan for the synchronization tip
- 18-5-2015: Fixed multiple call bug with End of Interrupt write
- 17-5-2015: First release