Github link: https://github.com/WindowsNT/asm now with VS solution and automatic compilation/ISO generation/bochs running.
The Infamous Trilogy: Part 3
In my previous 2 gory articles about CPU internals and Virtualization, I explained various CPU internals, but not how multiprocessors actually work. Here's a small working (however dirty) code that will help you catch up with multicore processing.
In my next "Low Level M3ss" article, I use this to implement a DOS multicore interface.
4x1 = 1x4
Basically, it's simple. Each of the CPUs has its own set of registers and modes. Only the memory is shared between them. That means that, in order to put the 8 cores of an i7 into long mode, we have to execute the very same procedure for each of the cores, because each core has its own register set, GDT, LDT, etc. Therefore, we are able to start a CPU in real mode and keep it there, while directing another CPU to long mode.
The same occurs in virtualization. In my Virtualization Article, I explain how to put the CPU in this state and, in order to put the entire machine in Virtualization, each one of the CPUs must be directed into Virtualization, which means setting VMCS, VMX regions, etc. for all of them.
As you already know, the CPU starts from 0xFFFF:0xFFF0, but this is only true for the first CPU; All other CPUs stay "asleep" until woken up, in a special state called Wait-for-SIPI. The main CPU awakes other CPUs by sending a SIPI (Startup Inter-Processor Interrupt) which contains the startup address for that CPU. Later on, there are other Inter-processor Interrupts to communicate between the CPUs.
Therefore, in a really weird driver I'm creating, I should be able to take one processor from windows, get it back to real mode, then virtualize all rest of the CPUs to create an inside debugger. Ha! Not so easy, eh.
Preparing the Party
Because multicore programming has nothing to do with CPU modes, you can do it in any mode. Actually, you could do it in real mode, but the memory we need to access is above 1MB which means that you have to enter protected mode. The updated version of this article is compatible with both Bochs (ACPI 1.0) and Vmware (ACPI 2.0+), written in FASM and enters unreal mode in order to access the structures. My code setups a FreeDos installation with a live CD, and you can launch directly from Visual Studio. Support for VirtualBox is pending.
ACPI and APIC
All this stuff is done by the APIC (Advanced Programmable Interrupt Controller). It is basically a set of tables in memory which are examined by the controller and the controller reacts on our modifications in the table registers (memory offsets). You find more about the APIC by searching the ACPI (Advanced Configuration and Power Interface). After verifying that we actually have an APIC somewhere (CPUID param 1, and then check EDX bit 9), the first thing we must do is to find where the ACPI is in the memory. The ACPI is in one of the following locations:
- In a place, for which a real mode segment pointer is stored at memory address 040E (I've never seen it there myself).
- In BIOS memory somewhere between physical 0xE0000 and 0xFFFFF .
Searching for the ACPI, we will locate it by its 8-byte signature 0x2052545020445352. If this signature is not found in the memory, then we don't have ACPI and therefore, there are no multiple CPU cores.
As stated in RSDP, this is merely the signature of a larger structure. We may have ACPI 1.0 or ACPI 2.0 and we will save the structure data for further use. Each ACPI table has a checksum and the total sum of all the bytes in an ACPI table must be a value with the lower byte equal to zero:
int ChecksumValid(unsigned char* addr,int cnt)
unsigned long a1 = 0;
for(int i = 0 ; i < cnt ; i++)
a1 += *(addr + i);
if((a1 & 0xFF) == 0)
Having found the RSDP, we take the address of the starting ACPI table in memory from its fields. The starting table contains pointers to all the other tables. This physical address is over the 1MB (actually, it is an 64-bit address but it is always in the lower 4GB area to allow 32-bit systems to work) and hence it is only accessible from protected mode, or, in our little program, from unreal mode. There are many ACPI tables and we are only interested in a few of them.
unsigned long Length;
unsigned char Revision;
unsigned char Checksum;
unsigned long OEMRevision;
unsigned long CreatorID;
unsigned long CreatorRevision;
All ACPI tables start with this structure as a header, and the
Length member tells us the total number of bytes that the structure has. For the starting table, the rest of the information is a list of 32 bit (or 64 bit for ACPI 2) addresses of the supported tables.
How Many CPUs Do I Have?
This is the easy part. You have to find the "
MADT" ACPI table in the memory, and then I pass the memory to
DumpMadt. Note that the MADT also informs us of the Local APIC Address (which is always by default at physical address 0xFEE00000).
Each CPU has its own Local APIC. This APIC handles interrupts for the CPU. It contains various stuff, such as a Local Vector Table (LVT) which is a translation between local interrupts (such as the clock) to an actual interrupt vector. There is also one I/O APIC, which provides multiprocessor management. The MADT also tells us the address of the I/O APIC, which is also by default at physical address 0xFEC00000). Both locations can be changed by setting the MSR, but in our program we will let them at their default values.
Note that the CPU does not know how much memory you have. Even if you only have 4MB of RAM, the Local APIC address is still at physical address 0xFEE00000.
Examining the MADT will give us all the above information:
Configuring the Local APIC
To prepare the APIC to manage interrupts, we have to enable the "Spurious Interrupt Vector Register", indexed at 0xF0:
After that, we are ready to send IPIs. An IPI (Interprocessor Interrupt) is sent by using the Interrupt Command Register of the Local APIC. This consists of two 32-bit registers, one at offset 0x300 and one at offset 0x310 (All Local APIC registers are aligned to 16 bytes):
- The register at 0x310 is what we write it first, and it contains the Local APIC of the processor we want to send the interrupt at the bits 24 - 27.
- The register at 0x300 has the following structure:
unsigned char VectorNumber; unsigned char DestinationMode:3; unsigned char DestinationModeType:1; unsigned char DeliveryStatus:1; unsigned char R1:1;
unsigned char InitDeAssertClear:1;
unsigned char InitDeAssertSet:1;
unsigned char R2:2;
unsigned char DestinationType:2; unsigned char R3:12;
Writing to register 0x300 will actually send the IPI (that is why you must write to 0x310 first). Note that if
DestinationType is not
0, the Destination target in the register 0x310 is ignored. Under Windows, IPIs are sent with an IRQL level 29.
To awake the processor, we send two special IPIs. The first is the "
DestinationMode 5, which stores the starting address for the CPU. Remember that the CPU starts in real mode. Because the processor starts in real mode, we have to give it a real memory address, stored in
VectorNumber. The second IPI is the SIPI,
DestinationMode 6, which starts the CPU. By convention, 2 SIPIs are sent with a delay between them.
Because the starting address must be aligned to 4096, my code transfers the code from the ASM source to hardcoded address 0x80000 for a quick solution to that.
Finally, you need to write "End of Interrupt" (Local Apic + 0xB0) the value 0, to indicate that you can send another interrupt.
Ha! As you can guess, no DOS function is thread safe. That means that, to call DOS from other CPUs, you must perform proper synchronization. My code has created a fast mutex which allows a thread to call int 21h successfully.
It's not very tough, as you saw. The problem is to synchronize all this thing along. More in the next article!
- 26-12-2018: Update with only fasm code and bochs support
- 27-03-2015: Some typos and INIT IPI
- 25-03-2015: First release