An x86 assembler with register allocation

Dávid Kocsis

4.84/5 (22 votes)

Jan 2, 2014

CPOL

6 min read

30546

436

It is made for a machine code generator, but it can be used separately.

Download source - 1.03 MB

Introduction

This article is about an x86 assembler I made for my programming language project. It consists of some separated libraries. A register allocator is integrated into it that can help much when writing bigger x86 assembly or making a code emitter. Currently it only supports the general instruction set, but my plan is to extend it soon with SSE instruction sets.

Writing x86 assembly difficulties

If you know x86 well, you may skip this section. The first problem is that only one operand can contain a memory operand. It is because the instruction encoding only allows one.

mov eax, dword[esp + 8]
mov dword[edx] eax

And the problem is increased by the fixed amount of registers (eax, edx, ..) which is normal. Register allocation means to assign registers to variables. It's not easy, because if you choose x variables to be in registers, the others are in memory, and can cause a temporary register need, so now you have less than x registers.

The second problem is that there are a lot of instruction with exceptions, some only allow register operands as first operand, some only memory etc. It really looks like that instruction set is obsolete, it has been being developed for a long time. Intel has tried to replace it several times, but it was unsuccessful.

The third problem is pre-coloring, it means that some instruction doesn't allow selecting the operand, it will use only given registers for inputs or outputs. For example a 32 bit div instruction takes the dividend from eax and edx, and outputs the result to eax and the remainder is stored in edx.

mov eax, (dividend)
mov edx, 0              ; unsigned extension to 8 bytes
div (divisor)
mov (quotient), eax
mov (remainder), edx

The command line tool

It is implemented in the x86Assembler project in the download file:

Bird x86 assembler
Usage: x86Assembler -i<input> -o<output>

  i, input      Required. Input file to be assembled.
  s, asm_out    Assembly created by register allocation output file.
  o, obj_out    Required. The generated object file name.
  arch          (Default: x86_32) Architecture mode for the assembly.
  help          Display this help screen.

Syntax

Basics

I have used a Flat assembler before, so the syntax builds on it. The elements have to be inside a section. To enable the register allocation and other transformations to the code you need to place the function between function and endfunction:

section "code" code executable readable

_Fib:
function
    save    ebp, ebx, edi, esi
    def     x, general, 4
    def     _1, general, 4
    def     _2, general, 4

    fenter
    mov     x, eax
    cmp     x, 2
    jge     _3
    mov     eax, 1
    fleave
_3:
    mov     _1, x
    dec     _1
    mov     eax, _1
    read    eax
    call    _Fib
    write   eax, edx, ecx
    mov     _2, eax
    mov     _1, x
    sub     _1, 2
    mov     eax, _1
    read    eax
    call    Fib
    write   eax, edx, ecx
    add     eax, _2
    fleave
endfunction

The function starts with some definitions. It needs to know which registers need to be saved by the fenter and fleave instructions. The def creates a symbol that can be used for data storage. First parameter is name, second type, third is size. The general means that it will be a general register (or a stack location).

The assembler does live range analysis, but some it needs some additional information for call instructions. read marks that the value will be used, write means that the original value has been overwritten.

The previous code will be transformed to this:

_Fib:
function
    push    ebp
    push    ebx
    mov     ebp, eax
    cmp     ebp, 2
    jge     _3
    mov     eax, 1
    pop     ebx
    pop     ebp
    ret 
_3:
    mov     eax, ebp
    dec     eax
    call    _Fib
    mov     ebx, eax
    mov     eax, ebp
    sub     eax, 2
    call    _Fib
    add     eax, ebx
    pop     ebx
    pop     ebp
    ret 
endfunction

This is how exporting and importing symbols work, these definitions need to be placed in the beginning of the assembly file:

public _Func2
public _Fib
extern _Func

section "code" code executable readable

Using stack

A function has two type of stack space. The space for the locals and the parameters. locals_size and parameters_size can specify their size. For user defined symbols you don't need to create local stack, when a symbol doesn't fits in registers, it'll automatically increase the local stack size to create a place for it. If the function uses a calling convention that needs to clean the stack, the callee_cleanup keyword can be used.

The stack can be accessed by register aliases. It is needed, because sometimes it can refer to different registers. eps is for parameter stack, els refers to the local stack. The e character means that it is 32 bit register. ps would be 16 bit, rps 64 bit.

_Function:
function
    locals_size 8
    parameters_size 4
    callee_cleanup
...

    mov eax, dword[eps]
    mov dword[els + 4]
...
endfunction

You may have seen that compilers copies esp to ebp. By default my assembler will do this if there is a local stack. In my experience it can boost performance. There are some keywords that can be used to overwrite this: frame_pointer_force, frame_pointer_disable, frame_pointer_default.

Stack alignment

Sometimes you need to store aligned data on the stack, e.g. SSE instructions requires 16 byte alignment:

_Func:
function
    def x, general, 4
    stack_align 16
    parameters_size 4

    fenter
    mov x, dword[eps]
    add x, 34
    fleave
endfunction

After extecuting the assembler:

_Func:
function
    push ebx
    mov ebx, esp
    and esp, 4294967280
    mov eax, dword[ebx + 8]
    add eax, 34
    mov esp, ebx
    pop ebx
    ret 
endfunction

The alignment is done by the and instruction on the stack pointer. It decreases the pointer, so it allocates some extra byte. The problem is that it is unknown what will be the offset to parameters and the caller function stack. Thus it needs to save that to the parameter pointer (I just call it that). It is always the ebx register. As well as for the frame pointer it can be forced or disabled: parameter_pointer_force, parameter_pointer_disable, parameter_pointer_default.

Using the code

Encoding an instruction

The project Bird.Architecture.x86 handles the instruction encoding task. I learnt machine code at http://www.c-jump.com/CIS77/CPU/x86/lecture.html. I will encode the add ax, cx instruction. Firstly a context should be created:

var context = new x86Context(ArchitectureMode.X86_32);

Next I create InstructionToEncode structure. It selects the instruction and operands. Mnemonic, ConditionTest and Register are enum types. ConditionTest can be only have a value if the instruction is conditional (Mnemonic.Jc, Mnemonic.Setcc, etc.).

var instructionToEncode =
    new InstructionToEncode(
        Mnemonic.Add,
        ConditionTest.Unknown,
        new Operand[]
        {
            Operand.Register(Register.AX),
            Operand.Register(Register.CX),
        });

And I call the x86CodeGeneration.Encode function:

EncodedInstruction encodedInstruction;
EncodingError error = x86CodeGeneration.Encode(
    context,
    instructionToEncode,
    out encodedInstruction);
    
if (error != EncodingError.Succeeded)
    throw new Exception("Failed to encode instruction");

It created a EncodedInstruction structure which contains prefixes, memory operand bytes, etc:

[Flags]
public enum EncodedInstructionPrefixes
{
    AddressSizePrefix = 1,
    OperandSizePrefix = 2
}

public struct EncodedInstruction
{
    public EncodedInstructionPrefixes   Prefixes { get; set; }
    public REXPrefix?                   REXPrefix { get; set; }
    public InstructionOpcode            Opcode { get; set; }
    public ModRegRMByte?                ModRegRMByte { get; set; }
    public SIBByte?                     SIBByte { get; set; }
    public VariableSizeSigned?          Displacement { get; set; }
    public VariableSizeSigned?          Immediate { get; set; }
}

By calling ToByteArray method, it can be converted to machine code bytes:

var bytes = encodedInstruction.ToByteArray();
foreach (var b in bytes)
    Console.WriteLine(b);

The x86Assembler class

The Operand.Immediate only allows a number value, labels are not allowed. That's because in machine code the assembler needs to emit relative or direct addresses. Each instruction has an address, direct means the destionation address, relative means the difference between destination and current address. It needs to know all the instruction sizes, and the instruction can grow bigger as the relative addresses are farther. The solution is to use multiple passes.

mov ecx, 0
_label:

    inc ecx
    cmp ecx, 10
    jl _label

This is a small for loop assembly code without anything in it. I've created a context, assembler and a symbol. There are 3 types of symbol: Local symbols only visible in the current assembly, public symbols are exported so other obj files can use it, external will be used from other file.

var context = new x86Context(ArchitectureMode.X86_32);
var assembler = new x86Assembler(context);
var label = new x86AssemblerSymbol("_label", CodeObjectSymbolType.Local);

The x86Assembler class functions to make members, I haven't made the data structure public. It uses LabelledOperand structure for operand, the only difference is that it allows labels too.

assembler.BeginCodeSection();

assembler.AddInstruction(
    Mnemonic.Mov,
    ConditionTest.Unknown,
    new LabelledOperand[]
    {
        LabelledOperand.Register(Register.ECX),
        LabelledOperand.Immediate(0),
    });

assembler.AddLabel(label);

assembler.AddInstruction(
    Mnemonic.Inc,
    ConditionTest.Unknown,
    new LabelledOperand[]
    {
        LabelledOperand.Register(Register.ECX),
    });

assembler.AddInstruction(
    Mnemonic.Cmp,
    ConditionTest.Unknown,
    new LabelledOperand[]
    {
        LabelledOperand.Register(Register.ECX),
        LabelledOperand.Immediate(10),
    });

assembler.AddInstruction(
    Mnemonic.Jcc,
    ConditionTest.Less,
    new LabelledOperand[]
    {
        LabelledOperand.Immediate(label),
    });

assembler.EndSection();

Next I'll create a CodeObject that contains the assembly binary. It can be saved to file by CoffObjectSaver class. I use http://onlinedisassembler.com/ to verify that the result is correct.

CodeObject codeObject;
var errorList = assembler.CreateCodeObject(out codeObject);
if (errorList.Count > 0)
    throw new Exception("Failed to create code object");

CoffObjectSaver.Instance.Save("test.obj", codeObject);

The next layer

Bird.Architecture.x86.Assembly allows register allocation, parsing etc. It has a tree that contains an immutable class for every element in the assembly. I'll show how to add two numbers that are stored in undefined storages. UndefinedStorage, ConstantImmediate, RegisterStorage, etc classes inherits the DataStorage class. TrackedString structure is used for the parser, its purpose is to store the offset to the original string while getting substrings. So here I created the instructions:

var context = new x86Context(ArchitectureMode.X86_32);

// parameters: name, type, size, alignment, priority
var undefinedA = new UndefinedSymbol("a", RegisterType.General, 4, 4, 0);
var undefinedB = new UndefinedSymbol("b", RegisterType.General, 4, 4, 0);

var elements = new AssemblyElement[]
{
    new AssemblyInstruction(
        Mnemonic.Mov,
        ConditionTest.Unknown,
        ImmutableArray.Create<DataStorage>(
            new UndefinedStorage(undefinedA, TrackedString.Invalid),
            new ConstantImmediate(2, TrackedString.Invalid)),
        TrackedString.Invalid),
        
    new AssemblyInstruction(
        Mnemonic.Mov,
        ConditionTest.Unknown,
        ImmutableArray.Create<DataStorage>(
            new UndefinedStorage(undefinedB, TrackedString.Invalid),
            new ConstantImmediate(3, TrackedString.Invalid)),
        TrackedString.Invalid),
        
    new AssemblyInstruction(
        Mnemonic.Add,
        ConditionTest.Unknown,
        ImmutableArray.Create<DataStorage>(
            new UndefinedStorage(undefinedA, TrackedString.Invalid),
            new UndefinedStorage(undefinedB, TrackedString.Invalid)),
        TrackedString.Invalid),
};

I use ImmutableArray because System.Collections.Immutable only has an ImmutableList class and it hasn't got as good indexer performance. Because of design issues they removed ImmutableArray for now.

Here is how to construct an AssemblyFunction. It has elements and local symbols. The FunctionProperties class can be used to specify the saved registers, local stack size etc. I just leave it empty now.

var function = new AssemblyFunction(
    ImmutableArray.CreateRange(elements),
    ImmutableArray.Create<AssemblySymbol>(undefinedA, undefinedB),
    new FunctionProperties(),
    TrackedString.Invalid);

Next I create the section. It has the same CodeObjectSectionProperties structure as created by x86Assembler.BeginCodeSection.

var section = new AssemblySection(
    ImmutableArray.Create<AssemblyElement>(function),
    "code",
    new CodeObjectSectionProperties(
        CodeObjectSectionType.Code,
        CodeObjectSectionFlags.Readable |
            CodeObjectSectionFlags.Writeable),
    TrackedString.Invalid);

The AssemblyTree class holds all sections and symbols that are outside of the function.

var tree = new AssemblyTree(
    ImmutableArray.Create<AssemblySection>(section),
    ImmutableArray.Create<AssemblySymbol>(),
    TrackedString.Invalid);

And some transformations should be run and the CodeObject can be created similarly as before:

var error = AssemblyActions.DoDefaultTransform(context, ref tree);
if (!error.IsSucceeded) 
    throw new Exception("Failed to do default transformation");

CodeObject codeObject;
var errorList = AssemblyActions.CreateCodeObject(
    context, tree, out codeObject);

if (errorList.Count > 0)
    throw new Exception("Failed to create code object");

CoffObjectSaver.Instance.Save("test2.obj", codeObject);