Table of Contents
Introduction
I developed this language because I don't find the other languages perfect for the things that I want to do. My problem with C++ is that it's slow to develop in it, and doesn't have such features that C# has, but .NET languages are not native and use CIL. This doesn't allow to use standard x86 assembly and maybe also slower.
I have been working since March 2010 in C#. Its performance is competitive with C++ while it will be a high level language. The syntax and libraries are similar to C#, C++, so I don't think it would be hard to learn, but I also change syntax if I think it could be better.
Requirements
Samples have a C++ equivalent code to compare performance. In order to compile them you need to install
MinGW, Clang and set the PATH variable, but it's optional.
The Visual C++ usage requires the path in "Run - VC++.bat" files to be set.
Creating Programs with Bird
The compiler can be run from command line by “Bird.exe” which is in “Binaries” directory:
Bird.exe -x -nodefaultlib -lBirdCore Something.bird -out Something.exe
I've made .bat files for the samples, so you won't need to use command line for them.
The -x means that the compiler should run the output file immediately. The input files
can be Bird files, C, C++ or Object files.
Libraries can be specified by the -l option. Currently the only available libraries are
BirdCore and BlitzMax, that are included by default. The -nodefaultlib
disables them. BlitzMax is another programming language,
its functions are needed for graphics since I haven't implemented it yet.
Object and archive files also can be the output file to use it in other languages. It can be specified
with the -format attribute. These are its possible values:
| app |
Executable file |
| arc |
Archive file, it doesn't contain the libraries, so linking them is also needed |
| obj |
Object file, only contains the .bird files' assembly. Both of libraries and other object files (e.g. compiled C++ files) has to be linked. |
1st Sample Program: Squares
|
This sample draws fading squares. When the mouse is over the square, it appears. I have created some constants and an array that contains the fading rate. The Width and Height variables contain how many squares should be in the graphics window:
using System
using BlitzMax
const var SquareSize = 32,
Width = 24,
Height = 20
float[Width, Height] Array
The type doesn't have to be specified, use can use var unlike C# where you can't use var for globals and class members and only one declaration is allowed.
|
 |
There are two helper functions. The first decreases a value, the second returns true if the mouse is in the specified area:
float Decrease(float Val)
return if Val < 0.02: 0 else Val - 0.02
bool InArea(int x, y, w, h)
return x <= MouseX() < x + w and y <= MouseY() < y + h
The Update function updates the squares and draws them. If the mouse is on the square, it draws a white square and sets the fading rate to 1 which means it's fully visible, otherwise it draws with a color based on square's position:
void Update()
for var x, y in 0 .. Width, 0 .. Height
var Value = Array[x, y]
var XPos = x * SquareSize
var YPos = y * SquareSize
if InArea(XPos, YPos, SquareSize, SquareSize)
Value = 1
SetColor 255, 255, 255
DrawRect XPos, YPos, SquareSize, SquareSize
else if Value > 0
var Color = 255 * Value
var Red = (int)(((float)x / Width) * Color)
var Greed = (int)(((1 - (float)y / Height)) * Color)
var Blue = (int)(((float)y / Height) * Color)
SetColor Red, Greed, Blue
DrawRect XPos, YPos, SquareSize, SquareSize
Array[x, y] = Decrease(Value)
The Reset function sets the array to zero, because it can contain any data. The Main function creates a graphics window, and updates it until the user hits the escape key or clicks the exit button:
void Reset()
for var x, y in 0 .. Width, 0 .. Height
Array[x, y] = 0
public stdcall void Main()
Reset
Graphics SquareSize * Width, SquareSize * Height
while !KeyHit(Keys.Escape) and !AppTerminate()
Cls
Update
Flip
The for until loop is different from for to loop. It doesn't reach the value at the right side, so subtracting one can be saved in some cases.
2nd Sample Program: Circles
|
In this sample, I calculate the color of each pixel. This sample's purposes is to compare the performance with C++. It calculates the distance from the middle point of the window and it takes the sine and cosine of it. To make it faster, it creates a smaller image that it draws to, and then it's scaled up.
I made some helper functions. The ArgbColor function combines the four components of a color to a single number. The GetDistance function simply calculates a distance, and the Wrap function cuts down parts of a integer that cannot fit in a byte.
The UpdateImagefunction does the main task of the program. It works as I wrote it in the beginning of the sample.
|
|
using System
using BlitzMax
int ArgbColor(byte a, r, g, b)
return (int)a << 24 | (int)r << 16 | (int)g << 8 | b
double GetDistance(double x1, y1, x2, y2)
var x = x2 - x1, y = y2 - y1
return Math.Sqrt(x * x + y * y)
byte Wrap(int x)
if x < 0: x = 0
if x > 255: x = 255
return (byte)x
void UpdateImage(IntPtr Image)
var Pixmap = LockImage(Image, 0, true, true)
var Width = ImageWidth(Image)
var Height = ImageHeight(Image)
var Time = MilliSecs()
for var x = 0 until Width
for var y = 0 until Height
var RelX = (float)x / Width
var RelY = (float)y / Height
var Distance = GetDistance(RelX, RelY, 0.5, 0.5) * 3 - (float)Time / 1000
var Light = (int)((RelY * 100) * Math.Abs(Math.Sin(Distance / 1.5)))
var Red = (int)((RelX * 255) * Math.Abs(Math.Cos(Distance))) + Light
var Green = (int)(((1 - RelY) * 255) * Math.Abs(Math.Sin(Distance))) + Light
var Blue = (int)((RelY * 255) * Math.Abs(Math.Cos(Distance / 3))) + Light
var Color = ArgbColor(255, Wrap(Red), Wrap(Green), Wrap(Blue))
WritePixel Pixmap, x, y, Color
UnlockImage Pixmap, 0
public stdcall void Main()
Graphics 1024, 720
var Image = CreateImage(512, 360, 1, -1)
while not KeyHit(Keys.Escape) and not AppTerminate()
Cls
UpdateImage Image
DrawImageRect Image, 0, 0, 1024, 720, 0
DrawFrameStats
Flip
Here are the performance results (frames per second) on my Intel Core 2 Dou 2.2 Ghz processor:
| Image Size |
GNU C++ compiler |
Clang C++ (LLVM) |
Visual C++ |
Bird |
| 512x360 |
8 |
9 |
14 |
18 |
| 256x180 |
31 |
35 |
53 |
60 |
Analyzing the Created Assembly Code
The ArgbColor Function
The ArgbColor function takes 4 byte parameters. These are passed in the al, ah, dl, dh registers, in that order. al is the first byte, ah is the second byte of the eax register. The result will be stored in the eax register too.
The ah register will be overwritten, so it copies it to the cl register. Then it clears the high 3 bytes of the eax register by doing a bitwise and on them and shifts the last byte from blue color place to the alpha. It does almost the same with the others.
_ArgbColor:
mov cl, ah
and eax, 0xFF
shl eax, 24
and ecx, 0xFF
shl ecx, 16
or eax, ecx
movzx ecx, dl
shl ecx, 8
or eax, ecx
movzx ecx, dh
or eax, ecx
ret
The GetDistance Function
The first three parameter is stored in the first three SSE register, the other is on the stack. The return value is stored in xmm0. The xmm4, xmm5 and xmm6 registers are saved, because in ascall, unlike other 32 bit calling conventions, the xmm4-7 registers must be unchanged after a function call. Bird doesn't save the stack pointer to ebp register, it rather uses it to store data, but it makes assembly less readable, because you may don't know the variables' positions. It even don't subtract 16 from esp, because no functions are called by GetDistance function.
_GetDistance:
movdqu dqword[esp - 16], xmm4
movsd xmm3, xmm2
subsd xmm3, xmm0
movsd xmm4, qword[esp + 4]
subsd xmm4, xmm1
movsd xmm1, xmm3
mulsd xmm1, xmm3
movsd xmm2, xmm4
mulsd xmm2, xmm4
addsd xmm1, xmm2
sqrtsd xmm0, xmm1
movdqu xmm4, dqword[esp - 16]
ret 8
The Wrap Function
This is the most simple function. It uses conditional moves. The disadvantage of this is that the cmov instruction's second parameter must be a memory position, and reading memory is slow. But I read the conditional jumps are also slow if we do it a lot because of branch prediction.
_Wrap:
cmp eax, 0
cmovl eax, dword[_11]
cmp eax, 255
cmovg eax, dword[_18]
ret
The Other Functions
I think describing them doesn't make much sense. The only thing I would mention is that in the future, I will add function inlining to improve performance. I suppose G++ does the same.
3rd Sample Program: Fire
|
This sample increases values based on
the distance to the mouse. These values are stored in a global array called Array.
The colors assigned to the them are stored in the Colors array. It uses
tuples extensively to reduce code size.
The sample uses some of functions from the previous sample. The GetDistance function
is vectorized, it takes two tuples as parameters. The GetValue function returns the value of
Array safely, if the parameters are outside of ImageSize, it returns 0.
The Update function firstly blurs the image by getting the avarage of nearby pixels.
It divides with 7.1 instead of 7 to make it fade out. Then it increases the
values in Array based on distance to the mouse.
UpdateImage function updates the image which will be scaled up to the whole window.
In this sample almost all of the four compilers have the same performance, but it can't be really measured because
FPS sometimes drops down really much.
| |
using System
using BlitzMax
const var
WindowSize = (1024, 720),
ImageSize = (512, 360),
RectSize = 70,
ColorNumber = 768
float[ImageSize.0, ImageSize.1] Array
int[ColorNumber] Colors
int ArgbColor(byte a, r, g, b)
return (int)a << 24 | (int)r << 16 | (int)g << 8 | b
float GetDistance((float, float) P1, P2)
var x = P1.0 - P2.0, y = P1.1 - P2.1
return (float)Math.Sqrt(x * x + y * y)
byte Wrap(int x)
if x < 0: x = 0
if x > 255: x = 255
return (byte)x
float GetValue(int x, y)
if 0 <= x < ImageSize.0 and 0 <= y < ImageSize.1
return Array[x, y]
return 0f
void Update()
for var x, y in 0 .. ImageSize.0, 0 .. ImageSize.1
var Value = Array[x, y] * 5
Value += GetValue(x + 1, y + 1) * 0.5
Value += GetValue(x - 1, y + 1) * 0.75
Value += GetValue(x - 2, y + 2) * 0.5
Value += GetValue(x - 3, y + 3) * 0.25
Array[x, y] = Value / 7.1
var Mouse = ((float)MouseX(), (float)MouseY()) * ImageSize / WindowSize
var P1 = (int, int)Mouse - RectSize, P2 = (int, int)Mouse + RectSize
if P1.0 < 0: P1.0 = 0
if P1.1 < 0: P1.1 = 0
if P2.0 >= ImageSize.0: P2.0 = ImageSize.0 - 1
if P2.1 >= ImageSize.1: P2.1 = ImageSize.1 - 1
for var x, y in P1.0 ... P2.0, P1.1 ... P2.1
var Distance = 1 - GetDistance(Mouse, (x, y)) / RectSize
if Distance > 0f
var NewVal = Array[x, y] + Distance / 10f
if NewVal > 1f: NewVal = 1f
Array[x, y] = NewVal
void UpdateImage(IntPtr Image)
var Pixmap = LockImage(Image, 0, true, true)
for var x, y in 0 .. ImageSize.0, 0 .. ImageSize.1
var Color = Colors[(int)(Array[x, y] * (ColorNumber - 1))]
WritePixel Pixmap, x, y, Color
void Initialize()
for var x, y in 0 .. ImageSize.0, 0 .. ImageSize.1
Array[x, y] = 0f
for var i in 0 .. ColorNumber
var Value = (float)i / (ColorNumber - 1)
var Red = (int)((Math.Min(Value, 1f / 3)) * 3 * 255)
var Green = (int)((Math.Min(Value, 1.65f / 3) - 0.65f / 3) * 3 * 255)
var Blue = (int)((Math.Min(Value, 1f) - 2f / 3) * 3 * 255)
Colors[i] = ArgbColor(255, Wrap(Red), Wrap(Green), Wrap(Blue))
public stdcall void Main()
Initialize
Graphics WindowSize
var Image = CreateImage(ImageSize, 1, -1)
while not KeyHit(Keys.Escape) and not AppTerminate()
Cls
Update
UpdateImage Image
DrawImageRect Image, (0, 0), WindowSize, 0
DrawFrameStats
Flip
Some Basic Syntax
Literals
The type of integer literals can be specified by a suffix that is a lower case short form of the type name. The hexadecimal numbers have to be upper case in order to distinguish them:
$FFb '' An unsigned byte
-$1Asb '' A signed byte
%100 '' A binary number
For loops
for var x, y in 0 ... Width, 0 ... Height
C# equivalent:
for (var x = 0; x <= Width; x++)
for (var y = 0; y <= Height; y++)
In certain cases you don't want to reach the value at right side. This can be indicated by 2 dots instead of 3:
for var x, y in 0 .. Width, 0 .. Height
C# equivalent:
for (var x = 0; x < Width; x++)
for (var y = 0; y < Height; y++)
Strings and Arrays
I've implemented the most important string methods that are found in .NET so far:
using System
using BlitzMax
public void Main()
Print "adfdfgh".PadRight(10) + "Valami"
Print "adfdh".PadRight(10) + "Valami"
Print
Print "adfdfgh".Contains("fdf")
Print "adfdfgh".Contains("fdfh")
Print
Print "adfdfgh".Replace('d', 'f')
Print "adfdfgh".Replace("d", "ddd")
Print "adfdfghléáuúsdf".ToUpper()
This is how the reference array can be created and initialized:
public void Main()
var Array1D_1 = new int[234]
var Array1D_2 = new int[]: 1, 2, 3
var Array1D_3 = new int[]:
1
2
3
var Array2D_2 = new int[,]: (1, 2), (3, 4)
var Array2D_2 = new int[,]:
1, 2
3, 4
var Array2D_3 = new int[,]:
(1000, 1001, 1002, 1003, 1004, 1005
1006, 1007, 1008, 1009, 1010, 1011)
(2000, 2001, 2002, 2003, 2004, 2005
2006, 2007, 2008, 2009, 2010, 2011)
The implementation of these functions can be found in "Bird\Libraries\BirdCore\String.bird" and "Bird\Libraries\BirdCore\Array.bird". Because there is no gc yet, these objects will be in memory until the program stops.
Structures and Enumerations
Basically, these are the same as C#:
struct Rect
public float X, Y, Width, Height
public Rect(float X, Y, Width, Height)
this.X = X
this.Y = Y
this.Width = Width
this.Height = Height
public Rect Copy()
return new Rect(X, Y, Width, Height)
public Rect Copy_2()
return new Rect():
X = this.X
Y = this.Y
Width = this.Width
Height = this.Height
public float Area:
get return Width * Height
public float GetValue(int x)
switch x
case 0: return X
case 1: return Y
case 2: return Width
case 3: return Height
default return 0
static class Console
public static void PrintInt(int x)
public static void PrintLong(long x)
public static void PrintDouble(double x)
enum Keys
A = 65, B, C, D, E, F, G, H, I, J, K, L, M, N, O
P, Q, R, S, T, U, V, W, X, Y, Z,
F1 = 112, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12
flag Modifiers
None = 0
Shift = 1, Control, Option, System
Alt = Option, Menu = Option, Apple = System, Windows = System
For the flag type, the automatic value is always multiplied by 2 unlike C#. It adds 1 even if it's declared with Flags attribute. The switch command doesn't need case statements to jump out from the switch.
Parameters with ref, out
I made a small addition to C#, it's possible to declare a variable after the out keyword:
using System
using BlitzMax
public void PrintFunc(ref int x)
Print x
x++
public void Func(out int x)
x = 10
public stdcall void Main()
Func out var x
PrintFunc ref x
PrintFunc ref x
PrintFunc ref x
A variable passed with ref must have a value, out parameters must set a value before
leaving the function. These checks can be avoided by using unsafe_ref.
Named and Optional Parameters
It's similar to C#. This is how BlitzMax.Grapics function looks in Bird:
public extern cdecl asmname("_brl_graphics_Graphics")
IntPtr Graphics(int Width, Height, Depth = 0, Hertz = 60, Flags = 0)
To call it, it's possible to miss the last parameters, or by specifying each parameter individually:
Graphics 800, 600
Graphics Width: 800, Height: 600
Graphics 800, 600, Hertz: 75
Properties and Indexers
class Class
int _Something
public int Something:
get return _Something
set _Something = value
public int AutomaticallyImplementedProperty:
get
set
public int this[int Index]:
get return Index * 2
public int NamedIndexer[int Index]:
get return this[Index]
public void Main()
var Obj = new Class()
PrintInt Obj[3]
PrintInt Obj.NamedIndexer[4]
Operator Functions
class Class
int _Something
public static void operator ++(Class Obj)
Obj._Something++
public void Main()
var Obj = new Class()
Obj++
Tuples
The tuples allow storing multiple values in a variable with different types. You can write the multiplication of vectors in Bird language like this:
type float3 = (float x, y, z)
float3 Cross(float3 a, b)
return a.y * b.z – a.z * b.y,
a.z * b.x – a.x * b.z,
a.x * b.y – a.y * b.x
If I don’t define the float3 type or name the members, they can be identified by numbers:
float, float, float Cross((float, float, float) a, b)
return a.1 * b.2 – a.2 * b.1,
a.2 * b.0 – a.0 * b.2,
a.0 * b.1 – a.1 * b.0
For the simple vector operations (addition, subtraction), you don’t have to call functions, it's enough to write a + b.
If I want to manage the members of the tuple, it's enough to separate the components by commas at the left of the assignment:
float x, y, z
x, y, z = Cross(a, b)
It’s also possible to declare and give values to the variable, but in this case you have to write out the types to be unambiguous:
(var x, var y, var z) = Cross(a, b)
Swapping of Variables
It swaps two or more data simple without used temporary variable:
a, b = b, a
Without this, the only way is to declare new variables or create a function that does the swapping, but this is also not perfect because the parameter can be only type. In order to be able to output values in parameters, I use the ref types. When you pass a parameter with ref, you pass the address of the variable, not its value:
void SwapInts(ref int a, b)
var c = a
a = b
b = c
'' Usage:
SwapInts ref a, ref b
In C, pointers are also needed:
void SwapInts(int* a, int* b)
{
int c = *a;
*a = *b;
*b = c;
}
SwapInts(&a, &b);
Function Return Value
Returning with a variable that is created at the beginning of a function is a common case. This function calculates the length of a zero-terminated string where the return command is not necessary:
public static UInt_PtrSize StringLength(char* Str)
: Ret = 0
while Str[Ret] != '~0'
Ret++
It's possible to create multiple variables, in this case the return type will be a tuple.
Referencing Functions Written in Other Languages
You can reference them with the extern modifier. The asmname enables you to specify the name that this identifier will have in the compiled code. You can also give the calling convention of the function. Currently, my language supports cdecl, stdcall, and ascall:
public extern cdecl asmname("_brl_polledinput_KeyHit") bool KeyHit(Keys key)
In C, there wouldn't be asmname setting, you need to use it with the full name or declare a macro:
void brl_polledinput_KeyHit(uint KeyCode);
#define KeyHit brl_polledinput_KeyHit
The default function call is ascall that uses eax, edx and ecx for parameters, but passes pointers, classes, etc. first, then the numbers (except the floating point numbers because it's not possible to calculate if they are in general registers). It can also handle high and low byte of the registers that have that variant to store two data in the same register.
Including Assembly in Source Code
Here is an example (System.Math.Truncate from BirdCore library):
public static asm double Truncate(double x)
fldcw $[_x86Helper.TruncateFPUControlWord]
fld qword[esp + 4]
frndint
fldcw $[_x86Helper.DefaultFPUControlWord]
ret 8
As you can see, it's possible to use globals with $[ ]. For function parameters direct location is needed, if I implement it, the code would be less optimal, because not all of the instructions' operand data types are allowed.
Getting the address of a r value
When the address is queried from something that is not a variable, it's copied to variable that only lives in the scope of the command:
using System
rem This constructors from the array class will be called:
public Array(int DimensionCount, Int_PtrSize ItemSize, Int_PtrSize* Dimensions)
int[,] CreateIntArray2D(int Width, Height)
var Class = new Array(2, 4, [Width, Height])
return reinterpret_cast(:int[,])(Class)
int[] CreateIntArray1D()
var Class = new Array(1, 4, &16)
return reinterpret_cast(:int[])(Class)
The type of [Width, Height] is int[2] (that is a fixed size array), so when it casted to int* the address queried.
reinterpret_cast basically does nothing, just change the type of an expression node like casting a pointer.
Overview of Bird's Structure
Identifiers
These can be types, variable or functions. All variables and functions have types. The latter is determined by the return value and the parameter’s type. The variables are also divided into ground. They can be local, global or member of a structure.
Modifiers
They can be found before the declaration (e.g., cdecl, asmname, etc.). They are isolated and stored in a list that will receive a function that does the declaration. For example, if a variable is marked with static, then it will create a global identifier.
Expressions
These are parsed by different recognizers. It’s possible to create a new or replace one without a large modification in the language. An expression recognizer can recognize more operations. E.g., The Multiplicative recognizes multiplication, division and modulus.
If it succeeded, then it should call the plugin that the recognizer received by parameter.
Plugins
The plugins can modify the recognized expression. The MultiPlugin class manages multiple plugin joint work. The type manager plugin’s task is to determine the type of the expressions and do the conversions. The calculator plugin calculates a value if we want to operate with constants.
If the expression recognizer wants to declare a variable, it should call a plugin’s function. If it can’t do this, it gives an error message. The multi plugin redirects the declaration to the compiler plugin, if it exists.
Preprocessor
It processes all statements starting with #. You can create a macro or if a condition is true, run another code. There’s a preprocessor plugin that replaces the expressions created by call recognizer to the given expression:
#define min(x, y) if x < y then x else y
#define max(x, y) if x > y then x else y
int Something(int i, j)
return max(i, j) / min(i, j)
The preprocessor produces this from it:
int Something(int i, j)
return (if i > j: i else j) / (if i < j: i else j)
Scopes
In Bird language, statements that have the same number of whitespace characters belong to a scope. First the compiler creates a global scope, then calls the preprocessor, processes the types and then the functions and variables. These are processed multi-threaded.
In the code, scopes can contain commands that can also contain things. E.g., an if command has an expression which is the condition and one or more code scopes. A function scope is the outermost scope in a function that is also a code scope.
Architectures
It’s possible to do more architecture that create assembly code. If necessary, it can use a plugin or attach data to identifiers, identifier containers. Currently the x86 architecture is implemented.
Basic Insight into Code Parsing
The language consists of recognizers, and supporting new languages can be done by replacing recognizers. The functions, types, variables will be available between Bird and the new languages. I don't use parser generators, because I think using other software could prevent from implementing some of the functionalities or I would have to do things in a different way.
The CompilerState class has a Language object that contains all recognizers. Here is the Language class:
public abstract class Language : LanguageNode
{
public IList<IExprRecognizer> ExprRecognizers;
public IList<IIdRecognizer> IdRecognizers;
public IList<ICommRecognizer> CommRecognizers;
public IList<IModRecognizer> ModRecognizers;
public IList<INameGenerator> NameGenerators;
public IList<IResultSkippingHandler> GlobalHandlers;
public IArgRecognizer ArgRecognizer;
public IGenericRecognizer GenericRecognizer;
public IGroupRecognizer GroupRecognizer;
public IRetValLessRecognizer RetValLessRecognizer;
public IInnerScopeRecognizer InnerScopeRecognizer;
public IVarDeclRecognizer VarDeclRecognizer;
public ITypeDeclRecognizer TypeDeclRecognizer;
public IConstDeclRecognizer ConstDeclRecognizer;
public INamespaceDeclRecognizer NamespaceDeclRecognizer;
public IDeclarationRecognizer DeclarationRecognizer;
public IGlobalScopeProcessor GlobalScopeProcessor;
public ICodeFileProcessor CodeFileProcessor;
public ICodeProcessor CodeProcessor;
...
}
String Handling
Bird uses StringSlice to store strings to avoid a new string allocation when it calls string.Substring. This structure also has functions for bracket handling and other operations. The CodeString structure consists of a StringSlice and a reference to CodeFile class that can also manage lines and indents of them. CodeString has a Line field and when its substring is needed, it updates the Line member. Most of the CodeString's methods are just wrapper around StringSlice's functions.
The ISkippingHandler interface is used to skip results that are in a string constant or bracket. The recognizers can store data in ResultSkippingManager.Datas, if it's needed. In some cases, it's necessary to make sure that the brackets are not skipped. A bracket recognizer shouldn't skip brackets if ResultSkippingManager.DoNotSkipBrackets is true.
public class BracketRecognizer : LanguageNode, IExprRecognizer,
IIdRecognizer, IResultSkippingHandler
{
...
public int SkipResult(ref ResultSkippingManager RSM)
{
if (!RSM.DoNotSkipBrackets && Helper.GetBracket(RSM.CurrentChar, RSM.Back))
{
var Handlers = RSM.SkippingHandlers;
if (RSM.Back)
{
var NewString = RSM.String.Substring(0, RSM.Current + 1);
var Pos = NewString.GetBracketPos(true, Handlers);
if (Pos != -1) return Pos;
}
else
{
var NewString = RSM.String.Substring(RSM.Current);
var Pos = NewString.GetBracketPos(false, Handlers);
if (Pos != -1) return Pos + RSM.Current;
}
}
return -1;
}
}
So this is the BracketRecognizer class. It has to return an index where the string search should be continued or -1. The GetBracketPos method returns the position of the bracket's pair which the string ends or starts with. If the first parameter is false, the string has to begin with a bracket that looks to the right.
LanguageNode Class and Recognizers
The LanguageNode class contains an array of strings called Operators that it recognizes and another array of strings (LanguageNode.Skip) that it should skip, because the operators, that it has to recognise, is the part of them. E.g., an addition recognizer has +, - operators then the skip array should contain ++, --. The NewLine strings contain which operators need the previous or the next line. E.g., if the line ends with +, then it will continue with the next:
public abstract class LanguageNode
{
public DataList Data = new DataList();
public Language Root;
public LanguageNode Parent;
public IList<LanguageNode> Children;
public IList<IResultSkippingHandler> SkippingHandlers;
public string[] Operators;
public string[] SkipFind;
public string[] Skip;
public string[] NewLineLeft;
public string[] NewLineRight;
public string[] OnlyLeft;
public string[] OnlyRight;
...
}
Recognizations are done in a recursive way. I think it's simple but maybe not the fastest way to parse a code file. To decrease the time spent on string operations, I use my own functions which are optimized for these things and significantly faster, although far more time spent on code generation. Currently my compiler can process 10 000 lines of code in about 1 second.
An expression recognizer should inherit from this class and IExprRecognizer interface. Recognizers can use Find2 function to determine which operators aren't unary. If the left side of the result ends with an operator or there's nothing, it will ignore that result. When it's not enough to use the LanguageNode.Operators, for example the cast recognizer needs to implement the IFind2Handler interface. The following code is the addition recognizer:
public class AdditiveRecognizer : LanguageNode, IExprRecognizer
{
public AdditiveRecognizer()
{
Operators = new string[] { "+", "-" };
NewLineLeft = NewLineRight = Operators;
}
public ExprRecognizerRes Recognize
(CodeString Code, ExprPlugin Plugin, ref ExpressionNode Ret)
{
var Result = ExprRecognizerHelper.Find2(this, Plugin.Container, Code.String);
if (Result.Position != -1)
{
var Ch = ExprRecognizerHelper.TwoParamOpNode(Code, Result, Plugin);
if (Ch == null) return ExprRecognizerRes.Failed;
Operator Op;
if (Result.Index == 0) Op = Operator.Add;
else if (Result.Index == 1) Op = Operator.Subract;
else throw new ApplicationException();
Ret = new OpExpressionNode(Op, Ch, Code);
return ExprRecognizerRes.Succeeded;
}
return ExprRecognizerRes.Unknown;
}
}
This still wouldn't be compatible with the NumberRecognizer, because floating point numbers can have e notation (2e+2 = 2 * 102). To solve this problem, the NumberRecognizer adds a result skipper to LanguageNode.SkippingHandlers. ExprRecognizerHelper.Find2 takes a LanguageNode parameter first, which is used to get the skipping handlers from.
The Code Generator
This is most complex part of Bird. It is about the third of the whole project. These are major steps it does to compile a function:
- Processes expressions
- Calculates the expression result storage locations
- Calculates the locals' locations
- Checks whether there is a need for temporary register (in x86 assembly most instructions can only have one memory operand, some must be either memory location or general register, etc.)
- If a temporary register needed restarts the data location allocation, allocates the temporary registers and continues at 2nd point.
- Generates assembly (without jumps)
- Jump optimizations
- Jump replacements
The Jump Optimizations and Replacements
int Fib(int x)
if x < 2: return 1
else return Fib(x - 1) + Fib(x - 2)
This function would have the following assembly without jump optimizations:
_Fib:
push ebx
push ebp
mov ebp, eax
cmp ebp, 2
jge _26
mov eax, 1
jmp _2
jmp _24
_26:
mov eax, ebp
dec eax
call _Fib
mov ebx, eax
mov eax, ebp
sub eax, 2
call _Fib
add eax, ebx
jmp _2
_24:
_2:
pop ebp
pop ebx
ret
The label _26 and _24 are for the condition and the label _2 is for function return. There are some unnecessary jumps and labels that should be deleted:
_Fib:
push ebx
push ebp
mov ebp, eax
cmp ebp, 2
jge _26
mov eax, 1
jmp _2
_26:
mov eax, ebp
dec eax
call _Fib
mov ebx, eax
mov eax, ebp
sub eax, 2
call _Fib
add eax, ebx
_2:
pop ebp
pop ebx
ret
It could save a jump if it does the same as where it jumps to. Unless it is too long, it replaces the jump:
_Fib:
push ebx
push ebp
mov ebp, eax
cmp ebp, 2
jge _26
mov eax, 1
pop ebp
pop ebx
ret
_26:
mov eax, ebp
dec eax
call _Fib
mov ebx, eax
mov eax, ebp
sub eax, 2
call _Fib
add eax, ebx
pop ebp
pop ebx
ret
Temporary Register Handling
The IsEqual function needs 3 temporary register to fulfil that the mov instruction can only have one memory operand, and the memory location can't have a memory address:
long* g_Pointers
int g_Index
long g_Value
bool IsEqual()
return g_Value == g_Pointers[g_Index]
_IsEqual:
mov eax, dword[_g_Pointers]
mov edx, dword[_g_Index]
mov ecx, dword[eax + edx * 8]
cmp dword[_g_Value], ecx
jne _27
mov ecx, dword[eax + edx * 8 + 4]
cmp dword[_g_Value + 4], ecx
jne _27
mov al, 1
ret
_27:
xor al, al
ret
Condition Assembly Generation
This is the main function for condition assembly generation (Bird.CodeGenerator.GetConditionCode):
void GetConditionCode(ExpressionNode Node, int Then, int Else, bool ElseAfterCondition)
The Then and Else are the index of labels. If ElseAfterCondition is
True, then the condition is followed by the else scope and the jump's condition is the same as the operator and it jumps to the Then label
False, then the condition is followed by the then scope and the jump's condition is negated and it jumps to the Else label
Here is an example:
void WhileTest(bool Loop)
var a = 0
while Loop
a++
void DoWhileTest(bool Loop)
var a = 0
do
a++
while Loop
; This is the assembly without jump replacements
_WhileTest:
xor edx, edx
_90:
test eax, eax
jnz _18
inc edx
jmp _90
_18:
ret
_DoWhileTest:
xor edx, edx
_93:
inc edx
test eax, eax
jz _93
ret
; After jump replacements
_WhileTest:
xor edx, edx
test eax, eax
jnz _18
_89:
inc edx
test al, al
jnz _89
_18:
ret
_DoWhileTest:
xor edx, edx
_93:
inc edx
test eax, eax
jz _93
ret
In the WhileTest function, the condition is followed by the then scope, ElseAfterCondition is false. In the DoWhileTest it's followed by the return command (which happens if the condition is false), the ElseAfterCondition is true.
This is how the and and or operators work:
void While_OrTest(bool x, y, z)
var a = 0
while x or y or z
a++
void DoWhile_OrTest(bool x, y, z)
var a = 0
do
a++
while x or y or z
void While_AndTest(bool x, y, z)
var a = 0
while x and y and z
a++
void DoWhile_AndTest(bool x, y, z)
var a = 0
do
a++
while x and y and z
_While_OrTest:
xor ecx, ecx
_45:
test al, al
jnz _44
test ah, ah
jnz _44
test dl, dl
jz _7
_44:
inc ecx
jmp _45
_7:
ret
_DoWhile_OrTest:
xor ecx, ecx
_50:
inc ecx
test al, al
jnz _50
test ah, ah
jnz _50
test dl, dl
jnz _50
ret
_While_AndTest:
xor ecx, ecx
_57:
test al, al
jz _9
test ah, ah
jz _9
test dl, dl
jz _9
inc ecx
jmp _57
_9:
ret
_DoWhile_AndTest:
xor ecx, ecx
_62:
inc ecx
test al, al
jz _10
test ah, ah
jz _10
test dl, dl
jnz _62
_10:
ret
In all the while and do-while tests, the last condition remains the same, the ElseAfterCondition is only changed for the first two conditions. For the and conditions, the ElseAfterCondition is false, for the or conditions it's true.
Here is an example that contains both operators:
void While_AndOrTest(bool x, y, z, w)
var a = 0
while (x or y) and (z or w)
a++
void DoWhile_AndOrTest(bool x, y, z, w)
var a = 0
do
a++
while (x or y) and (z or w)
_While_AndOrTest:
xor ecx, ecx
_71:
test al, al
jnz _72
test ah, ah
jz _11
_72:
test dl, dl
jnz _70
test dh, dh
jz _11
_70:
inc ecx
jmp _71
_11:
ret
_DoWhile_AndOrTest:
xor ecx, ecx
_77:
inc ecx
test al, al
jnz _79
test ah, ah
jz _12
_79:
test dl, dl
jnz _77
test dh, dh
jnz _77
_12:
ret
Plans and Ideas
Currently the most important is to make the language stable and usable. I'm planning to make exceptions, reflection, interfaces, etc. and gc soon.
Inlined Functions Inside Expressions
This could help to create IEnumerable<T> instances. I would create a recognizer to find commands, and then it would create an inline lambda
like function and call it. C# doesn't support even the yield keyword in lamdas.
void Something(int* pInt, int Count)
AnotherFunction for var i = 0 until Count
yield pInt[i]
void AnotherFunction(IEnumerable(:int) list)
History
- 19/5/2012: Arrays, object initializations, address can be taken of r values, ref, out parameters, added Visual C++ compilation of samples
- 2/5/2012: Improved performance, identifier aliases instead of typedefs, strings, reference equality operator (===, !==), binary files can be linked into the assembly, changed the name from Anonymus to Bird
- 20/4/2012: Implemented indexers and operator functions
- 30/3/2012: Simplified command line usage and some bug fixes
- 24/3/2012: Finished SSE support for scalar floating point calculations and added a new data location manager,
that made the Circles sample run much faster. The Fire sample now has a C++ version to compare performance.