Click here to Skip to main content
15,662,426 members
Articles / Programming Languages / VC++
Posted 5 May 2015


8 bookmarked

Compiler and language 2

Rate me:
Please Sign up or sign in to vote.
5.00/5 (3 votes)
5 May 2015CPOL5 min read


A continuation of my previous article related to Compiler and language ,


As this is a continuation, it would be best if you go through my previous article related to this topic. As before, the input program is provided using program.txt

A series of programs are provided in the attachment to give you a fair idea of this language's syntax. They exploit its function calling, array, auto variable capabilities

You will require a strong back ground in assembly programming since the output of a language is an assembly, though I have generated a C file with inline assembly to make it easier for us to view the results of the variables.

We will extend the language to support simple concepts (that we take for granted). As before the syntax must support both versions: compiler and interpreter, this allows users of this language to maintain code coherence.


Array access via array index is supported using the following syntax:


Here we can see that array 'a1' is written to at index 1+z, the value written is 24+3*s



Here we read the value of array a1 at index z*2

AUTO (variable on stack)

These variables have their scope limited to the function call , i.e. their scope is limited from the LABEL under which they are defined till the RETURN the program will encounter.

Usage of AUTO variables:


This will create an auto variable 'a' and associate value 200 to it


This will read the value from AUTO variable 'a', this variable must exists in the scope of the function call, else the asm code generated will be incorrect, (we can always place a simple check for this).

One optimization:- tail call :

This optimization is enabled for VS2012 release build and gcc -o3 optimization

It is a simple optimization that checks if additional stack frame creation can be ignored and the code generated for a 'call' can be replaced by 'jmp' (this saves the trouble of pushing return address to stack), a 'return' from such optimization forces the code to return to a previous function.

Using the code

As with my previous articles, the code must be referred to at all times and debugged to get a better understanding of how it works.

Let's look at array support

We start by declaring arrays via DIM:a1:10, this creates a one dimension array of size 10, this is required during the compilation stage since this array size has to be declared globally, for the interpreter, this is ignored

For working with arrays, we need 2 temporary variables: __TarrayIndexProcessing_ , __TarrayValueProcessing_. These variables are used for calculating and holding array index and value at that index. You will notice that index processing for compilers have one additional step: to multiply by 4 since the size of int is 4 bytes to correctly access the required element . This is not required for interpreter since it holds the values using:

struct arrayvar{
 string m_name; 
 int index;
 DWORD sz;  //maintains the size of the whole array (only one instance is actually required per name)

map<arrayvar,int> arrayList;

Notice that arrayvar (used by interpreter) uses index and name to uniquely identify an element, please refer to code to view the comparator function required by the map.

AUTOs  (variables on stack)

Variables on stack for interpreter is fairly straight forward, we must maintain the stack frame for every function call

This is done using the following line of code

stack< map<string,pair<UINT,int>> *> stackFrame; 

map<string,pair<UINT,int>> &autoStack=*; //maintains the current stack frame

Stack frame is created at the start of the program and at all times we point to the top of the stack, when a function call is encountered, a new stack frame is created and top of stack is updated, this ensures that the current stack is always maintained.

Please refer to CALL functionality implementation for the interpreter.

Similarly RETURN: functionality pops the stack and updates the top of stack.

Stack implementation for compiler is not very straight forward as the code has to parsed multiple times to find the definition of this variable in current LABEL scope.

If variable is not found within LABEL scope, then a variable is created by calling push element

The below function looks up auto variables from the source code, since we are in compilation phase the source code is available to find the index with respect to LABEL keyword.

int getVariablesIndex(string varName, UINT uiLineNumber)

It returns -1 if variable is not found else the index on stack frame if variable is found



Auto variable 'a' will be at index 1
Auto variable 'c' will be at index 2
Accessing these variables will be via EBP (this register is used for holding the frame pointer)
We never use ESP since its value can change when a push/pop/call is encountered
a will be accessed using EBP-4,
c will be used accessing EBP-8

Tail call:-

This optimization is similar for compilers and interpreter.

We parse through the file after the CALL is processed to look for an immediate RETURN, we ignore comments generated using '#'. If RETURN is found immediately after the CALL (ignoring comments) , we know that this code can be tail call optimized by placing a 'JMP' instead of a 'call', if AUTOs are used then additional code is required before RETURN for stack clean up and will not perform tail call optimization.

If RETURN is not immediately encountered , then the code has to return to a point after CALL to execute the statements found before RETURN, hence a 'call' will be used and not a 'JMP'.

This optimization is coded in place where 'CALL' language keyword is being evaluated, please refer to the code.


The above program will output the below displayed assembly, notice that JMP f1 is used in place of call f1

jmp f1

The above program will output the below displayed assembly, notice that call f1 was used since tail call optimization was not used

sub EBP,4
call f1
add EBP,4

Code usage

The following input program:-

# c will hold the value 200

Notice that a call has been made to 'f' , an AUTO variable 'a' has been used and then a call to 'f1', and a new AUTO variable 'a' has been created , notice in the compile output that the stack will get cleaned up and when 'f1' returns , AUTO:a will correctly point to the variable belonging to its scope, therefore c will hold value '200' and not '300'.

Points of Interest

This article introduces the implementation of ARRAYs and AUTOs in compilers and interpreters. We also see the active use of EBP for maintaining stack frames so as to correctly access stack variables.


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By
Instructor / Trainer
India India
I have been working with computers since my eight grade, programming the ZX Spectrum. I have always had an interest in assembly language and computer theory (and is still the reason for taking tons of online courses), actively code using C/C++ on Windows (using VS) and Linux (using QT).

I also provide training on data structures, algorithms, parallel patterns library , Graphics (DX11), GPGPUs (DX11-CS,AMP) and programming for performance on x86.
Feel free to call me at 0091-9823018914 (UTC +5:30)

(All views expressed here do not reflect the views of my employer).

Comments and Discussions

QuestionNice Pin
Kerem Guemruekcue29-Aug-15 12:22
Kerem Guemruekcue29-Aug-15 12:22 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.