Click here to Skip to main content
6,306,412 members and growing! (16,648 online)
Email Password   helpLost your password?
Languages » C / C++ Language » General License: The Code Project Open License (CPOL)

Compiler in action- C/C++ to Machine

By Maruf Maniruzzaman

What compiler generates from C/C++ source code
C++
Posted:6 Jun 2008
Views:8,849
Bookmarked:24 times
Unedited contribution
Announcements
Loading...
 
Search    
Advanced Search
printPrint   Broken Article?Report       add Share
  Discuss Discuss   Recommend Article Email
23 votes for this article.
Popularity: 4.71 Rating: 3.46 out of 5
1 vote, 4.3%
1
4 votes, 17.4%
2
4 votes, 17.4%
3
7 votes, 30.4%
4
7 votes, 30.4%
5

Introduction

What happens when I give my C/C++ code to a compiler? It generates machine code. But I want to know what machine code it generates really. I use the compiler that comes with Visual C++ 2008. Other versions should be similar if not same.

Producing Assembly Output

With Visual Studio we can produce assembly language output with following settings:

Project Property Pages > Configuration Properties > C++ > Output Files
Assembler Output: Assembly With Source Code (/FAs)

The compiler generates assembly code and output with corresponding C/C++ source code. Its very useful to understand how the compiler works.

Function

A function when compiled has its prolog, epilog and ret instructions along with its body. It maintains the stack and local variables.

Prolog and Epilog

Prolog is a set of instructions that compiler generates at the beginning of a function and epilog is generated at the end of a function. This two maintains stack, local variables, registers and unwind information.

Every function that allocates stack space, calls other functions, saves nonvolatile registers, or uses exception handling must have a prolog whose address limits are described in the unwind data associated with the respective function table entry. The prolog saves argument registers in their home addresses if required, pushes nonvolatile registers on the stack, allocates the fixed part of the stack for locals and temporaries, and optionally establishes a frame pointer. The associated unwind data must describe the action of the prolog and must provide the information necessary to undo the effect of the prolog code [MSDN].

Let us see what is generated as prolog and epilog. We have a function named add like this:

int add(int x, int y)
{
    int p=x+y;
    return p;
}
And the generated assembly listing:
_p$ = -4      ; size = 4
_x$ = 8       ; size = 4
_y$ = 12      ; size = 4
?add@@YAHHH@Z PROC     ; add, COMDAT
; 12   : {
;Prolog
push ebp
mov ebp, esp
push ecx

; 13   :  int p=x+y;
mov eax, DWORD PTR _x$[ebp]
add eax, DWORD PTR _y$[ebp]
mov DWORD PTR _p$[ebp], eax
; 14   :  return p;
mov eax, DWORD PTR _p$[ebp]
; 15   : }

;Epilog
mov esp, ebp
pop ebp

ret 0 ;disposition of stack- 0 disp as returning through register
?add@@YAHHH@Z ENDP     ; add

Not much work. The compiler just saves EBP register copies the ESP register in EBP register and use EBP as stack pointer at prolog and at epilog stage it restores the EBP register. Sometimes there is a subtraction to handle local variables. There are two instruction ENTER and LEAVE that can be used in place of push pop things.

Function Parameters/Local Variables

The function parameters are placed at positive offset from the stack pointer and local variables are located at negative offset at the time of calling the function. Function parameters are pushed on the stack before calling and the function may initialize the local variable. From previous assembly listing we find parameters x and y is at offset 8 and 12 and the local variable p is at offset -4 from the stack top.

Function call

The CALL instruction is used to invoke a function. Before doing so the caller function pushes parameter values or set register (this pointer) and issue CALL instruction. After returning the caller function may need to set stack pointer depending on calling convention it used. We discus this in next subsection.

Calling conventions

There are several calling conventions. Calling convention tells compiler how the parameters are passed, how stack is maintained and how to decorate the function names in object files. Following table shows basic things at a glance:

Calling Convention Argument Passing Stack Maintenance Name Decoration (C only) Notes
__cdecl Right to left. Calling function pops arguments from the stack. Underscore prefixed to function names. Ex: _Foo.
__stdcall Right to left. Called function pops its own arguments from the stack. Underscore prefixed to function name, @ appended followed by the number of decimal bytes in the argument list. Ex: _Foo@10.
__fastcall First two DWORD arguments are passed in ECX and EDX, the rest are passed right to left. Called function pops its own arguments from the stack. A @ is prefixed to the name, @ appended followed by the number of decimal bytes in the argument list. Ex: @Foo@10. Only applies to Intel CPUs. This is the default calling convention for Borland compilers.
thiscall this pointer put in ECX, arguments passed right to left. Calling function pops arguments from the stack. None. Used automatically by C++ code.
naked Right to left. Calling function pops arguments from the stack. None. Only used by VxDs.
Source: Debugging Applications by John Robbins

StdCall

int _stdcall StdCallFunction(int x, int y)
{
    return x;
}
The generated code is like this:
_x$ = 8       ; size = 4
_y$ = 12      ; size = 4
?StdCallFunction@@YGHHH@Z PROC    ; StdCallFunction, COMDAT
; 7    : {
push ebp
mov ebp, esp
; 8    :  return x;
mov eax, DWORD PTR _x$[ebp]
; 9    : }
pop ebp
ret 8
?StdCallFunction@@YGHHH@Z ENDP    ; StdCallFunction

To call the compiler generates code like this:
; 26   :  r=StdCallFunction(p, q);

mov eax, DWORD PTR _q$[ebp]
push eax
mov ecx, DWORD PTR _p$[ebp]
push ecx
call ?StdCallFunction@@YGHHH@Z ; StdCallFunction
mov DWORD PTR _r$[ebp], eax

Cdecl

The function declaration uses _cdecl keyword.
int _cdecl CDeclCallFunction(int x, int y)
{
    return x;
}
Compiler generates following assembly listing:
_x$ = 8       ; size = 4
_y$ = 12      ; size = 4
?CDeclCallFunction@@YAHHH@Z PROC   ; CDeclCallFunction, COMDAT

; 12   : {

push ebp
mov ebp, esp

; 13   :  return x;

mov eax, DWORD PTR _x$[ebp]

; 14   : }

pop ebp
ret 0
?CDeclCallFunction@@YAHHH@Z ENDP   ; CDeclCallFunction
To call the function compiler generates following code:
; 27   :  r=CDeclCallFunction(p, q);

mov edx, DWORD PTR _q$[ebp]
push edx
mov eax, DWORD PTR _p$[ebp]
push eax
call ?CDeclCallFunction@@YAHHH@Z  ; CDeclCallFunction
add esp, 8
mov DWORD PTR _r$[ebp], eax

Fastcall

int _fastcall FastCallFunction(int x, int y)
{
return x;
}
The generated code:
_y$ = -8      ; size = 4
_x$ = -4      ; size = 4
?FastCallFunction@@YIHHH@Z PROC    ; FastCallFunction, COMDAT
; _x$ = ecx
; _y$ = edx
; 17   : {
push ebp
mov ebp, esp
sub esp, 8
mov DWORD PTR _y$[ebp], edx
mov DWORD PTR _x$[ebp], ecx
; 18   :  return x;
mov eax, DWORD PTR _x$[ebp]
; 19   : }
mov esp, ebp
pop ebp
ret 0
?FastCallFunction@@YIHHH@Z ENDP    ; FastCallFunction
And to call the function:
; 28   :  r=FastCallFunction(p, q);

mov edx, DWORD PTR _q$[ebp]
mov ecx, DWORD PTR _p$[ebp]
call ?FastCallFunction@@YIHHH@Z  ; FastCallFunction
mov DWORD PTR _r$[ebp], eax

Thiscall

Used for class member functions. We discuss it later in detail.

Nacked

This calling conven is used for VxD drivers.

Representation of a class

A class is just a structure of varuables with functions. While creating an object compiler reserves space on heap and call the constructure of the class. A class can have a table of functions (the vtable) as the first member. It is used to call virtual functions. Class member functions are treated similar as normal C functions with the exception that it receives this pointer as one parameter in the ECX register.

Class Member Functions

Here is a simple class for demonstration of member function.
class Number
{
    int m_nMember;
    public:
    void SetNumber(int num, int base)
    {
        m_nMember = num;
    }
};
The SetNumber in class Number generates following listing:
_this$ = -4      ; size = 4
_num$ = 8      ; size = 4
_base$ = 12      ; size = 4
?SetNumber@Number@@QAEXHH@Z PROC   ; Number::SetNumber, COMDAT
; _this$ = ecx

; 30   :  {

push ebp
mov ebp, esp
push ecx
mov DWORD PTR _this$[ebp], ecx

; 31   :   m_nMember = num;

mov eax, DWORD PTR _this$[ebp]
mov ecx, DWORD PTR _num$[ebp]
mov DWORD PTR [eax], ecx

; 32   :  }

mov esp, ebp
pop ebp
ret 8
?SetNumber@Number@@QAEXHH@Z ENDP   ; Number::SetNumber
Call member function SetNumber of the class. The thiscall convension is used- this parameter is passed in ECX register:
; 42   :  Number nObject;
; 43   :  nObject.SetNumber(r, p);

mov ecx, DWORD PTR _p$[ebp]
push ecx
mov edx, DWORD PTR _r$[ebp]
push edx
lea ecx, DWORD PTR _nObject$[ebp]
call ?SetNumber@Number@@QAEXHH@Z  ; Number::SetNumber

Virtual Functions

In case of virtual functions the compiler does not call a function of a classe directly. It rather maintains table (called vtable) of function pointer for each class and while creating object of a class assigns the corresponding classes vtable as the first member of the class. The function call is indirect through this tables entry.
Let us create two classes with virtual functions here.

//A class with 2 virtual functions
class VirtualClass
{
public:
     VirtualClass()
    {
    }
    virtual int TheVirtualFunction()
    {
         return 1;
    }
    virtual int TheVirtualFunction2()
    {
        return 2;
    }
};


//Subclass
class SubVirtualClass: public VirtualClass
{
public:
    SubVirtualClass()
    { 
    }

    virtual int TheVirtualFunction()
    {
        return 3;
    }
};
Here is vtable of class VirtualClass.
CONST SEGMENT
??_7VirtualClass@@6B@ DD FLAT:??_R4VirtualClass@@6B@ ; VirtualClass::`vftable'
 DD FLAT:?TheVirtualFunction@VirtualClass@@UAEHXZ
 DD FLAT:?TheVirtualFunction2@VirtualClass@@UAEHXZ
 DD FLAT:__purecall
CONST ENDS
Please note that we have a table of three entry with one entry set to NULL (__purecall). This will be assigned in subclass. Without this pure virtual function in source class we could create an object and call the two virtual functions that would be base classes.
And the SubNumber classe's vtable is like this:
CONST SEGMENT
??_7SubVirtualClass@@6B@ DD FLAT:??_R4SubVirtualClass@@6B@ ; SubVirtualClass::`vftable'
 DD FLAT:?TheVirtualFunction@SubVirtualClass@@UAEHXZ
 DD FLAT:?TheVirtualFunction2@VirtualClass@@UAEHXZ
 DD FLAT:?PureVirtualFunction@SubVirtualClass@@UAEHXZ
CONST ENDS
We get all three virtual functions assigned here. As we did not override the TheVirtualFunction2 function we have the base classes pointer in the subclasses vtable- expected.
OK, but we must set the table as first member of a class object, right? Its done in the constructor. Here is the constructor of subclass:
; 64   :  SubVirtualClass()
 push ebp
 mov ebp, esp
 push ecx
 mov DWORD PTR _this$[ebp], ecx
 mov ecx, DWORD PTR _this$[ebp]     ;we get this pointer
 ;lets call base classes constructor here
 call ??0VirtualClass@@QAE@XZ   ; VirtualClass::VirtualClass
 mov eax, DWORD PTR _this$[ebp]
 mov DWORD PTR [eax],OFFSET ??_7SubVirtualClass@@6B@  ;the vtavle set now

Conclusion

Thats all for now. I want to add Inheritance, Polymorphism, Operator Overloading, Event mechanism, Template, COM Programming and exception handling in future.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

About the Author

Maruf Maniruzzaman


Member
Have completed BSc in Computer Science & Engineering from Shah Jalal University of Science & Technology, Sylhet, Bangladesh (SUST).

Working as a Software Engineer at KAZ Software Ltd., Dhaka, Bangladesh.

Story books (specially Masud Rana series), tourism, songs and programming is most favorite.

Homepage: http://www.kuashaonline.com

The CodeProject group on facebook
Occupation: Software Developer
Company: KAZ Software Ltd. Bangladesh.
Location: Bangladesh Bangladesh

Other popular C / C++ Language articles:

Article Top
You must Sign In to use this message board.
FAQ FAQ 
 
Noise Tolerance  Layout  Per page   
  (Refresh) 
-- There are no messages in this forum --

General General    News News    Question Question    Answer Answer    Joke Joke    Rant Rant    Admin Admin   

PermaLink | Privacy | Terms of Use
Last Updated: 6 Jun 2008
Editor:
Copyright 2008 by Maruf Maniruzzaman
Everything else Copyright © CodeProject, 1999-2009
Web16 | Advertise on the Code Project