![]() |
Languages »
C / C++ Language »
General
Intermediate
C++ Reverse DisassemblyBy OpcodevoidThis article's aim is to provide material for modern day decompiling of an application written in C++ |
VC6, Windows, Visual Studio, Dev
|
|
Advanced Search |
|
|
|
||||||||||||||||
This article's aim is to provide material for modern day decompiling of applications written in C++. We assume you have a solid understand of C++, X86 Assembly, and windows.
Each compiler is different, such as their CrtlStartUp
routines, their statement assemblies (switch , if, while), and numerous other things make
each compiler generate different code, even if you compile the same C++ code on
two compilers, the end result will be different, because of this I will stick
with one and only one compiler, which is the Visual C++ Compiler.
Visual C++ is produce by Microsoft and currently delivers the fastest and most optimized code available. Not to say all the information provided in this book only applies to Visual C++, I just saying some of the information presented in this book may only work on Visual C++.
If you don�t have Visual C++ that is fine, there are many other compilers available, and most of this information is also accurate for them
I been ask many times is C++ decompiling even possible not only due to the complexity of a compiler but for the mass about of information loss in compiling, such as comments , include files, macros just to name a few. So one often wonders is this even worth pursing. Well I wanted to start out with the topic of what is totally loss when you compile a program and what stays there, refer to table 1.1.1 to see what we loses and remains.
|
What is lost |
What remains |
|
templates |
Function calls |
|
classes |
Dynamic linking calls |
|
Marcos |
Switch statements |
|
Include files |
Local Variables |
|
comments |
Parameters |
Not to say everything that is in the �What remains� sections is 100% there, it just means it is very simple and practical to reverse engineer. Because of this fact I choose to deal with the �What remains� section first because it�s much easier.
As we progress though this book keep in mind reverse engineering is almost never practical and takes lots of practice. It�s harder to reverse engineer something created than to create it in the first place.
A good way to start out with reverse engineering is to decompile your own programs and see how each C++ function specifically works, then apply that knowledge in other areas because looking at thousands of lines of assembly code is not really fun.
Now when your reading this book you might start to think that , �anything translated info a different language can be retranslated back into the same language� right, well this is not the case in reverse engineering a lot of things will be lost, and a lot of things you must make up(assume) along the way.
So I wanted to make sure a provided some practical examples for reverse engineering at the beginning of the book, to give you a sense of hope.
To begin reverse engineering, I decided to start with the main C++ statement
Int main(int argc, char * argv[])
Now we can easily find this statement in any executable file due to the PE format which tells us the start of the executable, because of this we can simply read the PE format in a specific executable and get its start address. Or can we?
This is where the Common Runtime Library comes in at (CRTL), you see when you compile a C++ program most compilers (because this is compiler specific stuff) will execute in the following order
CrtlStartUp();
Int main(int argc, char * argv[])
CrtlCleanUp(); this means we can�t look into the PE file and get the start of our code, we can only get the start of the CrtlStartUp()�s code. We have to choices, reverse engineer the CrtStartup Code or skip over it, I like the latter, and we will deal with the Common Runtime Library later.
One of the main reason C++ is so well design is because it has a strict protocols use in its assemblies. C++ has some very static assemblies such as when you return values, it is always put in the EAX register, and function calling usually always use the stack because of this reverse engineers can attack this static assemblies and get a head start
The first thing we should deal with is Global Variables because if you�re coming from a lot of high level languages you might have some miss conceptions.
You know how many books say Memory is stored random on the computer, well this is true for the most part, but your application memory allocation for global variables is quite static. That�s right each time you run your program, your static allocated variables will always end up in the same place.
Another interesting fact is variables don�t hold data, they pointer to where the data is stored.
Here is a C++ Example:
#include "stdafx.h" #include "windows.h" char * globalvar = "Whats Up"; int APIENTRY WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow) { // TODO: Place code here. globalvar = (char *)0x400000; return 0; }
Here is a in depth look at the disassemblies
00405030: global_var dd 405034h
00405034: global_var_value db 'Whats Up',0
mov global_var,400000h
OK, this proves that variables do not hold data, as you can see, the compiler
automatically initialize our global_var pointer to the address of global_var_value.
OK, so far we know that variables are just pointers to values, so we can
change were the variable is pointer right? Yes we can, with mov
global_var, 400000h so whenever the compiler accesses global_var, it will look into the value stored at 405030h and come up with 400000h
If you�re confused remember global_var is stored at
405030h, and refer to the picture 2.

This picture is pretty self explanatory and if you�re still confused how everything works then I suggest you get a good assembly book and learn what indirect addressing is.
We have just dealt with a pointer variable lets deal with just a variable, because this is much more simple.
#include "stdafx.h" #include "windows.h" char globalvar[] = "Whats Up"; int APIENTRY WinMain(HINSTANCE hInstance, INSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow) { globalvar[0] = 'A'; globalvar[5] = �U�; return 0; }
Which when compiled becomes
00405030 global_var db �Whats up�,0
mov global_var, �A�
mov global_var + 5 , �U�
When instantly see that regular variables or a lot simpler than global
variables, all we have to do is refer to a address in memory which holds or data
, of course in machine code we can�t see pretty names like global_var, so here is a pure disassembly
00405030 �Whats Up�,0
mov 00405030,�A�
mov 00405035,�U�
As you can see, we aren�t doing anything special just modifying the values
store at 00405030 and 00405035.
You should have variables and pointer variables down pack, since this information will not be explain again, if there is something you don�t understand, read it over.
OK, as we all know C++ has near English like syntax and which we can program in. Well X86 assembly code doesn�t, for example take a look at the following statement
Int s = 3 + 4 + 1 + 5 + 9;
How can we calculate this in assembly? simple, look at the following C++ example
#include "stdafx.h" #include "windows.h" int s1 = 3; int s2 = 4; int s3 = 1; int s4 = 13; int APIENTRY WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpCmdLine, int nCmdShow) { // TODO: Place code here. s1 = s2 + s3 �s4 + 34; return 0; }
Which when compiled becomes
00405030 s1 dd 3
00405034 s2 dd 4
00405038 s3 dd 1
0040503C s4 dd 1
mov eax, s2
00401008 add eax, s3
0040100E sub eax, s4
00401014 add eax, 34
00401017 mov s1, eax
OK the compiler optimizes the code a little bit, but it�s still very easy to understand.
eax, with
the value of s2 with mov eax, s2
eax holds 4, the next thing
we do is add eax to s3,
eax holds 5, after that we
subtract eax from s4,
eax holds 4, after that we
add eax to 34,
eax holds 38
eax which holds
38, now s1 holds 38. You will often see the compiler use registers instead of variables in expression because registers are faster.
From this we can conclude that for each mathematical operator the compiler maps it with a specific X 86 Instructions, here is a table
|
C++ Operator |
X86 Instruction |
|
* (Multiply) |
Mul , (use fmul for floating point) |
|
/ (Division) |
Div (use fdiv for floating point) |
|
- (Subtraction) |
Sub |
|
+(Addition) |
Add |
As you can see, we can easily decipher most statements in C++ using the table above.
For a test we will look at a sample disassembly dump and decompile it by hand to C++.
0000000 2
0000001 3
0000002 4
0000003 0
0000004 1
0000005 mov al, [00000000]
add al, [00000001]
mov ch, [00000002]
mul ch
mov [000000003],ax
OK the first thing we do is try to figure out what type of variables they are using
And from what we can see they our using al and ch, which are 8 bit registers, so that means whenever they reference anything with 8 bit registers, it means the variable is a Char type.
On down you see that they do a �mov [000000003], ax�,
and since ax is a 16 bit register the variable type is short int.
Here is a small table, so you can map registers to variable types
|
X86 Registers |
C++ Type Variables |
|
8 bit registers ( |
Char |
|
16 bit registers (AX) |
Short int |
|
32 bit registers (EAX) |
Int |
So far we see 4 references to memory addresses, because of this we know we have 4 variables, the first one [000000000] is obviously an char type variable since we see,
mov al, [00000000]�
and since al is an 8 bit register.
So lets give [0000000] the name of s1, we also see that [0000000] though[00000002] is all
reference by 8 bit variables meaning they are also char type, and the last one
[00000003] which�s use like �mov
[000000003] , ax� is a short int type since ax is 16 bits
OK let�s create another table one which will hold variable names or alias for the addresses
Although we can never get the original variable name we can also create our own.
|
Addresses |
Variable names/alias�s |
Variable size |
|
0000:0000 |
S1 |
Char |
|
0000:0001 |
S2 |
Char |
|
0000:0002 |
S3 |
Char |
|
0000:0003 |
S4 |
Short int |
You might be confused why 00000004 holds 1 and 00000003 doesn�t, well this is because Intel is a little edian
machine, that stores values in reverse word order.
Now the next thing we should do is rewrite the code above with our alias�s we created
s1 db 2
s2 db 3
s3 db 4
s4 dw 1
mov al, s1
add al, s2
mov ch, s3
mul ch
mov
s4,ax
Now the first thing we do is mov al, s1
OK al now holds a value of 2, the next thing we do is
�add al, s2�
Now al has a value of 5, since s2
had a value of 3 in it the next thing we want to do is mov
ch, s3?
Now chhas a value of 4, after
that we mul ch, now ax has the value of al * ch,
And since al had a value of 5 in it and ch had a value
of 4 in at, ax has the value of
20.
OK we can start to decipher the C++ statement which is
s1 + s2 * s3
After that we see that we see, �mov s4, ax� so the
complete C++ statement is
S4 = s1 + s2 * s3;
As you can see we just went though a whole bunch of mess to come up with a simple C++ statement, and this only works for global variables. Not local variables or structure members. So things will only get harder, due to this I suggest you read carefully and if you don�t understand something read it over and over until you do.
One of the major fundamentals of C++ is returning values from function call. This is actually a very simple procedure, because it simple involves placing a value into the eax register.
So when you have a statement like this
c = (char *) malloc (0xFF);
The first thing the compiler does is call mallocand then it assigns c to eax like �mov c,
eax�
For example if you have a statement that returns 5; what you our really saying is
__asm { Mov eax, 5 Ret }
Let�s have a little practice with a full disassembly dump
Mov eax,5
Add eax,2
Sub eax,1
Ret
And the C++ equivalent is
return 5 + 2 � 1;
This although simple is one of the most important concepts a C++ reverse engineer can learn.
Now its time to get to the blood and guts of C++ with function calls.
Function calls are fairly simple for the most part because they our just labels for assembly programmers example.
Int func () {return 1 ;}
Func();
Would compile into
Func:
Mov eax, 1
Ret
Call Func
From this we can conclude two things, the first is:
Function�s name or like variables, they are just references to some address which is the same as a label
Here is a full disassembly dump for practice
0000:0000 0
0000:0001 0
0000:0002 0
0000:0003 0
0000:0005 mov eax,1
0000:0009 ret
0000:0010 call 0000:0005 �code starts here
0000:0015 mov [0000:0000],eax
OK the first thing we see is that at address 0000:0015
, we our assign a 32 bit memory address to the value of a 32 bit register
which�s mean that we have a 32 bit variable at hand or a int type variable to be
more exact.
So let�s create an alias for the address�s 0000:0000 �
0000:0003, which will be s1.
Now let�s create a new disassembly with this added information
S1 dw0
0000:0005 mov eax,1
0000:0009 ret
0000:0010 call 0000:0005 �code starts here
0000:0015 mov S1,eax
OK the second thing we see is that code start at 0000:0010
and the first instruction is call 0000:0005.
Now we�re at 0000:0015 we can see that the code is moving a value into eax
then returning. Now we our at address 0000:0015 and we
just moved s1 into eax
So we can now reverse engineer this whole program back into C++
Int s1 = 0; //dw 0 Int some_function() { return 1; //mov eax ,1 : ret } s1 = some_function(); //mov s1 , eax
Now what do we do when functions have parameters, well things get pretty complicated because the compiler uses the stack to handle parameters.
It pushes in parameters right to left, meaning the last parameter goes in first, and the first parameter goes in lest.
For example, C++ Function:
Func (1, 2);
Would compile into
Push 2 Push 1 Call func
Now let�s have an imaginary stack frame, which has a size of 32
Now the first thing we realize is that ESP = 32, with
that in mind look at the table below
|
X86 Instruction |
Memory address stored at |
Stack Frame Pointer value |
|
Push 2 |
[32] |
ESP = 28 |
|
Push 1 |
[28] |
ESP = 24 |
|
Call func |
[24] |
ESP = 20 |
|
Push ebp |
[20] |
ESP = 16 |
Remember when you issue a call instruction on the X86 machines, the Processor stores the current address on the stack so it can know the location it should return to.
Now that the parameters are on the stack lets look at the function itself
Int func (int a, int b) { return a + b; }
Mov eax, [ESP +
8]�, since ESP equals 20, and the first parameter
is stored at [28].
add eax, [esp +
12]� and since ESP equals 20 and the second
parameter is stored at 32.
ret So the full compilation would be
Func:
Mov eax, [ESP + 8]
Add eax, [ESP + 12]
Ret
A neat little reverse engineering tip is to remember that sense the stack has a fix width of 4 bytes, you can easily tell what parameter they our accessing.
[EBP] = Stack [EBP +4] = Return address [EBP + 8] = First [EBP + 12] = Second [EBP + 16] = Third [EBP + 20] = Fourth
And so on�.
We just learn that parameters are stored on the stack, now it time to learn about local variables which are also stored on the stack, but local variables are stored quite different.
Here is an example
Int func ()
{
int a = 5;
return a;
}
OK to compile this code, the compiler must first reserve space on the stack by going
Sub ESP, 4. Since 4 bytes is the size of an int variable. Of course the compiler must first back up the esp
register , and it does this by �mov ebp,esp� , but wait,
the compiler must first back up ebp, and it does this by
�push ebp� so the very first thing the compiler does
is
: Setting up the stack frame
Push ebp; back up ebp
Mov ebp, ESP; back up ESP in ebp
Sub ESP, 4; reverse some space on the stack
Note: C++ always compiles code like �Setting up the stack frame� in any function, even if you use or don�t use local variables, and the compiler always uses ebp to reference parameters and local variables.
In the �Function Calls and the Stack� section I use esp to reference parameters and skip Setting up the stack frame code this out for clarity sake.
Now the second thing the compiler does is
Mov [ebp � 4], 5
Mov eax, [ebp -4]
If we had a second local variable we could simple go
Mov [ebp � 8], 5, or course the compiler would use
sub ESP, 8 Instead of sub ESP,
4.
The last thing the compiler does is restore the stack frame and return
; Cleaning up the stack frame
Mov ESP, ebp; restore stack pointer
Pop ebp; restore ebp
Ret
Note: The compiler always execute the �Cleaning up the stack frame� code, in every function, due to this we can detect a function by looking for similar code. I also skip this in �functions call and the stack� section for clarity sake.
Here is a full disassembly dump, for practice
0000:0000 0
0000:0004 push ebp
0000:0003 mov ebp,esp
0000:0005 sub esp, 8
0000:0010 mov [ebp -4], 5
0000:0015 add [ebp � 4] , [ebp + 8]
0000:0016 mov eax,[ebp � 4]
0000:0018 mov esp, ebp
0000:0020 pop ebp
0000:0021 ret
0000:0022 push ebp
0000:0023 mov ebp,esp
0000:0025 add [ebp + 8] , [0000:0000]
0000:0030 add [ebp + 8] , [ebp + 12]
0000:0031 mov eax,[ebp +8]
0000:0032 mov esp, ebp
0000:0035 pop ebp
0000:0036 ret
0000:0037 push ebp ;code start
0000:0038 mov esp, ebp
0000:0040 push 1
0000:0044 call 0000:0002
0000:0049 mov [0000:0000],eax
0000:0050 push 4
0000:0051 push 3
0000:0052 call 0000:0022
0000:0056 add [0000:0000],eax
0000:0058 mov esp, ebp
0000:0059 pop ebp
OK the first thing we is that memory address [0000:000]
is being reference by eax a lot, meaning we have a 32 bit variable which is an
int type. The next thing we notice is we set up the stack frame 3 times and
clean it up 3 times, which means we have 3 functions(and yes int main(�) also sets up the stack frame and cleans it up).
So we have
Func1 () Func2 () Main ()
Next we see Func1 address is at 0000:0004 and accept one 32 bit parameter
Because we see at address 0000:0040 we push 1 into the stack and then at
address 0000:0044 we are calling 0000:0004 so we can setup
func1 declaration
00000:00002 Func1 (int a)
Now whenever func1, does anything to [ebp + 8] we know that it is doing
something to its first parameter. So look into func1 code, and we see that it
has 1 local variable because it references [ebp � 4].
Now lets take a lot at address 0000:0049, which is mov
[0000:000], eax so we know that the original C++ code is something
like
[0000:0000] = func1 (1);
Next when see at address 0000:0051 that we are pushing
4 onto the stack then after that we are pushing 3 onto the stack then we all
0000:0022.
Now we can setup Func2 declarations
0000:0022 Func2(int a, int b)
At address 0000:0056 we see add
[0000:0000],eax , means the original C++ code is something like
[0000:0000] += Func2(3,4)
Remember we pushed 4 onto the stack first, and 3 onto the stack second, because parameters or passed right to left.
Now that we have a lot of information lets make a new disassembly one with
alias for all local variables and parameters in Func1 and Func2. Since we know
that whenever they use code like [ebp +�] it�s a
parameter, and when they use code like [ebp -...] it�s a
local variable.
0000:0000 s1 dw 0
0000:0004 func1(int param_1): push ebp
{ local : local_var_1}
0000:0003 mov ebp,esp
0000:0005 sub esp, 8
0000:0010 mov local_var_1, 5
0000:0015 add local_var_1 , param_1
0000:0016 mov eax,local_var_1
0000:0018 mov esp, ebp
0000:0020 pop ebp
0000:0021 ret ;func1 exits here
0000:0022 func2 (int param_1 , int param_2) :
push ebp
0000:0023 mov ebp,esp
0000:0025 add param_1, s1
0000:0030 add param_1, param_2
0000:0031 mov eax,param_1
0000:0032 mov esp, ebp
0000:0035 pop ebp
0000:0036 ret ;func2 exits here
0000:0037 push ebp ;code start
0000:0038 mov esp, ebp
0000:0040 push 1
0000:0044 call func1
0000:0049 mov s1,eax ; s1 = func1(1);
0000:0050 push 4
0000:0051 push 3
0000:0052 call func2
0000:0056 add [0000:0000],eax ; s1 += func2(3,4);
0000:0058 mov esp, ebp
0000:0059 pop ebp
OK I know, I made up a little assembly syntax such as func1(int param_1) and
{Local : local_var_1 }
This is for clarity sake that�s all.
Now let�s start with func1 at address 0000:0010 we see
that it is moving local_var_1 to 5, which in C++ it's
saying
int local_var_1 = 5;
next we see add local_var_1, param_1 which in C++ its saying
local_var_1 +=param_1
The last thing we see before we clean up the stack is mov
eax,local_var_1 which in C++ its saying
return local_var_1;
So the full reversed engineered function is
Int func1(int param1) { int local_var_1 = 5; local_var_1 += param1; return local_var_1;
Now lets go to func2 at address 0000:0025 we see add
param_1, s1, which in C++ its saying
param_1 +=s1;
after that we see add param_1, param_2, which in C++ its saying
param_1 += param_2;
the last thing we see before we clean up the stack is mov eax,
param_1, which in C++ its saying
return param_1;
So the full reversed engineered function is
Int func2(int param_1 int param_2) { param_1 += s1; param_1 += param_2; return param_1; }
Now we our able to reverse engineer the whole program
Int s1 = 0; Int func1(int param1) { int local_var_1 = 5; local_var_1 += param1; return local_var_1; } Int func2(int param_1 int param_2) { param_1 += s1; param_1 += param_2; return param_1; } int main() { s1 = func1(1); s1 += func2(3,4); }
This Chapter might be a little hard to comprehend at first since I presented a lot of �straight to the point� information, again if you don�t understand anything read it over, and if you still don�t understand email vbmew@hotmail.com with your question
What we been doing so far is the easy stuff, its time to deal with C++ keywords complex expression, and some practical real world examples.
One of the main statements people use is this if statement which logically compares values. Using this function we can choose which path of execution our program should take.
If statement can also be very , very complex and very simple
Take a look at the following examples.
If(I ==0) //do function //continue
Now what if we had something like this
If(I==0)
{
int i2 = 0;
}
i2 = 3; //error can�t access i2 because it�s not in your scope
// it�s in the if statements scope
Because of this we know that compiler generates a stack frame for each If statement with brackets right? Wrong!.
I2 is accessible to main in reality but the compiler keeps it hidden, the reason I �m telling you this is because to reverse engineer if statements you must completely understand them.
The second example is
If( (I ==0) || ( ( I2 == 1) && (i3 ==2) ) )
The logic for this is if I = 0 or if i2
= 1 andi3 = 2
Another Example would be
If( (c = (char *) malloc(0xFF) ) == NULL)
This is saying c = malloc(0xFF) and if mallocreturn NULL
this condition is true.
Yet another example is
If(malloc(0xFF)) //this is saying call malloc(0xFF) and if it returns //anything not equal to 0 then This condition is true
The last but not least example is
If(!malloc(0xFF)) //this is saying call malloc(0xFF) and if it returns // value is equal to zero then this condition is true
Thankfully all these if statement can be reverse engineer in turn back into just the way they are(almost).
Now the if statement maps directly to the X86 instruction cmp with this in mind take a lot at the following C++ program
int main() { int I = 0; if(I == 34) i+= 23; return 1; }
This compiles into the following
push ebp
mov ebp,esp ;setup the stack frame
sub esp, 4
mov [ebp � 4],0
cmp [ebp � 4], 34 ;
jnz continue_program
add [ebp � 4],23
continue_program:
mov eax,1
mov esp, ebp ;restore the stack frame
pop ebp
ret
Yes I know I decided to give you a complete binary disassembly to see if you remember about the stack frame and the [ebp -4] which means the first local variable created and yes int main has to setup the stack frame like every other function.
Now let�s learn how to turn this program back into C++
The first thing we do is look at the compare mov [ebp � 4],0 which is telling us that the program is initilize a variable to 0.
Next we see a cmp instruction that is comparing [ebp -4],34 , because of this we know the program is using a if statement, you know �if [ebp -4] = 34� what we should do now is create some alias for [ebp -4] we will use local_var_1. next we see the instruction jnz, which is the same as jne which is saying if[ebp -4] or local_var_1 is not 34 then skip over this if statement and jump to continue_program.
Add [ebp -4], 34 or add local_var_1, 34
is saying local_var_1 += 34; After that we �mov eax,1�, clean up the stack frame and then return.
Now lets look for a multiple logical if statements
If( (i==0) || (i2 == 23) && (i3 ==21) )
If_block_check1:
Cmp I,0
Jne if_block_check2:
Jmp do_if
If_block_check2:
Cmp i2,23
Jne skip_if
Cmp i3,21
Jne skip_if
Do_if:
; actions here
skip_if:
OK the first thing we see is that on multi logical if statements when one condition fails it jumps to the next logical expression to see if that will evaluate to true, as shown in figure 3.2.1

So if we have a multi logical if statement, and part of the expression succeeds we continue to evaluate the expression until something is false.
Of course this is only true for a && operator.
For a || operator if one part of the expression is true we
quit that entire expression and the if statement evaluates as true.
The for
The interesting factor for the for loop comes in its ability to evaluate 3 expressions
For( <expression 1>; <expression 2>; <expression 3>)
The Expression our usually
For( <assignment>; <conditional>; <increment| decrement>)
Reverse engineering the for statement is not hard, because it�s really a if statement in most cases
If(I < 4) { i++; //do actions }
Now for the for loop equivalent
for(int I =0;i<4;i++) { //do actions }
OK lets look at a simple reverse disassembly for the for loop
Mov [ebp � 4],0 ;initilize the local variable
Jmp condition
Increment:
Add [ebp -4],1
Condition:
Cmp [ebp -4],4
Jge done
Loop:
;do actions
Jmp increment
Done:
As you can see the for loop is nothing more than a high level if statement, the first thing we do is initilize the local variable on the stack , after that check the condition statement. Then we go to the loop, then at last we jump back to increment then we jump yet again to the condition label and again until the condition is true.
Structures are very useful in C++ because of there ability to contain members. A structure lets you define a variable of any size , example
Struct test1
{
int member1;
int member2;
};
This creates a 64 bit , 8 byte variable in memory. So in a sense structures or regular variables but allow us to access certain parts of that variable independently from others
This makes it very useful
Because if you were to use char test1[8]; you would be create the exact same in memory as Struct test1, only it would be much harder to access 4 byte members individually in char test[8];
Here is a example of using test1 as a local variable
Sub ESP, 8 ;reverse 8 bytes on the local stack
Mov [ESP -4], 45 ;move member2 to 45
Mov [ebp -8], 12 ;move member1 to 1
As you can see structures are stored reverse in memory, because you would think
That member one would be the last on the stack, but it turns out it is the first on the stack
For a global variable the compiler would simply reverse 8 bytes in the executable in reference those each individually base on the member you have chosen.
I am providing some algorithms to prove and help you understand some of the theory I presented in this book.
This following example proves that variables inside a if block our truly accessible to the whole function.
#include "stdafx.h" #include "iostream.h" int main(int argc, char* argv[]) { __asm mov dword ptr [ebp -4], 23 if(true) { int i; cout << i << endl; } return 1; }
The output should be 23 even though we never initialize I , if your confused remanber that since I is the first variable and the only variable its location is [ebp -4].
This next example proves that structures are just regular variables with the given ability to be access in parts instead of wholes.
#include "stdafx.h" #include "iostream.h" struct test1 { int member1; int member2; int member3; }; int main(int argc, char* argv[]) { test1 local_struct; local_struct.member1 = 1; local_struct.member2 = 1; local_struct.member3 = 1; __asm { add dword ptr [ ebp - 12],55 ; structure 1 add dword ptr [ ebp - 8] , 100 ; structure 2 add dword ptr [ ebp - 4] , 23 ; structure 3 } cout << "member 1: " << local_struct.member1 << endl; cout << "member 2: " << local_struct.member2 << endl; cout << "member 3: " << local_struct.member3 << endl; return 1; }
Output should be
member 1: 56 member 2: 101 member 3: 24
This Chapter aims to provide knowledge of practical decompiling, in this chapter we will learn to use a disassembler, and learn to decompile real world applications.
4.1 Intro to Windows decompiling
Windows decompiling is not that difficult since all windows programmers
follow a strict programming method such as CreateWindowEx,
or CreateDialog, and All windows have message loops which
you can easily find. Before we really start getting into decompiling lets go
over the basic. In the vast world of windows there are many types of
application, and many more types of technology.
Therefore all of it is too much to cover in one tutorial. On top of that,
this information only applies to application that uses the basic window
functions, such as CreateWindowEx, and CreateDialog. Applications made in visual basic, or
1. Create the window class
From this we can get the Window Procedure Method, in which all message are handle.
lpfnWndProc of the WNDCLASSEX
structure contains the address to the Window procedure method.
2. Create the Window itself.
We can retrieve every single const by name, and most of the time the exact C/C++ equivalent.
3. The message
All we have to do is look for a reference to
GetMessage(�).
We start with the basic skeletons first, then move on to more complex stuff, its import to learn the basic first because
They give you an ideal of how the application is design. We will be using the PVdasm, which you can get from my site -
This is a very nice free disassembler which we will be using.
4.2 Decompiling a sample application
First load up PvDasm, and your screen should look similar to Figure 4.2.1

(Figure 4.2.1)
Grab CreateWindow2 (the program we are going to decompile by hand) and Open it in the disassembler, your screen should look similar to figure 4.2.2

(Figure 4.2.2)
We see are entry point, but this is CRTL code (Common Runtime library), how
can we find WinMain Function? By references. We know that
in WinMain functions we have a CreateWindowEx, or a RegisterClassEx, if we can find where the program is calling
these functions, we can than begin to map out the program. You see when you
compile a program a linker links it with libraries or DLL (Dynamic linking
libraries). The functions you get from these
DLL�s are called imports. The PVdasm can list all the imports a program has, and show you the address from where they are called. To use this feature press Crtl+N or press the import button. Your screen should look similar to figure 4.2.3

CreateWindowEx. Now we must find the start of the function, this is pretty easy, if we follow the following rules.
1. Consist of apush ebp
mov esp,ebp
sub esp, <X>
2. Right after a
mov esp,ebp
pop ebp
ret <X>
Well if we scroll up to address 0040104C and you should see
0040104C push ebp
0040104D mov ebp, esp
0040104F sub esp, 50h
After that we see
mov dword ptr ss:[ebp- 30],0000030
mov dword ptr ss:[ebp-2c],0000000003
Ok, so we know we have local variables, and it mostly looks like a structure,
to find the WNDCLASSEX structure we need a reference
point. A good reference to look for is LoadCursor. About
every single application uses the call, so simply press the import button or
Crtl+N, and select LoadCursor.
Once you have selected LoadCursor you should then see
something similar to
00401092 call ds:LoadCursorA
00401098 mov [ebp-14], eax
Ok, now we all know the return value for functions are stored in the eax
register, and we know that the hCursor member of WNDCLASSEX is being used (because we are loading a cursor). Now
what position is hCursor in memory, well its ebp-14h(yes
that�s 14 HEX no decimal), with this information we can figure out where all the
other member are to. If we take a quick look at the WNDCLASSEX structure
typedef struct WNDCLASSEX { UINT cbSize; //30h UINT style; // 2ch WNDPROC lpfnWndProc; //28h int cbClsExtra; //24h int cbWndExtra; //20h HINSTANCE hInstance; //1ch/ HICON hIcon; //18h HCURSOR hCursor; // ebp -14h �--Start calculation here -> HBRUSH hbrBackground; //ebp -10h LPCSTR lpszMenuName; //ebp � 0ch LPCSTR lpszClassName; //ebp - 8 HICON hIconSm; //ebp -4 };
As you can see its easy to calculate structure member addresses, simply add the size of the variable for each member above you and subtract the size of the variable for each member below you. Now that we know the memory location of every structure we can begin to really understand how the program is created. The first thing we do is get the value of all the members in the structure, starting with the cbSize member.
1. cbSize
The first thing we see is mov dword ptr ss:[ebp-
30],0000030 and we all know that ebp � 30h is the location of cbSize. So what we are really saying is mov dword
ptr ss:[cbSize],30h. Of course we can go a step further since we know
that 30h is the size of WNDCLASSEX, and cbSize is suppose to hold the size of WNDCLASSEX, so we can fully decompile this line to
wc.cbSize = sizeof(WNDCLASSEX);
2. style
mov dword ptr ss:[ebp-2c],0000000003
Ok, what style is the program using, well, to figure this out we need to look
into windows.h and get all style values. Now we could do a bit by bit compare by
hand, but we don�t have time for that, so I made a small program call
WinDasmRef. All we need to do is choose the type of section we want to look up,
in our case its style from WNDCLASSEX, then enter a value,
and bam it returns exactly what the user entered.
Refer to screen shot 4.2.5 for more information

You can get this program from http://www.crackingislife.com/modules.php?name=Downloads&d_op=getit&lid=1
This program is no where near finish, but it is more than enough for this book.
3. lpfnWndProc
mov dword ptr ss:[ebp-28],00401000
This is the most important and interesting structure, because this holds the address to the message loop from this we can tell that the message loop is located at address 00401000(in hex of course)
4. cbClsExtra
mov dword ptr ss:[ebp -24],0
We are simply setting wc.cbClsExtra to 0000000
5. cbWndExtra
mov dword ptr ss:[ebp-20],0000000
we are simply setting wc.cbWndExtra to 0
6. hInstance
mov eax,dword ptr ss:[ebp+8] //local variable
hInstance
mov dword ptr ss:[ebp-1C],eax //Hinstance
Remember the declaration for the main function is
WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR
lpCmdLine , int nCmdShow)
and the first parameter (hInstance) is stored at ebp +
8, and the second parameter (hPrevInstance) is stored at ebp + 12
Now that eax holds the value of holds hinstance, we simply transfer that value to [ebp-1C] or hinstance. So in other words we are saying wc.hInstance = hInstance
7. hIcon
mov dword ptr ss:[ebp-18],00000000
we are simply setting wc.hIcon to 0
8. hCursor
push 00007F00
mov ecx,DWORD ptr SS:[ebp+08]
push ecx
call USER32!LoadCursorA
mov dword ptr ss:[ebp-14],eax
Ok, the first thing we do is look at the declaration of LoadCursorA and find that it is
LoadCursor (HINSTANCE hInstance, LPSTR cursorname);
and the last parameter is push first, so cursorname is the first parameter being bush which is the value 7F00.
If the user is not using a custom cursor (most don�t) we can retrieve its value in WinDasmRef and yes, you can enter hex values in WinDasmRef, just make sure you put a 0x7F00 not 7F00
refer to figure 4.2.6

(Figure 4.2.6)
Note: If your wondering why LoadCursor.cursorname
wasn�t in the first picture, it is because I�m writing this program as I�m
typing this book.
mov ecx,DWORD ptr SS:[ebp+08]
push ecx
Next we move ecx, to SS:[ebp+8] which is hInstance, and
then we push ecx to the stack,
the stack currently contains
then we see call USER32!LoadCursorA , we can turn this
back into the complete original line of source which is
LoadCursor(hInstance,IDC_ARROW);
now we all know that LoadCursor returns the handle to
the cursor in the eax register so
mov dword ptr ss:[ebp-14],eax , ebp-14 is the position
of hCursor. Now lets decompile the entire statement
wc.hCursor = LoadCursor(hInstance,IDC_ARROW);
9. hbrBackground
push 01
CALL GDI32!GetStockObject
mov dword ptr ss:[ebp-10],eax
Ok , first we push 01 into the stack and call GetStockObject, now if we look at the declaration of GetStockObject which is GetStockObject(int
brush) , we know that the 01 is specifying a brush so load up WinDasmRef,
and type 1 in , refer to figure 4.2.7 for more information

So we know the call is like GetStockObject(LTGRAY_BRUSH), after that we see mov dword ptr ss:[ebp-10],eax and eax holds the handle to the
brush return by GetStockObject, and ebp-10, is the memory
location of hbrBackground, so the full decompile statement
is
wc.hbrBackground = GetStockObject(LTGRAY_BRUSH);
10. lpszMenuName
mov dword ptr ss:[ebp-0C],0000000
we simply set lpszMenuName to 0
11. lpszClassName
mov edx,dword ptr ds:[0040603C]
mov dword ptr ss:[ebp-08],edx
at the address of 0040603C, is a pointer to are class name, how can i tell ? , easy because it is surrounding the address in brackets, so it is getting a value from 0040603C, we can easily use any hex editor to look at the address 0040603C, as long as we know the image base.
The image base is the location the program is loaded into memory, to see the image base press CRTL+P in PvDasm A window similar to Figure 4.2.8 should come up

(Figure 4.2.8)
We subtract the image base with is 400000 in hex from 0040603C, and we are left with 603C, now if we go to offset 603C in a file we will see 30, we must read 3 more bytes because Intel uses 32 bit address, so the full address is 30604000
Now 30604000 is in little endian order, which the X86 uses, we must convert it to big endian by reverse every hex byte, like this 00406030, now if we subtract the image base from that we get 6030, and we look at address 6030, we will see a �D�, if we keep reading to a null terminator like everyone else does we will see DECOMPILE.
Now that we have the name of are class, we can fully decompile the statement like this
static char * szClass = �DECOMPILE�;
wc.lpszClassName = szClass; since we are going mov dword ptr ss:[ebp-08],edx and edx
holds the address of szClass, and ebp-8 is the memory
location of lpszClassName
12. hIcon
mov dword ptr ss:[ebp-4],0000000
this is simply setting hIcon to 0
Now that we are done with are whole window class, lets have a overview of all the values
WNDCLASSEX wc; //we don�t know the exact name but it has to be something wc.cbSize = sizeof(WNDCLASSEX); wc.style = CS_HREDRAW | CS_VREDRAW; wc.lpfnWndProc = WndProc; wc.cbClsExtra = 0; wc.cbWndExtra =0; wc.hInstance = hInstance; wc.hIcon =0; wc.hCursor = LoadCursor(hInstance,IDC_ARROW); wc.hbrBackground = (HBRUSH) GetStockObject(LTGRAY_BRUSH); wc.lpszMenuName = NULL; wc.lpszClassName = szClass; wc.hIconSm = NULL;
As you can see we practically decompile this back to exact source code.
Now we see the following code
lea eax,dword ptr ss:[ebp-30]
push eax
call USER32!RegisterClassExA
and eax,0000FFFF
test eax,eax
jnz 004010E4
push 0
push 00406054 ; ASCIIZ Crap
push 0040605C ; ASCIIZ Can�t register class
push 0
Call USER32!MessageBoxA
xor eax,eax
jmp 00401172
lets first begin with
lea eax,dword ptr ss:[ebp-30]
push eax
call USER32!RegisterClassExA
now ss:[ebp-30] holds the address of the WNDCLASSEX
structure, because [ebp-30] is the first member of the structure which is
cbSize, now that eax holds the address of the structure we push it into the
stack and call USER32!RegisterClassExA; if we look at the
Declaration of RegisterClassEx,
ATOM WINAPI RegisterClassExA(CONST WNDCLASSEX *);
We see that it returns the type ATOM, which is 16 bits, and because of that we see and eax,0000FFFF, which is masking off the upper 16 bits, so we don�t read a 32 bit value, after that we see
test eax,eax
jnz 004010E4
this is simply saying if eax is not zero then jump to 004010E4, the exact c++ code for this is
if(!RegisterClassEx(&wc))
{
//bad code here
}
//else continue (004010E4
Remember the �!� is saying if RegisterClassEx returns
the value of 0 execute the bad code. Now as we continue on we see that it is
going to display a message box if it fails
push 0
push 00406054 ; ASCIIZ Crap
push 0040605C ; ASCIIZ Can�t register class
push 0
Call USER32!MessageBoxA
and if we look at the declaring of MessageBox
MessageBoxA(HWND hWnd , LPCSTR lpText, LPCSTR lpCaption, UINT
uType);
Lets crack open WinDasmRef
Refer to figure 4.2.9 for more information

So we can decompile the whole line into
MessageBox(NULL,�Can�t register
class�,�crap�,MB_OK);
after that we see
xor eax,eax
jmp 00401172
xor eax,eax clears 0 and if we go see what�s at address
00401172, we will find
mov esp,ebp
pop ebp
ret 10
which is exit code, so we can decompile this line to return 0. The full original code is
if(!RegisterClassEx(&wc))
{
MessageBox(NULL,"Can't register
class","Crap",MB_OK);
return 0;
}
As you can see decompiling is quite simple for this basic windows stuff, so I not going to bore you with the rest. If you have any questions , please check out are forums at http://www.eliteproxy.com/modules.php?name=Forums
Visual basic 6.0 is next
This paper is made possible by a grant from your donation, if you would like to continue to support Opcodevoid, then please donate.
This book is provided as is; no warranty is applied nor granted information. What is presented in this book is copyrighted by Opcodevoid with all rights respected. All information, algorithms can not be copied, reproduce nor distributed in anyway, without written permission from Opcodevoid or Opcodevoid Inc.
General
News
Question
Answer
Joke
Rant
Admin
|
PermaLink |
Privacy |
Terms of Use
Last Updated: 25 Aug 2004 Editor: Nishant Sivakumar |
Copyright 2003 by Opcodevoid Everything else Copyright © CodeProject, 1999-2009 Web11 | Advertise on the Code Project |