The long gone era of 32 bit and old school stack buffer overflows seems to have gone with the introduction of memory randomization, canary variables, ASLR and 64bit addresses (making it harder to escape bad bytes in shellcode). Yet so if we ever want to work in the field of security and Ethical hacking, we need to know some skills of hacks that were very common in the bygone era. Buffer overflows are one of the biggest ones that will help you learn how to think the way a black hat hacker would think. In this case scenario, we will be taking a peek at 64bit buffer overflows.
To catch a criminal, you must think like a criminal. It is important that you do have some command of C or Assembly or are willing to find resources online that will further what I am teaching here.
My background ranges from ERP development to E-commerce development which soon led me into computer security. As soon as I learnt a few things, I was fascinated by how small holes in a programmer's code can be devastating for millions if not billions of users.
Some Understanding of Memory and How It Works
Memory in modern day computers is segmented into various different sections. All have their own primary purpose which helps keep things clean. These are:
- Heap (Going downward from smaller memory addresses to larger ones)
- Stack (Going up from larger memory addresses to smaller ones)
All of these segments have a specific function. For example:
- Text [fixed size, read only] contains compiled and linked code to be executed in assembly.
- Data [fixed size, pre-initialized, writable] contains initialized static and global variables ex.
static int i = 9;
- BSS [contains placeholders] contains uninitialized
global variables ex.
static int i;
- Heap [Variable size, read, write] contains variable values which are also variable in size. This memory is allocated using
malloc(). Simply put: a memory control structure for memory allocated at runtime.
- Stack [Size set on initialization, read, write] contains variables that are variable in size. Are contained in a preallocated stack bound set by the operating system.
Stack variables of which we will be overflowing today have fixed bounds which are set by the operating system once the said program is started.
Variables on the stack can be arguments sent to a function including variables within a function itself. For the use of context, every variable is associated to a 'stack frame' or so a function.
Starting of With a Simple C Program (simplec.c)
You might think that out of all languages, C might be the safest. Yet so C is highly vulnerable to a multitude of exploits. Let's create a simple exploitable program that may be altered beyond its normal function:
We will be using two header files in our program:
Here is the complete program simplec.c:
int main(int argc, char* argv)
As you can see, we created a simple program that just copies arguments sent to the
argv array into the
buffer character array using the
strcpy() function. As so, nothing seems to be wrong. Yet, there is one part that was missed:
- Checking that the value we are copying into
buffer, whose maximum length is that of 64
char(s) or so 64
bytes as each
char is 1
byte is not exceeded.
Without checking the length before copying. Anything that has been passed as an argument in
argv irrespective of length will be copied into
buffer. Now we all know what happens when we try to stuff something that's bigger than a container can handle. The same way a full glass of water overflows if you try to pour more water into it. The
buffer array will overflow. In the case of the glass of water, water will overflow onto the surface holding the glass or so it's support structure, and as such
buffer will overflow into other areas in memory adjacent to it in the same stack frame (overflow into the area adjacent it in
main()). By doing so, we may be able to find areas in memory that we can write to simply by overflowing into them. Making execution of our program or even variables in our program change.
Here are some simple diagrams to show you what would happen if
argv was a 97
byte string such as:
- "Hi My name is terry malanovoic, I like to overflow buffers, well it's only 64 bytes so whatever."
The first example will portray what happens when we do a length check on
argv before copying so that it fits into buffer and truncate anything that is extra:
char buffer with a length check algorithm, or a function that writes to buffer with a length option:
["Hi My name is terry malanovoic, I like to overflow buffers, wel"] [0xffffdc2a]
char buffer without length check algorithm. No length checking was performed.
argv copied as is:
["Hi My name is terry malanovoic, I like to overflow buffers, wel"]
["l it] ["'s o"] ["nly "] ["64 b"] ["ytes"] ...
As you can see the memory after
buffer is overwritten when we do not check the length of
argv before copying it into
buffer. It is generally discouraged to use
strcpy() in favour of other functions such as
memcpy() which take a length argument and would mitigate this issue.
Starting with Our Overflow (OS & Compilation)
To create our first overflow in Linux, let's boot up into a flavour of Linux of our choice (a recommendation is to use a Linux flavour that is either a live USB or in a virtualized machine).
As our first step, we must now switch off memory randomization. Memory randomization helps programs protect themselves against buffer overflow or similar memory based attacks. To switch it off manually. Set the value from
0 in the file (In the future, it would be wise to read more on ASLR or so memory randomization):
We can now go ahead and compile our C program from before using the following command:
gcc -g -fno-stack-protector -z execstack simplec.c -o simplec.out
We use a few flags in our compilation to aid our first time exploitation of the binary. Although there are many techniques to do such without these flags, we will use them this time as it is our first buffer overflow exploit. Generally, a black hat hacker will not have the availability of information that these flags do provide. As final copies of programs such as Libre Office would not have such flags in its compilation process which could easily allow for abuse.
Here is a simplistic teardown of the flags that we have used:
-g uses global debug symbols. Allows us to see debug information about the executable. Such as files, line numbers, C code, etc...
-fno-stack-protector removes stack protection for the executable. Generally, if a buffer is overflowed in C without this option, a segmentation fault exception will be raised, and the program will be terminated.
-z execstack tells our compiler to allow stack execution. This is usually off for security reasons. With this option on, we can execute values that we write into the stack using our overflow exploit.
-o tells our compiler the output file for our binary.
Starting With Our Overflow (GDB & Assembly)
Now that we have compiled our program, let's get started by investigating how our program's memory is laid out. In order to investigate our memory, we will use the gnu debugger also known as GDB. We can open our executable in gdb through the following command:
gdb -q ./simplec.out
We are now greeted by the GDB screen:
-q flag is used in order to not view any splash information such as authors, name of program, version, etc. when starting the program. It basically means 'quiet'.
We can now set up our environment. Please remember that there are two syntaxes for viewing assembly, usually the default is AT&T:
- AT&T assembly
- INTEL assembly
For this tutorial, I will be using the INTEL syntax. We can set INTEL assembly as our preferred syntax by using the following line in GDB:
set disassembly intel
If we want this syntax to be used by default everytime we start GDB, we can write a configuration file to our home directory named .gdbinit with the above line written in it.
Now that we have set up our environment, let's start investigating. Let's type in the command:
We need to create some breakpoints in order to stop the flow of execution and examine memory. For that, let's view the code of our program so that we can determine any line numbers that we would like to put a breakpoint at. We can see our program's code by using the command:
This is the output we get:
Here, we can see the individual lines of code with line numbers. We will be using the line numbers in order to create breakpoints. We can also use function names or memory addresses of instructions to create breakpoints. In order to see the memory addresses of our code in assembly type:
main() can be any function you would like to disassemble(
We can now see the disassembled code for the
main() function. We may set breakpoints at any of the
text segment memory addresses such as
+47 which is the instruction
Function Memory & Stack Frames
A functions memory is organized in what we call stack frames these get pushed onto the stack one on top of the other which help in keeping context due to the FILO (first in last out) structure. Variables sent to the function are pushed onto the stack before the frame. This includes the return address. The return address allows us to return to the previous stack frame. Here is how a stack frame is organized (For context, I will be using the below code to represent our stack frame):
void do(int a, int b)
void main(int argc, char* argv)
Here, it is represented in stack frames:
<< Lower memory addresses
Function do() @ 0x007fffffffc6
|-------------------------------| ← Start of frame for do(). Referenced to by $RSP
| //function prologue |
| //code | ← Code
| int b |
| int a |
|-------------------------------| ← End of frame reference to by the $RBP
| return address which is the ‘next instruction’ line in main() 0x007fffffffc2 |
Function main() @ 0x007fffffffc2
|-------------------------------| ← Start of frame for main(). Referenced to by $RSP
| //function prologue |
| do() @ 0x007fffffffc6 | ← Code
| next instruction | ← Return address from do() returns here.
| char* argv | ← Function arguments
| int argc |
|-------------------------------| ← End of frame reference to by the $RBP
| return address |
Higher memory addresses >>
As you may notice, variables are pushed in a first in-last out disposition (things are pushed in the opposite way though not read that way). This is due to the stack’s first in last out FILO way of working. This helps with context. For example, the function
main() which calls
do() will be pushed before
do() onto the stack. Remember that the stack grows upward from higher memory addresses to lower which means we must remove
do() from the stack in order to remove the
With all this in mind. If we overflowed
buffer, we could reach a part of memory that reads executable code at a specific address. As you see in the previous diagram, all variables are above the return address. The return address seems like something that points to a memory address to continue execution. What if we manipulated it to read a different memory address than it was designed to? Maybe a memory address that contains a variable we have control over. What if we put executable code into
buffer, at the same time overflowed
buffer to write into the return address to go and read the contents of
buffer as executable code?
As we see in the previous stack frame diagram, there is something called
$RBP or so the base pointer.
$RSP are some of the many pointers and registers used by the CPU to point to memory and contain values for mathematical purposes and so forth.
$RSP help with context. For example, in which part of memory a function starts and ends. You could say they are delimiters so that context is kept and no memory addresses before or after the stack frame are executed during the function call. Think of it as a book. There is a front cover which
$RSP points to and a back cover which
$RBP points to. And yes, we will need to know where we currently are in the book. For that, there is the
$RIP or so the instruction pointer. Here is a basic list of 3 pointers on the CPU that we will be using:
$RBP = Base pointer/Where the function or so stack frame ends
$RSP = Source pointer/Where the function or so the stack frame starts
$RIP = Intruction pointer/Which instruction is about to be executed.
You may see the current status of these registers at any moment by typing in:
Starting With Our Overflow (Investigation)
Now that we have a grasp of a few Assembly and GDB concepts, let's go and fire up our first buffer overflow exploit. To do so, let's create a breakpoint during execution:
Let's break at line 7. Breaking execution will allow us to examine registers, pointers and memory while the program is executing.
Run the program using the
Let's now figure out where
$RBP is located. The return address is usually located right after this pointer. For shorthand instead of typing info registers, we can also type
i r and after it, the pointer or register we want such as
i r $rbp
We can now see that
$RBP is located at
0x7fffffffde80 - this means that our return address is located right after it!. Perfect! but how can we see it? After all, the return address doesn't have a name. Here is where
Examine comes into play.
Examine commands allow us to view memory in different locations in as many quantities as we may like. Let's say that
$RBP is our start point and we tell the
Examine command to view 20 addresses after
$RBP. The examine command will display 20 addresses after
On the row of
0x7fffffffde80, we can see a second value,
0x00007ffff7dea09b. This is our return address.
buffer will be used to overwrite it with our own custom address pointing to our executable code. The question is, if we can use
buffer to override the return address. Can't we use
buffer to store the malicious code too?
Let's first try overwriting our return address. Let's make it point to the address of
buffer which will contain our malicious code or so 'shellcode'. We can figure out how far the return address is by determining the distance between
buffer is above the
$RBP in the stack:
print $rbp - buffer
The difference between
buffer is 64 bytes. Let's run our program with an argument that is 64 bytes long and create a breakpoint at line 8. This will make sure that
strpcy() is executed and
buffer is copied into, so that we can see the result of our hard work. If your program is still executing, you may use the below command to continue execution and allow the program to end:
You can run the program argument without quotes or use any argument that is 64 characters long considering that char is 1 byte long. We can now go and examine 20 addresses, but first let's examine
buffer as a
We can now see that
buffer contains our
string, let's go ahead and examine the addresses in hexadecimal:
$RBP is contained at
0x7fffffffde30. Yes memory can vary during different executions, especially when you supply new argument values or recompile the program. Memory is pretty flexible but turning off memory randomization helps us keep things roughly where they are in between executions of our program.
We can see various values. Remember that two values, say
0x7f are one byte! Everything you see in GDB must be read backwards so
0x3837363534333231 must be read as
31323343536373933x0 in Unicode 31 and 32 are equivalent to 1 and 2 just like the 1 and 2 in the
string we supplied as an argument to the program. The next address read backwards is
39302d3d71776572x0 of which the first two characters correspond to "
90-=" as you can see in the argument
string as the 9th, 10th, 11th and 12th characters.
How you read memory addresses is determined on Endianess. In this case, we are using Little Endian. This depends on the manufacturer of your hardware, protocol or in this case, the architecture of your processor.
Starting With Our Overflow (Execution)
Before we had mentioned a special element called '
shellcode', this is machine code we can write into a memory address such as
buffer and then have it executed by changing the return address for the function pointing it to
buffer instead of its original purpose. If we change the return address to the beginning of
buffer, we will be able to read any machine code in
buffer as normal execution instructions. To write our
shellcode, we will be using two specific instructions:
0x90 represents a NOP or so no operation instruction where no operation is executed or so nothing happens and execution passes along to the next memory address.
0xcc is what is called a hard breakpoint and will cause our program to halt and give a
SIGTRAP exception upon exit. Although shellcodes allow us to do many things such as spawn a root command shell (Application needs SetUID enabled). We are using
0xcc as our shellcode command as it is the simplest one byte instruction to show you how buffer overflows work. As we know, memory addresses sometimes change. In order to mitigate such, we must create something called an NOP sled.
An NOP sled would allow us to create a padding of NOP or
0x90 around our execution instruction
$RIP will scroll through all the NOP'S eventually to hit the
0xcc instruction. Making our life much easier than having to find the exact address of
0xcc in case memory arrangement changes.
We will use the Perl language to output our shellcode as an argument which will be processed and sent to the program when the
run command is executed. It makes creating NOP sled's much easier. The same can be done in any other language such as Python or C:
run "$(perl -e 'print "\x90" x 31 . "\xcc" .
"\x90" x 40 . "\xfa\xdd\xff\xff\xff\x7f"')"
Note that the total bytes needed to reach
64, but in order to override
$RBP and get to the return address, we must add 6 more bytes which are the number of bytes stored in
00s are not necessary to overwrite as the addresses of
buffer itself contains trailing zeros and hence is represented by six bytes.
But what does it all mean? The
$() represents execution. You can also use
`` to do the same. The command inside
$() is executed first and sent as the first argument of simplec.out. The
-e flag tells perl to execute the command in the following single quotation marks. We then print an NOP sled by printing NOP bytes before our shellcode which as we discussed before is \xcc. As we are using Little Endian. All bytes are written backwards. We then proceed into writing another NOP sled right after although this is not necessary unless we have another shellcode right after our first one. In the final
string contained in our double quotation marks is the address in bytes we would like to overwrite the return address with pointing to some address in the
buffer array which will start executing our NOP sled and eventually our shellcode. Where can we point the return address to for a positive hit?
0x7fffffffddfa is just a few bytes after the start of
buffer this makes sure that in case
buffer moves (unless drastically of course), our shellcode is executed. Let's execute our new program argument:
As you may see when the return address after the
$RBP of the
main() function was called, the program ended in a SIGTRAP. The return address is called when the
$RIP leaves the function and returns to the previous function in the stack. In this case, the
$RIP would return to system. This means that our shellcode has executed successfully. If you do get a segmentation fault, it means that some step in your argument
string is wrong. Sometimes, it could just be due to changes in memory addresses.
Buffer overflows are one of the most basic exploitation techniques. Although they have slowly become less prevalent, they are still widely possible. Especially when programmers do not pay attention to the code they output. By using this technique, we can read any memory address and execute it, just by writing it into the return address of some function in a program.
All information in this tutorial must be used for White hat hacking and under the law. Do not use this material to break the law. Only conduct such exploits on a machine you have written permission or recorded permission for.