How to follow along

Two basic docker commands are required to follow along with this lesson -

docker pull learnreverseengineering/lesson4
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined learnreverseengineering/lesson4 bash

Introduction

In the previous lesson, "The stack", we covered the concept of the stack, and how assembly programs can use the stack to store data as part of execution.

In reality, the kinds of C programs which we're going to be disassembling as part of this course are more complicated than the basic example ASM programs I wrote and provided in the last lesson. Rather than just arbitrarily writing data to the stack whenever, modern (and complex) applications use the concept of a stack frame to keep track of the current function's context.

What's a stack frame?

Essentially, a stack frame 'divides' the stack up into small blocks which belong to the function which is currently executing. What this means is that instead of an application's functions being able to address the entire stack and store data wherever they please, they instead (by convention) interact with only their assigned block (frame!) on the stack. At compile time, the compiler works out how much space is going to be required on the stack for the function's variables and the stack frame size is hardcoded based upon that information!

This probably seems quite confusing without any examples, we'll be digging into some example code (and reversing our first proper C application 🥳) in a few paragraphs.

Last lesson we learned about and interacted with the RSP (RIDICULOUSLY EXTENDED STACK POINTER) register which points to the current top of the stack. There is another register waiting in the wings which we've not actually touched on yet called RBP, or RIDICULOUSLY EXTENDED BASE POINTER (base pointer for short).

The base pointer register points to the bottom of the stack frame, and the stack pointer points to the top of the stack frame. What this means is that we have a fixed position on the stack to reference from, no matter how large the stack grows to be as part of our function's execution.

Let's look at some code to see this in action.

The C code

This code can be found in the lesson's Docker container under /lesson/basicStackFrameExample.c


        #include <stdio.h>

        void printNumber(int number){
            printf("Number is - %d\n", number);
        }
        
        int main(int argc, char** argv){
            int variableOne = 0x35;           
            printNumber(variableOne);
        }

First and foremost is a function which accepts a number as an argument and then prints it to the terminal. Secondly is the main function, which declares an integer variable, assigns it a hexadecimal value (53 in decimal) and then prints it with printNumber. Straightforward stuff, but it's enough to highlight how stack frames work believe it or not!

The disassembly

You're strongly encouraged to follow along with this in the lesson's Docker container by running gdb basicStackFrameExample, then set a breakpoint on the main method (b main) and on the printNumber method (b printNumber). Run the application to pause at the main method using r

The disassembly for the above C code looks as follows -


        pwndbg> disassemble main
        Dump of assembler code for function main:
           0x000055555555517d <+0>:     endbr64
           0x0000555555555181 <+4>:     push   rbp
           0x0000555555555182 <+5>:     mov    rbp,rsp
           0x0000555555555185 <+8>:     sub    rsp,0x20
           0x0000555555555189 <+12>:    mov    DWORD PTR [rbp-0x14],edi
           0x000055555555518c <+15>:    mov    QWORD PTR [rbp-0x20],rsi
           0x0000555555555190 <+19>:    mov    DWORD PTR [rbp-0x4],0x35
           0x0000555555555197 <+26>:    mov    eax,DWORD PTR [rbp-0x4]
           0x000055555555519a <+29>:    mov    edi,eax
           0x000055555555519c <+31>:    call   0x555555555149 <printNumber>
           0x00005555555551a1 <+36>:    mov    eax,0x0
           0x00005555555551a6 <+41>:    leave
        => 0x00005555555551a7 <+42>:    ret
        End of assembler dump.

Function Prologue

The first three instructions are at the start of every function in a disassembled C application, and they are referred to as the "Function Prologue" (see here for the nitty-gritty details). Generally speaking the function prologue does three things -

call endbr64 as an exploit mitigation technique called Control Flow Enforcement Technology (currently outside of the scope of this course)
PUSH the RBP register onto the stack, which has the effect of saving it for later!
Copy RSP into RBP, so that RBP now points to the top of the stack

Creating the stack frame

The next instruction, sub RSP, 0x20, needs a little explanation. sub is the "subtract" instruction, so by subtracting 0x20 (32 decimal) bytes from RSP we are making it point to an address 32 bytes lower on the stack, which has the effect of creating a "block" (stack frame!) between RBP and RSP.

Let's compare how the stack would look before and after the call to sub rsp, 0x20-

Before SUB RSP.
Stack Address	Contents
0x10000	RBP and RSP point to this address
0xfff8	empty stack space
0xfff0	empty stack space
0xffe8	empty stack space
0xffe0	empty stack space
0xffd8	empty stack space

After SUB RSP.
Stack Address	Contents
0x10000	RBP points to this address
0xfff8	empty stack space
0xfff0	empty stack space
0xffe8	empty stack space
0xffe0	RSP points to this address (space is still empty though)
0xffd8	empty stack space

What's happened is that we have created a block (stack frame!!!) which can accommodate up to four 8 byte variables (in 0xfff8, 0xfff0, 0xffe8 and 0xffe0). We can now reference any variables in our function's stack frame from the fixed location at the top of the frame (AKA RBP!) using instructions like MOV EAX, QWORD PTR [RBP-8]

You're probably asking yourself 'well where did 0x20 come from? why that value? should I buy the author of this course a coffee? (yes please I'm very tired)', these are all valid questions. Looking at the main method in the C code above we can identify three variables -

argc (int, 4 bytes)
argv (char**, 8 bytes)
variableOne (int, 4 bytes)

The compiler has correctly identified that we will need 16 (0x10 in decimal) bytes of storage on the stack for these variables, the compiler has also helpfully rounded up to the next 16 bytes (from 0x10 to 0x20) to give us a little leeway inside of our stack frame.

As a fun exercise, modify the code in basicStackFrameExample.c to include a number of additional integers in the main method, compile the code with gcc -masm=intel basicStackFrameExample.c -o basicStackFrameExample and observe how the sub rsp, 0x20 instruction changes! Also, remove the variables entirely and just call printNumber(0x35); and observe what changes.

Variable assignment

Let's tackle the next four lines of assembly

mov DWORD PTR [rbp-0x14],edi
mov QWORD PTR [rbp-0x20],rsi
mov DWORD PTR [rbp-0x4],0x35
mov eax,DWORD PTR [rbp-0x4]

This code should look fairly familiar if you completed the last lesson, although you'll observe that rather than referencing things on the stack relative to the RSP register, we'll now use RBP (our base pointer, which always points to the top of the stack frame!). Observe that because the stack frame has already been reserved for the main method, we don't need to PUSH our variables onto the stack in these instructions, we can simply copy data in arbitrarily using the MOV instruction.

Firstly we store the 4 byte EDI register (which contains argc) onto the stack frame at offset 0x14. The main method always puts argc and argv onto the stack in every C application. In the final examples in the last lesson where we wrote assembly code and opened it in GDB we saw that argc and argv were already at the top of the stack when the main method started.

It shouldn't come as a surprise that the next instruction stores the 8 byte (quadword!) argv onto the stack at stack frame offset 0x20.

The next instruction stores 0x35 onto the stack frame at offset 0x4, this is our 4 byte integer called variableOne from the C code above! From the assembly, we can tell that the variable is 4 bytes wide because the application writes it to a DWORD PTR location on the stack rather than a QWORD PTR location (which would be 8 bytes wide).

The next instruction takes the DWORD which is pointed to by RBP-0x4 and places it into EAX. At this point, from reading and understanding the assembly code we've established that the DWORD variable at stack frame offset 0x4 is our variableOne variable. ergo, EAX now contains variableOne.

argc lives at RBP-0x14, argv lives at RBP-0x20, variableOne lives at RBP-0x4. This is important, because it shows that while there is not the concept of a variable in ASM, we can still easily track where variables are by monitoring stack offsets and register values!

For fun, let's break the stack frame up into 4 byte chunks and see where everything lives at this point. Follow along at home by running dd $rbp 1 , dd $rbp-4 1 , dd $rbp-8 1 etc. in GDB.

Stackframe offset	Value	Description
RBP	00000000	The saved RBP register from the second instruction
RBP-0x4	00000035	variableOne
RBP-0x8	00000000	Empty space
RBP-0xC	00007fff	Irrelevant, already on stack
RBP-0x10	ffffe6c0	Irrelevant, already on stack
RBP-0x14	00000001	argc
RBP-0x18	55555060	Irrelevant, already on stack
RBP-0x1C	00007ffff	The second half of argv's address
RBP-0x20	ffffe6c8	The first half of argv's address

All very logical I hope. The things marked "Irrelevant, already on stack" are noteworthy - stackframes aren't guaranteed to be empty when they're created, they can contain all kinds of random data which can be safely overwritten by the application.

Call to printNumber

The next instructions are -

mov edi,eax
0x000055555555519c <+31>: call 0x555555555149 <printNumber>

The first instruction should be both familiar and expected. We discussed in the second lesson that the first argument to a function is always placed within RDI prior to a call instruction. So in this case we put variableOne into RDI and then call printNumber.

Press si on the call instruction to Step Into the printNumber function. Look at the stack at this point -

The state of the stack at the start of the printNumber function

Observe that a new quadword pointer has been PUSHED on top of the stack outside of the main function's stack frame. This quadword pointer is the address of the next line of code in the main function after the call printNumber function. We can confirm this by running disassemble main and looking for 0x555555555195, we'll observe that it's directly after the call to printNumber.

pwndbg also helpfully gives us all of this information too, telling us that it's the address of main+36, which is mov eax, 0!

Looking at the disassembly in printNumber, we observe that there's another function prologue which is going to save a copy of RBP (which is the base of main's stack frame!) onto the stack, followed by creating a new stack frame by putting RSP into RBP and then making 0x10 bytes of space on the stack for the stack frame (0x10 bytes of space because there is only one variable - the function argument!)

Use n to step through the code until after the call to printf. We get to learn about two new instructions! lea rdi, [rip + 0xea0] or Load Effective Address loads the address of a string into a register! In this case we're loading the address of the "Number is %d\n" string and putting it into RDI ready for the call to printf! The other new instruction is nop. NOP stands for "No Operation" and it does literally nothing. It's unclear why the compiler has added a NOP instruction here, but we can safely ignore it because it performs no operations!

OK here is the big pay off. Take a close look at the stack and at the registers right now before we execute the LEAVE instruction -

Press n to step over the leave instruction and observe that multiple things happened -

The RBP and RSP registers were updated
The stack is now identical to what it was at the start of the function
GDB has helpfully highlighted where the RET instruction is going to return to

We call what the LEAVE instruction has done "unwinding the stack", it's part of the "function epilogue" and it's responsible for deleting the current stackframe and putting the stack / RBP / RSP registers back to where they were before the function prologue occurred! Press n to step over the ret instruction and observe that we're back in the main method, and the stack frame for main has been restored, as if nothing ever happened.

Conclusion

We covered a large amount of fundamentally important information in this lesson. We now understand how stack frames work and we have gained the ability to reverse engineer a decent amount of basic C applications!

Lesson 4 - Stack Frames.