MOV YOUR_HEART,ASM....
Two basic docker commands are required to follow along with this lesson -
docker pull learnreverseengineering/lesson2
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined learnreverseengineering/lesson2 bash
Start the lesson by running cat /lesson/movInstruction.c to print the first code sample that we're going to reverse engineer.
This resource will be immensely useful during this lesson and subsequent ones. It teaches the most important GDB / PwnDBG commands to make you a reverse engineering deity.
MOV
is one of the most fundamentally important instructions in any assembly language, in my
opinion. It enables an application to MOV
e a piece of data (a constant, an address, a bitmask,
anything) from one location to another. The instruction looks as follows: MOV destination, source
Generally speaking, the source
can be either a CPU register or a location on the stack (an
additional area of data storage which we're going to tackle in the next lesson). If the source
is a
register then the destination
can either be a register or a location on the stack, however if the
source
is a location on the stack then the destination
must be a register (stack to
stack MOV
s are not permitted.)
Consider the simple code in movInstruction.c -
#include <stdio.h>
int main(int argc, char** argv){
int variableOne = 0x35;
int variableTwo = 0x2a;
int variableThree = 0x0;
// Putting one variable into another
variableThree = variableOne;
printf("Number is: %d\n",variableOne);
printf("Number is: %d\n",variableTwo);
printf("Number is: %d\n",variableThree);
exit(0);
}
Themain
function, which declares some integer variables, assigns them some hexadecimal values (53,
42 and 0 respectively) and then prints them with printf
.
Without even disassembling the app at /lesson/movInstruction we can already see where the MOV
instructions probably are, every instance of something = somethingElse
will
generally create a MOV
instruction in the resulting ASM.
I initially thought about just showing the disassembly for the above C application, but it contained so much use of the stack (which is covered next lesson!) that it felt more appropriate to just manually write a pure (fully functional) assembly application here instead which performs the same operations, without relying so heavily on the stack.
As such, the code snippet below (found in /lesson/movInstruction.asm) is the simplest accurate representation of the original C source code.
section .data
msg: db "Number is:%d", 13, 10, 0 ; equivalent to "Number is:%d\r\n\0"
section .text
extern printf
extern exit
global _start, main
main:
_start:
mov RDI, msg ; put the "Number is:%d" message to print into RDI
mov RSI, 0x35 ; put 0x35 into RSI
call printf; x64 calling convention is that the first argument to the function is inside of RDI, then RSI
mov RDI, msg ; again, message into RDI
mov RSI, 0x2A ; put 0x2A into RSI ready to print it
call printf ; call printNumber, which will print 0x2a
mov RAX, 0x0 ; put zero into RAX (variableThree)
mov RBX, 0x35 ; put 0x35 into RBX
mov RAX, RBX ; overwrite RAX with RBX (0x35), this is equivalent to variableThree = variableOne eh?
mov RSI, RAX ; put the 0x35 into RSI ready to print it
mov RDI, msg ; same again, message into RDI
call printf
mov RDI, 0 ; exit code will be '0'
call exit ; quit the app
The assembly above was prebuilt inside of the lesson's Docker container, it can be found in /lesson/movInstructionAsm. If you'd like to manually build the assembly code as a learning exercise (which I greatly encourage) then run nasm -felf64 -o temp -l temp.lst movInstruction.asm followed by ld temp -o myAssembly -lc --dynamic-linker /lib64/ld-linux-x86-64.so.2 these two steps compile and link the assembly app into an executable called "myAssembly"
Let's work through this code line by line to help make the material stick!
Assembly code | Description |
---|---|
section .data |
Define the 'data' section of the executable |
msg: db "Number is:%d", … |
Create a string constant called message, the '13,10,0' part is just the decimal representation of \r\n\0 (carriage return, line feed, null character) |
section .text |
Define the 'text' section of the executable |
extern printf |
Indicate that the printf function should be linked from glibc |
extern exit |
Indicate that the exit function should be linked from glibc |
global _start, main |
Define our functions, start and main are the same but NASM needs _start to be defined |
main: |
Start the main function |
_start: |
And the _start function |
mov RDI, msg |
Put the address of our "Number is:".. String constant into the RDI register, this is the first argument to printf |
mov RSI, 0x35 |
Put 0x35 into RSI, this is now our 'variableOne' variable, this is the second argument to printf |
call printf |
This is how we call functions in assembly, with the CALL instruction. Call the printf
function with RDI and RSI as arguments, directly equivalent to printf("Number is: %d\r\n", 0x35);
|
mov RDI, msg |
The RDI register is overwritten by the call to printf above, so reset to to the correct value again |
mov RSI, 0x2A |
Put variableTwo (0x2A) into RSI, which is the second argument to printf |
call printf |
call printf again |
mov RAX, 0x0 |
Imagine that the RAX register is our variableThree, put 0 into it |
mov RBX, 0x35 |
Put 0x35 into RBX |
mov RAX, RBX |
Put RBX into RAX (variableThree, right?) |
mov RSI, RAX |
Put RAX into RSI, the second argument to printf |
mov RDI, msg |
Put the message into RDI again, the first argument to printf |
call printf |
Call printf |
mov RDI, 0 |
Put 0, the exit code, into RDI. |
call exit |
Call exit with RDI as an argument |
You were just inadvertently introduced to the CALL
instruction in the previous code snippet. This
instruction is the way that x64 (and x86) ASM applications make function calls like
printf(), exit(), fgets(), socket()
etc.
When we call functions in high level languages like C, we simply pass the arguments in parenthesis when we call
the function (printf("something: %s", somethingElse);
for example). In assembly we use different
'calling conventions' to dictate how functions are called.
I won't go into too many details here as there are... quite a lot... of calling conventions, but thankfully on Linux when writing x64 ASM you only need to worry about one, which is defined excellently https://www.ired.team/miscellaneous-reversing-forensics/windows-kernel-internals/linux-x64-calling-convention-stack-frame
Generally speaking, what you need to care about as a reverse engineer is that arguments to functions work as follows -
Obviously this is straightforward enough, but an example is always welcome! Consider the following C code -
void exampleFunction(int arg1, int arg2, int arg3, int arg4, int arg5, int arg6, int arg7){
// do something cool with arg1-arg7
}
void main(int argc, char** argv){
exampleFunction(1, 2, 3, 4, 5, 6, 7);
}
The above C code translates to the following ASM snippet (truncated for brevity) -
main: ; start of main function
mov RDI, 1
mov RSI, 2
mov RDX, 3
mov RCX, 4
mov R8, 5
mov R9, 6
push 7 ; as mentioned above, beyond 6 args we start pushing args to the stack
call exampleFunction
exampleFunction: ; start of exampleFunction function
............ ; do something cool with arg1-arg7
To really cement how this stuff works and to wrap this lesson up, we're going to step through the above assembly
code line by line in the GDB disassembler! From the lesson's Docker container run gdb
movInstructionAsm, followed by b main (to set a breakpoint at the start of the main method)
followed by run. The app starts and then immediately stops at the start of the main
method as we requested.
As we work through this disassembly, press n and return to step through the code line by line as we perform our analysis of what the program does. The caret (►) will show which line is about to be executed when you next press n.
The UI can be a little overwhelming initially, I want to point out that the top pane contains the state of the CPU registers at the time the app hit the breakpoint, the second pane contains the disassembly, the third pane contains the stack and the bottom pane contains the call stack. All of this will be covered in additional detail soon.
GDB is paused at the first instruction, movabs rdi, msg
. We can introspect the contents of
msg
by running telescope &msg 1 which will dereference the msg pointer and print the
text that it points to. Don't be alarmed that the instruction has changed from mov
to
movabs
, this is just a compiler optimization which happened automagically.
Pressing n will advance to the next instruction in the disassembly, observe that the RDI variable in
the registers pane at the top of the screen changes to an address like 0x403028
as a result of the
MOV operation. Step forward again with n to place 0x35 into RSI. The disassembly says ESI, but this
is just a compiler optimization because the source value is such a small value. ESI is analogous to RSI here, so
it's the same end result in this case.
Step forward again and observe that now that the call to printf
is highlighted, GDB helpfully shows
you what the arguments to the function will be -
Continue to step through the code with n, pay close attention to the state of the registers after each
instruction in the program. Observe how after each call to printf
, the RAX
variable
changes to the number of characters which were printed, this is because functions store their return addresses
in the RAX register by convention!
Continue to step through the code until the call to exit
is hit, then GDB should inform you that the
application has closed. It can be restarted again if you wish with run or r for short.
We covered a load of dry theory again this lesson, but we are rapidly approaching the point where we can reverse engineer some more challenging executables. Next lesson we'll cover the stack, which is fundamentally important for more complex ASM applications.