Literally just the various JMP
instructions again but ✨f a n c i e r✨.
Two basic docker commands are required to follow along with this lesson -
docker pull learnreverseengineering/lesson8
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined learnreverseengineering/lesson8 bash
In this lesson we'll cover how high level language looping constructs like for
, while
,
and do/while
loops look in x64 ASM. This lesson will be significantly easier to follow along with
now that we have knowledge of conditional statements, covered in the previous lesson.
Loops in high level languages are used to perform the same sequence of instructions over and over again until a particular condition is reached. This makes programs shorter and more efficient, while enabling all kinds of complicated processing tasks like reading files line by line and acting on their contents.
Let's look at a basic example, available in the lesson's docker container at /lesson/forLoop.c. The executable lives at /lesson/forLoopCompiled -
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char** argv) {
for(int i = 0; i < argc; i++){
printf("argv[%d] is %s\n", i, argv[i]);
}
exit(0);
}
The above code simply iterates over the contents of the argv
vector and prints every value. Imagine
if we weren't able to use loops, we'd need to somehow guess how many arguments would be provided to the
application and then simply add a print statement per argument. Inefficient and flakey.
Test this application out by supplying it with as many arguments as you like, observe that the values are printed to the terminal -
Here's the corresponding ASM for this application -
pwndbg> disassemble main
Dump of assembler code for function main:
=> 0x0000555555555169 <+0>: endbr64
0x000055555555516d <+4>: push rbp
0x000055555555516e <+5>: mov rbp,rsp
0x0000555555555171 <+8>: sub rsp,0x20
0x0000555555555175 <+12>: mov DWORD PTR [rbp-0x14],edi
0x0000555555555178 <+15>: mov QWORD PTR [rbp-0x20],rsi
0x000055555555517c <+19>: mov DWORD PTR [rbp-0x4],0x0
0x0000555555555183 <+26>: jmp 0x5555555551b6 <main+77>
0x0000555555555185 <+28>: mov eax,DWORD PTR [rbp-0x4]
0x0000555555555188 <+31>: cdqe
0x000055555555518a <+33>: lea rdx,[rax*8+0x0]
0x0000555555555192 <+41>: mov rax,QWORD PTR [rbp-0x20]
0x0000555555555196 <+45>: add rax,rdx
0x0000555555555199 <+48>: mov rdx,QWORD PTR [rax]
0x000055555555519c <+51>: mov eax,DWORD PTR [rbp-0x4]
0x000055555555519f <+54>: mov esi,eax
0x00005555555551a1 <+56>: lea rdi,[rip+0xe5c] # 0x555555556004
0x00005555555551a8 <+63>: mov eax,0x0
0x00005555555551ad <+68>: call 0x555555555060 <printf@plt>
0x00005555555551b2 <+73>: add DWORD PTR [rbp-0x4],0x1
0x00005555555551b6 <+77>: mov eax,DWORD PTR [rbp-0x4]
0x00005555555551b9 <+80>: cmp eax,DWORD PTR [rbp-0x14]
0x00005555555551bc <+83>: jl 0x555555555185 <main+28>
0x00005555555551be <+85>: mov edi,0x0
0x00005555555551c3 <+90>: call 0x555555555070 <exit@plt>
The parts of the ASM above related to looping have been highlighted for your convenience. The code kicks off with
an unconditional jump to main+77, this allows the application to check whether the for loop should even execute.
Imagine a situation where a developer has created a for loop like
for(int i = 0; i == 53; i++){...}
, in this (contrived, absurd) example the code inside of the loop
would never execute because 0 doesn't equal 53. The unconditional jump at the start of the code allows the
application to handle this kind of weird edge case.
The JMP
lands us at mov eax, DWORD PTR[rbp-0x4]
, which has the effect of putting a
value on the stack into EAX. The next line cmp
ares the value against another value on the stack at
rbp-0x14
. These two instructions are putting the variable i
into EAX and then
comparing it against the variable argc
. If the comparison determines that i < argc
then the JL main+28
instruction will execute, taking us back to the start of the loop (the line
after the unconditional JMP
) and go through the code which prints the current argv
string!
After the values are printed, we see an add DWORD PTR [rbp-0x4], 0x1
instruction, this is simply
incrementing the i
variable by one, and then the familiar mov / cmp / jl
instruction
executes again.. and again.. until such time as i
is no longer less than argv
, at
which time execution will skip past the jl
instruction and the application will exit.
Straightforward and logical, I hope.
A second kind of loop in high level languages is the while loop. When I was first being taught to code (OG Visual
Basic in 2003 if I remember correctly..) I distinctly remember my teacher telling me that anything which can be
implemented as a 'for' loop can also be implemented as a while loop without compromising functionality. For fun
I'm going to demonstrate this by converting the first application to a while loop and then disassembling it. The
code can be found in /lesson/whileLoop.c compiled in /lesson/whileLoopCompiled
.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char** argv) {
int i = 0;
while(i < argc){
printf("argv[%d] is %s\n", i, argv[i]);
i++;
}
exit(0);
}
The ASM looks as follows -
The ASM is, quite literally, identical to the 'for' loop's assembly, proving my teacher in school to be entirely correct.
For those who are unaware, a do while
loop is a specific type of loop which is guaranteed to always
execute at least once. The 'do' portion is executed before the 'while' condition is evaluated. Recall the
contrived example above about a for loop which would never execute? a 'do while' loop mitigates that exact
situation.
The C code (found in /lesson/doWhile.c as usual) is as follows -
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char** argv) {
FILE * fp;
char * line = NULL;
size_t len = 0;
size_t read;
int i = 0;
fp = fopen("/lesson/doWhile.c", "r");
if (fp == NULL)
exit(1);
do{
read = getline(&line, &len, fp);
printf("Line number %d is %s", i, line);
i++;
}
while ( read != -1);
fclose(fp);
exit(0);
}
This code creates a file pointer to /lesson/doWhile.c and then reads through the file line by line, printing the
contents to the terminal. The application breaks out of the while loop when no line is read from the file (when
the read
variable is -1). Basically the program reads itself and writes itself. Let's take a look
at the ASM -
pwndbg> disassemble main
Dump of assembler code for function main:
=> 0x00005555555551c9 <+0>: endbr64
0x00005555555551cd <+4>: push rbp
0x00005555555551ce <+5>: mov rbp,rsp
0x00005555555551d1 <+8>: sub rsp,0x40
0x00005555555551d5 <+12>: mov DWORD PTR [rbp-0x34],edi
0x00005555555551d8 <+15>: mov QWORD PTR [rbp-0x40],rsi
0x00005555555551dc <+19>: mov rax,QWORD PTR fs:0x28
0x00005555555551e5 <+28>: mov QWORD PTR [rbp-0x8],rax
0x00005555555551e9 <+32>: xor eax,eax
0x00005555555551eb <+34>: mov QWORD PTR [rbp-0x28],0x0
0x00005555555551f3 <+42>: mov QWORD PTR [rbp-0x20],0x0
0x00005555555551fb <+50>: mov DWORD PTR [rbp-0x2c],0x0
0x0000555555555202 <+57>: lea rsi,[rip+0xdfb] # 0x555555556004
0x0000555555555209 <+64>: lea rdi,[rip+0xdf6] # 0x555555556006
0x0000555555555210 <+71>: call 0x5555555550b0 <fopen@plt>
0x0000555555555215 <+76>: mov QWORD PTR [rbp-0x18],rax
0x0000555555555219 <+80>: cmp QWORD PTR [rbp-0x18],0x0
0x000055555555521e <+85>: jne 0x55555555522a <main+97>
0x0000555555555220 <+87>: mov edi,0x1
0x0000555555555225 <+92>: call 0x5555555550d0 <exit@plt>
0x000055555555522a <+97>: mov rdx,QWORD PTR [rbp-0x18]
0x000055555555522e <+101>: lea rcx,[rbp-0x20]
0x0000555555555232 <+105>: lea rax,[rbp-0x28]
0x0000555555555236 <+109>: mov rsi,rcx
0x0000555555555239 <+112>: mov rdi,rax
0x000055555555523c <+115>: call 0x5555555550c0 <getline@plt>
0x0000555555555241 <+120>: mov QWORD PTR [rbp-0x10],rax
0x0000555555555245 <+124>: mov rdx,QWORD PTR [rbp-0x28]
0x0000555555555249 <+128>: mov eax,DWORD PTR [rbp-0x2c]
0x000055555555524c <+131>: mov esi,eax
0x000055555555524e <+133>: lea rdi,[rip+0xdc3] # 0x555555556018
0x0000555555555255 <+140>: mov eax,0x0
0x000055555555525a <+145>: call 0x5555555550a0 <printf@plt>
0x000055555555525f <+150>: add DWORD PTR [rbp-0x2c],0x1
0x0000555555555263 <+154>: cmp QWORD PTR [rbp-0x10],0xffffffffffffffff
0x0000555555555268 <+159>: jne 0x55555555522a <main+97>
0x000055555555526a <+161>: mov rax,QWORD PTR [rbp-0x18]
0x000055555555526e <+165>: mov rdi,rax
0x0000555555555271 <+168>: call 0x555555555090 <fclose@plt>
0x0000555555555276 <+173>: mov edi,0x0
0x000055555555527b <+178>: call 0x5555555550d0 <exit@plt>
End of assembler dump.
OK so once again I've highlighted a couple of noteworthy things here. The first highlighted line,
jne main+97
corresponds with the 'if' statement in the C code, which checks to make sure that the
file was opened successfully.
The jump is taken, because the file was loaded successfully with fopen()
. Immediately afterwards the
application executes the code which was within the 'do' block of the 'do while' loop. The first line is read
from the file with getline()
, and it's then printed to the terminal with printf()
.
After this, the application increments the i
variable, then compares the variable at
rbp-0x10
(the read
variable) with 0xfffffff or -1. If the value isn't -1 then the
application jumps back up to main+97, ready to read another line from the file.
It should be clear by now that loops in ASM are basically just intelligent uses of conditional jump statements. Code which jumps to lower addresses is a solid indicator of the presence of a loop, because there aren't many other reasons for an application to jump backwards in code.
The next lesson is going to cover a couple of small gaps in our knowledge, floating point numbers specifically. After this lesson you'll be ready to work through the x64 challenge applications which I've provided!