Lesson 8 - Loops.

Literally just the various JMP instructions again but ✨f a n c i e r✨.

How to follow along

Two basic docker commands are required to follow along with this lesson -

Introduction

In this lesson we'll cover how high level language looping constructs like for, while, and do/while loops look in x64 ASM. This lesson will be significantly easier to follow along with now that we have knowledge of conditional statements, covered in the previous lesson.

Loops in high level languages are used to perform the same sequence of instructions over and over again until a particular condition is reached. This makes programs shorter and more efficient, while enabling all kinds of complicated processing tasks like reading files line by line and acting on their contents.

How 'for' loops are implemented in ASM

Let's look at a basic example, available in the lesson's docker container at /lesson/forLoop.c. The executable lives at /lesson/forLoopCompiled -


        #include <stdio.h>
        #include <stdlib.h>
            
        int main(int argc, char** argv) {
            for(int i = 0; i < argc; i++){
                printf("argv[%d] is %s\n", i, argv[i]);
            }
        
            exit(0);
        }
    

The above code simply iterates over the contents of the argv vector and prints every value. Imagine if we weren't able to use loops, we'd need to somehow guess how many arguments would be provided to the application and then simply add a print statement per argument. Inefficient and flakey.

Test this application out by supplying it with as many arguments as you like, observe that the values are printed to the terminal -

A demo of the application which prints 'learnreverseengineering.com is pretty swell'

Here's the corresponding ASM for this application -


    pwndbg> disassemble main
    Dump of assembler code for function main:
    => 0x0000555555555169 <+0>:     endbr64
    0x000055555555516d <+4>:     push   rbp
    0x000055555555516e <+5>:     mov    rbp,rsp
    0x0000555555555171 <+8>:     sub    rsp,0x20
    0x0000555555555175 <+12>:    mov    DWORD PTR [rbp-0x14],edi
    0x0000555555555178 <+15>:    mov    QWORD PTR [rbp-0x20],rsi
    0x000055555555517c <+19>:    mov    DWORD PTR [rbp-0x4],0x0
    0x0000555555555183 <+26>:    jmp    0x5555555551b6 <main+77>
    0x0000555555555185 <+28>:    mov    eax,DWORD PTR [rbp-0x4]
    0x0000555555555188 <+31>:    cdqe
    0x000055555555518a <+33>:    lea    rdx,[rax*8+0x0]
    0x0000555555555192 <+41>:    mov    rax,QWORD PTR [rbp-0x20]
    0x0000555555555196 <+45>:    add    rax,rdx
    0x0000555555555199 <+48>:    mov    rdx,QWORD PTR [rax]
    0x000055555555519c <+51>:    mov    eax,DWORD PTR [rbp-0x4]
    0x000055555555519f <+54>:    mov    esi,eax
    0x00005555555551a1 <+56>:    lea    rdi,[rip+0xe5c]        # 0x555555556004
    0x00005555555551a8 <+63>:    mov    eax,0x0
    0x00005555555551ad <+68>:    call   0x555555555060 <printf@plt>
    0x00005555555551b2 <+73>:    add    DWORD PTR [rbp-0x4],0x1
    0x00005555555551b6 <+77>:    mov    eax,DWORD PTR [rbp-0x4]
    0x00005555555551b9 <+80>:    cmp    eax,DWORD PTR [rbp-0x14]
    0x00005555555551bc <+83>:    jl     0x555555555185 <main+28>
    0x00005555555551be <+85>:    mov    edi,0x0
    0x00005555555551c3 <+90>:    call   0x555555555070 <exit@plt>
    

The parts of the ASM above related to looping have been highlighted for your convenience. The code kicks off with an unconditional jump to main+77, this allows the application to check whether the for loop should even execute. Imagine a situation where a developer has created a for loop like for(int i = 0; i == 53; i++){...}, in this (contrived, absurd) example the code inside of the loop would never execute because 0 doesn't equal 53. The unconditional jump at the start of the code allows the application to handle this kind of weird edge case.

The JMP lands us at mov eax, DWORD PTR[rbp-0x4], which has the effect of putting a value on the stack into EAX. The next line cmpares the value against another value on the stack at rbp-0x14. These two instructions are putting the variable i into EAX and then comparing it against the variable argc. If the comparison determines that i < argc then the JL main+28 instruction will execute, taking us back to the start of the loop (the line after the unconditional JMP) and go through the code which prints the current argv string!

After the values are printed, we see an add DWORD PTR [rbp-0x4], 0x1 instruction, this is simply incrementing the i variable by one, and then the familiar mov / cmp / jl instruction executes again.. and again.. until such time as i is no longer less than argv, at which time execution will skip past the jl instruction and the application will exit.

Straightforward and logical, I hope.

How 'while' loops are implemented in ASM

A second kind of loop in high level languages is the while loop. When I was first being taught to code (OG Visual Basic in 2003 if I remember correctly..) I distinctly remember my teacher telling me that anything which can be implemented as a 'for' loop can also be implemented as a while loop without compromising functionality. For fun I'm going to demonstrate this by converting the first application to a while loop and then disassembling it. The code can be found in /lesson/whileLoop.c compiled in /lesson/whileLoopCompiled.


    #include <stdio.h>
    #include <stdlib.h>
    
    int main(int argc, char** argv) {
        int i = 0;
        while(i < argc){
            printf("argv[%d] is %s\n", i, argv[i]);
            i++;
        }
    
        exit(0);
    }
    

The ASM looks as follows -

A demo of the while loop application, it's identical to the for loop application at the ASM level.'

The ASM is, quite literally, identical to the 'for' loop's assembly, proving my teacher in school to be entirely correct.

How 'do..while' loops are implemented in ASM.

For those who are unaware, a do while loop is a specific type of loop which is guaranteed to always execute at least once. The 'do' portion is executed before the 'while' condition is evaluated. Recall the contrived example above about a for loop which would never execute? a 'do while' loop mitigates that exact situation.

The C code (found in /lesson/doWhile.c as usual) is as follows -


    #include <stdio.h>
    #include <stdlib.h>

    int main(int argc, char** argv) {
        FILE * fp;
        char * line = NULL;
        size_t len = 0;
        size_t read;
        int i = 0;

        fp = fopen("/lesson/doWhile.c", "r");
        if (fp == NULL)
            exit(1);

        do{
            read = getline(&line, &len, fp);
            printf("Line number %d is %s", i, line);
            i++;
        }
        while ( read != -1);

        fclose(fp);
        exit(0);
    }

    

This code creates a file pointer to /lesson/doWhile.c and then reads through the file line by line, printing the contents to the terminal. The application breaks out of the while loop when no line is read from the file (when the read variable is -1). Basically the program reads itself and writes itself. Let's take a look at the ASM -


    pwndbg> disassemble main
    Dump of assembler code for function main:
    => 0x00005555555551c9 <+0>:     endbr64
    0x00005555555551cd <+4>:     push   rbp
    0x00005555555551ce <+5>:     mov    rbp,rsp
    0x00005555555551d1 <+8>:     sub    rsp,0x40
    0x00005555555551d5 <+12>:    mov    DWORD PTR [rbp-0x34],edi
    0x00005555555551d8 <+15>:    mov    QWORD PTR [rbp-0x40],rsi
    0x00005555555551dc <+19>:    mov    rax,QWORD PTR fs:0x28
    0x00005555555551e5 <+28>:    mov    QWORD PTR [rbp-0x8],rax
    0x00005555555551e9 <+32>:    xor    eax,eax
    0x00005555555551eb <+34>:    mov    QWORD PTR [rbp-0x28],0x0
    0x00005555555551f3 <+42>:    mov    QWORD PTR [rbp-0x20],0x0
    0x00005555555551fb <+50>:    mov    DWORD PTR [rbp-0x2c],0x0
    0x0000555555555202 <+57>:    lea    rsi,[rip+0xdfb]        # 0x555555556004
    0x0000555555555209 <+64>:    lea    rdi,[rip+0xdf6]        # 0x555555556006
    0x0000555555555210 <+71>:    call   0x5555555550b0 <fopen@plt>
    0x0000555555555215 <+76>:    mov    QWORD PTR [rbp-0x18],rax
    0x0000555555555219 <+80>:    cmp    QWORD PTR [rbp-0x18],0x0
    0x000055555555521e <+85>:    jne    0x55555555522a <main+97>
    0x0000555555555220 <+87>:    mov    edi,0x1
    0x0000555555555225 <+92>:    call   0x5555555550d0 <exit@plt>
    0x000055555555522a <+97>:    mov    rdx,QWORD PTR [rbp-0x18]
    0x000055555555522e <+101>:   lea    rcx,[rbp-0x20]
    0x0000555555555232 <+105>:   lea    rax,[rbp-0x28]
    0x0000555555555236 <+109>:   mov    rsi,rcx
    0x0000555555555239 <+112>:   mov    rdi,rax
    0x000055555555523c <+115>:   call   0x5555555550c0 <getline@plt>
    0x0000555555555241 <+120>:   mov    QWORD PTR [rbp-0x10],rax
    0x0000555555555245 <+124>:   mov    rdx,QWORD PTR [rbp-0x28]
    0x0000555555555249 <+128>:   mov    eax,DWORD PTR [rbp-0x2c]
    0x000055555555524c <+131>:   mov    esi,eax
    0x000055555555524e <+133>:   lea    rdi,[rip+0xdc3]        # 0x555555556018
    0x0000555555555255 <+140>:   mov    eax,0x0
    0x000055555555525a <+145>:   call   0x5555555550a0 <printf@plt>
    0x000055555555525f <+150>:   add    DWORD PTR [rbp-0x2c],0x1
    0x0000555555555263 <+154>:   cmp    QWORD PTR [rbp-0x10],0xffffffffffffffff
    0x0000555555555268 <+159>:   jne    0x55555555522a <main+97>
    0x000055555555526a <+161>:   mov    rax,QWORD PTR [rbp-0x18]
    0x000055555555526e <+165>:   mov    rdi,rax
    0x0000555555555271 <+168>:   call   0x555555555090 <fclose@plt>
    0x0000555555555276 <+173>:   mov    edi,0x0
    0x000055555555527b <+178>:   call   0x5555555550d0 <exit@plt>
    End of assembler dump.
    

OK so once again I've highlighted a couple of noteworthy things here. The first highlighted line, jne main+97 corresponds with the 'if' statement in the C code, which checks to make sure that the file was opened successfully.

The jump is taken, because the file was loaded successfully with fopen(). Immediately afterwards the application executes the code which was within the 'do' block of the 'do while' loop. The first line is read from the file with getline(), and it's then printed to the terminal with printf(). After this, the application increments the i variable, then compares the variable at rbp-0x10 (the read variable) with 0xfffffff or -1. If the value isn't -1 then the application jumps back up to main+97, ready to read another line from the file.

Conclusion

It should be clear by now that loops in ASM are basically just intelligent uses of conditional jump statements. Code which jumps to lower addresses is a solid indicator of the presence of a loop, because there aren't many other reasons for an application to jump backwards in code.

The next lesson is going to cover a couple of small gaps in our knowledge, floating point numbers specifically. After this lesson you'll be ready to work through the x64 challenge applications which I've provided!