Lesson 7 - Conditional statements.

So get out JA seat and JMP around! JMP around! JMP around!

How to follow along

Two basic docker commands are required to follow along with this lesson -

Introduction

Alright so if you've been following along with this material so far then you may have noticed that all of the little ASM programs that we've worked through have something in common - they start at the beginning of the code (Either the main() method or the _start label.), they execute a few instructions and then they end. There is no branching (aside from function calls) depending on the outcome of conditional statements and there is no iteration (e.g looping, running the same piece of code over and over until a condition is met)!

Generally speaking, every application worth its salt has some form of conditional statement usage and probably some form of iteration - it's what allows applications to perform things like error checking and providing rich functionality that is able to respond to events / input / external stimulus / stuff.

Imagine if we wrote a program which accepted a number from the user, but we had no way of conditionally doing something with that number if, for example, it turned out to be a string like goat - the application would try to use the word 'goat' as an integer and it would crash ungracefully.

Conditional statements in ASM

There are an absurd amount of conditional statement mnemonics in x64 ASM. We only need to care about 4 or 5 of them though and the rest will be intuitive when we see them. I've made the decision to jump straight into some C code and see how that gets mapped to ASM, so we can learn by example here.

if / else

If you're used to programming in high level languages then a conditional statement is just like an if(){} statement in your favorite language. Essentially, if (this condition is true) then { do these things } otherwise { do these things } - it's already clear how this will make for more interesting applications and examples I hope!

Consider the following C code (available in the lesson's docker container under /lesson/if.c) -


    #include <stdio.h>
    #include <stdlib.h>
    #include <errno.h>

    int main(int argc, char** argv) {
        if(argc == 7) {
           printf("You've successfully established the correct number of arguments to access this application.\n");
           printf("Pretend that the application is now exposing some awesome functionality, please.\n");
        } else {
           printf("This application requires a specific number of command line arguments for it to function.\n");
           printf("Usage: '/lesson/elseIfCompiled argument1 argument2.........argumentZ'\n");
        }
   
       exit(0);
    }

    

Pretty simple stuff I hope. The application uses the if statement to check if the number of arguments passed to the program is 7. If it's not precisely 7 then the else block executes and tells the user to do better.

Test the application out by running /lesson/ifCompiled 1 2 3 4 5 and observe the response, then run /lesson/ifCompiled 1 2 3 4 5 6 to observe that we've satisfied the requirement for 'argc' to be 7 (6 arguments + the name of the executable). Let's open this up in GDB and see how it looks


    pwndbg> disassemble main
    Dump of assembler code for function main:
    => 0x0000555555555169 <+0>:     endbr64
        0x000055555555516d <+4>:     push   rbp
        0x000055555555516e <+5>:     mov    rbp,rsp
        0x0000555555555171 <+8>:     sub    rsp,0x10
        0x0000555555555175 <+12>:    mov    DWORD PTR [rbp-0x4],edi
        0x0000555555555178 <+15>:    mov    QWORD PTR [rbp-0x10],rsi
        0x000055555555517c <+19>:    cmp    DWORD PTR [rbp-0x4],0x7
        0x0000555555555180 <+23>:    jne    0x55555555519c <main+51>
        0x0000555555555182 <+25>:    lea    rdi,[rip+0xe7f]        # 0x555555556008
        0x0000555555555189 <+32>:    call   0x555555555060 <puts@plt>
        0x000055555555518e <+37>:    lea    rdi,[rip+0xed3]        # 0x555555556068
        0x0000555555555195 <+44>:    call   0x555555555060 <puts@plt>
        0x000055555555519a <+49>:    jmp    0x5555555551b4 <main+75>
        0x000055555555519c <+51>:    lea    rdi,[rip+0xf1d]        # 0x5555555560c0
        0x00005555555551a3 <+58>:    call   0x555555555060 <puts@plt>
        0x00005555555551a8 <+63>:    lea    rdi,[rip+0xf71]        # 0x555555556120
        0x00005555555551af <+70>:    call   0x555555555060 <puts@plt>
        0x00005555555551b4 <+75>:    mov    edi,0x0
        0x00005555555551b9 <+80>:    call   0x555555555070 <exit@plt>
    End of assembler dump.
    

Alright so at this point in the course I hope that there aren't any surprises in the above ASM (aside from maybe LEA which I'll cover in a second). We can see three new instructions though. JMP, JNE, CMP.

Maybe a touch confusing but hopefully not too bad. With the context above, here's a breakdown of what the assembly is doing (starting at the line above the cmp instruction)

As noted above, the LEA instruction is new to us. This instruction puts the address of a string into a register to be used by functions like puts() which accept string pointers as arguments. In this case it's directly equivalent to mov rdi, 0x555555556008. Confirm this yourself in GDB with x/s 0x555555556008.

Another note, in case it's unclear, if the result of a cmp instruction doesn't satisfy a conditional jump's condition, the conditional jump instruction is simply skipped over, it doesn't execute.

Observe above that we have a JNE instruction. It shouldn't surprise you to learn that there is a JE instruction too, which jumps only if two values (compared with CMP) are equal. There is also a JZ / JNZ instruction for jump if zero and jump if not zero, these are functionally identical to JE and JNE, they operate on the value of the Z flag in the EFLAGS register.

Just for fun, to really cement how this stuff works, we can use the examine instruction command ( x/i ) in GDB to look at the code which will be executed by the JNE instruction and the JMP instructions respectively -

Using the examine instruction command to look at the addresses which the jump instructions jump to.

if / else if / else

There is an additional form of if statements called else if. This code allows developers to write code which firstly checks for a condition being true, then check for another condition being true, as many times as the developer wishes, before eventually falling into an else block and breaking out of the conditional.

Consider the following snippet (which can be found in the docker container under /lesson/elseIfRedacted.c) -


    #include <stdio.h>
    #include <stdlib.h>
    #include <errno.h>
    
    
    int main(int argc, char** argv) {
        if(argc<=1) {
            printf("In order to unlock this application you must supply the correct numeric 'code' as an argument to the application.\n");
            printf("Usage: '/lesson/elseIfCompiled 53'\n");
            exit(1);
        } 
    
        long int result;
        char *pend;
    
        errno = 0;

        // strtol, string to long, takes a string and returns the 'long' int representation of that string.
        // Arguments are as follows - 
        // 1. The string to 'cast' to a long, in our case it's the argument to our program
        // 2. a buffer to hold any extra 'stuff' on the end of the number (in case the user does something silly)
        // 3. the 'base' of the number (10 == base 10 == decimal)
        result = strtol (argv[1], &pend, 10);
    
        if(errno != 0){ // Check if strtol returned an error
            printf("Something bad has happened, exiting the program.\n");

        } else if(result == REDACTED) {  // Check if the code was correct for a low privileged user
            printf("Welcome low privileged user, the application was unlocked. Shame you're not an administrator though.\n");

        } else if(result == REDACTED){ // Check if the code was correct for an administrator
            printf("Welcome admin, the application was unlocked with all privileges enabled.\n");
            printf("Please pretend that some useful functionality was enabled.\n");

        } else {
            printf("That code is incorrect. Please try again.\n");
        }
        exit(0);
    }
    

This is clearly the largest and most complicated C code that we've seen so far in this course, so let's break it down and explain what it's doing.

I've unhelpfully redacted the numeric access codes from the C source code, and I've hidden the original C source file in the container somewhere so you can't just peek at it to see what the correct numbers are. We're reverse engineers after all, eh? 🙂

You probably won't find the file either, so I wouldn't waste valuable seconds hunting for it when you could just open /lesson/elseIfCompiled in GDB and work out the code from there!

The corresponding ASM

The ASM code below was established by disassembling /lesson/elseIfCompiled in the lesson's container.


    pwndbg> disassemble main
    Dump of assembler code for function main:
    => 0x00005555555551a9 <+0>:     endbr64
        0x00005555555551ad <+4>:     push   rbp
        0x00005555555551ae <+5>:     mov    rbp,rsp
        0x00005555555551b1 <+8>:     sub    rsp,0x30
        0x00005555555551b5 <+12>:    mov    DWORD PTR [rbp-0x24],edi
        0x00005555555551b8 <+15>:    mov    QWORD PTR [rbp-0x30],rsi
        0x00005555555551bc <+19>:    mov    rax,QWORD PTR fs:0x28
        0x00005555555551c5 <+28>:    mov    QWORD PTR [rbp-0x8],rax
        0x00005555555551c9 <+32>:    xor    eax,eax
        0x00005555555551cb <+34>:    cmp    DWORD PTR [rbp-0x24],0x1
        0x00005555555551cf <+38>:    jg     0x5555555551f3 <main+74>
        0x00005555555551d1 <+40>:    lea    rdi,[rip+0xe30]        # 0x555555556008
        0x00005555555551d8 <+47>:    call   0x555555555090 <puts@plt>
        0x00005555555551dd <+52>:    lea    rdi,[rip+0xe9c]        # 0x555555556080
        0x00005555555551e4 <+59>:    call   0x555555555090 <puts@plt>
        0x00005555555551e9 <+64>:    mov    edi,0x1
        0x00005555555551ee <+69>:    call   0x5555555550b0 <exit@plt>
        0x00005555555551f3 <+74>:    call   0x555555555080 <__errno_location@plt>
        0x00005555555551f8 <+79>:    mov    DWORD PTR [rax],0x0
        0x00005555555551fe <+85>:    mov    rax,QWORD PTR [rbp-0x30]
        0x0000555555555202 <+89>:    add    rax,0x8
        0x0000555555555206 <+93>:    mov    rax,QWORD PTR [rax]
        0x0000555555555209 <+96>:    lea    rcx,[rbp-0x18]
        0x000055555555520d <+100>:   mov    edx,0xa
        0x0000555555555212 <+105>:   mov    rsi,rcx
        0x0000555555555215 <+108>:   mov    rdi,rax
        0x0000555555555218 <+111>:   call   0x5555555550a0 <strtol@plt>
        0x000055555555521d <+116>:   mov    QWORD PTR [rbp-0x10],rax
        0x0000555555555221 <+120>:   call   0x555555555080 <__errno_location@plt>
        0x0000555555555226 <+125>:   mov    eax,DWORD PTR [rax]
        0x0000555555555228 <+127>:   test   eax,eax
        0x000055555555522a <+129>:   je     0x55555555523a <main+145>
        0x000055555555522c <+131>:   lea    rdi,[rip+0xe75]        # 0x5555555560a8
        0x0000555555555233 <+138>:   call   0x555555555090 <puts@plt>
        0x0000555555555238 <+143>:   jmp    0x555555555282 <main+217>
        0x000055555555523a <+145>:   cmp    QWORD PTR [rbp-0x10],0x703cd8
        0x0000555555555242 <+153>:   jne    0x555555555252 <main+169>
        0x0000555555555244 <+155>:   lea    rdi,[rip+0xe95]        # 0x5555555560e0
        0x000055555555524b <+162>:   call   0x555555555090 <puts@plt>
        0x0000555555555250 <+167>:   jmp    0x555555555282 <main+217>
        0x0000555555555252 <+169>:   cmp    QWORD PTR [rbp-0x10],0xcc07c9
        0x000055555555525a <+177>:   jne    0x555555555276 <main+205>
        0x000055555555525c <+179>:   lea    rdi,[rip+0xee5]        # 0x555555556148
        0x0000555555555263 <+186>:   call   0x555555555090 <puts@plt>
        0x0000555555555268 <+191>:   lea    rdi,[rip+0xf29]        # 0x555555556198
        0x000055555555526f <+198>:   call   0x555555555090 <puts@plt>
        0x0000555555555274 <+203>:   jmp    0x555555555282 <main+217>
        0x0000555555555276 <+205>:   lea    rdi,[rip+0xf5b]        # 0x5555555561d8
        0x000055555555527d <+212>:   call   0x555555555090 <puts@plt>
        0x0000555555555282 <+217>:   mov    edi,0x0
        0x0000555555555287 <+222>:   call   0x5555555550b0 <exit@plt>
        End of assembler dump.
    

Well, it was our most complete and complicated C example so far so it only makes sense for this to be our most complete and complicated ASM sample too. I'm absolutely certain that this looks like quite a scary listing as a budding reverse engineer so I've taken the liberty of highlighting the lines which correspond with if/else if statements, and we'll start working through each one of them below.

OK starting at the first two highlighted lines above -

The first chunk of highlighted assembly.

The cmp statement checks a value on the stack (argc!) against the number one. Immediately afterwards is a new instruction JG, which stands for "Jump if Greater Than". Telling us that if the cmp instruction's result set the 'Z' flag in the EFLAGS register to one AND if the cmp instruction set the 'S' and 'O' flags to the same values in the EFLAGS register then the jump will be taken. This is a very confusing, so I want to say two things.

  1. This resource explains all of the states that the EFLAGS register might be in after a cmp instruction
  2. You don't need to know (or even memorize) the values of the EFLAGS register or what they mean. All you need to know is that if the left hand value in a cmp statement is larger than the right value then a JG will be taken.

One last note before we move on, there is also -

Back to the analysis anyway. If argc is greater than '1' then we make a jump into another location in the code. If we don't jump then a few instructions ahead is a big, inevitable call to exit() which will clearly exit the application. Based on this information we know that we need argc to be greater than 1 in order to continue the application's execution.

Skipping on to the point in the code where the JG lands then, in the case where argc is greater than 1.

The second chunk of highlighted assembly.

The above code isn't particularly scary, it sets up a number of registers and stack locations for a call to strtol() which converts the string pointed to by RDI into a long value. There are a couple of noteworthy instructions here which I'd like to dive into quickly.

I'll quickly clarify that last point. Consider that the argv vector looks like the following in memory -

Addresses in RAM 0 (argv[0]) 8 (argv[1]) 16 (argv[2]) 24 (argv[3])
Data at that address 1000 1240 1180 2248

Ignoring the obviously fictional numbers here, observe how address 8 in memory contains "1240". "1240" is the location in memory where the first argument to the application lives. The mov rax, QWORD PTR [rax] instruction changes the value in RAX from "8" (because it was pointing at the second element of argv) to "1240", which is the direct pointer to the first argument to the application! Nothing too scary I hope, but it was worth clarifying.

After the call to strtol, the following code executes -

The third chunk of highlighted assembly.

The call to strtol completes, the return value from the function is inside of RAX immediately after the function call, and it's then stored on the stack at RBP-10. The next line contains something interesting, an automatic optimization by the compiler to insert a call to the errno function which populates the global errno variable with a value if the call to strtol failed for whatever reason. So, we (automagically) call errno(), the return value is inside of RAX as usual, that value is dereferenced (because a pointer is returned from errno()) and placed into EAX.

The next instruction is new. test EAX, EAX is a fairly common and efficient instruction to check if a value is zero or not. If EAX is zero (indicating that no errors were returned from strtol()) then the JE instruction will execute and jump to another location in code. If EAX is not zero then we can see that something gets printed with puts() and there is an unconditional jump with JMP down to the end of the code, which calls exit().

We've successfully worked out that if strtol() fails or raises an error for whatever reason then the application will terminate immediately. This is very common error handling! If you'd like to see the failure in action then run the following /lesson/elseIfCompiled 9223372036854775808. This number is larger than the value which a long integer can accommodate and causes the strtol() call to raise an error.

Let's continue disassembling from main+145 then, the location that the JE instruction jumps to if strtol doesn't raise an error.

The fourth chunk of highlighted assembly.

OK so this piece of code performs a comparison operation on the stack location RBP-0x10 (which we know is the returned value from strtol()) against 0x703cd8. If the user supplied value is equal to 0x703cd8 (or 7355608 in decimal, hit me up on Mastodon if you got this reference) then the application will display a message to the user and then jump down to the call to exit(). If it does not match that value then it will jump to another location in the code.

The fifth chunk of highlighted assembly.

Observe how, if the value isn't 7355608 and we jump to this location in code, this block of code is virtually identical to the above block? This is precisely how if/else if looks in ASM! If this condition is not met then jump somewhere, if this condition isn't met then jump somewhere..

The above block of code checks to see if the user supplied argument is 0xcc07c9 (or 13371337 in decimal) and prints two statements to the user if the value matches. If the value doesn't match 13371337 then we see that there is a jump to main+205 -

The sixth chunk of highlighted assembly.

Observe how the above code puts a string address into RDI, prints the string to the user and then immediately exits? This is how else blocks are represented after an if/else if block. There are no more conditional statements, we've landed in the default code block and execution will continue (or terminate in our case).

We've reverse engineered the application and found out the two access codes to gain access to different pieces of functionality in the app -

The culmination of our reverse engineering efforts.

Switch case

One quick footnote before I wrap this lengthy lesson up. C (and other high level languages) have the concept of a switch case statement, which provides an elegant and aesthetically pleasing way of performing many different comparisons on a value.

I've provided an example in the lesson's Docker container under /lesson/switchCase.c and /lesson/switchCaseCompiled.


    #include <stdio.h>
    #include <stdlib.h>
    #include <errno.h>
    
    
    int main(int argc, char** argv) {
            if(argc<=1) {
            printf("This application is a guessing game. Provide a single character as an argument.\n");
            printf("Usage: '/lesson/elseIfCompiled Z'\n");
            exit(1);
    } 
    
    long int result;
    char *pend;

    errno = 0;
    // strtol, string to long, takes a string and returns the 'long' int representation of that string.
    // Arguments are as follows - 
    // 1. The string to 'cast' to a long, in our case it's the argument to our program
    // 2. a buffer to hold any extra 'stuff' on the end of the number (in case the user does something silly)
    // 3. the 'base' of the number (16 == base 16 == hexadecimal)
    result = strtol (argv[1], &pend, 16);

    switch(result){
        case 0x414141:
            printf("Not 'A' bad guess.\n");
            break;
        case 0x464646:
            printf("'F'eels incorrect.\n");
            break;
        case 0x525252:
            printf("'R'eally close..\n");
            printf("Sort of.\n");
            break;
        case 0x44:
        case 0x45:
        case 0x43:
            printf("Demonstrating switch case fall through.\n");
            break;
        case 0x474747:
            printf("'G'ood work, you 'G'uessed the value correctly. 😊\n");
            break;
        default:
            printf("Incorrect letter provided.\n");
            break;
    }

    exit(0);
    }
    

And here is the relevant and interesting part of the corresponding ASM code -


        0x0000555555555218 <+111>:   call   0x5555555550a0 <strtol@plt>
        0x000055555555521d <+116>:   mov    QWORD PTR [rbp-0x10],rax
        0x0000555555555221 <+120>:   cmp    QWORD PTR [rbp-0x10],0x525252
        0x0000555555555229 <+128>:   je     0x55555555529f <main+246>
        0x000055555555522b <+130>:   cmp    QWORD PTR [rbp-0x10],0x525252
        0x0000555555555233 <+138>:   jg     0x5555555552d5 <main+300>
        0x0000555555555239 <+144>:   cmp    QWORD PTR [rbp-0x10],0x474747
        0x0000555555555241 <+152>:   je     0x5555555552c7 <main+286>
        0x0000555555555247 <+158>:   cmp    QWORD PTR [rbp-0x10],0x474747
        0x000055555555524f <+166>:   jg     0x5555555552d5 <main+300>
        0x0000555555555255 <+172>:   cmp    QWORD PTR [rbp-0x10],0x464646
        0x000055555555525d <+180>:   je     0x555555555291 <main+232>
        0x000055555555525f <+182>:   cmp    QWORD PTR [rbp-0x10],0x464646
        0x0000555555555267 <+190>:   jg     0x5555555552d5 <main+300>
        0x0000555555555269 <+192>:   cmp    QWORD PTR [rbp-0x10],0x45
        0x000055555555526e <+197>:   jg     0x555555555279 <main+208>
        0x0000555555555270 <+199>:   cmp    QWORD PTR [rbp-0x10],0x43
        0x0000555555555275 <+204>:   jge    0x5555555552b9 <main+272>
        0x0000555555555277 <+206>:   jmp    0x5555555552d5 <main+300>
        0x0000555555555279 <+208>:   cmp    QWORD PTR [rbp-0x10],0x414141
        0x0000555555555281 <+216>:   jne    0x5555555552d5 <main+300>
        0x0000555555555283 <+218>:   lea    rdi,[rip+0xe2e]        # 0x5555555560b8
        0x000055555555528a <+225>:   call   0x555555555090 <puts@plt>
        0x000055555555528f <+230>:   jmp    0x5555555552e2 <main+313>
        0x0000555555555291 <+232>:   lea    rdi,[rip+0xe33]        # 0x5555555560cb
        0x0000555555555298 <+239>:   call   0x555555555090 <puts@plt>
        0x000055555555529d <+244>:   jmp    0x5555555552e2 <main+313>
        0x000055555555529f <+246>:   lea    rdi,[rip+0xe38]        # 0x5555555560de
        0x00005555555552a6 <+253>:   call   0x555555555090 <puts@plt>
        0x00005555555552ab <+258>:   lea    rdi,[rip+0xe3d]        # 0x5555555560ef
        0x00005555555552b2 <+265>:   call   0x555555555090 <puts@plt>
        0x00005555555552b7 <+270>:   jmp    0x5555555552e2 <main+313>
        0x00005555555552b9 <+272>:   lea    rdi,[rip+0xe38]        # 0x5555555560f8
        0x00005555555552c0 <+279>:   call   0x555555555090 <puts@plt>
        0x00005555555552c5 <+284>:   jmp    0x5555555552e2 <main+313>
        0x00005555555552c7 <+286>:   lea    rdi,[rip+0xe52]        # 0x555555556120
        0x00005555555552ce <+293>:   call   0x555555555090 <puts@plt>
        0x00005555555552d3 <+298>:   jmp    0x5555555552e2 <main+313>
        0x00005555555552d5 <+300>:   lea    rdi,[rip+0xe79]        # 0x555555556155
        0x00005555555552dc <+307>:   call   0x555555555090 <puts@plt>
        0x00005555555552e1 <+312>:   nop
        0x00005555555552e2 <+313>:   mov    edi,0x0
        0x00005555555552e7 <+318>:   call   0x5555555550b0 <exit@plt>
        End of assembler dump.
    

This lesson is getting quite long now, so I won't do a deep dive into this code like I've done with the other examples above, at this point you likely have enough skills to read every line of this code. The interesting thing to note about the above ASM is the structure of it. This pattern of cmp -> jg -> cmp -> je -> cmp -> jg -> .... followed by a long list of do_something -> jmp -> do_something -> jmp to the same address -> .... is a telltale sign of a switch case statement. If you see this big obvious repetitive structure in a disassembled application then you know for sure that you're looking at a switch case.

Let's do a little bit of analysis then, because I can't help myself. I'll run through to the 'successful guess' block of the switch case.

The code begins by taking the user supplied value and converting it to a hexadecimal number using our friend strtol(). Afterwards, the application performs a comparison between the supplied number and 0x525252. If they are equal then a jump is made to the call to printf("'R'eally close..\n");, followed by a jump to exit(). Next there is a check to see if the supplied value is greater than 0x525252, if it is larger than that number then a jump is taken to the 'default' block of the switch case (because the compiler has cleverly determined that none of the case statements check for a number larger than 0x525252).

Next up is a check to see if the value is equal to 0x474747 (this is the correct guess), if so then the application jumps to main+286 which prints the 'success' message and then jumps unconditionally to the end of the application. We have successfully reverse engineered the application to work out the correct secret code -

Successfully working out the secret code using reverse engineering.

Conclusion

This was probably the longest and most demanding lesson so far, but by working through this material we've finally enabled ourselves to reverse engineer the vast majority of applications, and that's pretty exciting.

In the next lesson we'll finally cover how loops are implemented in ASM, at which time we will have the skills and knowledge to reverse engineer a huge number of non-trivial desktop applications and finally dig into some really fun and interesting applications.