So get out JA
seat and JMP
around! JMP
around! JMP
around!
Two basic docker commands are required to follow along with this lesson -
docker pull learnreverseengineering/lesson7
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined learnreverseengineering/lesson7 bash
Alright so if you've been following along with this material so far then you may have noticed that all of the
little ASM programs that we've worked through have something in common - they start at the beginning of the code
(Either the main()
method or the _start
label.), they execute a few instructions and
then they end. There is no branching (aside from function calls) depending on the outcome of conditional
statements and there is no iteration (e.g looping, running the same piece of code over and over
until a condition is met)!
Generally speaking, every application worth its salt has some form of conditional statement usage and probably some form of iteration - it's what allows applications to perform things like error checking and providing rich functionality that is able to respond to events / input / external stimulus / stuff.
Imagine if we wrote a program which accepted a number from the user, but we had no way of conditionally doing something with that number if, for example, it turned out to be a string like goat - the application would try to use the word 'goat' as an integer and it would crash ungracefully.
There are an absurd amount of conditional statement mnemonics in x64 ASM. We only need to care about 4 or 5 of them though and the rest will be intuitive when we see them. I've made the decision to jump straight into some C code and see how that gets mapped to ASM, so we can learn by example here.
If you're used to programming in high level languages then a conditional statement is just like an
if(){}
statement in your favorite language. Essentially,
if (this condition is true) then { do these things } otherwise { do these things }
- it's already
clear how this will make for more interesting applications and examples I hope!
Consider the following C code (available in the lesson's docker container under /lesson/if.c) -
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
int main(int argc, char** argv) {
if(argc == 7) {
printf("You've successfully established the correct number of arguments to access this application.\n");
printf("Pretend that the application is now exposing some awesome functionality, please.\n");
} else {
printf("This application requires a specific number of command line arguments for it to function.\n");
printf("Usage: '/lesson/elseIfCompiled argument1 argument2.........argumentZ'\n");
}
exit(0);
}
Pretty simple stuff I hope. The application uses the if
statement to check if the number of
arguments passed to the program is 7. If it's not precisely 7 then the else
block executes and
tells the user to do better.
Test the application out by running /lesson/ifCompiled 1 2 3 4 5 and observe the response, then run /lesson/ifCompiled 1 2 3 4 5 6 to observe that we've satisfied the requirement for 'argc' to be 7 (6 arguments + the name of the executable). Let's open this up in GDB and see how it looks
pwndbg> disassemble main
Dump of assembler code for function main:
=> 0x0000555555555169 <+0>: endbr64
0x000055555555516d <+4>: push rbp
0x000055555555516e <+5>: mov rbp,rsp
0x0000555555555171 <+8>: sub rsp,0x10
0x0000555555555175 <+12>: mov DWORD PTR [rbp-0x4],edi
0x0000555555555178 <+15>: mov QWORD PTR [rbp-0x10],rsi
0x000055555555517c <+19>: cmp DWORD PTR [rbp-0x4],0x7
0x0000555555555180 <+23>: jne 0x55555555519c <main+51>
0x0000555555555182 <+25>: lea rdi,[rip+0xe7f] # 0x555555556008
0x0000555555555189 <+32>: call 0x555555555060 <puts@plt>
0x000055555555518e <+37>: lea rdi,[rip+0xed3] # 0x555555556068
0x0000555555555195 <+44>: call 0x555555555060 <puts@plt>
0x000055555555519a <+49>: jmp 0x5555555551b4 <main+75>
0x000055555555519c <+51>: lea rdi,[rip+0xf1d] # 0x5555555560c0
0x00005555555551a3 <+58>: call 0x555555555060 <puts@plt>
0x00005555555551a8 <+63>: lea rdi,[rip+0xf71] # 0x555555556120
0x00005555555551af <+70>: call 0x555555555060 <puts@plt>
0x00005555555551b4 <+75>: mov edi,0x0
0x00005555555551b9 <+80>: call 0x555555555070 <exit@plt>
End of assembler dump.
Alright so at this point in the course I hope that there aren't any surprises in the above ASM (aside from maybe
LEA
which I'll cover in a second). We can see three new instructions though.
JMP, JNE, CMP
.
CMP
- usage: CMP something, something_else
. Compares two values together. JNE
- usage: JNE an_address
. Jump if Not
Equal. CMP
or a TST
instruction.JMP
- usage: JMP an_address
an unconditional jump to an address in the executable.
goto
statement in C and other languages.Maybe a touch confusing but hopefully not too bad. With the context above, here's a breakdown of what the assembly is doing (starting at the line above the cmp instruction)
cmp DWORD PTR [rbp-0x4],0x7
- compares the stack value at RBP-0x4 (argc) with 0x7jne 0x55555555519c <main+51>
- jumps to an address (0x5555555519c) if
they're not equal (this is the failure case)lea rdi,[rip+0xe7f] # 0x555555556008
- Execution simply continues here
if the values were equal. Load the Effective Address of our 'success' string into RDIcall 0x555555555060 <puts@plt>
- print the success stringlea rdi,[rip+0xed3] # 0x555555556068
- Load the Effective Address of our second
'success' string into RDIcall 0x555555555060 <puts@plt>
- print itjmp 0x5555555551b4 <main+75>
- jump to the end of the program to call
exit()
lea rdi,[rip+0xf1d] # 0x5555555560c0
- THIS location is where the
JNE
jumps to, it is the start of the failure case. call 0x555555555060 <puts@plt>
- print the failure stringlea rdi,[rip+0xf71] # 0x555555556120
- load the second failure string into RDIcall 0x555555555060 <puts@plt>
- print itmov edi,0x0
- put the return code into EDIcall 0x555555555070 <exit@plt>
- gracefully exit the program.As noted above, the LEA instruction is new to us. This instruction puts the address of a
string into a register to be used by functions like puts()
which accept string pointers as
arguments. In this case it's directly equivalent to mov rdi, 0x555555556008
. Confirm this yourself
in GDB with x/s 0x555555556008.
Another note, in case it's unclear, if the result of a cmp
instruction
doesn't satisfy a conditional jump's condition, the conditional jump instruction is simply skipped over, it
doesn't execute.
Observe above that we have a JNE
instruction. It shouldn't surprise you to learn that there is a
JE
instruction too, which jumps only if two values (compared with CMP
) are
equal
. There is also a JZ / JNZ
instruction for jump if zero and jump if not zero,
these are functionally identical to JE and JNE, they operate on the value of the Z flag in the EFLAGS register.
Just for fun, to really cement how this stuff works, we can use the examine instruction command ( x/i ) in GDB to look at the code which will be executed by the JNE instruction and the JMP instructions respectively -
There is an additional form of if statements called else if. This code allows developers to write code which firstly checks for a condition being true, then check for another condition being true, as many times as the developer wishes, before eventually falling into an else block and breaking out of the conditional.
Consider the following snippet (which can be found in the docker container under /lesson/elseIfRedacted.c) -
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
int main(int argc, char** argv) {
if(argc<=1) {
printf("In order to unlock this application you must supply the correct numeric 'code' as an argument to the application.\n");
printf("Usage: '/lesson/elseIfCompiled 53'\n");
exit(1);
}
long int result;
char *pend;
errno = 0;
// strtol, string to long, takes a string and returns the 'long' int representation of that string.
// Arguments are as follows -
// 1. The string to 'cast' to a long, in our case it's the argument to our program
// 2. a buffer to hold any extra 'stuff' on the end of the number (in case the user does something silly)
// 3. the 'base' of the number (10 == base 10 == decimal)
result = strtol (argv[1], &pend, 10);
if(errno != 0){ // Check if strtol returned an error
printf("Something bad has happened, exiting the program.\n");
} else if(result == REDACTED) { // Check if the code was correct for a low privileged user
printf("Welcome low privileged user, the application was unlocked. Shame you're not an administrator though.\n");
} else if(result == REDACTED){ // Check if the code was correct for an administrator
printf("Welcome admin, the application was unlocked with all privileges enabled.\n");
printf("Please pretend that some useful functionality was enabled.\n");
} else {
printf("That code is incorrect. Please try again.\n");
}
exit(0);
}
This is clearly the largest and most complicated C code that we've seen so far in this course, so let's break it down and explain what it's doing.
if
statement)else if
statement)else if
statement)else
statement) it returns a generic 'you failed' message and ends the if
statement.I've unhelpfully redacted the numeric access codes from the C source code, and I've hidden the original C source file in the container somewhere so you can't just peek at it to see what the correct numbers are. We're reverse engineers after all, eh? 🙂
You probably won't find the file either, so I wouldn't waste valuable seconds hunting for it when you could just open /lesson/elseIfCompiled in GDB and work out the code from there!
The ASM code below was established by disassembling /lesson/elseIfCompiled in the lesson's container.
pwndbg> disassemble main
Dump of assembler code for function main:
=> 0x00005555555551a9 <+0>: endbr64
0x00005555555551ad <+4>: push rbp
0x00005555555551ae <+5>: mov rbp,rsp
0x00005555555551b1 <+8>: sub rsp,0x30
0x00005555555551b5 <+12>: mov DWORD PTR [rbp-0x24],edi
0x00005555555551b8 <+15>: mov QWORD PTR [rbp-0x30],rsi
0x00005555555551bc <+19>: mov rax,QWORD PTR fs:0x28
0x00005555555551c5 <+28>: mov QWORD PTR [rbp-0x8],rax
0x00005555555551c9 <+32>: xor eax,eax
0x00005555555551cb <+34>: cmp DWORD PTR [rbp-0x24],0x1
0x00005555555551cf <+38>: jg 0x5555555551f3 <main+74>
0x00005555555551d1 <+40>: lea rdi,[rip+0xe30] # 0x555555556008
0x00005555555551d8 <+47>: call 0x555555555090 <puts@plt>
0x00005555555551dd <+52>: lea rdi,[rip+0xe9c] # 0x555555556080
0x00005555555551e4 <+59>: call 0x555555555090 <puts@plt>
0x00005555555551e9 <+64>: mov edi,0x1
0x00005555555551ee <+69>: call 0x5555555550b0 <exit@plt>
0x00005555555551f3 <+74>: call 0x555555555080 <__errno_location@plt>
0x00005555555551f8 <+79>: mov DWORD PTR [rax],0x0
0x00005555555551fe <+85>: mov rax,QWORD PTR [rbp-0x30]
0x0000555555555202 <+89>: add rax,0x8
0x0000555555555206 <+93>: mov rax,QWORD PTR [rax]
0x0000555555555209 <+96>: lea rcx,[rbp-0x18]
0x000055555555520d <+100>: mov edx,0xa
0x0000555555555212 <+105>: mov rsi,rcx
0x0000555555555215 <+108>: mov rdi,rax
0x0000555555555218 <+111>: call 0x5555555550a0 <strtol@plt>
0x000055555555521d <+116>: mov QWORD PTR [rbp-0x10],rax
0x0000555555555221 <+120>: call 0x555555555080 <__errno_location@plt>
0x0000555555555226 <+125>: mov eax,DWORD PTR [rax]
0x0000555555555228 <+127>: test eax,eax
0x000055555555522a <+129>: je 0x55555555523a <main+145>
0x000055555555522c <+131>: lea rdi,[rip+0xe75] # 0x5555555560a8
0x0000555555555233 <+138>: call 0x555555555090 <puts@plt>
0x0000555555555238 <+143>: jmp 0x555555555282 <main+217>
0x000055555555523a <+145>: cmp QWORD PTR [rbp-0x10],0x703cd8
0x0000555555555242 <+153>: jne 0x555555555252 <main+169>
0x0000555555555244 <+155>: lea rdi,[rip+0xe95] # 0x5555555560e0
0x000055555555524b <+162>: call 0x555555555090 <puts@plt>
0x0000555555555250 <+167>: jmp 0x555555555282 <main+217>
0x0000555555555252 <+169>: cmp QWORD PTR [rbp-0x10],0xcc07c9
0x000055555555525a <+177>: jne 0x555555555276 <main+205>
0x000055555555525c <+179>: lea rdi,[rip+0xee5] # 0x555555556148
0x0000555555555263 <+186>: call 0x555555555090 <puts@plt>
0x0000555555555268 <+191>: lea rdi,[rip+0xf29] # 0x555555556198
0x000055555555526f <+198>: call 0x555555555090 <puts@plt>
0x0000555555555274 <+203>: jmp 0x555555555282 <main+217>
0x0000555555555276 <+205>: lea rdi,[rip+0xf5b] # 0x5555555561d8
0x000055555555527d <+212>: call 0x555555555090 <puts@plt>
0x0000555555555282 <+217>: mov edi,0x0
0x0000555555555287 <+222>: call 0x5555555550b0 <exit@plt>
End of assembler dump.
Well, it was our most complete and complicated C example so far so it only makes sense for this to be our most
complete and complicated ASM sample too. I'm absolutely certain that this looks like quite a scary listing as a
budding reverse engineer so I've taken the liberty of highlighting the lines which correspond with
if/else if
statements, and we'll start working through each one of them below.
OK starting at the first two highlighted lines above -
The cmp
statement checks a value on the stack (argc!) against the number one. Immediately afterwards
is a new instruction JG
, which stands for "Jump if Greater Than". Telling us that if the
cmp
instruction's result set the 'Z' flag in the EFLAGS register to one AND if the cmp
instruction set the 'S' and 'O' flags to the same values in the EFLAGS register then the jump will be taken.
This is a very confusing, so I want to say two things.
cmp
instructioncmp
statement is larger than the right value then a
JG will be taken.One last note before we move on, there is also -
JL
- jump if less thanJB
- jump if below, same thing as JLJA
- jump if above, same as JGJGE
- jump if greater than or equal toJLE
- jump if less than or equal toBack to the analysis anyway. If argc is greater than '1' then we make a jump into another location in the code.
If we don't jump then a few instructions ahead is a big, inevitable call to exit()
which will
clearly exit the application. Based on this information we know that we need argc
to be greater
than 1 in order to continue the application's execution.
Skipping on to the point in the code where the JG lands then, in the case where argc
is greater than
1.
The above code isn't particularly scary, it sets up a number of registers and stack locations for a call to
strtol()
which converts the string pointed to by RDI into a long value. There are a couple of
noteworthy instructions here which I'd like to dive into quickly.
mov rax,QWORD PTR [rbp-0x30]
- puts the argv
vector into RAX add rax, 0x8
- adds 8 bytes to RAX's value, which has the effect of making RAX point at the
second string pointer in the argv
vectormov rax, QWORD PTR [rax]
- this is the interesting line. argv
is a list of string
pointers, and right now RAX contains a pointer to a string pointer. This line dereferences the
pointer to get the direct address of the string.I'll quickly clarify that last point. Consider that the argv vector looks like the following in memory -
Addresses in RAM | 0 (argv[0]) | 8 (argv[1]) | 16 (argv[2]) | 24 (argv[3]) |
---|---|---|---|---|
Data at that address | 1000 | 1240 | 1180 | 2248 |
Ignoring the obviously fictional numbers here, observe how address 8 in memory contains "1240". "1240" is the
location in memory where the first argument to the application lives. The mov rax, QWORD PTR [rax]
instruction changes the value in RAX from "8" (because it was pointing at the second element of argv) to "1240",
which is the direct pointer to the first argument to the application! Nothing too scary I hope,
but it was worth clarifying.
After the call to strtol
, the following code executes -
The call to strtol
completes, the return value from the function is inside of RAX immediately after
the function call, and it's then stored on the stack at RBP-10
. The next line contains something
interesting, an automatic optimization by the compiler to insert a call to the errno
function which
populates the global errno
variable with a value if the call to strtol
failed for
whatever reason. So, we (automagically) call errno()
, the return value is inside of RAX as usual,
that value is dereferenced (because a pointer is returned from errno()
) and placed into EAX.
The next instruction is new. test EAX, EAX
is a fairly common and efficient instruction to check if
a value is zero or not. If EAX is zero (indicating that no errors were returned from strtol()
) then
the JE
instruction will execute and jump to another location in code. If EAX is
not zero then we can see that something gets printed with puts()
and there is an
unconditional jump with JMP down to the end of the code, which calls exit()
.
We've successfully worked out that if strtol()
fails or raises an error for whatever reason then the
application will terminate immediately. This is very common error handling! If you'd like to see the failure in
action then run the following /lesson/elseIfCompiled 9223372036854775808. This number is larger than
the value which a long integer can accommodate and causes the strtol()
call to raise an error.
Let's continue disassembling from main+145 then, the location that the JE instruction jumps to if
strtol
doesn't raise an error.
OK so this piece of code performs a comparison operation on the stack location RBP-0x10 (which we know is the
returned value from strtol()
) against 0x703cd8. If the user supplied value is equal to 0x703cd8 (or
7355608 in decimal, hit me up on Mastodon if you got this reference) then the application will
display a message to the user and then jump down to the call to exit()
. If it does not
match that value then it will jump to another location in the code.
Observe how, if the value isn't 7355608 and we jump to this location in code, this block of code is
virtually identical to the above block? This is precisely how if/else if
looks in
ASM! If this condition is not met then jump somewhere, if this condition isn't met then jump somewhere..
The above block of code checks to see if the user supplied argument is 0xcc07c9 (or 13371337 in decimal) and prints two statements to the user if the value matches. If the value doesn't match 13371337 then we see that there is a jump to main+205 -
Observe how the above code puts a string address into RDI, prints the string to the user and then immediately
exits? This is how else
blocks are represented after an if/else if
block. There are no
more conditional statements, we've landed in the default code block and execution will continue (or terminate in
our case).
We've reverse engineered the application and found out the two access codes to gain access to different pieces of functionality in the app -
One quick footnote before I wrap this lengthy lesson up. C (and other high level languages) have the concept of a switch case statement, which provides an elegant and aesthetically pleasing way of performing many different comparisons on a value.
I've provided an example in the lesson's Docker container under /lesson/switchCase.c and /lesson/switchCaseCompiled.
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
int main(int argc, char** argv) {
if(argc<=1) {
printf("This application is a guessing game. Provide a single character as an argument.\n");
printf("Usage: '/lesson/elseIfCompiled Z'\n");
exit(1);
}
long int result;
char *pend;
errno = 0;
// strtol, string to long, takes a string and returns the 'long' int representation of that string.
// Arguments are as follows -
// 1. The string to 'cast' to a long, in our case it's the argument to our program
// 2. a buffer to hold any extra 'stuff' on the end of the number (in case the user does something silly)
// 3. the 'base' of the number (16 == base 16 == hexadecimal)
result = strtol (argv[1], &pend, 16);
switch(result){
case 0x414141:
printf("Not 'A' bad guess.\n");
break;
case 0x464646:
printf("'F'eels incorrect.\n");
break;
case 0x525252:
printf("'R'eally close..\n");
printf("Sort of.\n");
break;
case 0x44:
case 0x45:
case 0x43:
printf("Demonstrating switch case fall through.\n");
break;
case 0x474747:
printf("'G'ood work, you 'G'uessed the value correctly. 😊\n");
break;
default:
printf("Incorrect letter provided.\n");
break;
}
exit(0);
}
And here is the relevant and interesting part of the corresponding ASM code -
0x0000555555555218 <+111>: call 0x5555555550a0 <strtol@plt>
0x000055555555521d <+116>: mov QWORD PTR [rbp-0x10],rax
0x0000555555555221 <+120>: cmp QWORD PTR [rbp-0x10],0x525252
0x0000555555555229 <+128>: je 0x55555555529f <main+246>
0x000055555555522b <+130>: cmp QWORD PTR [rbp-0x10],0x525252
0x0000555555555233 <+138>: jg 0x5555555552d5 <main+300>
0x0000555555555239 <+144>: cmp QWORD PTR [rbp-0x10],0x474747
0x0000555555555241 <+152>: je 0x5555555552c7 <main+286>
0x0000555555555247 <+158>: cmp QWORD PTR [rbp-0x10],0x474747
0x000055555555524f <+166>: jg 0x5555555552d5 <main+300>
0x0000555555555255 <+172>: cmp QWORD PTR [rbp-0x10],0x464646
0x000055555555525d <+180>: je 0x555555555291 <main+232>
0x000055555555525f <+182>: cmp QWORD PTR [rbp-0x10],0x464646
0x0000555555555267 <+190>: jg 0x5555555552d5 <main+300>
0x0000555555555269 <+192>: cmp QWORD PTR [rbp-0x10],0x45
0x000055555555526e <+197>: jg 0x555555555279 <main+208>
0x0000555555555270 <+199>: cmp QWORD PTR [rbp-0x10],0x43
0x0000555555555275 <+204>: jge 0x5555555552b9 <main+272>
0x0000555555555277 <+206>: jmp 0x5555555552d5 <main+300>
0x0000555555555279 <+208>: cmp QWORD PTR [rbp-0x10],0x414141
0x0000555555555281 <+216>: jne 0x5555555552d5 <main+300>
0x0000555555555283 <+218>: lea rdi,[rip+0xe2e] # 0x5555555560b8
0x000055555555528a <+225>: call 0x555555555090 <puts@plt>
0x000055555555528f <+230>: jmp 0x5555555552e2 <main+313>
0x0000555555555291 <+232>: lea rdi,[rip+0xe33] # 0x5555555560cb
0x0000555555555298 <+239>: call 0x555555555090 <puts@plt>
0x000055555555529d <+244>: jmp 0x5555555552e2 <main+313>
0x000055555555529f <+246>: lea rdi,[rip+0xe38] # 0x5555555560de
0x00005555555552a6 <+253>: call 0x555555555090 <puts@plt>
0x00005555555552ab <+258>: lea rdi,[rip+0xe3d] # 0x5555555560ef
0x00005555555552b2 <+265>: call 0x555555555090 <puts@plt>
0x00005555555552b7 <+270>: jmp 0x5555555552e2 <main+313>
0x00005555555552b9 <+272>: lea rdi,[rip+0xe38] # 0x5555555560f8
0x00005555555552c0 <+279>: call 0x555555555090 <puts@plt>
0x00005555555552c5 <+284>: jmp 0x5555555552e2 <main+313>
0x00005555555552c7 <+286>: lea rdi,[rip+0xe52] # 0x555555556120
0x00005555555552ce <+293>: call 0x555555555090 <puts@plt>
0x00005555555552d3 <+298>: jmp 0x5555555552e2 <main+313>
0x00005555555552d5 <+300>: lea rdi,[rip+0xe79] # 0x555555556155
0x00005555555552dc <+307>: call 0x555555555090 <puts@plt>
0x00005555555552e1 <+312>: nop
0x00005555555552e2 <+313>: mov edi,0x0
0x00005555555552e7 <+318>: call 0x5555555550b0 <exit@plt>
End of assembler dump.
This lesson is getting quite long now, so I won't do a deep dive into this code like I've done with the other
examples above, at this point you likely have enough skills to read every line of this code. The interesting
thing to note about the above ASM is the structure of it. This pattern of
cmp -> jg -> cmp -> je -> cmp -> jg -> ....
followed by a long list of
do_something -> jmp -> do_something -> jmp to the same address -> ....
is a telltale sign of a
switch case statement. If you see this big obvious repetitive structure in a disassembled application then you
know for sure that you're looking at a switch case.
Let's do a little bit of analysis then, because I can't help myself. I'll run through to the 'successful guess' block of the switch case.
The code begins by taking the user supplied value and converting it to a hexadecimal number using our friend
strtol()
. Afterwards, the application performs a comparison between the supplied number and
0x525252. If they are equal then a jump is made to the call to printf("'R'eally close..\n");
,
followed by a jump to exit()
. Next there is a check to see if the supplied value is greater than
0x525252, if it is larger than that number then a jump is taken to the 'default' block of the switch case
(because the compiler has cleverly determined that none of the case statements check for a number larger than
0x525252).
Next up is a check to see if the value is equal to 0x474747 (this is the correct guess), if so then the application jumps to main+286 which prints the 'success' message and then jumps unconditionally to the end of the application. We have successfully reverse engineered the application to work out the correct secret code -
This was probably the longest and most demanding lesson so far, but by working through this material we've finally enabled ourselves to reverse engineer the vast majority of applications, and that's pretty exciting.
In the next lesson we'll finally cover how loops are implemented in ASM, at which time we will have the skills and knowledge to reverse engineer a huge number of non-trivial desktop applications and finally dig into some really fun and interesting applications.