One of the difficulties of creating a reverse engineering course like this one, in which course attendees are required to write C programs and then decompile them again is that everyone needs to have precisely the same environment (same GLIBC version, same compiler version, same GDB version, same OS, etc.) otherwise issues and inconsistencies will arise (for example, ASM on one machine looking vastly different to ASM on another machine).
In order to resolve this constraint, I settled on creating and distributing Dockerfiles for each lesson. Mostly because they're significantly more lightweight to distribute than a virtual machine, but they are also more portable and likely to run on equipment with lower specifications (consequently making this a more accessible and inclusive course.)
Instructions can be found here on how to install Docker, and an excellent tutorial for learning the basics of Docker can be found here.
Verify that the installation was successful with docker run hello-world
, and then clean up again
with docker rm $(docker ps -a -q -f status=exited); docker rmi $(docker images -a -q)
To actually run each lesson's Docker container, simply run
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined learnreverseengineering/lesson1 bash
,
replacing lesson1 with the appropriate lesson.
Folks with a background in security might feel a bit uncomfortable running their Docker containers with
--security-opt seccomp=unconfined
. This instruction came directly from the PwnDBG documentation. I've done some
tinkering and I didn't identify any issues or inconsistencies in the course material by omitting that flag, so
feel free to omit it if that makes you feel more comfortable!
In order to perform some reverse engineering, we're going to need some way of disassembling the compiled applications which we write as part of this course. As such, we'll use GDB because it's lightweight, runs anywhere and can be turned into a truly beautiful reverse engineering tool with the addition of PwnDBG / GEF / Peda. No local installation of GDB is going to be required, I've installed it and configured it inside of each of the Docker containers for this course. Some instructions on how to use GDB with PwnDBG can be found here
Obviously not everyone who is taking this course has encountered GDB (or indeed any disassembler) so I felt it appropriate to include some basic commands here which will be used as part of the course
oliver@krankenhaus> gdb nameOfBinaryFile
Inside of GDB, run info functions
to list all functions in an unstripped binary -
(gdb) info functions
All defined functions:
Non-debugging symbols:
0x0000000000001000 _init
0x0000000000001030 printf@plt
0x0000000000001040 __cxa_finalize@plt
0x0000000000001050 _start
0x0000000000001080 deregister_tm_clones
0x00000000000010b0 register_tm_clones
0x00000000000010f0 __do_global_dtors_aux
0x0000000000001130 frame_dummy
0x0000000000001139 main
0x0000000000001170 __libc_csu_init
0x00000000000011d0 __libc_csu_fini
0x00000000000011d4 _fini
When GDB starts, the target executable is paused. It's possible to run the executable inside of GDB (useful for
analyzing where it crashes, dynamically analyzing it etc.) with run
-
(gdb) run
Starting program: /home/oliver/0xff-hello-world/0xff-hello-world
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
learnreverseengineering.com is the best![Inferior 1 (process 524301) exited normally]
Set a breakpoint to pause execution of the application a certain point with break FUNCTION_NAME
or
break *ADDRESS
-
(gdb) break main
Breakpoint 1 at 0x113d
(gdb) break *0x1139
Breakpoint 2 at 0x1139
(gdb) break *main+1
Breakpoint 3 at 0x113a
Inside of GDB, run disassemble FUNCTION_NAME
to disassemble a specific function -
(gdb) disassemble main
Dump of assembler code for function main:
0x0000000000001139 <+0>: push rbp
0x000000000000113a <+1>: mov rbp,rsp
0x000000000000113d <+4>: sub rsp,0x10
0x0000000000001141 <+8>: mov DWORD PTR [rbp-0x4],edi
0x0000000000001144 <+11>: mov QWORD PTR [rbp-0x10],rsi
0x0000000000001148 <+15>: lea rax,[rip+0xeb9] # 0x2008
0x000000000000114f <+22>: mov rdi,rax
0x0000000000001152 <+25>: mov eax,0x0
0x0000000000001157 <+30>: call 0x1030
0x000000000000115c <+35>: mov eax,0x0
0x0000000000001161 <+40>: leave
0x0000000000001162 <+41>: ret
End of assembler dump.
If execution is paused because of a breakpoint then it's possible to simply run disassemble
to
disassemble the current function that's executing.
(gdb) b main
Breakpoint 8 at 0x55555555513d
(gdb) run
Starting program: /home/oliver/0xff-hello-world/0xff-hello-world
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 8, 0x000055555555513d in main ()
(gdb) disassemble
Dump of assembler code for function main:
0x0000555555555139 <+0>: push rbp
0x000055555555513a <+1>: mov rbp,rsp
=> 0x000055555555513d <+4>: sub rsp,0x10
0x0000555555555141 <+8>: mov DWORD PTR [rbp-0x4],edi
0x0000555555555144 <+11>: mov QWORD PTR [rbp-0x10],rsi
0x0000555555555148 <+15>: lea rax,[rip+0xeb9] # 0x555555556008
0x000055555555514f <+22>: mov rdi,rax
0x0000555555555152 <+25>: mov eax,0x0
0x0000555555555157 <+30>: call 0x555555555030
0x000055555555515c <+35>: mov eax,0x0
0x0000555555555161 <+40>: leave
0x0000555555555162 <+41>: ret
End of assembler dump.
(gdb)
Observe that execution has paused at *main+4 as the breakpoint requested
This course uses the C language for reverse engineering. The reasoning for this is that C is a very low level and lightweight language, which means that once it's disassembled it maps very easily back to the C language, because there are very few layers of abstraction between C and assembly.
I've written all of the C examples which will be used in this course, and I analyze/explain them heavily on the assumption that the reader isn't some kind of C language god/goddess. If you're not very well versed in the C language then don't worry, you can still follow along with this course without much trouble.
All of the code in this course is compiled with a very basic GCC configuration - essentially all of the code is compiled without optimization enabled and the binaries are not stripped (which means that they contain symbols, AKA function names/variable names.. things which make reverse engineering significantly easier when learning.
Again, there is no requirement to know any special GCC flags / syntax / internals, I'll always provide BASH scripts to do the compilation where necessary.