Docker Setup

So as discussed in the prerequisites page, Docker is going to be used for this course. For every lesson in this course I've pushed pre-built images to Dockerhub, so all that you need to do to follow along is run the following commands -

docker pull learnreverseengineering/lesson1
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined learnreverseengineering/lesson1 bash

These two commands will firstly pull down the Docker image for lesson 1 (which builds off of a customized Ubuntu VM with pwndbg preinstalled) and then run it, with seccomp flags enabling pwndbg to disable ASLR on the running executable (this will be discussed later!).

NOTE:The first pull and run combination will take a long time (5+ minutes), because Ubuntu is pulled down / updated / upgraded before pwndbg is installed. Subsequent pulls and runs will be significantly quicker

SECOND NOTE: All code and executables for this level can be found under /lesson inside of the container.

What are registers?

So, consider the following C application (available at /lesson/variables.c in the container) -


            #include <stdio.h>

            int main(int argc, char** argv){
                int int_variable = 65;
                char char_variable = 'A';
                short short_variable = 0x41;
            }

We can clearly see the data types (int, char, short) and the variable names (int_variable, char_variable, short_variable) and this gives us a huge amount of context about what the application does.

Unfortunately, the assembly language has no concept of variables or even data types (to an extent), which is part of the reason why reverse engineering is so difficult - you, as a reverse engineer, need to try and infer what the original developer was thinking by reading ASM code without any contextual hints.

Instead of using variables to hold a piece of data, all flavors of ASM (x86, x64, ARM, MIPS) instead use CPU Registers to temporarily hold data. Each register is a small (8 byte) area of storage which can hold almost any data type that fits within a 64 bit integer, for example -

Absurdly large numbers and small numbers (e.g 0x0123456789abcdef and 0x0000000000000001)
Addresses in RAM (e.g 0x55555555464e)
booleans (e.g 0x0000000000000000, 0x00000000000000FF)
Characters (e.g 0x494C4F564541534D, which is the hex representation of "ILOVEASM")

x64 ASM (the standard which we care about for this course) has the following general purpose registers -

NAME	PURPOSE
RAX	The “Accumulator”. Multi-purpose, nowadays
RBP	The Base Pointer. This register stores the address of the beginning of the current stack frame. This will be explained in more detail shortly.
RBX	The “Base”. Multi-purpose, nowadays
RCX	The “Counter”. Used to be used to hold the current iteration of a loop, for example. Multi-purpose, nowadays
RDX	The “Data” register. Multi-purpose*
RIP	The Instruction Pointer. This register stores the address of the instruction which the CPU is executing at any one time.
RSP	The Stack Pointer. This register stores the address of the current top of the stack. This will be explained in more detail shortly.
RDI	Multi-purpose*
RSI	Multi-purpose*
R8	Multi-purpose*
R9	Multi-purpose*
R10	Multi-purpose*
R11	Multi-purpose
R12	Multi-purpose
R13	Multi-purpose
R14	Multi-purpose
R15	Multi-purpose

Registers marked with an asterisk above are general purpose, but with a caveat that they are occasionally used for a specific purpose which we'll cover in the next lesson.

What does the 'R' prefix mean?

So the truth of the matter is that the R doesn't have an official definition, but it is noteworthy because each of the registers above can be addressed using alternative names (with the exception of R8-R15)

Each of the above registers are 64 bits wide (they can hold 8 bytes of data), but for some of them it's also possible to address the bottom 32 bites of them (using an E prefix), the bottom 16 bits of them (using no prefix at all), and the top 8 bits and bottom 8 bits of that register. The following table explains that a little better -

64 BITS	LOW 32 BITS	LOW 16 BITS	HIGH 8 BITS	LOW 8 BITS
RAX	EAX	AX	AH	AL
RBX	EBX	BX	BH	BL
RCX	ECX	CX	CH	CL
RDX	EDX	DX	DH	DL
RDI	EDI	DI	–	–
RSI	ESI	SI	–	–
RBP	EBP	BP	–	–
RSP	ESP	SP	–	–
RIP	EIP	IP	–	–

By convention, the "R" doesn't really have a definition, but the "E" prefix in 32 bit assembly originally stood for extended

I like to remember the system with - RIDICULOUSLY-Extended AX -> Extended AX -> AX -> A HIGH and A LOW

Another example, to really solidify what this means. Imagine that the RAX register contained 0xcafebabebadc0ffe.

EAX, the bottom 32 bits, would contain 0xbadc0ffe.
AX, the bottom 16 bits, would contain 0x0ffe.
AH, the high 8 bits of AX, would contain 0x0f.
AL, the low 8 bits of AX, would contain 0xfe.

The same is true for the other registers outlined above.

What are CPU Flags?

There is a special register called the EFLAGS register, which is reserved for use by the CPU. This register is managed by the CPU, and is used to track the outcome of certain instructions.

The EFLAGS register is 32bits wide, with each individual bit being used as a boolean for a particular flag. For example, bit number 6 is the Zero Flag, the Zero flag is set to 1 if the CPU has determined that the next conditional jump operation (we'll cover this in lesson 5) is going to be taken, and the Zero Flag will be set to 0 if the CPU has determined that the next conditional jump won't be taken.

As a fledgling ASM writer or reverse engineer, you won't need to mess with the EFLAGS register too often, but it's worth knowing what the fields are and when they're set. Refer to this page for some extra information about the flags register, but don't worry too much if it goes over your head for now. That's normal.

In GDB, execute set show-flags on to force GDB to show you the EFLAGS register in the registers pane at the top of the screen.

Closing Thoughts

I realize that theory like this can be quite dry without concrete examples, and it's hard to make any of it stick. In the next lesson we're going to dive into some actual 64 bit assembly and see these registers in action.

64 BITS	LOW 32 BITS	LOW 16 BITS	HIGH 8 BITS	LOW 8 BITS
RAX	EAX	AX	AH	AL
RBX	EBX	BX	BH	BL
RCX	ECX	CX	CH	CL
RDX	EDX	DX	DH	DL
RDI	EDI	DI	–	–
RSI	ESI	SI	–	–
RBP	EBP	BP	–	–
RSP	ESP	SP	–	–
RIP	EIP	IP	–	–

64 BITS	LOW 32 BITS	LOW 16 BITS	HIGH 8 BITS	LOW 8 BITS
RAX	EAX	AX	AH	AL
RBX	EBX	BX	BH	BL
RCX	ECX	CX	CH	CL
RDX	EDX	DX	DH	DL
RDI	EDI	DI	–	–
RSI	ESI	SI	–	–
RBP	EBP	BP	–	–
RSP	ESP	SP	–	–
RIP	EIP	IP	–	–

Lesson 1 - Registers and CPU Flags.

Docker Setup

What are registers?

What does the 'R' prefix mean?

What are CPU Flags?

Closing Thoughts

64 BITS	LOW 32 BITS	LOW 16 BITS	HIGH 8 BITS	LOW 8 BITS
RAX	EAX	AX	AH	AL
RBX	EBX	BX	BH	BL
RCX	ECX	CX	CH	CL
RDX	EDX	DX	DH	DL
RDI	EDI	DI	–	–
RSI	ESI	SI	–	–
RBP	EBP	BP	–	–
RSP	ESP	SP	–	–
RIP	EIP	IP	–	–