Lesson 1 - Registers and CPU Flags.

Fun with flags 🏴 (and registers)....

Docker Setup

So as discussed in the prerequisites page, Docker is going to be used for this course. For every lesson in this course I've pushed pre-built images to Dockerhub, so all that you need to do to follow along is run the following commands -

These two commands will firstly pull down the Docker image for lesson 1 (which builds off of a customized Ubuntu VM with pwndbg preinstalled) and then run it, with seccomp flags enabling pwndbg to disable ASLR on the running executable (this will be discussed later!).

NOTE:The first pull and run combination will take a long time (5+ minutes), because Ubuntu is pulled down / updated / upgraded before pwndbg is installed. Subsequent pulls and runs will be significantly quicker

SECOND NOTE: All code and executables for this level can be found under /lesson inside of the container.

What are registers?

So, consider the following C application (available at /lesson/variables.c in the container) -


            #include <stdio.h>

            int main(int argc, char** argv){
                int int_variable = 65;
                char char_variable = 'A';
                short short_variable = 0x41;
            }
        

We can clearly see the data types (int, char, short) and the variable names (int_variable, char_variable, short_variable) and this gives us a huge amount of context about what the application does.

Unfortunately, the assembly language has no concept of variables or even data types (to an extent), which is part of the reason why reverse engineering is so difficult - you, as a reverse engineer, need to try and infer what the original developer was thinking by reading ASM code without any contextual hints.

Instead of using variables to hold a piece of data, all flavors of ASM (x86, x64, ARM, MIPS) instead use CPU Registers to temporarily hold data. Each register is a small (8 byte) area of storage which can hold almost any data type that fits within a 64 bit integer, for example -

x64 ASM (the standard which we care about for this course) has the following general purpose registers -

NAME PURPOSE
RAX The “Accumulator”. Multi-purpose, nowadays
RBP The Base Pointer. This register stores the address of the beginning of the current stack frame. This will be explained in more detail shortly.
RBX The “Base”. Multi-purpose, nowadays
RCX The “Counter”. Used to be used to hold the current iteration of a loop, for example. Multi-purpose, nowadays
RDX The “Data” register. Multi-purpose*
RIP The Instruction Pointer. This register stores the address of the instruction which the CPU is executing at any one time.
RSP The Stack Pointer. This register stores the address of the current top of the stack. This will be explained in more detail shortly.
RDI Multi-purpose*
RSI Multi-purpose*
R8 Multi-purpose*
R9 Multi-purpose*
R10 Multi-purpose*
R11 Multi-purpose
R12 Multi-purpose
R13 Multi-purpose
R14 Multi-purpose
R15 Multi-purpose

Registers marked with an asterisk above are general purpose, but with a caveat that they are occasionally used for a specific purpose which we'll cover in the next lesson.

What does the 'R' prefix mean?

So the truth of the matter is that the R doesn't have an official definition, but it is noteworthy because each of the registers above can be addressed using alternative names (with the exception of R8-R15)

Each of the above registers are 64 bits wide (they can hold 8 bytes of data), but for some of them it's also possible to address the bottom 32 bites of them (using an E prefix), the bottom 16 bits of them (using no prefix at all), and the top 8 bits and bottom 8 bits of that register. The following table explains that a little better -

64 BITS LOW 32 BITS LOW 16 BITS HIGH 8 BITS LOW 8 BITS
RAX EAX AX AH AL
RBX EBX BX BH BL
RCX ECX CX CH CL
RDX EDX DX DH DL
RDI EDI DI
RSI ESI SI
RBP EBP BP
RSP ESP SP
RIP EIP IP

By convention, the "R" doesn't really have a definition, but the "E" prefix in 32 bit assembly originally stood for extended

I like to remember the system with - RIDICULOUSLY-Extended AX -> Extended AX -> AX -> A HIGH and A LOW

Another example, to really solidify what this means. Imagine that the RAX register contained 0xcafebabebadc0ffe.

The same is true for the other registers outlined above.

What are CPU Flags?

There is a special register called the EFLAGS register, which is reserved for use by the CPU. This register is managed by the CPU, and is used to track the outcome of certain instructions.

The EFLAGS register is 32bits wide, with each individual bit being used as a boolean for a particular flag. For example, bit number 6 is the Zero Flag, the Zero flag is set to 1 if the CPU has determined that the next conditional jump operation (we'll cover this in lesson 5) is going to be taken, and the Zero Flag will be set to 0 if the CPU has determined that the next conditional jump won't be taken.

As a fledgling ASM writer or reverse engineer, you won't need to mess with the EFLAGS register too often, but it's worth knowing what the fields are and when they're set. Refer to this page for some extra information about the flags register, but don't worry too much if it goes over your head for now. That's normal.

In GDB, execute set show-flags on to force GDB to show you the EFLAGS register in the registers pane at the top of the screen.

Closing Thoughts

I realize that theory like this can be quite dry without concrete examples, and it's hard to make any of it stick. In the next lesson we're going to dive into some actual 64 bit assembly and see these registers in action.