How to follow along

Two basic docker commands are required to follow along with this lesson -

docker pull learnreverseengineering/lesson6
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined learnreverseengineering/lesson6 bash

If you're working through this course on Windows then the windows calculator in programmer mode will make this lesson a little easier to visualize.

Introduction

So in the last lesson we learned how to perform arithmetic operations using instructions like ADD, SUB, MUL, IDIV etc. This is often the easiest and most straightforward way to perform mathematical operations, but there are occasions where a high level language compiler will remove one of these instructions for an ultra-efficient instruction which performs binary maths instead.

In this lesson we'll learn about binary numbers, some efficient ways that they can be manipulated to perform mathematical operations and we'll learn about something called logical operators, which will come up quite frequently during your reverse engineering adventures.

What is a binary number?

Imagine you have a byte value, 0x06. This single byte is made up of 8 bits. In the case of the byte value 0x06 the bits look as follows -

This is how bits work, they are either a 1 (indicating that the bit is set) or a 0 indicating that the bit is unset. So how does 0x06 become 0b00000110?

The binary number counting scheme is very simple, elegant, efficient and powerful. Essentially, the rightmost bit (which we call the least significant bit) can be thought of as the number one. The following bit represents the number two, the following bit is four, the following is eight... sixteen, thirty two, sixty four, one hundred and twenty eight.. In this way, any possible value can be represented by setting individual binary bits! In case you're wondering, we call the leftmost bit the most significant bit

So looking at the original example above of 0x6, we can see that 'bit 2' is set (the second bit position) and so is 'bit 4' (the third bit position). These bits both being set indicate that the byte's value is 6, because 2+4=6!

Here is a series of numbers demonstrating how any value can be made using binary -

#	128	64	32	16	8	4	2	1
6	0	0	0	0	0	1	1	0
25	0	0	0	1	1	0	0	1
251	1	1	1	1	1	0	1	1
255	1	1	1	1	1	1	1	1
119	0	1	1	1	0	1	1	1
120	0	1	1	1	1	0	0	0

All of the examples above represent 1 byte, and are only able to create a value up to 255 (0xFF in hex, 0b11111111 in binary). The registers in a 64 bit CPU (RAX, RBX, RCX etc.) are all 8 bytes wide, which means that in the case where RAX is (for example) 0x1 then the binary representation of the register would be 0b0000000000000000000000000000000000000000000000000000000000000001. If RAX was 9,223,372,036,854,775,807 (0x7fffffffffffffff) then the binary representation would be 0b0111111111111111111111111111111111111111111111111111111111111111.

Binary mathematics

ASM contains four instructions (which we care about) which are able to quickly multiply and divide numbers if a certain set of conditions are met.

SHL
SHR
ROR
ROL

SHL - Shift Left

This instruction allows an an application to take a binary number and shift every bit X places to the left, which has the effect of adding a 0 to the least significant bit position and shifting a bit off of the end at the most significant bit (the leftmost bit). Let's see some code -


        mov rax, 0x1 ; rax is now 0b00000001
        shl rax, 1 ; rax is now 0b00000010 or 2
        shl rax, 1 ; rax is now 0b00000100 or 4
        shl rax, 1 ; rax is now 0b00001000 or 8
        shl rax, 3 ; rax is now 0b01000000 or 64
        shl rax, 2 ; rax is now 0b0000000100000000 or 256
        mov rax, 0x7 ; rax is now 0b00000111 or 7
        shl rax, 2 ; rax is now 0b00011100 or 28

If we were to execute, for example, mov rax, 1 followed by shl rax, 62 then RAX would become 0b01000000000000000000000000000000. If we shifted it left again with shl rax, 1 then RAX would become negative because of two's compliment. If we shifted it left one more time then RAX would become 0 (0b00000000000000000000000000000000) because the set bit was shifted off of the end and lost forever. This is an imporant element of SHL (and SHR, more commonly).

I hope that it's clear from the above example that when the SHL instruction is used with '1' as an argument it has the effect of doubling the value of a register, which is a more efficient operation than IMUL RAX, 2 for example.

What may not be clear though is that supplying a value other than '1' to SHL has the effect of multiplying by powers of two. For example, SHL RAX, 3 is directly equivalent to the following C pseudocode - int number = 2*2*2; rax = rax * number;. SHL RAX, 16 is int number = 1*2*2*2*2*2*2*2*2*2*2*2*2*2*2*2*2; rax = rax * number;. Test this out in a programmer's calendar and you'll see that this is correct.

Basically left shift is an absurdly efficient way of multiplying numbers by arbitrary powers of two!

SHR - Shift Right

It probably won't come as a surprise to learn that there's an equal but opposite instruction named SHR which is able to shift a number to the right by an arbitrary number of places. This has the effect of dividing numbers by powers of two! Let's see the code:


        mov rax, 128 ; 0b10000000
        shr rax, 1 ; 0b01000000, or 64
        shr rax, 1 ; 0b00100000 or 32
        shr rax, 4 ; 0b00000010 or 2
        shr rax, 1 ; 0b00000001 or 1
        mov rax, 5 ; 0b00000101 
        shr rax

Every time a number is shifted right, the least significant bit is lost (it can be thought of as 'falling off' of the right hand side). This looks like the following -


        mov rax, 5 ; 0b00000101
        shr rax, 1 ; 0b00000010, rax is now 2 because the least significant bit was shifted out.
        shr rax, 2 ; 0b00000000, rax is now zero because every bit was shifted out.

In summary, shifting right by one place has the effect of dividing any number by 2. Shifting to the right by any other value has the effect of dividing it by a power of two. Again, shifting right offers a very efficient way of dividing numbers in particular situations.

ROL and ROR (Rotate Left, Rotate Right)

I won't spend too long on these instructions as they're very similar to SHL and SHR in both functionality and instruction structure! The key difference, though, is that where SHL and SHR lose data off of the most and least significant bits respectively, the 'rotate' instructions preserve that data and simply rotate it to the opposite end of the bits. Let's see an example of ROL


        mov AX, 16385 ; AX is now 0b0100000000000001
        rol AX, 1     ; AX is now 0b1000000000000010 or -32,766 because of Two's Compliment
        rol AX, 1     ; AX is now 0b0000000000000101 or 5
        rol AX, 3     ; AX is now 0b0000000000101000 or 40

Notice how there is no loss of data with the rol instruction, the 'dropped bits' are simply teleported (nothing personnel kid) to the least significant bit position instead.

The ror instruction performs the same operation, except the dropped bits are teleported to the most significant bit position instead. Let's see an example:


        mov AX, 5 ; AX is now 0b0000000000000101
        ror AX, 1 ; AX is now 0b1000000000000010 or -32,766 because of Two's Compliment
        ror AX, 1 ; AX is now 0b0100000000000001 or 16,385
        ror AX, 3 ; AX is now 0b0010100000000000 or 10,240

Generally speaking we most commonly see the ROL and ROR instructions for cryptography when reverse engineering, they're not typically used for general purpose operations like SHL and SHR are.

Logical operators

There are 4 logical operators which I'd like to cover in this little section. AND, OR, XOR, NOT. These four instructions allow developers (and compilers) to perform logic on the binary bits within a byte. Let's jump into OR first because it's easy to visualize.

OR instruction

The OR instruction (commonly called inclusive OR) takes two pieces of data, a source and a destination. It iterates over every bit in both pieces of data and compares them together to see if the bit is set in either. If one of the pieces of data has the bit set (AKA it is 1) then the OR instruction will set that bit in the destination register. This is a logical disjunction in computer science.

For example consider the following two bytes -

0b10001000
0b01000100

If those two values were OR'd together then the result would be 0b11001100. It's almost like they've been merged together! An ASM example of this has been provided in the lesson's Docker container under /lesson/inclusiveOR.asm (compiled to /lesson/inclusiveORAsm). Open the compiled code up in GDB, set a breakpoint on the main method and step through the code until the OR instruction is about to be hit.

Observe that RAX is 0xd (0b00001101) and RBX is 0x2 (0b00000010). Step over the OR instruction using n and observe that RAX is now 0x0F or 0b00001111, because the OR instruction has essentially merged the two bitmasks together.

AND instruction

The AND instruction is similar to the OR instruction in functionality and instruction structure, but the difference here is that as it iterates over bits within bytes, if the source and the destination do not have the bit set then the destination will have that bit set to 0. This can be thought of as a logical conjunction in computer science topics!

For example consider the following two bytes -

0b10001001
0b11000101

The result of an AND operation here would be 0b10000001 because only the first and last bits were set to 1 in the same locations in both the source and the destination. An example has been provided under /lesson/logicalAnd.asm and /lesson/logicalAndAsm. Open the compiled binary in GDB, set a breakpoint on the main method and step through the code until the AND instruction is about to be hit.

Observe that AX is 45 (0b0000000000101101) and BX is 35 (0b0000000000100011) before the AND instruction, step over the and instruction with n and observe that RAX is now 33, which is 0b00100001. This has happened because only the 6th bit and the 1st bit were set in both registers when the AND instruction executed.

You can check the binary representation of any value in GDB by running, for example, p/t $rax. This can be helpful when debugging and reversing complex cryptography operations.

XOR instruction

XOR, or Exclusive OR, is a common operation in computer science. It is the opposite of an and instruction basically, in that it will only set a bit to 1 in the destination register if the bit is NOT already set in both the source and destination.. For example, consider the following two bytes:

Source	`0b10101001`
Destination	`0b11000101`
XOR result	`0b01101100`

After a XOR instruction the resulting byte would be 0b01101100, where the '1's are only set in positions where the source OR the destination already had a 1, but not where they BOTH had a 1 set.

One of the common uses for XOR instructions is to quickly clear a register's value back to zero! For example, xor rax, rax will XOR the RAX register with itself and the result is that RAX will become 0. Imagine that RAX is 85 (0b01010101), well 0b01010101^0b01010101 == 0b00000000 because the bits are all set in the same locations in the source and destination operands.

There's a fun quirk with XOR instructions that makes it commonly used for basic cryptography. Number1 XOR'd with Number2 makes Number3, right? Well, Number3 XOR'd with Number2 makes Number1 again! We can see this in /lesson/xorDemo.asm inside of the container. Running /lesson/xorDemoAsm in GDB, we can see the operations in real time

Before the XOR -


        pwndbg> p/t $ax
        $1 = 1010101010101010
        pwndbg> p/t $bx
        $2 = 0101010101010101
        pwndbg> n

After the first XOR -


        pwndbg> p/t $ax
        $3 = 1111111111111111
        pwndbg> n

After the second XOR -


        pwndbg> p/t $ax
        $3 = 1010101010101010

Observe that XORing RAX with RBX again put RAX back to it's original value! XOR is a strange operation, but it pops up very frequently during reverse engineering and it's worthwhile knowing what it does and why it does it.

NOT instruction

The not instruction is the simplest of all of the logical operators, it simply takes a register and flips every one of the register value's bits. If the register is 0x1 (0b00000001) then it will become 254 (0b11111110) after a NOT operation. Pretty straight forward. There's an example under /lesson/notInstruction.asm, compiled to /lesson/notInstructionAsm - open it in GDB, break on main like usual and run it.

Step the code until the NOT instruction is about to execute and look at the binary representation of AX using p/t $rax -


        pwndbg> p/t $rax
        $2 = 101101

Step over the NOT instruction and look at the binary representation of AX again -


        pwndbg> p/t $rax
        $3 = 1111111111010010

As expected, the bits got flipped.

Conclusion

This lesson got pretty far down into the theory weeds, I hope it was easy to follow along though and I hope that you learned something about binary numbers and binary maths!

#	128	64	32	16	8	4	2	1
6	0	0	0	0	0	1	1	0
25	0	0	0	1	1	0	0	1
251	1	1	1	1	1	0	1	1
255	1	1	1	1	1	1	1	1
119	0	1	1	1	0	1	1	1
120	0	1	1	1	1	0	0	0

#	128	64	32	16	8	4	2	1
6	0	0	0	0	0	1	1	0
25	0	0	0	1	1	0	0	1
251	1	1	1	1	1	0	1	1
255	1	1	1	1	1	1	1	1
119	0	1	1	1	0	1	1	1
120	0	1	1	1	1	0	0	0

Lesson 6 - binary mathematics and the logical operators.