Rockin'. Rockin' AND
ROL
lin'. Down to the beach I'm
SHR
trollin'🎶....
Two basic docker commands are required to follow along with this lesson -
docker pull learnreverseengineering/lesson6
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined learnreverseengineering/lesson6 bash
If you're working through this course on Windows then the windows calculator in programmer mode will make this lesson a little easier to visualize.
So in the last lesson we learned how to perform arithmetic operations using instructions like
ADD, SUB, MUL, IDIV
etc. This is often the easiest and most straightforward way to perform
mathematical operations, but there are occasions where a high level language compiler will remove one of these
instructions for an ultra-efficient instruction which performs binary maths instead.
In this lesson we'll learn about binary numbers, some efficient ways that they can be manipulated to perform mathematical operations and we'll learn about something called logical operators, which will come up quite frequently during your reverse engineering adventures.
Imagine you have a byte value, 0x06. This single byte is made up of 8 bits. In the case of the byte value 0x06 the bits look as follows -
0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
This is how bits work, they are either a 1 (indicating that the bit is set) or a 0 indicating
that the bit is unset. So how does 0x06 become 0b00000110
?
The binary number counting scheme is very simple, elegant, efficient and powerful. Essentially, the rightmost bit (which we call the least significant bit) can be thought of as the number one. The following bit represents the number two, the following bit is four, the following is eight... sixteen, thirty two, sixty four, one hundred and twenty eight.. In this way, any possible value can be represented by setting individual binary bits! In case you're wondering, we call the leftmost bit the most significant bit
So looking at the original example above of 0x6, we can see that 'bit 2' is set (the second bit position) and so
is 'bit 4' (the third bit position). These bits both being set indicate that the byte's value is 6, because
2+4=6
!
Here is a series of numbers demonstrating how any value can be made using binary -
# | 128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 |
---|---|---|---|---|---|---|---|---|
6 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
25 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 |
251 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 |
255 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
119 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 |
120 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
All of the examples above represent 1 byte, and are only able to create a value up to 255 (0xFF in hex,
0b11111111
in binary). The registers in a 64 bit CPU (RAX, RBX, RCX etc.) are all 8 bytes wide,
which means that in the case where RAX is (for example) 0x1 then the binary representation of the register would
be 0b0000000000000000000000000000000000000000000000000000000000000001
. If RAX was
9,223,372,036,854,775,807 (0x7fffffffffffffff) then the binary representation would be
0b0111111111111111111111111111111111111111111111111111111111111111
.
ASM contains four instructions (which we care about) which are able to quickly multiply and divide numbers if a certain set of conditions are met.
SHL
SHR
ROR
ROL
This instruction allows an an application to take a binary number and shift every bit X places to the left, which has the effect of adding a 0 to the least significant bit position and shifting a bit off of the end at the most significant bit (the leftmost bit). Let's see some code -
mov rax, 0x1 ; rax is now 0b00000001
shl rax, 1 ; rax is now 0b00000010 or 2
shl rax, 1 ; rax is now 0b00000100 or 4
shl rax, 1 ; rax is now 0b00001000 or 8
shl rax, 3 ; rax is now 0b01000000 or 64
shl rax, 2 ; rax is now 0b0000000100000000 or 256
mov rax, 0x7 ; rax is now 0b00000111 or 7
shl rax, 2 ; rax is now 0b00011100 or 28
If we were to execute, for example, mov rax, 1
followed by shl rax, 62
then RAX would
become 0b01000000000000000000000000000000
. If we shifted it left again with shl rax, 1
then RAX would become negative because of two's
compliment. If we shifted it left one more time then RAX would become 0
(0b00000000000000000000000000000000
) because the set bit was shifted off of the end and
lost forever. This is an imporant element of SHL (and SHR, more commonly).
I hope that it's clear from the above example that when the SHL
instruction is used with '1' as an
argument it has the effect of doubling the value of a register, which is a more efficient operation than
IMUL RAX, 2
for example.
What may not be clear though is that supplying a value other than '1' to SHL has the effect of multiplying by
powers of two. For example, SHL RAX, 3
is directly equivalent to the following C
pseudocode - int number = 2*2*2; rax = rax * number;
. SHL RAX, 16
is
int number = 1*2*2*2*2*2*2*2*2*2*2*2*2*2*2*2*2; rax = rax * number;
. Test this out in a
programmer's calendar and you'll see that this is correct.
Basically left shift is an absurdly efficient way of multiplying numbers by arbitrary powers of two!
It probably won't come as a surprise to learn that there's an equal but opposite instruction named
SHR
which is able to shift a number to the right by an arbitrary number of places. This has the
effect of dividing numbers by powers of two! Let's see the code:
mov rax, 128 ; 0b10000000
shr rax, 1 ; 0b01000000, or 64
shr rax, 1 ; 0b00100000 or 32
shr rax, 4 ; 0b00000010 or 2
shr rax, 1 ; 0b00000001 or 1
mov rax, 5 ; 0b00000101
shr rax
Every time a number is shifted right, the least significant bit is lost (it can be thought of as 'falling off' of the right hand side). This looks like the following -
mov rax, 5 ; 0b00000101
shr rax, 1 ; 0b00000010, rax is now 2 because the least significant bit was shifted out.
shr rax, 2 ; 0b00000000, rax is now zero because every bit was shifted out.
In summary, shifting right by one place has the effect of dividing any number by 2. Shifting to the right by any other value has the effect of dividing it by a power of two. Again, shifting right offers a very efficient way of dividing numbers in particular situations.
I won't spend too long on these instructions as they're very similar to SHL and SHR
in both
functionality and instruction structure! The key difference, though, is that where SHL and SHR
lose
data off of the most and least significant bits respectively, the 'rotate' instructions preserve that data and
simply rotate it to the opposite end of the bits. Let's see an example of ROL
mov AX, 16385 ; AX is now 0b0100000000000001
rol AX, 1 ; AX is now 0b1000000000000010 or -32,766 because of Two's Compliment
rol AX, 1 ; AX is now 0b0000000000000101 or 5
rol AX, 3 ; AX is now 0b0000000000101000 or 40
Notice how there is no loss of data with the rol instruction, the 'dropped bits' are simply teleported (nothing personnel kid) to the least significant bit position instead.
The ror
instruction performs the same operation, except the dropped bits are teleported to the most
significant bit position instead. Let's see an example:
mov AX, 5 ; AX is now 0b0000000000000101
ror AX, 1 ; AX is now 0b1000000000000010 or -32,766 because of Two's Compliment
ror AX, 1 ; AX is now 0b0100000000000001 or 16,385
ror AX, 3 ; AX is now 0b0010100000000000 or 10,240
Generally speaking we most commonly see the ROL and ROR instructions for cryptography when reverse engineering, they're not typically used for general purpose operations like SHL and SHR are.
There are 4 logical operators which I'd like to cover in this little section. AND, OR, XOR, NOT
.
These four instructions allow developers (and compilers) to perform logic on the binary bits within a byte.
Let's jump into OR
first because it's easy to visualize.
The OR instruction (commonly called inclusive OR) takes two pieces of data, a source and a destination. It iterates over every bit in both pieces of data and compares them together to see if the bit is set in either. If one of the pieces of data has the bit set (AKA it is 1) then the OR instruction will set that bit in the destination register. This is a logical disjunction in computer science.
For example consider the following two bytes -
If those two values were OR'd together then the result would be 0b11001100
. It's almost like they've
been merged together! An ASM example of this has been provided in the lesson's Docker container under
/lesson/inclusiveOR.asm (compiled to /lesson/inclusiveORAsm). Open the
compiled code up in GDB, set a breakpoint on the main method and step through the code until the OR instruction
is about to be hit.
Observe that RAX is 0xd (0b00001101
) and RBX is 0x2 (0b00000010
). Step over the OR
instruction using n and observe that RAX is now 0x0F or 0b00001111
, because the OR
instruction has essentially merged the two bitmasks together.
The AND instruction is similar to the OR instruction in functionality and instruction structure, but the difference here is that as it iterates over bits within bytes, if the source and the destination do not have the bit set then the destination will have that bit set to 0. This can be thought of as a logical conjunction in computer science topics!
For example consider the following two bytes -
0b10001001
0b11000101
The result of an AND operation here would be 0b10000001
because only the first and last bits were
set to 1 in the same locations in both the source and the destination. An example has been provided under
/lesson/logicalAnd.asm and /lesson/logicalAndAsm. Open the compiled binary in GDB, set a
breakpoint on the main method and step through the code until the AND instruction is about to be hit.
Observe that AX is 45 (0b0000000000101101
) and BX is 35 (0b0000000000100011
) before the
AND instruction, step over the and instruction with n and observe that RAX is now 33, which is
0b00100001
. This has happened because only the 6th bit and the 1st bit were set in both registers
when the AND instruction executed.
You can check the binary representation of any value in GDB by running, for example, p/t $rax. This can be helpful when debugging and reversing complex cryptography operations.
XOR, or Exclusive OR, is a common operation in computer science. It is the opposite of an and
instruction basically, in that it will only set a bit to 1 in the destination register if the bit is NOT already
set in both the source and destination.. For example, consider the following two bytes:
Source | 0b10101001 |
Destination | 0b11000101 |
XOR result | 0b01101100 |
After a XOR instruction the resulting byte would be 0b01101100
, where the '1's are only set in
positions where the source OR the destination already had a 1, but not where they BOTH had a 1 set.
One of the common uses for XOR instructions is to quickly clear a register's value back to zero! For example,
xor rax, rax
will XOR the RAX register with itself and the result is that RAX will become 0.
Imagine that RAX is 85 (0b01010101
), well 0b01010101^0b01010101 == 0b00000000
because
the bits are all set in the same locations in the source and destination operands.
There's a fun quirk with XOR instructions that makes it commonly used for basic cryptography. Number1 XOR'd with Number2 makes Number3, right? Well, Number3 XOR'd with Number2 makes Number1 again! We can see this in /lesson/xorDemo.asm inside of the container. Running /lesson/xorDemoAsm in GDB, we can see the operations in real time
Before the XOR -
pwndbg> p/t $ax
$1 = 1010101010101010
pwndbg> p/t $bx
$2 = 0101010101010101
pwndbg> n
After the first XOR -
pwndbg> p/t $ax
$3 = 1111111111111111
pwndbg> n
After the second XOR -
pwndbg> p/t $ax
$3 = 1010101010101010
Observe that XORing RAX with RBX again put RAX back to it's original value! XOR is a strange operation, but it pops up very frequently during reverse engineering and it's worthwhile knowing what it does and why it does it.
The not
instruction is the simplest of all of the logical operators, it simply takes a register and
flips every one of the register value's bits. If the register is 0x1 (0b00000001
) then it will
become 254 (0b11111110
) after a NOT operation. Pretty straight forward. There's an example under
/lesson/notInstruction.asm, compiled to /lesson/notInstructionAsm
- open it in
GDB, break on main
like usual and run it.
Step the code until the NOT instruction is about to execute and look at the binary representation of AX using p/t $rax -
pwndbg> p/t $rax
$2 = 101101
Step over the NOT instruction and look at the binary representation of AX again -
pwndbg> p/t $rax
$3 = 1111111111010010
As expected, the bits got flipped.
This lesson got pretty far down into the theory weeds, I hope it was easy to follow along though and I hope that you learned something about binary numbers and binary maths!