x86 Disassembly Flashcards
What is the difference between a 32-bit and 64-bit machine?
The most significant difference is the bus-size between the two systems. A 32-bit system has a 32-bit bus which means that it can only address 4 GB of RAM (can be extended with some tricks). 64-bit systems have double the bus width and can therefore address 4GB * 4GB addresses. This allows 64-bit machines to run 32-bit programs as they essentially only need to consider the 32 least significant bits.
Why do malware analysts only convert binaries to assembly instead of some higher level programming language?
The main reason is that the programming languages found at higher levels of abstraction are much farther removed from the machine code. As the disassembler essentially has to guess the meaning of the inspected binaries it becomes very hard to reconstruct some high level language. However, since assembly is reasonably close to machine code, a disassembler is quite capable of performing highly accurate reconstructions. In short, assembly is much more reliable and is therefore accepted despite of the reduced readability.
How is the layout of a allocated segment of main memory.
There are 4 main sections of the memory allocated to a process.
The data section is at the top. This section contains static and read only global values.
Below the data we have the code itself, i.e., the instructions.
This is followed by the heap which utilized as a dynamic area wherein the process can add and free values at will. When more data is added to the heap it grows upwards (toward the code section).
Finally, the stack is used for storing local variables and parameters for functions. This is an essential section as it enables program flow and is therefore essential for malware analysts. The stack resides in the area with the lowest addresses in the allocated section and grows downward, i.e., from high to low which is away from the heap.
How does x86 enable 32-bit machines to run 16 and 8-bit code?
All 32-bit x86 machines names the registers starting with the letter E. This signifies that it is extended. The EAX register has capacity for 32-bits, however, if we only specify AX we only get the first/least significant 16-bits. Further, AH (A high) gives us the 8 most significant bits in AX, whilst AL gives the 8 least significant bits. This is how we can specify bit-patterns in the CPU-registers.
What type of endianness is x86
x86 machines use little endian (least significant byte first). The only exception are network addresses.
What is the difference between the LEA and MOV instructions?
MOV simply moves a value from one register or main memory address into a specified address. When using brackets to specify a address to load, e.g., [ebx +esi*4], we load the content of that address into the target register.
LEA or load effective address is used to calculate pointers. This means that we can use LEA to store pointers. This is not as effective when using MOV as when we use the instructio LEA eax [ebx+esi*4] we store the address calculated in the brackets in eax not the content of the address. LEA is an essential operation as pointers are required for any computer program.
How are function calls handled?
Function calls utilize the stack to operate. When a the program calls a function it must move its execution to a completely new spot in memory. When such a switch occurs there will always be some overhead. This is typically divided into the prologue which is concerned with preparing the stack and registers for transfer of control, and the epilogue which is the restoration of stack and registers.
There are a number of calling conventions which specify proper behaviour. Some of the most common are: Cdecl (C decleration) Stdcall (Standard call in Windows) Fastcall (Visual studio) Safecall Thiscall (Visual studio)
CDECL and STDCALL are very similar in that they push the arguments of the function to be called in a right to left fashion, store the pointer to the stack (ebp), and establishes a new stack frame. The return values from the function are stored in eax (will also use edx if required, might be necessary after division operations). The main difference between the two is that CDECL tells the caller, i.e., the program to clean up the stack, while it is the opposite in STDCALL.
What are the 4 main stack instructions?
PUSH pushes a value to the stack and updates (reduces the ESP) by 4 bytes.
POP retrieves the value stored at ESP and increases the address by 4 bytes. The retrieved value is stored in a specified register.
CALL transfers control to another function. Here we begin by pushing the address of the next instruction. This allows us to resume execution of the program when the function returns. Then the EIP (program counter) is set to the address of the function).
RET returns control to the calling program. here the address of the next instruction originally pushed to the stack is poped into the EIP register.
There are some differences between the CDECL and STDCALL due to the difference in who is assigned to clean up the stack. When using CDECL, where the caller cleans up the stack, simply pop the final address and continue execution. In STDCALL, where the callee must clean up the stack. Here the original address is poped in put into EIP, followed by adding N-byes to the ESP to reduce the stack pointer back to the original point.
How does branching work in x86
There several way of ensuring conditional execution on x86 machines. The most popular are the jump instructions where a jump to an address other than the one stored in EIP is will occur. This can be carried out based on some condition but the jump can be unconditional. When using conditional jump we check the value store in the EFLAGS register and inspect the bit value at the specified flag.
To perform conditional jumps we must add the values to the EFLAGS register. This can be done using the test and cmp instructions. test checks whether a value is null, while cmp is checks whether the value stored in two registers are equal.