8: Smashing the Stack For Fun and Profit Flashcards

1
Q

What does smash the stack mean?

A

Code (C implementations) that

  • corrupts the execution stack by writing past the end of an array declared auto in a routine
  • can cause return from the routine to jump to a random address
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a buffer?

A

A buffer is simply a contiguous block of computer memory that holds multiple instances of the same data type.

  • mostly associated w/ character arrays
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When and where are static variables allocated?

A

Static variables are allocated at load time on the data segment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When are dynamic variables allocated and where?

A

Dynamics variables are allocated at run time on the stack

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does overflow mean?

A

To overflow is to flow, or fill over the top, brims or bound. We will focus only on overflow of dynamic buffers, aka stack-based buffer overflows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How are processes divided in memory?

A

Processes are divided into three regions: Text, Data, and Stack

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the properties of the text region of process memory?

A

the text region

  • is fixed by the program
  • includes code (instructions) and read-only data
  • corresponds to the text section of the executable file
  • normally marked read-only and any attempt to write to it will result in a segmentation violation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the properties of the data region?

A

The data region

  • corresponds to the data-bss sections of the executable file
  • its size can be changed with the brk(2) system call
  • expansion of the bss daa or the user stack exhausts available memory, the process is blocked and is rescheduled to run again with a larger memory space
  • new space is added between the data and stack segments
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a stack?

A
  • a stack is an abstract data type frequently used in CS
  • LIFO
  • has operation push that adds an element at the top of the stack
  • has operation pop that reduces the stack size by one by removing the last element at the top of the stack
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why do we use a stack?

A
  • modern computers are designed with the need of high-level languages in mind
  • procedures/functinos structure programs in high-level languages
  • a procedure call alters the flow of control just as a jump does, but unlike a jump, when finished performing its task, a function returns control to the statement or instruction following the call
  • this high-level abstraction is implemented with the help of the stack
  • stack also dynamically allocate the local variables used in functions, to pass parameters to functions, and to return values from the function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the properties of the stack region of process memory?

A
  • stack is a contiguous block of memory containing data
  • register called the stack pointer (SP) points to the top of the stack
  • bottom of the stack is a fixed address
  • its size is dynamically adjusted by the kernet at run time
  • CPU implements instructions to PUSH onto and POP off of the stack
  • stack consists of logical stack frames that are pushed when calling a function and popped when returning
  • Depending on the implementation the stack will either frow down (towards lower memory addresses), or up
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does a stack frame contain?

A

A stack frame contains

  • the parameters to a function
  • its local variables
  • the data necessary to recover the previous stack frame
    • including the value of the instruction pointer at the time of the function call
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The stack grows down towards lower memory addresses in which processors?

A

Intel (x86), Motorola, SPARC and MIPS processors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Is the stack pointer implementation dependent? If so, how?

A
  • yes, the SP is implementation dependent
  • it may point to the last address on the stack, or to the next free available address after the stack
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In x86, where does the SP point?

A

the last address on the stack (the top of the stack and the lowest numerical address)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Where does the stack pointer point in ARM?

A

The top of the stack

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What direction does the stack grow in ARM?

A

It is selectable, default grows to lower memory like x86

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the FP?

A
  • The frame pointer which pointes to a fixed location within a frame
  • some texts also refer to it as a local base pointer (LB)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the problem with referencing local variables from the SP?

A
  • local variables could be referenced by giving their offsets from SP
  • however, as words are pushed onto and popped off the stack, these offsets change
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does accessing a variable at a known distance from SP require in an Intel-based processor?

A

on some machines, such as Intel based processors, accessing a variable at a known distance from SP requires multiple instructions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the BP (EBP) on Intel CPUs used for?

A
  • many compilers use a second register, FP, for referencing both local variables and parameters because their distance from FP do not change with PUSHes and POPs.
  • On Intel CPUs, BP (EBP) is used for this purpose
  • On Motorola CPUs (ARM), any address register except A7 (the stack pointer) will do
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How are local variables referenced from the FP on x86 and Motorola ARM?

A
  • b/c the way the stack grows, actual parameters have positive offsets and local variables have negative offsets from FP
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the procedure prolog?

A

The first thing a procedure must do when called

  1. first save previous FP
    * so it can be restored at procedure exit
  2. then it copies SP into FP to create the new FP
  3. then advances SP to reserve space for the local variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the procedure epilog?

A
  • Upon procedure exit, the stack must be cleaned up again, something called the procedure epilog.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How are procedure prolog and epilog handled in Intel and Motorola CPUs?

A

The Intel ENTER and LEAVE instructions and the Motorola LINK and UNLINK instructions, have been provided to do most of the procedure prolog and epilog work efficiently.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What does this program does to call function()?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What causes buffer overflow?

A

A buffer overflow is the result of stuffing more data into a buffer than it can handle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is happening in this code?

A
  • segmentation violation b/c strcpy() is copying the contents of *str(larger_string[]) into buffer[] until a null character is found on the string.
  • buffer[] is much smaller than *str
  • since buffer[16] and larger_string[256], all 250 [240] bytes after buffer in the stack are being overwritten
  • this includes SFP, RET, and even *str
  • filled large_string w/ character ‘A’ which has hex character value 0x41
  • return address is now 0x41414141
  • this is outside process address space
  • thus, when the function returns and tries to read the next instruction from that address of a function
29
Q

How can we change this code so that it overwrites the return address?

A

What we have done is add 12 to buffer1[]’s address. This new address is where the return address is stored. We want to skip past the assignment to the printf call. How did we know to add 8 [should be 10] to the return address? We used a test value first (for example 1), compiled the program, and then started gdb:

30
Q

What we have done is add 12 to buffer1[]’s address. This new address is where the return address is stored. We want to skip past the assignment to the printf call. How did we know to add 8 [should be 10] to the return address?

A
  • use GDB
31
Q

So now that we know that we can modify the return address and the flow of execution, what program do we want to execute?

A

In most cases we’ll simply want the program to spawn a shell. From the shell we can then issue other commands as we wish.

32
Q

Now that we know that we can modify the return address and the flow of execution, how can we place arbitrary instruction into its address space?

A
  • the answer is to place the code (shell code) with [you] are trying to execute in the buffer we are overflowing, and overwrite the return address so it points back into the buffer.
33
Q

What is the code to spawn a shell in C?

A
34
Q

How do we see the assembly of this code?

A
35
Q

The beginning of the shellcode dows what?

A
36
Q

What does the line at 0x80000136 do?

A
37
Q

What does the code do at 0x800013d?

A
38
Q

What does the last line of code do?

A
39
Q

What does the last line of code do?

A
40
Q

What does the last line of code do?

A
41
Q

What does the last 3 lines of code do?

A
42
Q

What happens to control when the syscall execve() is called in an Intel based Linux system.

A

Now execve(). Keep in mind we are using a Intel based Linux system. The syscall details will change from OS to OS, and from CPU to CPU. Some will pass the arguments on the stack, others on the registers. Some use a software interrupt to jump to kernel mode, others use a far call. Linux passes its arguments to the system call on the registers, and uses a software interrupt to jump into kernel mode.

43
Q

execve() is called from main. What do these lines of code do?

A

The procedure prelude

44
Q

What does the code after the procedure prelude do?

A
45
Q

Generally, what need to be done in the execve() system call to have the address point to shell code without exit?

A

As we can see there is not much to the execve() system call. All we need to do is:

46
Q

But what if the execve() call fails for some reason?

A

The program will continue fetching instructions from the stack, which may contain random data! The program will most likely core dump. We want the program to exit cleanly if the execve syscall fails. To accomplish this we must then add an exit syscall after the execve syscall. What does the exit syscall looks like?

47
Q

How does the assembly of this code work?

A
48
Q

Generally, what need to be done to have the address point to shell code with the exit code?

A
49
Q

Trying to put this together in assembly language, placing the string after the code, and remembering we will place the address of the string, and null word after the array, what does this look like?

A
50
Q

How do we get around the problem of not knowing where in the memory space of the program we are trying to exploit the code(and the string that follows it) will be placed?

A

The problem is that we don’t know where in the memory space of the program we are trying to exploit the code (and the string that follows it) will be placed. One way around it is to use a JMP, and a CALL instruction. The JMP and CALL instructions can use IP relative addressing, which means we can jump to an offset from the current IP without needing to know the exact address of wherein memory we want to jump to. If we place a CALL instruction right before the “/bin/sh” string, and a JMP instruction to it, the strings address will be pushed onto the stack as the return address when CALL is executed. All we need then is to copy the return address into a register.

51
Q

What is the assembly code to use jmp and call to find where the code will be placed?

A
52
Q

What is the assembly code to use jmp and call to find where the code will be placed accounting for offsets?

A
53
Q

How do we get around this problem?

A

To get around this restriction we must place the code we wish to execute in the stack or data segment, and transfer control to it. To do so we will place our code in a global array in the data segment. We need first a hex representation of the binary code. Lets compile it first, and then use gdb to obtain it.

54
Q

What is the assembly of this code?

A
55
Q

What is the problem with this shellcode?

A
56
Q

What is the full correct code to use point a program to shellcode?

A
57
Q

[Exploit 1] Lets try to pull all our pieces together. We have the shellcode. We know it must be part of the string which we’ll use to overflow the buffer. We know we must point the return address back into the buffer. This example will demonstrate these points. What does it do?

A

What we have done above is filled the array large_string[] with the address of buffer[], which is where our code will be. Then we copy our shellcode into the beginning of the large_string string. strcpy() will then copy large_string onto buffer without doing any bounds checking, and will overflow the return address, overwriting it with the address where our code is now located. Once we reach the end of main and it tried to return it jumps to our code, and execs a shell. The problem we are faced when trying to overflow the buffer of another program is trying to figure out at what address the buffer (and thus our code) will be. The answer is that for every program the stack will start at the same address. Most programs do not push more than a few hundred or a few thousand bytes into the stack at any one time. Therefore by knowing where the stack starts we can try to guess where the buffer we are trying to overflow will be.

58
Q

How do we know where the stack pointer is?

A

Here is a little program that will print its stack pointer:

59
Q

Lets assume this is the program we are trying to overflow is: vulnerable.c

void main(int argc, char *argv[]) {

char buffer[512];

if (argc > 1) strcpy(buffer,argv[1]);

}

We can create a program that takes as a parameter a buffer size, and an offset from its own stack pointer (where we believe the buffer we want to overflow may live). We’ll put the overflow string in an environment variable so it is easy to manipulate.

What is wrong with this exploit?

A

We’d have to guess what the buffer and offset should be.

Trying to guess the offset even while knowing where the beginning of the stack lives is nearly impossible. We would need at best a hundred tries, and at worst a couple of thousand. The problem is we need to guess *exactly* where the address of our code will start. If we are off by one byte more or less we will just get a segmentation violation or a invalid instruction.

60
Q

How do we increase our chances of guessing the offset?

A

One way to increase our chances is to pad the front of our overflow buffer with NOP instructions. Almost all processors have a NOP instruction that performs a null operation. It is usually used to delay execution for purposes of timing. We will take advantage of it and fill half of our overflow buffer with them. We will place our shellcode at the center, and then follow it with the return addresses. If we are lucky and the return address points anywhere in the string of NOPs, they will just get executed until they reach our code. In the Intel architecture the NOP instruction is one byte long and it translates to 0x90 in machine code. Assuming the stack starts at address 0xFF, that S stands for shell code, and that N stands for a NOP instruction the new stack would look like this:

61
Q

What does the code to pad the front of our overflow buffer with NOP instructions?

A
62
Q

What is a good selection for a buffer size with NOPs?

A

A good selection for our buffer size is about 100 bytes more than the size of the buffer we are trying to overflow. This will place our code at the end of the buffer we are trying to overflow, giving a lot of space for the NOPs, but still overwriting the return address with the address we guessed. The buffer we are trying to overflow is 512 bytes long, so we’ll use 612.

63
Q

What is an example of using NOPs for buffer overflow on the Xt library?

A
64
Q

What do you do to cause a buffer overflow when:

There will be times when the buffer you are trying to overflow is so small that either the shellcode wont fit into it, and it will overwrite the return address with instructions instead of the address of our code, or the number of NOPs you can pad the front of the string with is so small that the chances of guessing their address is minuscule.

A

What we will do is place our shellcode in an environment variable, and then overflow the buffer with the address of this variable in memory. This method also increases your changes of the exploit working as you can make the environment variable holding the shell code as large as you want. The environment variables are stored in the top of the stack when the program is started, any modification by setenv() are then allocated elsewhere. The stack at the beginning then looks like this:

<strings><argv>NULL<envp>NULL<argc><argv>envp&gt;</argv></argc></envp></argv></strings>

65
Q

What we will do is place our shellcode in an environment variable, and then overflow the buffer with the address of this variable in memory. This method also increases your changes of the exploit working as you can make the environment variable holding the shell code as large as you want. The environment variables are stored in the top of the stack when the program is started, any modification by setenv() are then allocated elsewhere.

How is this accomplished in code?

A

Our new program will take an extra variable, the size of the variable containing the shellcode and NOPs. Our new exploit now looks like this:

Test it like this:

[aleph1]$ ./exploit4 768 Using address: 0xbffffdb0 [aleph1]$ ./vulnerable $RET $

How does it work in xterm?

[aleph1]$ export DISPLAY=:0.0 [aleph1]$ ./exploit4 2148
Using address: 0xbffffdb0
[aleph1]$ /usr/X11R6/bin/xterm -fg $RET Warning: Color name

…°¤ÿ¿°¤ÿ¿°¤ …

Warning: some arguments in previous message were lost $

  • Experiment both with positive and negative offsets.
66
Q

Where can buffer overflow occur in C?

A
  • The standard C library provides a number of functions for copying or appending strings, that perform no boundary checking
    • strcat(), strcpy(), sprintf(), and vsprintf()
      • These functions operate on null­terminated strings, and do not check for overflow of the receiving string
    • gets() is a function that reads a line from stdin into a buffer until either a terminating newline or EOF
      • It performs no checks for buffer overflows
    • scanf() family of functions
      • can also be a problem if you are matching a sequence of non­white­space characters (%s), or matching a non­empty sequence of characters from a specified set (%[]), and the array pointed to by the char pointer, is not large enough to accept the whole sequence of characters, and you have not defined the optional maximum field width
    • If the target of any of these functions is a buffer of static size, and its other argument was somehow derived from user input there is a good posibility that you might be able to exploit a buffer overflow
  • while loop to read one character at a time into a buffer from stdin or some file until the end of line, end of file, or some other delimiter is reached
    • getc(), fgetc(), or getchar()
    • If there is no explicit checks for overflows in the while loop, such programs are easily exploited
67
Q

Why is grep(1) your friend?

A

The sources for free operating systems and their utilities is readily available. This fact becomes quite interesting once you realize that many comercial operating systems utilities where derived from the same sources as the free ones. Use the source d00d.

68
Q

[Appendix A] What does shellcode for different OS/Architectures look like?

A