Basic Model of Locality Flashcards
In the memory hierarchy, what happens as we move from disk to processor?
The memory at each level of the hierarchy gets faster, but it also gets smaller
Describe the structure of the von Neumann model (it’s parts and their relationships)
The von Neumann model is composed of an infinite but slow memory, a small but fast memory, and a sequential processor. i.e., it’s like disk, cache, and processor.
The fast memory has a maximum storage of Z “words”.
The processor can only operate on data stored in the small, fast memory.
Describe the rules of the von Neumann model
1) Local storage rule: The processor can’t work with data unless it is stored fast memory.
2) Block-transfer rule: Transfers of data between slow and fast memory occur in blocks of L words.
Block-transfer implies that when you want to move one word x into fast memory, then you need to move the entire block that it belongs to. Therefore the other data stored in its block is going to affect efficiency (does the block have what you’ll need next, or do you need to bring in another block?). This is called “data alignment”.
What is data alignment?
When data that we want to load into fast memory together belongs to the same block, so it can all be loaded at once (I believe).
What are the costs in the von Neumann model?
1) Work: W(n) = the # of operations
2) Transfers: Q(n; Z, L) = # of L-sized slow-fast transfers (loads and stores)
n = input size
Z = max # of words stored in fast memory
L = # of words in a transfer block
What are our algorithmic design goals for a hierarchical memory system?
1) Work optimality (of course!): W(n) = BigTheta(W_star(n)), where W_star(n) is the work of the optimal sequential algorithm. i.e., maintain the work of the best sequential algo.
2) High computational intensity: Maximize I(n; Z, L) = W(n) / L * Q(n; Z, L), i.e. ops per word transferred to fast memory, i.e. data reuse of your algorithm.
Do not optimize intensity at the expense of work optimality!
What is W(n)?
Work as a function of input size
What is Q(n; Z, L)?
Q(n; Z, L) = # of L-sized slow-fast transfers (loads and stores)
What is Z?
Z = max # of words stored in fast memory
What is L?
L = # of words in a transfer block
Let tau be the time per operation. How much time does it take just to do computations (no transfers)?
T_comp = tau*W
Let alpha be the time to transfer one word. How much time does it take just to do transfers (no computations)?
T_mem = alpha * L * Q
Assuming perfect overlap between the time to compute and time to transfer data, what is the minimum execution time for program?
T >= max{T_comp, T_mem} = max{tau * W, alpha * L * Q} =
…tau * W * max{1, (alpha/tau)/(W/(L*Q))}
- tau * W is the IDEAL running time (assumes data movement is free)
- the max term is the PENALTY you pay for moving data
- Note that W/(L * Q) = ops/word = I (intensity!)
- alpha/tau is “machine balance”, we can replace this with B
So the penalty is max{1, B / I} and our lower bound is T >= tau* W *
max{1, B / I}.
What is machine balance?
B = alpha/tau is “machine balance”, the time per word divided by time per operation. It’s a ratio of parameters that depend on just the machine.
It tells you how many operations can execute in the time it takes to move a word of data.
Note that it has units of ops/word, just like intensity (I).
What is ideal running time, assuming data movement is free?
tau*W, i.e. time per operation * number of operations