Basic Model of Locality Flashcards

1
Q

In the memory hierarchy, what happens as we move from disk to processor?

A

The memory at each level of the hierarchy gets faster, but it also gets smaller

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe the structure of the von Neumann model (it’s parts and their relationships)

A

The von Neumann model is composed of an infinite but slow memory, a small but fast memory, and a sequential processor. i.e., it’s like disk, cache, and processor.

The fast memory has a maximum storage of Z “words”.

The processor can only operate on data stored in the small, fast memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe the rules of the von Neumann model

A

1) Local storage rule: The processor can’t work with data unless it is stored fast memory.
2) Block-transfer rule: Transfers of data between slow and fast memory occur in blocks of L words.

Block-transfer implies that when you want to move one word x into fast memory, then you need to move the entire block that it belongs to. Therefore the other data stored in its block is going to affect efficiency (does the block have what you’ll need next, or do you need to bring in another block?). This is called “data alignment”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is data alignment?

A

When data that we want to load into fast memory together belongs to the same block, so it can all be loaded at once (I believe).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the costs in the von Neumann model?

A

1) Work: W(n) = the # of operations
2) Transfers: Q(n; Z, L) = # of L-sized slow-fast transfers (loads and stores)

n = input size
Z = max # of words stored in fast memory
L = # of words in a transfer block

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are our algorithmic design goals for a hierarchical memory system?

A

1) Work optimality (of course!): W(n) = BigTheta(W_star(n)), where W_star(n) is the work of the optimal sequential algorithm. i.e., maintain the work of the best sequential algo.

2) High computational intensity: Maximize I(n; Z, L) = W(n) / L * Q(n; Z, L), i.e. ops per word transferred to fast memory, i.e. data reuse of your algorithm.

Do not optimize intensity at the expense of work optimality!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is W(n)?

A

Work as a function of input size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Q(n; Z, L)?

A

Q(n; Z, L) = # of L-sized slow-fast transfers (loads and stores)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Z?

A

Z = max # of words stored in fast memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is L?

A

L = # of words in a transfer block

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Let tau be the time per operation. How much time does it take just to do computations (no transfers)?

A

T_comp = tau*W

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Let alpha be the time to transfer one word. How much time does it take just to do transfers (no computations)?

A

T_mem = alpha * L * Q

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Assuming perfect overlap between the time to compute and time to transfer data, what is the minimum execution time for program?

A

T >= max{T_comp, T_mem} = max{tau * W, alpha * L * Q} =
…tau * W * max{1, (alpha/tau)/(W/(L*Q))}

  • tau * W is the IDEAL running time (assumes data movement is free)
  • the max term is the PENALTY you pay for moving data
  • Note that W/(L * Q) = ops/word = I (intensity!)
  • alpha/tau is “machine balance”, we can replace this with B

So the penalty is max{1, B / I} and our lower bound is T >= tau* W *
max{1, B / I}.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is machine balance?

A

B = alpha/tau is “machine balance”, the time per word divided by time per operation. It’s a ratio of parameters that depend on just the machine.

It tells you how many operations can execute in the time it takes to move a word of data.

Note that it has units of ops/word, just like intensity (I).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is ideal running time, assuming data movement is free?

A

tau*W, i.e. time per operation * number of operations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is tau?

A

Time per operation

17
Q

What is alpha?

A

Time per word (transfers)

18
Q

What is the maximum computation time (assuming no overlap between data movement and computation)?

A

tau*w * (1 + B / I), i.e. the sum of computation time and data movement time rather than the max.

19
Q

What is normalized performance?

A

R := (tau * W_star) / T <= (W_star / W) * min{1, I / B}

This performance measure is inversely proportional to time, so higher values are better

20
Q

What is intensity?

A

I(n; Z, L) = W(n) / (L * Q(n; Z, L)), i.e. ops per word transferred to fast memory, i.e. data reuse of your algorithm.

21
Q

How do you exploit memory hierarchy in algorithm design?

A

Organize data accesses so that they maximize data reuse

22
Q

What do you do to ensure your algorithm scales well to future memory hierarchies?

A

Intensity should equal or exceed machine balance (I assume this is because increasing processor speed increases machine balance over time)