WK 7 : Caches Flashcards

1
Q

Approx how much faster is SRAM than DRAM? What is DRAM used for vs. SRAM?
Are DRAM / SRAM volatile or non-volatile?

A

10x….

DRAM is usually main memories and frame buffers.

SRAM is cache

Volatile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain the CPU-Memory Gap. What does this look like on a graph?

A

The speed of memory is not increasing at the same speed of processors. Processors have always been faster than memory and the gap is increasing. Memory has flattened out as CPU is still getting better (see envelope)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain temporal locality.

Explain Spatial locality

A

Recently referenced items are likely to be referenced again in the future.

items nearby those just referenced are likely to be accessed again in the future.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What examples of data and instruction references can be made of a simple loop?

A

sum = 0;

for (i = 0; i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Does this have good locality wrt a?

int sum_array_rows(int a[M][N])
{
int i, j, sum = 0;
for (i = 0; i
A

Yes, it follows a in memory, based on C being row major order, and we have the column varying most frequently. If we used column-major (ala FORTRAN) this would have a longer stride.

Or if this loop was for j, for i

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why don’t we make the entire memory a cache?

A

Too expensive…. Using caches can get performance so that to processor it all seems like one long ass cache, even though it isn’t.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the analogy of how cache hierarchies look to each other?

A
registers are to cache
as 
cache is to memory
as 
memory is to disk.

As you go up, get larger, but cheaper. As you go down, gets smaller, but faster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What two questions do we ask wrt cache misses?

A

Who do I kick out?
(replacement)

Where does new block go that comes in?
(placement)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are 3 types of cache misses?

A

Cold
(first time we access, nothing there)
Conflict
(cache is large enough, but all blocks map to the same location)
Capacity
(set of working blocks is larger than my cache)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are 3 cache types and why would you want to use a particular one?

What are the differences of each one? What are the similarities?

Draw diagram of each type of cache.

A

Direct Mapped
(one big column)
has many sets, but only one E.

Fully Associative
(one big row)
has only one set, but many E.

Set Associative
(in between the two others)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain S, E, B of cache organization

A

There are S sets in the cache.

Each S has E blocks

Each Block is size B = 2^b Bytes. b is the number of bits in the address used to address the block, owka the block offset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain t, s, b of address

Draw what this looks like

A

t is the tag…. s is the set, b is the block offset. See white paper hand notes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are steps of a cache read?

A
  1. Locate set
  2. Check if any line in the set has matching tag
  3. if Yes and line valid = hit
  4. locate data starting at offset.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why would I ever want to use a Fully Associative Cache?

A

For instance in a TLB, or something else that is very small, since they’re very expensive to check all the tags in parallel.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How does a cache line relate to a cache block?

A

A cache line includes a valid bit, a tag to show where in memory the block came from, as well as the block itself. The block itself is just the data. A line if referenced by the set in the address bits and if the tag matches and the valid bit is valid, then it looks at the block offset for the actual data desired.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What drawback could I have with a set associative cache?

A

The cache could still have empty spaces, yet I’m getting conflict misses because of where I can put blocks into the cache.

17
Q

When would I expect cache thrashing? How would I fix?

A

large data structures with powers of 2, they could be evicted and replaced on the same loop cycle that they’re loaded. Fix this padding the size of the structure with the size of a cache line. Especially a factor on low associativity caches.

18
Q

What is a typical L1 cache miss rate? What about L2?

A

L1 : 3-10%

L2 : 1%

19
Q

What is a typical hit rate for L1, L2 caches? What addtl time req’d for a miss?

A

1-2 cycles : L1
5-20 cycles : L2
50-200 cycles : Miss to MM

20
Q

Why is 99% hits better than 97% hits?

A

Because the miss penalty could be 100x… even just a few misses extra could double our time.

21
Q

Explain Write-Hit Options and the pluses / minuses. Which are most common and with what types of caches?

A

Write-through
- write directly through to memory

Write-back
- defer write until line is replaced (uses dirty bit)

Write-Back is most common, usually used with Write Allocate

22
Q

Explain Write-Miss Option and the pluses / minuses.

Which are most common and with what types of caches?

A

Write Allocate
- load into cache and update there.
No write-allocate
- writes directly to memory.

WA more common, makes sense to take advantage of locality if expect more accesses to that info.

WA goes with WB. I wouldn’t mix them because if I’m saving all my writes via WB, I wouldn’t want to add extra time with a writethrough on a miss. Bringing it to cache and letting it get written with the rest of the WB saves time. If I’m doing Write Through, then I have to do No write allocate, or that thing may never get written if I don’t do it now.

23
Q

How does mem blocking work? What are the effects? How big do I want my block size to be?

A

Keeps all references in the cache and can improve performance by a drastic constant. As large as possible, but 3B^2

24
Q

What doe we want to do in loops to make sure we’re taking advantage of locality?

A

Stride as minimally as possible. Use smart (row vs. columns in C) iteration changes in C. Bring unnecessary references out of the loop by using accumulators.

25
Q

How does a memory mountain get effected by temporal and spatial locality?

A

If my problem size goes up, my temporal locality goes down and I hit a drop off each time my size gets bigger than a cache level. As my stride increases, either through my poor programming, or intentionally to test, my spatial locality goes down, this drives a constant-ish slope off until I reach the point where every access is a miss and then it levels off. (see memory mountain note card)

26
Q

If I read nothing else from Bryant Chapter 6 on memory, what should I remember about writing programs?

A

Look closely at inner loops, especially the stride of memory accesses, try to stride by 1 and reference memory sequentially as its stored to take advantage of spatial locality.

Maximize temporal locality by using a data object as often as possible as soon as it has been read from memory.