WK 7 : Caches Flashcards
Approx how much faster is SRAM than DRAM? What is DRAM used for vs. SRAM?
Are DRAM / SRAM volatile or non-volatile?
10x….
DRAM is usually main memories and frame buffers.
SRAM is cache
Volatile
Explain the CPU-Memory Gap. What does this look like on a graph?
The speed of memory is not increasing at the same speed of processors. Processors have always been faster than memory and the gap is increasing. Memory has flattened out as CPU is still getting better (see envelope)
Explain temporal locality.
Explain Spatial locality
Recently referenced items are likely to be referenced again in the future.
items nearby those just referenced are likely to be accessed again in the future.
What examples of data and instruction references can be made of a simple loop?
sum = 0;
for (i = 0; i
Does this have good locality wrt a?
int sum_array_rows(int a[M][N]) { int i, j, sum = 0; for (i = 0; i
Yes, it follows a in memory, based on C being row major order, and we have the column varying most frequently. If we used column-major (ala FORTRAN) this would have a longer stride.
Or if this loop was for j, for i
Why don’t we make the entire memory a cache?
Too expensive…. Using caches can get performance so that to processor it all seems like one long ass cache, even though it isn’t.
What is the analogy of how cache hierarchies look to each other?
registers are to cache as cache is to memory as memory is to disk.
As you go up, get larger, but cheaper. As you go down, gets smaller, but faster
What two questions do we ask wrt cache misses?
Who do I kick out?
(replacement)
Where does new block go that comes in?
(placement)
What are 3 types of cache misses?
Cold
(first time we access, nothing there)
Conflict
(cache is large enough, but all blocks map to the same location)
Capacity
(set of working blocks is larger than my cache)
What are 3 cache types and why would you want to use a particular one?
What are the differences of each one? What are the similarities?
Draw diagram of each type of cache.
Direct Mapped
(one big column)
has many sets, but only one E.
Fully Associative
(one big row)
has only one set, but many E.
Set Associative
(in between the two others)
Explain S, E, B of cache organization
There are S sets in the cache.
Each S has E blocks
Each Block is size B = 2^b Bytes. b is the number of bits in the address used to address the block, owka the block offset
Explain t, s, b of address
Draw what this looks like
t is the tag…. s is the set, b is the block offset. See white paper hand notes.
What are steps of a cache read?
- Locate set
- Check if any line in the set has matching tag
- if Yes and line valid = hit
- locate data starting at offset.
Why would I ever want to use a Fully Associative Cache?
For instance in a TLB, or something else that is very small, since they’re very expensive to check all the tags in parallel.
How does a cache line relate to a cache block?
A cache line includes a valid bit, a tag to show where in memory the block came from, as well as the block itself. The block itself is just the data. A line if referenced by the set in the address bits and if the tag matches and the valid bit is valid, then it looks at the block offset for the actual data desired.
What drawback could I have with a set associative cache?
The cache could still have empty spaces, yet I’m getting conflict misses because of where I can put blocks into the cache.
When would I expect cache thrashing? How would I fix?
large data structures with powers of 2, they could be evicted and replaced on the same loop cycle that they’re loaded. Fix this padding the size of the structure with the size of a cache line. Especially a factor on low associativity caches.
What is a typical L1 cache miss rate? What about L2?
L1 : 3-10%
L2 : 1%
What is a typical hit rate for L1, L2 caches? What addtl time req’d for a miss?
1-2 cycles : L1
5-20 cycles : L2
50-200 cycles : Miss to MM
Why is 99% hits better than 97% hits?
Because the miss penalty could be 100x… even just a few misses extra could double our time.
Explain Write-Hit Options and the pluses / minuses. Which are most common and with what types of caches?
Write-through
- write directly through to memory
Write-back
- defer write until line is replaced (uses dirty bit)
Write-Back is most common, usually used with Write Allocate
Explain Write-Miss Option and the pluses / minuses.
Which are most common and with what types of caches?
Write Allocate
- load into cache and update there.
No write-allocate
- writes directly to memory.
WA more common, makes sense to take advantage of locality if expect more accesses to that info.
WA goes with WB. I wouldn’t mix them because if I’m saving all my writes via WB, I wouldn’t want to add extra time with a writethrough on a miss. Bringing it to cache and letting it get written with the rest of the WB saves time. If I’m doing Write Through, then I have to do No write allocate, or that thing may never get written if I don’t do it now.
How does mem blocking work? What are the effects? How big do I want my block size to be?
Keeps all references in the cache and can improve performance by a drastic constant. As large as possible, but 3B^2
What doe we want to do in loops to make sure we’re taking advantage of locality?
Stride as minimally as possible. Use smart (row vs. columns in C) iteration changes in C. Bring unnecessary references out of the loop by using accumulators.