Caches Flashcards

1
Q

The memory wall

A

processors are getting faster at a rate faster than memories are getting faster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

temporal locality

A

recently referenced data likely to be referenced again soon. Reactive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

spacial locality

A

more likely to reference data near recently referenced data. Proactive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Is temporal and spacial locality used for both data and instructions?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to find average memory access time

A

latency_avg = latency_hit + %_miss*latency_miss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Primary caches

A

split instructions (l$) and data (d$). on chip (with CPU), made of SRAM (same circuit type as CPU)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

2nd level caches

A

on chip (with CPU), SRAM); unified (holds both I and D)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How large are primary caches?

A

8KB to 64KB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How large are second level caches?

A

typically 512KB to 16MB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

4th level cache = main memory

A

Made of DRAM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How large is fourth level cache?

A

1GB to 4GB for desktop, severs can have much more

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

5th level cache

A

disk/SSD (swap and files)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Processors are how much cache by area?

A

30-70%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Static RAM (SRAM)

A

6 transistors per bit
optimized for speed and density
fast (sub-nanosecond latency for small SRAM)
speed proportional to area
integrates well with standard processor logic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Dynamic RAM (DRAM)

A

1 transistor + 1 capacitor per bit
optimized for density
slow (>40ns internal access, ~100ns pin-to-pin)
different fabrication steps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Nonvolatile storage

A

magnetic disk, flash, STT, Re-RAM, PCM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Cache Lookup Algorithm

A

Read frame indicated by index bits

“Hit” if tag matches and valid bit is set, otherwise miss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Fill path also called what?

A

backside

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Cache controller

A

finite state machine - remembers miss address, accesses next level, waits for response, writes data and tag in proper locations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

%miss (miss rate)

A

misses/#accesses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

t_hit (hit time)

A

time to read data from (write data to) cache

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

t_miss (miss penalty)

A

time to read data into cache

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Average access time: t_avg

A

t_hit + %miss * t_miss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what roughly determines t_hit

A

cache capacity and circuits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

what roughly determines t_hit

A

lower level memory structures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How to measure %_miss?

A

hardware performance counters, simulation, paper simulation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

how to find offset

A

Log_2(block size)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

how to find index

A

log_2(number of sets)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How to reduce %miss?

A

increase capacity

increase block size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What happens if you increase cache capacity?

A

reduce % miss, but t_hit increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is t_hit latency proportional to?

A

sqrt(capacity)

32
Q

What are the advantages of increasing block size?

A

reduce %miss

reduce tag overhead

33
Q

What are the disadvantages of increasing block size?

A

potentially useless data transfer

premature replacement of useful data

34
Q

For same size cache, will increasing the block size increase or reduce the tag overhead?

A

Increasing the block size will reduce the tag overhead

35
Q

Effects of block size on miss rate

A

spacial prefetching

interference

36
Q

Spacial prefetching

A

good; for blocks with adjacent addresses turns miss/miss into miss/hit

37
Q

Interference

A

For blocks with non-adjacent addresses but adjacent frames; turns hits into misses by disallowing simultaneous residence

38
Q

What offsets the time to read/transfer/fill a larger block?

A

critical word first/early restart

39
Q

Can critical word first/early restart help with a cluster of misses?

A

No. Reads/transfers/fills of two misses can’t happen simultaneously

40
Q

Name for a frame group

A

set

41
Q

Each frame in set

A

way

42
Q

Pros and cons of increasing set associativity

A

pro: reduces conflicts
con: increases t_hit (additional tage match and muxing)

43
Q

Lookup algorithm for multi-way set associative cache

A

Use index bit to find set
read data/tags in all frames in that set in parallel
if match and valid bit, hit

44
Q

NMRU and miss handling

A

Add MRU bits to each set, hit will update MRU, miss will replace any way but MRU

45
Q

Can split data and tags into two different arrays, so can access in parallel. Why are multi-way associative caches still slower than direct mapped caches?

A

Still more logic in the critical path than direct mapped caches (an additional multiplexor), so slower t_hit time

46
Q

Pros and cons of higher associative caches

A

Pro: have better (lower) % miss
Con: T_hit increases - the more associative, the slower

47
Q

Why are instruction caches smaller/simpler?

A

don’t have to worry about writing/storing

48
Q

Why are writes slower than reads?

A

For reads, can read tag and data in parallel

49
Q

Stages of write pipeline

A

1) match tag
2) write to matching way
bypass to avoid load stalls, may introduce structural hazards

50
Q

Two options for when to propagate new value to lower level memory

A

1) write through

2) write back

51
Q

Write Through

A

on hit, update cache

immediately send the write to the next level

52
Q

Write Back

A

write to lower level when block is replaced

requires an extra dirty bit per block

53
Q

Writeback buffer (WBB)

A

keeps writes off the critical path

1) send fill request to next level
2) while waiting, write dirty block to buffer
3) when new block arrives, put it into cache
4) write buffer sends contens to next-level

54
Q

Disadvantages of write through

A

requires additional bus bandwidth

without a write buffer, must wait for writes to complete to memory

55
Q

Advantages of write through

A

Easier to implement, no need for dirty bits in cache
Don’t have to deal with coherence traffic at this cache level
Simplifies miss handling (no write back buffer step)

56
Q

Advantage of Write back

A

Uses less bandwidth since some writes don’t go to memory (also saves power)

57
Q

Read vs Write Miss

A

Read miss: load can’t go on w/o data, must stall

Write miss: no instruction waiting for data, so don’t need to stall

58
Q

Store buffer

A

writes to D$ in background
eliminates stalls on write misses
loads must search store buffer in addition to D$

59
Q

Store vs. writeback buffer

A

store buffer: in front of D$, hides store misses

writeback buffer: behind D$, hides write backs

60
Q

Write allocate is used with with what time of write (write back or write through)?

A

write back

61
Q

Write-allocate

A

when a write miss occurs, allocate a frame in the cache for the miss data

62
Q

Advantage of write alloccate

A

decreases read misses

63
Q

No write allocate

A

when a write miss occurs, just write to next level, no need to allocate a cache frame for the miss data

64
Q

Pros/cons of no-write-allocate

A

potentially more read misses, but doesn’t use a frame in the cache

65
Q

4 types of cache miss

A

compulsory, capacity, conflict, coherence

66
Q

compulsory cache miss

A

never seen this address before, would miss in infinite cache

67
Q

capacity cache miss

A

miss caused because cache is too small (would miss in fully associative cache)

68
Q

conflict cache miss

A

miss caused because cache associativity is too low

69
Q

coherence cache miss

A

miss due to external invalidations in shared memory multiprocessors and multicores

70
Q

How does larger block size effect 3 C’s and hit rate?

A

decreases compulsory misses (spacial locality)
increases conflict misses (fewer frames)
can increase t_miss - reading more bytes from next level
no significant effect on t_hit

71
Q

How does larger cache effect 3 C’s and hit rate?

A

decreases capacity miss

increases t_hit

72
Q

How does higher associativity effect 3 C’s and hit rate?

A

decreases conflict misses

increases t_hit

73
Q

local hit/miss rate

A

percent of references to this cache that hit -# misses/total accesses to this cache
local miss rate = (100%- local hit rate)

74
Q

global hit/miss rate

A

misses/total # of memory references

75
Q

inclusive caches

A

a block in the L1 is always in the L2
good for write throughs
coherence traffic only needs to check L2

76
Q

exclusive caches

A

block is either in L1 or L2 (never both)
holds more data
coherence traffic must check both L1 and L2

77
Q

Give reads priority over writes

A

read must check contents of the WBB since it could hold the read value
reduces write costs in writeback cache- if read miss will replace a dirty block, write the dirty block to WRR, read memory, then write WBB to memory