Caches Flashcards
The memory wall
processors are getting faster at a rate faster than memories are getting faster
temporal locality
recently referenced data likely to be referenced again soon. Reactive
spacial locality
more likely to reference data near recently referenced data. Proactive
Is temporal and spacial locality used for both data and instructions?
Yes
How to find average memory access time
latency_avg = latency_hit + %_miss*latency_miss
Primary caches
split instructions (l$) and data (d$). on chip (with CPU), made of SRAM (same circuit type as CPU)
2nd level caches
on chip (with CPU), SRAM); unified (holds both I and D)
How large are primary caches?
8KB to 64KB
How large are second level caches?
typically 512KB to 16MB
4th level cache = main memory
Made of DRAM
How large is fourth level cache?
1GB to 4GB for desktop, severs can have much more
5th level cache
disk/SSD (swap and files)
Processors are how much cache by area?
30-70%
Static RAM (SRAM)
6 transistors per bit
optimized for speed and density
fast (sub-nanosecond latency for small SRAM)
speed proportional to area
integrates well with standard processor logic
Dynamic RAM (DRAM)
1 transistor + 1 capacitor per bit
optimized for density
slow (>40ns internal access, ~100ns pin-to-pin)
different fabrication steps
Nonvolatile storage
magnetic disk, flash, STT, Re-RAM, PCM
Cache Lookup Algorithm
Read frame indicated by index bits
“Hit” if tag matches and valid bit is set, otherwise miss
Fill path also called what?
backside
Cache controller
finite state machine - remembers miss address, accesses next level, waits for response, writes data and tag in proper locations
%miss (miss rate)
misses/#accesses
t_hit (hit time)
time to read data from (write data to) cache
t_miss (miss penalty)
time to read data into cache
Average access time: t_avg
t_hit + %miss * t_miss
what roughly determines t_hit
cache capacity and circuits
what roughly determines t_hit
lower level memory structures
How to measure %_miss?
hardware performance counters, simulation, paper simulation
how to find offset
Log_2(block size)
how to find index
log_2(number of sets)
How to reduce %miss?
increase capacity
increase block size
What happens if you increase cache capacity?
reduce % miss, but t_hit increases