Lectures 9 & 10 - The Memory Hierachy Flashcards
Which caches are separate?
L1 data and L1 instruction caches. This allows access to data memory and instruction memory in the same cycle. Characteristics can also then be optimised for contents.
Which caches are unified?
L2/L3 caches are typically unified. Allows the proportion of memory dedicated to instructions and data to be dynamically adjusted depending on program requirements.
Blocks or Cache Lines
Blocks help us exploit spatial locality of data. On a cache miss we must load a minimum of one block of data from main memory into the cache.
Block Size - Optimal
Larger block size allows us to better exploit spatial locality - reduces miss rate.
Increasing block size decreases number of different blocks that can be stored. - increases miss rate
Large block size - increases miss rate.
Direct Mapped Cache
Each block has only one place it can be stored in the cache. Low-access time but may suffer from many collisions meaning cache lines may be repeatedly evicted despite free entries.
Set Associative Cache
Each block can be in n differrent places in cache.
Fully associative cache (using cams also known as CAM-tag or CAM-RAN cache)
Highly associative caches often seen in embedded systems where energy consumption is a major concern. They reduce miss rates avoiding high cost of accessing main memory.
Block Replacement Policies
Want to use invalid lines in the set if possible. Otherwise use LRU scheme or approximate LRU by FIFO or Not Last Used.
Write-Allocate
Allocate block in cache and then write
No-write allocate
Just write to lower-level memory
Write through
Write to both cache and lower level of memory when we perform a write.
Write Back
Only write dirty cache blocks to lower level of memory when they are replaced.
Average Memory Access Time
Hit Time + Miss Rate x Miss Penalty
Compulsory Cache Miss
Generated when a block is brought into the cache for the first time.
Capacity
Capacity misses are produced when the number of blocks required by a program exceeds the capacity of the cache.
Conflict
Misses caused by many blocks mapping to the same set.
Write Buffer
Hide lower throughput of underlying memory system particularly useful for write-through cache. It allows loads to bypass buffered stores.
L1 Cache
Optimised for fast access times as cache hits will be very common. They tend to be small (fast) with lower associativity.
L2/L3 Cache
Have a low miss rate and minimise off-chip/DRAM accesses. They are larger with higher associativity.
Multi-Level Inclusion
Include L1 data in L2 etc. Desirable as consistency between I/O and caches can be determined by just checking the L2.
Drawbacks:
May want different block sizes - inclusion still possible.
L2 is not much larger than L1, exclusion may be better policy.
Not required for instruction caches.
Critical Word First
Processor often only requires a single word from a cache block so read required word first and allow processor to continue execution.
Sub-block placement
Uses sub-block placement to transfer sub-blocks. Allows size of tag-RAM to be kept small without incurring large miss penalties larger blocks cause.
Victim Cache
Small fully associative cache used in conjunction with direct-mapped cache. Holds lines displaces from L1 cache. Processor checks both L1 and VC. Data is swapped if VC hits.
Aims to reduce conflict misses (20-40% of direct mapped cache misses).
Assist Cache
Small fully-associative cache used to assist direct-mapped caches.
Data enters assist cache when L1 misses, lines moved out of assist cache in FIFO order.
Helps eliminate thrashing behaviour associated with direct mapped caches.