Cache Review Flashcards
Locality principle
“Things that will happen soon are likely to be close to things that just happened.” - lecture
The processor will tend to access the same set of memory locations repeatedly over a short period. So the next location accessed is likely to be a location that recently accessed.
This is the concept that the cache is built on. Also relevant to branch prediction.
Two kinds of locality of memory references
If we recently accessed location X:
- Temporal: likely to access the same address soon
- Spatial: likely to access addresses near X
What is the cache?
It’s a small amount of memory on the processor that allows us to store items that we recently got from main memory. It’s like borrowing a book from the library, and let’s us exploit temporal and spatial (how exactly?) memory?
I think we might also bring nearby items into the cache, hence spatial locality, but I’m not sure.
Average memory access time (AMAT)
AMAT = Hit Time + (Miss Rate * Miss Penalty), where…
Hit Time = The time it takes to retrieve data when it’s in the cache
Miss Rate = The rate at which we have cache misses
Miss Penalty = The additional time (above hit time) needed to retrieve data from main memory
How do we optimize Average Memory Access Time (AMAT)?
We want to balance a low Hit Time with a low Miss Rate.
Hit Time: Keeping this low requires a small and fast cache.
Miss Rate: Keeping THIS low requires a larger cache, or a smarter one. A smarter cache will do a better a job of choosing what to keep in cache, but retrieval time with will longer.
How is the L1 cache organized?
The cache is a table with a tag indicating whether we have hit or not, and then the stored data.
The tag might be the block number, or a subset of the block number.
The size of stored data is called the “block size” or “line size”. Typically 32 - 128 bytes. This is large enough to take advantage of spatial locality, but small enough that we aren’t making the cache needlessly large/slow.
When we fetch a block of memory to add to the cache, how do we choose a cache block start address?
If we let blocks begin anywhere (0 to 63, 1 to 64, 2 to 65, etc) then they will overlap and the same address will map to many blocks in the cache. We will have to keep them all up to date when we write, it’s complicated!
We can get around this by only using block-aligned addresses! So to 63, 64 to 127, etc.
Blocks and lines
Memory stores data as blocks (many short “rows”) and the cache stores the same data as a line (one long row). Corresponding blocks and lines are the same size.
Why should block sizes be divisible by 2?
We index from addresses to the cache line by using the part of the address that applies to the entire line/block. If block sizes are divisible by 2, we can accomplish this by just dividing the address by the block size, which amounts to just throwing away some number of least significant digits (because binary).
If the block size isn’t divisibly by 2, we would have to actually perform the division calculation, which would be much more expensive.
Block offset and block number
We index into the cache using an address. The block number is the (upper, more significant) part of the address that would shared by all data within the block. The block offset is the remaining (less significant) part of the address, which tells us where within the block to find our data.
If we have a 32 bit address and 16 bit blocks, we need the least significant four bits (0-3) for the offset (bc 16 = 2^4). The remainder (4-31) are the block number.
Valid bit
An extra bit we add to the tag to indicate whether the tag and data are valid (as opposed to just nonsense that loaded up there when the computer booted, but not real data)
So a hit means the tag == branch # (or something close) AND valid bit == TRUE
Types of caches
Fully associative: any block can be stored in any line. Have to search all lines.
Set associative: there are N lines where a block may be stored, where 1 < N < # of lines
Direct mapped: each block can only be stored in one line. Have to search only one line.
How does a direct mapped cache work?
Each block can only map to one line. Multiple blocks can map to the same line though. So the first few (least sig) bits of the address will be the offset like usual, then the next few will index into the cache, then the rest will be the tag, telling us exactly which block is stored there.
Pros and cons of direct mapped cache
PROS
You only have to check one line for a matching block number and valid bit. This makes them:
- Fast (short hit time)
- Cheap
- Energy efficient
CONS
Blocks HAVE to go in one place. If we frequently access the same two blocks, and those blocks map to the same block, we have to keep booting them out and loading them in from memory again.
- Conflicts! (high miss rate)
How does a set-associative cache work?
The cache is divided into sets, and each set divided into N lines, or ways. It is called n-way set associative
Each block will map to one set, then it has a choice between the n lines.