Cache Review Flashcards by Robert Lindgren

Locality principle

“Things that will happen soon are likely to be close to things that just happened.” - lecture

The processor will tend to access the same set of memory locations repeatedly over a short period. So the next location accessed is likely to be a location that recently accessed.

This is the concept that the cache is built on. Also relevant to branch prediction.

How well did you know this?

Not at all

Perfectly

Two kinds of locality of memory references

If we recently accessed location X:

Temporal: likely to access the same address soon
Spatial: likely to access addresses near X

How well did you know this?

Not at all

Perfectly

What is the cache?

It’s a small amount of memory on the processor that allows us to store items that we recently got from main memory. It’s like borrowing a book from the library, and let’s us exploit temporal and spatial (how exactly?) memory?

I think we might also bring nearby items into the cache, hence spatial locality, but I’m not sure.

How well did you know this?

Not at all

Perfectly

Average memory access time (AMAT)

AMAT = Hit Time + (Miss Rate * Miss Penalty), where…

Hit Time = The time it takes to retrieve data when it’s in the cache
Miss Rate = The rate at which we have cache misses
Miss Penalty = The additional time (above hit time) needed to retrieve data from main memory

How well did you know this?

Not at all

Perfectly

How do we optimize Average Memory Access Time (AMAT)?

We want to balance a low Hit Time with a low Miss Rate.

Hit Time: Keeping this low requires a small and fast cache.
Miss Rate: Keeping THIS low requires a larger cache, or a smarter one. A smarter cache will do a better a job of choosing what to keep in cache, but retrieval time with will longer.

How well did you know this?

Not at all

Perfectly

How is the L1 cache organized?

The cache is a table with a tag indicating whether we have hit or not, and then the stored data.

The tag might be the block number, or a subset of the block number.

The size of stored data is called the “block size” or “line size”. Typically 32 - 128 bytes. This is large enough to take advantage of spatial locality, but small enough that we aren’t making the cache needlessly large/slow.

How well did you know this?

Not at all

Perfectly

When we fetch a block of memory to add to the cache, how do we choose a cache block start address?

If we let blocks begin anywhere (0 to 63, 1 to 64, 2 to 65, etc) then they will overlap and the same address will map to many blocks in the cache. We will have to keep them all up to date when we write, it’s complicated!

We can get around this by only using block-aligned addresses! So to 63, 64 to 127, etc.

How well did you know this?

Not at all

Perfectly

Blocks and lines

Memory stores data as blocks (many short “rows”) and the cache stores the same data as a line (one long row). Corresponding blocks and lines are the same size.

How well did you know this?

Not at all

Perfectly

Why should block sizes be divisible by 2?

We index from addresses to the cache line by using the part of the address that applies to the entire line/block. If block sizes are divisible by 2, we can accomplish this by just dividing the address by the block size, which amounts to just throwing away some number of least significant digits (because binary).

If the block size isn’t divisibly by 2, we would have to actually perform the division calculation, which would be much more expensive.

How well did you know this?

Not at all

Perfectly

Block offset and block number

We index into the cache using an address. The block number is the (upper, more significant) part of the address that would shared by all data within the block. The block offset is the remaining (less significant) part of the address, which tells us where within the block to find our data.

If we have a 32 bit address and 16 bit blocks, we need the least significant four bits (0-3) for the offset (bc 16 = 2^4). The remainder (4-31) are the block number.

How well did you know this?

Not at all

Perfectly

Valid bit

An extra bit we add to the tag to indicate whether the tag and data are valid (as opposed to just nonsense that loaded up there when the computer booted, but not real data)

So a hit means the tag == branch # (or something close) AND valid bit == TRUE

How well did you know this?

Not at all

Perfectly

Types of caches

Fully associative: any block can be stored in any line. Have to search all lines.

Set associative: there are N lines where a block may be stored, where 1 < N < # of lines

Direct mapped: each block can only be stored in one line. Have to search only one line.

How well did you know this?

Not at all

Perfectly

How does a direct mapped cache work?

Each block can only map to one line. Multiple blocks can map to the same line though. So the first few (least sig) bits of the address will be the offset like usual, then the next few will index into the cache, then the rest will be the tag, telling us exactly which block is stored there.

How well did you know this?

Not at all

Perfectly

Pros and cons of direct mapped cache

PROS
You only have to check one line for a matching block number and valid bit. This makes them:
- Fast (short hit time)
- Cheap
- Energy efficient

CONS
Blocks HAVE to go in one place. If we frequently access the same two blocks, and those blocks map to the same block, we have to keep booting them out and loading them in from memory again.
- Conflicts! (high miss rate)

How well did you know this?

Not at all

Perfectly

How does a set-associative cache work?

The cache is divided into sets, and each set divided into N lines, or ways. It is called n-way set associative

Each block will map to one set, then it has a choice between the n lines.

How well did you know this?

Not at all

Perfectly

How is an address divided into tag, index, and offset for a set-associative cache?

Study These Flashcards

The least significant bits are still the offset. The index maps to the set, and the number of bits needed for the index are determined by the number of sets. The tag lets us find data in an individual line.

Fully associative cache

Study These Flashcards

Any block can map to any line

How is an address divided into tag, index, and offset for a fully associative cache?

Study These Flashcards

The least significant bits are for the offset and the rest are for the tag! Because any block can go to any line, there are no “index” bits.

In what way are fully associative and direct mapped caches special cases of set associative caches?

Study These Flashcards

Fully associative is an n-way set associative cache where n = the number of lines.

Direct mapped is an n-way set associative cache where n = 1.

What is a general rule for dividing an address in to tag, index and offset?

Study These Flashcards

Number of offset bits = log2(Block Size)

Number of index bits = log2(number of sets). Note that log2(1) = 0, hence no index bits for fully associative.

The remaining bits are for the tag.

When do we need to perform cache replacement?

Study These Flashcards

Set is full
We have a cache miss: the data we need isn’t in the cache

What are some possible cache replacement policies?

Study These Flashcards

Random! Just pick anything and kick it out
FIFO: Track the order in which blocks are added to the cache and boot out the one added the earliest
LRU: Track how long it’s been since we’ve used a block and boot the one that’s LRU

LRU is very effective but very hard to implement. So instead we can try…
- NRMU (Not Most Recently Used): Just track the most recently used block and randomly choose from the remaining blocks

How do we implement the LRU cache?

Study These Flashcards

First, we need to add a counter, and it needs to be big enough to track the order of of all lines within a set, i.e. log2(N) where N is the size of the set.

Whenever we access a data block, whether it’s in the cache or not, we update the counter for that block to N-1 and decrement all counters that were greater than the block’s original counter number. In the case of a cache miss, this means incrementing all counters other than the one for the block we accessed.

What are the downsides of the LRU cache?

Study These Flashcards

Cost: Requires N log2(N)-bit counters in each set.
Energy: Requires that we update N counters on each access, even cache hits!

What are two possible write policies for the cache?

- Write-allocate: We add a block to the cache when we write it to memory - No-write-allocate: We don't add a block to the cache when we write it to memory Write allocate is the preferred solution because there is the allocates between reads and writes. If we just wrote a data block to memory, it's likely we will need to read it soon.

Do we write just to cache or also to memory? What are our possible policies for this?

- Write-through: Whenever we write a block to cache, we also write it to memory - Write-back: We only write blocks to memory when they're removed from the cache Write-back is preferred because we have to write to memory much less frequently, essentially when we're done using the data block for the time being.

What is the "dirty bit"?

The dirty bit is an additional bit for each block in the cache used to indicate whether that block has been written to since we brought it out of memory and put it in the cache. 0 = clean, or this block has only been read 1 = dirty, or this block has been written since being brought from memory, so we need to update memory when it's booted from the cache The dirty bit helps us write to memory only when necessary, rather than every time we boot a block from the cache.

Cache Review Flashcards

(27 cards)