Memory and Caches Flashcards by Ben Comer

DEFINE: SRAM.

Static Random Access Memory. Memory made out of transistors that simply retains data at the speed of the processor (i.e. clock rate).

How well did you know this?

Not at all

Perfectly

DEFINE: DRAM.

Dynamic Random Access Memory. Memory made out of capacitors (i.e. a lot of electrical and chemical engineering). Synonymous with “main memory.”

How well did you know this?

Not at all

Perfectly

TRUE/FALSE: DRAM is faster than SRAM (as the name implies) because it runs at processor speed.

FALSE, the opposite is true.

How well did you know this?

Not at all

Perfectly

TRUE/FALSE: Reading from DRAM requires reading from whole rows or columns (also called pages), and therefore is quite slow.

TRUE.

How well did you know this?

Not at all

Perfectly

What are the 5 (common) levels of cache hierarchy, starting with 1) Core Processing?

1) Core Processing
2) L1 Cache
3) L2 Cache
4) L3 Cache
5) DRAM/Main Memory

How well did you know this?

Not at all

Perfectly

TRUE/FALSE: As caches get larger, they also get slower.

TRUE.

How well did you know this?

Not at all

Perfectly

What is the L1 cache’s goal?

To prevent pipeline bubbles by having “memory” immediately available.

How well did you know this?

Not at all

Perfectly

TRUE/FALSE: L2 and L3 caches are good, but pipeline bubbles are inevitable when the L1 cache misses.

TRUE.

How well did you know this?

Not at all

Perfectly

DEFINE: Cache hit.

When the L1 cache contains the requested memory, thus preventing pipeline bubbles.

How well did you know this?

Not at all

Perfectly

DEFINE: Cache miss.

When the memory has to be pulled from L2, L3, or DRAM, thus creating pipeline bubbles.

How well did you know this?

Not at all

Perfectly

DEFINE: Cache block.

A unit of caching.

How well did you know this?

Not at all

Perfectly

DEFINE: Temporal Locality.

“If we notice a memory object has been used recently, we think it’ll be used again soon.”

How well did you know this?

Not at all

Perfectly

DEFINE: Spatial Locality.

“If a given memory address has been used recently, we think nearby addresses will be used soon.”

How well did you know this?

Not at all

Perfectly

What are the two main trends that caching exploits?

Temporal Locality and Spatial Locality.

How well did you know this?

Not at all

Perfectly

DEFINE: Miss penalty.

The time (or cycle) penalty for when a cache miss occurs. Tends to be outsized.

How well did you know this?

Not at all

Perfectly

DEFINE: Direct Mapped Cache.

Study These Flashcards

A cache that simply takes a modulo of the address to check for, and stores one cache block for that result.

DEFINE: Fully Associative Cache.

Study These Flashcards

A cache that uses all of its space to simply store all the most recent blocks, regardless of address. To find the cached item, a search is done.

Why aren’t Direct Mapped caches ideal?

Study These Flashcards

Conflict misses become a common issue, when two addresses modulo to the same value, and eviction can become common.

Why aren’t Fully Associative caches ideal?

Study These Flashcards

The circuit overhead is immense, and searching through the entire cache means a slow runtime. Eviction occurs only on full-capacity.

What are the three (common) types of cache misses we have to consider?

Study These Flashcards

Compulsory misses: Upon initially filling the cache, we have no choice but to take from DRAM. Unavoidable, not worth profiling.

Capacity misses: When the data was in the cache, but got evicted by another block due to capacity limits.

Conflict/Collision misses: When the data was in the cache, but got evicted by another block occupying the same slot.

In a 2-way set-associative cache capable of holding 256 cache blocks, how many caches are in parallel? How many blocks do those caches modulo over?

Study These Flashcards

2 and 128.

You are designing a CPU with a 37-bit address bus, a 29-stage pipeline, and a Harvard architecture with separate 13-bit instruction and 19-bit data buses. The CPU uses a three-level cache hierarchy where L1 cache has a block size of 37 bytes, L2 cache is fully associative with 23-way set associativity, and L3 cache uses a victim cache policy that dynamically reconfigures its 5-level prefetcher based on the Fibonacci sequence.

If the CPU’s branch predictor mispredicts on prime-numbered cycles and the instruction decoder can only decode instructions with Hamming weights greater than 5, what is the expected CPI (Cycles Per Instruction) when executing a Fibonacci sequence generator written in Brainfuck, assuming each ‘+’ operation results in a TLB miss?

Study These Flashcards

It depends.

DEFINE: Replacement policy.

Study These Flashcards

The policy or strategy the cache takes when considering which cache block in parallel to evict.

What is the optimal (if impossible) replacement policy?

Study These Flashcards

Evict whichever block is used furthest in the future.

What is a replacement policy that isn't perfect but works well, thanks to temporal locality?

Least Recently Used: Evict the block that was last used at the earliest time.

Caches often employ large blocks for storage. Why is this?

The minimum burst length over memory is a page, and storing a page is fine. We can also subdivide blocks as we please.

What is the equation for AMAT?

Average Memory Access Time = Hit Time + Miss Penalty * Miss Ratio.

Given a CPU with a CPI of 1 and a clock rate of 4 GHz, if main memory takes 100ns to access, L3 cache takes 25ns to access, L2 cache takes 5ns to access, and L1 cache takes 0ns to access... If 5% of instructions are L1 misses, 2% are L2 misses, and 0% are L3 misses, what is the effective CPI of this processor's memory instructions, or AMAT?

4 GHz converts to 0.25ns clock period. The miss penalties are: L1 miss: 5ns/0.25ns = 20c L2 miss: 25ns/0.25ns = 100c L3 miss: 100ns/0.25 = 400c CPI = 1 + 0.05 * 20 + 0.02 * 100 + 0 * 400 = 4 CPI

Given a CPU with a CPI of 1 and a clock rate of 4 GHz, if main memory takes 100ns to access, L3 cache takes 25ns to access, L2 cache takes 5ns to access, and L1 cache takes 0ns to access... If I got rid of all three caches, what is the effective CPI of this processor's memory instructions, or AMAT?

4 GHz converts to 0.25ns clock period. The miss penalties are: No matter what: 100ns/0.25ns = 400c CPI = 400 CPI

Given a CPU with a CPI of 1 and a clock rate of 4 GHz, if main memory takes 100ns to access, L3 cache takes 25ns to access, L2 cache takes 5ns to access, and L1 cache takes 0ns to access... If 10% of instructions are L1 misses, but I don't have an L2 or L3 processor, what is the effective CPI of this processor's memory instructions, or AMAT?

4 GHz converts to 0.25ns clock period. The miss penalties are: L1 miss: 100ns/0.25ns = 400c CPI = 1 + 0.10 * 400c = 41 CPI

DEFINE: Cache coherency.

When reading from an address, the most recent value must be the one we grab. This is important for multicore systems.

To maintain cache coherency, when a cache line is written by a core, what needs to happen?

All other instances of that cache line need to be invalidated.

Memory and Caches Flashcards

(32 cards)