Memory and Caches Flashcards

1
Q

DEFINE: SRAM.

A

Static Random Access Memory. Memory made out of transistors that simply retains data at the speed of the processor (i.e. clock rate).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

DEFINE: DRAM.

A

Dynamic Random Access Memory. Memory made out of capacitors (i.e. a lot of electrical and chemical engineering). Synonymous with “main memory.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

TRUE/FALSE: DRAM is faster than SRAM (as the name implies) because it runs at processor speed.

A

FALSE, the opposite is true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

TRUE/FALSE: Reading from DRAM requires reading from whole rows or columns (also called pages), and therefore is quite slow.

A

TRUE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the 5 (common) levels of cache hierarchy, starting with 1) Core Processing?

A

1) Core Processing
2) L1 Cache
3) L2 Cache
4) L3 Cache
5) DRAM/Main Memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

TRUE/FALSE: As caches get larger, they also get slower.

A

TRUE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the L1 cache’s goal?

A

To prevent pipeline bubbles by having “memory” immediately available.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

TRUE/FALSE: L2 and L3 caches are good, but pipeline bubbles are inevitable when the L1 cache misses.

A

TRUE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

DEFINE: Cache hit.

A

When the L1 cache contains the requested memory, thus preventing pipeline bubbles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

DEFINE: Cache miss.

A

When the memory has to be pulled from L2, L3, or DRAM, thus creating pipeline bubbles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

DEFINE: Cache block.

A

A unit of caching.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

DEFINE: Temporal Locality.

A

“If we notice a memory object has been used recently, we think it’ll be used again soon.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

DEFINE: Spatial Locality.

A

“If a given memory address has been used recently, we think nearby addresses will be used soon.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the two main trends that caching exploits?

A

Temporal Locality and Spatial Locality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

DEFINE: Miss penalty.

A

The time (or cycle) penalty for when a cache miss occurs. Tends to be outsized.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

DEFINE: Direct Mapped Cache.

A

A cache that simply takes a modulo of the address to check for, and stores one cache block for that result.

17
Q

DEFINE: Fully Associative Cache.

A

A cache that uses all of its space to simply store all the most recent blocks, regardless of address. To find the cached item, a search is done.

18
Q

Why aren’t Direct Mapped caches ideal?

A

Conflict misses become a common issue, when two addresses modulo to the same value, and eviction can become common.

19
Q

Why aren’t Fully Associative caches ideal?

A

The circuit overhead is immense, and searching through the entire cache means a slow runtime. Eviction occurs only on full-capacity.

20
Q

What are the three (common) types of cache misses we have to consider?

A

Compulsory misses: Upon initially filling the cache, we have no choice but to take from DRAM. Unavoidable, not worth profiling.

Capacity misses: When the data was in the cache, but got evicted by another block due to capacity limits.

Conflict/Collision misses: When the data was in the cache, but got evicted by another block occupying the same slot.

21
Q

In a 2-way set-associative cache capable of holding 256 cache blocks, how many caches are in parallel? How many blocks do those caches modulo over?

A

2 and 128.

22
Q

You are designing a CPU with a 37-bit address bus, a 29-stage pipeline, and a Harvard architecture with separate 13-bit instruction and 19-bit data buses. The CPU uses a three-level cache hierarchy where L1 cache has a block size of 37 bytes, L2 cache is fully associative with 23-way set associativity, and L3 cache uses a victim cache policy that dynamically reconfigures its 5-level prefetcher based on the Fibonacci sequence.

If the CPU’s branch predictor mispredicts on prime-numbered cycles and the instruction decoder can only decode instructions with Hamming weights greater than 5, what is the expected CPI (Cycles Per Instruction) when executing a Fibonacci sequence generator written in Brainfuck, assuming each ‘+’ operation results in a TLB miss?

A

It depends.

23
Q

DEFINE: Replacement policy.

A

The policy or strategy the cache takes when considering which cache block in parallel to evict.

24
Q

What is the optimal (if impossible) replacement policy?

A

Evict whichever block is used furthest in the future.

25
Q

What is a replacement policy that isn’t perfect but works well, thanks to temporal locality?

A

Least Recently Used: Evict the block that was last used at the earliest time.

26
Q

Caches often employ large blocks for storage. Why is this?

A

The minimum burst length over memory is a page, and storing a page is fine. We can also subdivide blocks as we please.

27
Q

What is the equation for AMAT?

A

Average Memory Access Time =
Hit Time + Miss Penalty * Miss Ratio.

28
Q

Given a CPU with a CPI of 1 and a clock rate of 4 GHz, if main memory takes 100ns to access, L3 cache takes 25ns to access, L2 cache takes 5ns to access, and L1 cache takes 0ns to access…

If 5% of instructions are L1 misses, 2% are L2 misses, and 0% are L3 misses, what is the effective CPI of this processor’s memory instructions, or AMAT?

A

4 GHz converts to 0.25ns clock period.

The miss penalties are:
L1 miss: 5ns/0.25ns = 20c
L2 miss: 25ns/0.25ns = 100c
L3 miss: 100ns/0.25 = 400c

CPI = 1 + 0.05 * 20 + 0.02 * 100 + 0 * 400 = 4 CPI

29
Q

Given a CPU with a CPI of 1 and a clock rate of 4 GHz, if main memory takes 100ns to access, L3 cache takes 25ns to access, L2 cache takes 5ns to access, and L1 cache takes 0ns to access…

If I got rid of all three caches, what is the effective CPI of this processor’s memory instructions, or AMAT?

A

4 GHz converts to 0.25ns clock period.

The miss penalties are:
No matter what: 100ns/0.25ns = 400c

CPI = 400 CPI

30
Q

Given a CPU with a CPI of 1 and a clock rate of 4 GHz, if main memory takes 100ns to access, L3 cache takes 25ns to access, L2 cache takes 5ns to access, and L1 cache takes 0ns to access…

If 10% of instructions are L1 misses, but I don’t have an L2 or L3 processor, what is the effective CPI of this processor’s memory instructions, or AMAT?

A

4 GHz converts to 0.25ns clock period.

The miss penalties are:
L1 miss: 100ns/0.25ns = 400c

CPI = 1 + 0.10 * 400c = 41 CPI

31
Q

DEFINE: Cache coherency.

A

When reading from an address, the most recent value must be the one we grab. This is important for multicore systems.

32
Q

To maintain cache coherency, when a cache line is written by a core, what needs to happen?

A

All other instances of that cache line need to be invalidated.