Lectures 9 & 10 - The Memory Hierachy Flashcards

1
Q

Which caches are separate?

A

L1 data and L1 instruction caches. This allows access to data memory and instruction memory in the same cycle. Characteristics can also then be optimised for contents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which caches are unified?

A

L2/L3 caches are typically unified. Allows the proportion of memory dedicated to instructions and data to be dynamically adjusted depending on program requirements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Blocks or Cache Lines

A

Blocks help us exploit spatial locality of data. On a cache miss we must load a minimum of one block of data from main memory into the cache.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Block Size - Optimal

A

Larger block size allows us to better exploit spatial locality - reduces miss rate.

Increasing block size decreases number of different blocks that can be stored. - increases miss rate

Large block size - increases miss rate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Direct Mapped Cache

A

Each block has only one place it can be stored in the cache. Low-access time but may suffer from many collisions meaning cache lines may be repeatedly evicted despite free entries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Set Associative Cache

A

Each block can be in n differrent places in cache.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Fully associative cache (using cams also known as CAM-tag or CAM-RAN cache)

A

Highly associative caches often seen in embedded systems where energy consumption is a major concern. They reduce miss rates avoiding high cost of accessing main memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Block Replacement Policies

A

Want to use invalid lines in the set if possible. Otherwise use LRU scheme or approximate LRU by FIFO or Not Last Used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Write-Allocate

A

Allocate block in cache and then write

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

No-write allocate

A

Just write to lower-level memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Write through

A

Write to both cache and lower level of memory when we perform a write.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Write Back

A

Only write dirty cache blocks to lower level of memory when they are replaced.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Average Memory Access Time

A

Hit Time + Miss Rate x Miss Penalty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Compulsory Cache Miss

A

Generated when a block is brought into the cache for the first time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Capacity

A

Capacity misses are produced when the number of blocks required by a program exceeds the capacity of the cache.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Conflict

A

Misses caused by many blocks mapping to the same set.

17
Q

Write Buffer

A

Hide lower throughput of underlying memory system particularly useful for write-through cache. It allows loads to bypass buffered stores.

18
Q

L1 Cache

A

Optimised for fast access times as cache hits will be very common. They tend to be small (fast) with lower associativity.

19
Q

L2/L3 Cache

A

Have a low miss rate and minimise off-chip/DRAM accesses. They are larger with higher associativity.

20
Q

Multi-Level Inclusion

A

Include L1 data in L2 etc. Desirable as consistency between I/O and caches can be determined by just checking the L2.

Drawbacks:

May want different block sizes - inclusion still possible.

L2 is not much larger than L1, exclusion may be better policy.

Not required for instruction caches.

21
Q

Critical Word First

A

Processor often only requires a single word from a cache block so read required word first and allow processor to continue execution.

22
Q

Sub-block placement

A

Uses sub-block placement to transfer sub-blocks. Allows size of tag-RAM to be kept small without incurring large miss penalties larger blocks cause.

23
Q

Victim Cache

A

Small fully associative cache used in conjunction with direct-mapped cache. Holds lines displaces from L1 cache. Processor checks both L1 and VC. Data is swapped if VC hits.

Aims to reduce conflict misses (20-40% of direct mapped cache misses).

24
Q

Assist Cache

A

Small fully-associative cache used to assist direct-mapped caches.

Data enters assist cache when L1 misses, lines moved out of assist cache in FIFO order.

Helps eliminate thrashing behaviour associated with direct mapped caches.

25
Q

Non-blocking caches

A

If out-of-order execution is permitted then could continue to execute even if we are waiting for a cache miss to be serviced. Can effective reduce the load miss penalty. In order to implement need to keep track of outstanding memory requests by adding special registers to store this information. Stored in miss status golding register.

26
Q

Programming for caches - simple optimisations

A

Working set is small enough to fit in the cache

Reposition program in memory to optimise instruction cache performance

27
Q

Loop Interchange

A

Improve spatial locality by favouring sequential accesses to non-unit strides.

28
Q

Loop fusion

A

Fuse together loops that access the same data to improve temporal locality.

29
Q

Array Merging

A

Guarantee spatial locality by reorganising data as array. Works well if data is related and accessed together.

30
Q

Array Padding

A

Shifts the position of arrays in memory to reduce problematic conflict misses.

31
Q

Cache Blocking

A

Organise computation so we access sub-matrices that fit into the cache.

32
Q

Sequential prefetcher

A

One Block Lookahead

On a cache miss: fetch required block and prefetch next sequential block into prefect buffer, on miss check this buffer first.

33
Q

Tagged sequential prefetcher

A

Adds tag bit per cache line which is set when a block is prefetched into the cache. If bit is set on a cache hit then next cacheline is prefetched.

34
Q

Striped prefetch

A

Detect regular access patters and prefetch data

35
Q

Regrence Prediction Table

A

indexed by address of load instruction. It stores last address referenced by load which is used to calculate and store delta between subsequent references. If stride between 3 such misses is constant then strided access pattern is assumed and delta is used to prefetch cache blocks.

36
Q

Helper Thread

A

A microthread which is run in parallel with the main thread to initiate memory request ahead of time. It can implement complex prefetch algorithm based on addresses of loads that miss.