Intro Caches Theory Flashcards

1
Q

What is it the main goal of a cash memory?

A

To increase the performance of a computer through the memory system in order to:

  • Provide the user the illusion to use a memory that is simultaneously large and fast
  • Provide the data to the processor at high frequency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why does memory hierarchy design becomes more crucial with multicore processors?

A

Due to the aggregate peak bandwidth growth related to the increasing in the number of cores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is temporal locality?

A

When there is a reference to one memory element, the trend is to refer again to the same memory element soon

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is spatial locality?

A

When there is a reference to one memory element, the trend is to refer soon at other memory elements, whose addresses are close by

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do caches exploit both types of locality?

A

Exploits temporality by keeping the contents of recently access memory locations

And spatial locality by fetching blocks of data around recently accessed memory locations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the minimum of data that can be copied into the cash?

A

The block or cash line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the relationship between the block size in the cash memory and in the main memory and what is the rationale behind it?

A

The block sizing the cash memory must be multiple of the word size in main memory. The reason behind it is to better exploit the spatial locality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do we calculate the number of blocks in a cash?

A

We divide the cash size by the Block size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define a cash hit

A

It is when the requested data is found in one of the cash blocks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define a cash miss and how it is handled

A

It is when a requested data is not find in one of the cash blocks. In case of a miss we stall the CPU, then we require the block from the main memory, copy the block in cash and repeat the cash access

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the hit rate and how do we calculate it?

A

It is the number of memory access that find the data in the upper level with respect to the number of memory accesses.

It is calculated by the amount of hit accesses divided by the amount of memory accesses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the hit time?

A

It is the time to access the data in the upper level of the memory hierarchy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Define a miss rate

A

It is the number of memory accesses not finding the data in the upper level with respect to the total number of memory accesses. It is therefor calculated as the division between the misses and the total memory accesses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define and explain, how is the miss time calculated?

A

The miss time is the sum of the hit time and the miss penalty, where the penalty is the time needed to access the lower level and to replace the block in the upper level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the average memory access time and how it is calculated?

A

It is the sum of the hit rate and miss rate by its respective hit and miss time. It can be calculated ass the hit time plus the miss rate times the miss penalty (Hit Time + Miss Rate * Miss Penalty)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Based on how the average memory access time is calculated, how can we improve cash performance?

A

We can improve cash performance by reducing either the hit time, the miss rate or even the miss penalty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the rational behind the division of the cash into instruction and data cash

A

Since there is a higher spatial locality in instructions, the miss rate of an instruction cache would be much lower than the miss rate of a data cash

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Define a validity bit (and it’s relation with the bootstrap process)

A

Validity be it is used to indicate if the content of the cash block or line is valid or not. At the bootstrap, all the entries in the cash are marked as invalid.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Define the cash tag

A

It contains the value that univocally identifies the memory address corresponding to the stored data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Where can a block be placed in the upper level? Given the address of the block in the main memory, where the block can be placed in the cash.

A

The correspondence between the memory address and the cash address depends on the cash structure, which can be direct mapped, fully associative, and N-way set-associative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Describe the direct mapped cash and its functionalities. Describe how the memory address in the cash is composed.

A

In the direct mapped cache, each memory location correspondence to one and only one cash location.

The block address is composed by a tag and index. The full memory address also contains a block offset to control this specific data that was requested.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

This describe how is it done the verification of a given address with respect to a direct mapped cash structure

A

First, the cash verifies the index in the requested data in the memory location, which will correspondence to one and only cash line. Then he verifies the tag in order to check for the correspondence off the stored data. Then it check the validity of the data, and finally using the block offset it extract the specific requested data in the memory block.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Describe a fully associative cash and its memory composition

A

In a fully associative cash The memory block can be placed in any position of the cash, therefore, all the cash blocks must be checked during the search of the block.

The cash memory is composed by the block address, which contains the tag for memory correspondence verification, and block offset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

This describe how is it done the verification of a given address with respect to a fully associative cash structure

A

The associative cash checks first for the tag correspondence between the cash lines and the requested data memory address, then it checks the validity of the data in the cash, and finally uses the offset to extract the specific data requested

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Describe the N-way set associative and its memory address composition

A

Cash is composed of set, each set composed of N blocks, which is related to the cash and block size. Memory block can be placed in any block of the set, and therefore the search must be done on all the blocks of the set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

This describe how is it done the verification of a given address with respect to a n-way set associative cash structure

A

The cash uses the index in the block address memory requested to identify the correct set in which that memory address could be stored. Then inside this set it compares every cash line with the requested tag, then check the validity of the line, and finally it uses the offsets to extract the specific data requested

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How does the incremental of associativity impact the cash performance? What are the advantages and these advantages?

A

Advantages: reduction of miss rate

Disadvantages : higher implementation cost, increment of hit time

28
Q

How is a block found if it is in the upper level

A

To address the problem of block identification, we need to compare tag beats.

In a direct map cash, we identify the block position by index (once every memory address corresponds to a specific cash address) then we compare to tag off the block and verify the validity

Set-associative mapping, we identify the set in which the memory address could be placed then we compare the tag of the blocks in the set and then check the validity bit

In fully associative mapping, we compare the tags for every block in the cash memory and verify the validity bit

29
Q

Which block should be replaced on a miss, describe the differences in the rational between the different structures of possible caches

A

In case of a miss in a fully associative cash, we need to decide which block to replace, any block is a potential candidate for the replacement

If the cash is set associative, we need to select among the blocks in the given set

For a direct mapped cash There is only one candidate to be replaced, therefore, there is no need of any block replacement strategy.

30
Q

Are the main strategies for block replacement

A

Random, least recently used, and IIFO

31
Q

What happens on a write?

A

There are two possibilities

Right – through, in which the information is written to both the block in the cash and to the block in the lower level memory

Right – back, the information is written only in the block in cash. The modified cash block is written to the lower level memory only when it is replaced due to a miss.

32
Q

The question “is a block clean or dirty?” adequate to which right policy and what does it means?

A

It is adequate to the write – back policy, and it is related to the fact that at the end of the write in cash, the cash block becomes dirty (modified) and the main memory will contain a different with respect to the value in cash, main memory cash are not coherent

33
Q

Compare the two right policies

A

Write – through is simpler to implement, but to be effective, it requires a right buffer, to not wait for the lower level of the memory hierarchy. The read misses are cheaper because they do not require any write to the lower level of the memory hierarchy and the memory is always up-to-date.

A write-back policy, otherwise, the block can be written by the processor at the frequency at which the cash, and not the main memory, can accept it . Other positive aspect is: multiple writes to the same block require only a single write to main memory

34
Q

To what right policy would be right but for being necessary and how would it work?

A

Write buffer would be necessary in case of a write through policy

The basic idea is to insert a buffer to not wait for lower level memory access. This way, the processor writes data to the cash and to the right buffer and the memory controller writes the contents of the buffer to the memory.

35
Q

What is a problem of a write buffer

A

Write buffer saturation

36
Q

What are the possible options a buffer can take on the right miss

A

It could use the write allocate policy where it would allocate a new cash line in cash then write (double write to cash).

Or it could use a no write allocate policy, this way you simply send write data to lower level memory, do not allocate new cash line

37
Q

What are the most common combinations between right policies?

A

Write-back cash uses the write allocate option, hoping next writes to the block will be done again in cash

Write-through cash uses the no write allocate option, hoping next writes to the block will be done again in memory

38
Q

How can we reduce the miss penalty?

A

Adding a second level cash

This way we would have a L1 cash, small enough to match the Fast CPU cycle time

And another L2 cash large enough to capture many accesses that would go to main memory reducing the effective miss penalty

39
Q

How would we calculate the average memory access time for a memory hierarchy with two level cashes

A

It would be calculated as the hit time of L1, plus the miss rate of L1 times the miss penalty of L1

But the penalty of L1 is equal to the hit time of L2, plus the miss rate of L2 times the miss penalty of L2

So our final result would be:

Hit Time L1 + Miss Rate L1 * Hit Time L2 + Miss Rate L1L2 * Miss Penalty L2

40
Q

What is the difference between a local and global miss rate?

A

In local miss rate, we would have the number of misses in the cash divided by the total number of memory accesses to the cash

In Global miss rate, otherwise, is the misses in the cash divided by the total number of memory accesses generated by the CPU.

For the upper level cash, for example, we would have the global miss rate equal to local, in the lower levels otherwise we would have the Global Miss rate as the multiplication of the local miss rate of the upper level and the current level.

In other words, it indicates what fraction of memory accesses from CPU go all the way to main memory.

41
Q

Direct Mapped Cache Example

• Memory address composed of N = 32 bit

• Cache Size 64 K Byte

• Block Size 128 bit = 16 Byte

Calculate the Structure of the Memory Address

A

Block Size 128 bit = 16 Byte => 4 bit Block Offset

Numberofblocks=CacheSize/BlockSize=64KByte/16Byte= 4 K blocks => 212 blocks => 12 bit Index

Structure of the memory address:
• Byte Offset: BO = 2 bit
• Word Offset: WO = 2 bit
• Index: 12 bit
• Tag:(32–12-4)=16bit

42
Q

Fully Associative Cache Example

• Memory address composed of N = 32 bit

• Cache size 256 Byte

• Block size 128 bit = 16 Byte

Calculate the Structure of the Memory Address

A

Block size 128 bit = 16 Byte => 4 bit Block Offset

Number of blocks = Cache Size / Block Size =
= 256 Byte / 16 Byte = 16 Blocks

Structure of the memory address:
• Byte Offset: BO = 2 bit
• Word Offset: KO = 2 bit
• Tag:(32bit–4bit)=28bit

43
Q

4-way Set Associative Cache Example

• Memory address: N = 32 bit

• Cache Size 4KByte

• Block size 32 bit = 4 Byte

Calculate the Structure of the Memory Address

A

Block size 32 bit = 4 Byte => 2 bit Block Offset

Numberofblocks=CacheSize/BlockSize=4KByte/4Byte=1Kblocks

Numberofsets=CacheSize/(Blocksizexn)=4KByte/(4Bytex4)= =256 sets=28 sets=>8bitIndex

Structure of the memory address:
• Byte Offset: BO = 2 bit
• Word Offset: WO = 0 bit
• Index: 8 bit
• Tag:(32–8–2)bit=22bit

44
Q

What are the categories of cache misses

A

Compulsory misses, capacity misses, and conflict misses

45
Q

What are compulsory misses

A

Cold start misses or first reference misses

46
Q

What are capacity misses

A

If the cache cannot contain all the blocks needed during the execution of a program, capacity misses will occur due to blocks being replaced and later retrieved

Can be reduced by increasing the cache size

47
Q

What are conflict misses

A

If the block placement strategy is set, associative or direct mapping, conflict misses will occur because a block can be replaced and later retrieved when other blocks map to the same location in the cash

Conflict misses decrease as associativity increases

48
Q

What are the ways to reduce the miss rate and its consequences

A

Increase cash capacity. Which return increases heat time, area, power consumption, and cost.

Increase block size. Reduces the miss rate up to a point when the block size is too large with respect to the cash size. Reduces compulsory misses due to spatial locality.

Higher associativity. Which comes at the price of complexicity

By introducing multibanked caches

By introducing victim caches

Via pseudo-associativity and way prediction

By hardware prefetching of instructions and data

By software prefetching data

By compiler optimizations

49
Q

How can we reduce miss rate via larger block size and are the drawbacks of this technique

A

Increasing the block size reduces the miss rate up to a point when the block size is too large with respect to the cache size

Larger block size will reduce compulsory misses taking advantage of spatial locality

Main drawbacks:
 Larger blocks increase miss penalty
 Larger blocks reduce the number of blocks so increase conflict misses (and even capacity misses) if the cache is small.

50
Q

How can we reduce miss rate via higher associativity

A

Higher associativity decreases the conflict misses

Main drawbacks:
 The complexity increases hit time, area, power consumption and cost.

51
Q

How can we reduce miss rate using multibanked caches

A

Multibanked caches introduce a sort of associativity

 Organize cache as independent banks to support simultaneous access to increase the cache bandwidth

 Interleave banks according to block address to access each bank in turn (sequential interleaving)

52
Q

How can we reduce miss rate using victim caches

A

Victim cache is a small fully associative cache used as a buffer to place data discarded from cache to better exploit temporal locality

 Victim cache placed between cache and its refilling path towards the next lower-level in the hierarchy

 Victim cache is checked on a miss to see if it has the required data before going to lower-level memory

 If the block is found in Victim cache the victim block and the cache block are swapped

53
Q

How can we reduce miss rate using hardware prefetching

A

Basic idea: To exploit locality, pre-fetch next instructions (or data) before they are requested by the processor
 Pre-fetching can be done in cache or in an external stream buffer

54
Q

Give an example on how hardware prefetching works

A

Instruction pre-fetch in Alpha AXP 21064
 Fetch two blocks on a miss; the requested block (i) and the next consecutive block (i+1)
 Requested block placed in cache, while next block in instruction stream buffer
 If miss in cache, but hit in stream buffer, move stream buffer block into cache and pre-fetch next block (i+2)

55
Q

How can we reduce miss rate using software prefetching

A

Compiler-controlled pre-fetching (the compiler can help in reducing useless pre-fetching): Compiler inserts pre- fetch LOAD instructions to load data in registers/cache before they are needed

56
Q

How can we reduce miss rate by compiler optimizations

A

Basic idea: Apply profiling on SW applications, then use profiling info to apply code transformations

Managing instructions:
 Reorder instructions in memory so as to reduce conflict misses
 Profiling to look at instruction conflicts

57
Q

How can we reduce miss rate by applying way prediction

A

Way prediction is a technique used in cache memory systems to reduce the miss rate and improve access times. It helps in predicting the “way” in which a particular piece of data might be found, thus enhancing the efficiency of cache lookups. Here’s how way prediction can reduce the miss rate:

Way Prediction: Modern caches are often set-associative, meaning each set in the cache can have multiple ways (slots) where data can be stored. Way prediction involves guessing which way within the set will hold the requested data, aiming to reduce the time and energy needed to search through all possible ways.

Faster Access Times: By predicting the correct way:
- If the prediction is correct, the cache can quickly access the data, reducing the time spent on cache lookups.
- Faster access times can reduce the penalty of cache misses, indirectly contributing to performance improvement.

Reduced Conflict Misses: Way prediction can help mitigate conflict misses:
- Conflict misses occur when multiple pieces of data compete for the same cache line or set.
- By predicting and distributing data across different ways more effectively, way prediction can reduce the chances of such conflicts, thereby lowering the miss rate.

58
Q

How can we reduce the miss rate using pseudo-associativity

A

Pseudo-associativity in direct mapped caches: Divide the cache in two banks in a sort of associativity.
 If the bank prediction is correct  Hit time
 If misprediction on the first bank, check the other bank to see if there, if so have a pseudo hit (slow hit) and change the bank predictor
 Otherwise go to the lower level of hierarchy (Miss penalty)

59
Q

What are some of the instruction and data managing compilers optimizations do

A

Managing Data
a) Merging Arrays: improve spatial locality by single array of compound elements vs. 2 arrays (to operate on data in the same cache block)

b) Loop Interchange: improve spatial locality by changing loops nesting to access data in the order stored in memory (re-ordering maximizes re-use of data in a cache block)

c) Loop Fusion: improve spatial locality by combining 2 independent loops that have same looping and some variables overlap

d) Loop Blocking: Improve temporal locality by accessing “sub-blocks” of data repeatedly vs. accessing by entire columns or rows

60
Q

How can we reduce the miss penalty?

A

Give priority to reads over writes

Sub-Block placement

Early restart in critical word first

Non-blocking cashes (hit under Miss, miss under miss)

Introducing Second level cache

Merging write buffer (victim cache)

61
Q

How can we give priority to read misses over write misses and what are the drawbacks of this technique

A

Basic idea: Giving higher priority to read misses over writes to reduce the miss penalty. This is done by introducing a write buffer between CPU and the main memory

Drawback: This approach can complicate the memory access because the write buffer might hold the updated value of a memory location needed on a read miss => RAW hazard through memory.

62
Q

How does sub-blocking works and what are the drawbacks of this technique

A

Don’t have to load full block on a miss: move sub- blocks

We need valid bits per sub-block to indicate validity

Drawback: no exploiting enough the spatial locality

63
Q

How does the early restart and the critical word first techniques work

A

Usually the CPU needs just one word of the block on a miss.

Basic idea: Don’t wait for full block to be loaded before restarting CPU (by sending the requested missed word)

Early restart: Request the words in normal order from memory, but as soon as the requested word of the block arrives, send it to the CPU to let the CPU continue execution, while filling in the rest of the words in the cache block;

Critical Word First: Request the missed word first from memory and send it to the CPU as soon as it arrives to let the CPU continue execution, while filling the rest of the words in the cache block. This approach is also called requested word first.

64
Q

How does the non-blocking techniques work and what are the types of it

A

Non-blocking cache (or lockup-free cache) allows data cache to continue to supply cache hits during a previous miss (Hit under Miss);

“Hit under Miss” reduces the effective miss penalty by working during a miss instead of stalling CPUs on misses
 Requires out-of-order execution CPU: the CPU needs to do not stall on a cache miss: For example, the CPU can continue fetching instructions from I-cache while waiting for D-cache to return the missing data.
 This approach is a sort of “out-of-order” pipelined memory access

“Miss under Miss” may further lower the effective miss penalty by overlapping multiple misses
 Requires multiple memory banks (otherwise cannot support) to serve multiple misses
 Significantly increases the complexity of the cache controller as there can be multiple simultaneous memory accesses
 Pentium Pro allows 4 outstanding memory misses

65
Q

How do we improve hit time

A

Via small and simple L1 cash

By avoiding address translation. If index is physical part of address, can start tag access in parallel with address translation so that can compare to physical tag

Fast Hit Times Via Pipelined Writes
Basic idea: To pipeline Tag Check and Update Cache Data as separate stages: current write tag check & previous write cache update
The “Delayed Write Buffer”; must be checked on reads; either complete write or read from write buffer

Via Small Sub-blocks for Write Through