Intro Caches Theory Flashcards

Question

Describe the N-way set associative and its memory address composition

Answer 1

Cash is composed of set, each set composed of N blocks, which is related to the cash and block size. Memory block can be placed in any block of the set, and therefore the search must be done on all the blocks of the set.

Answer 2

The cash uses the index in the block address memory requested to identify the correct set in which that memory address could be stored. Then inside this set it compares every cash line with the requested tag, then check the validity of the line, and finally it uses the offsets to extract the specific data requested

Answer 3

Advantages: reduction of miss rate Disadvantages : higher implementation cost, increment of hit time

Answer 4

To address the problem of block identification, we need to compare tag beats. In a direct map cash, we identify the block position by index (once every memory address corresponds to a specific cash address) then we compare to tag off the block and verify the validity Set-associative mapping, we identify the set in which the memory address could be placed then we compare the tag of the blocks in the set and then check the validity bit In fully associative mapping, we compare the tags for every block in the cash memory and verify the validity bit

Answer 5

In case of a miss in a fully associative cash, we need to decide which block to replace, any block is a potential candidate for the replacement If the cash is set associative, we need to select among the blocks in the given set For a direct mapped cash There is only one candidate to be replaced, therefore, there is no need of any block replacement strategy.

Answer 6

Random, least recently used, and IIFO

Answer 7

There are two possibilities Right – through, in which the information is written to both the block in the cash and to the block in the lower level memory Right – back, the information is written only in the block in cash. The modified cash block is written to the lower level memory only when it is replaced due to a miss.

Answer 8

It is adequate to the write – back policy, and it is related to the fact that at the end of the write in cash, the cash block becomes dirty (modified) and the main memory will contain a different with respect to the value in cash, main memory cash are not coherent

Answer 9

Write – through is simpler to implement, but to be effective, it requires a right buffer, to not wait for the lower level of the memory hierarchy. The read misses are cheaper because they do not require any write to the lower level of the memory hierarchy and the memory is always up-to-date. A write-back policy, otherwise, the block can be written by the processor at the frequency at which the cash, and not the main memory, can accept it . Other positive aspect is: multiple writes to the same block require only a single write to main memory

Answer 10

Write buffer would be necessary in case of a write through policy The basic idea is to insert a buffer to not wait for lower level memory access. This way, the processor writes data to the cash and to the right buffer and the memory controller writes the contents of the buffer to the memory.

Answer 11

Write buffer saturation

Answer 12

It could use the write allocate policy where it would allocate a new cash line in cash then write (double write to cash). Or it could use a no write allocate policy, this way you simply send write data to lower level memory, do not allocate new cash line

Answer 13

Write-back cash uses the write allocate option, hoping next writes to the block will be done again in cash Write-through cash uses the no write allocate option, hoping next writes to the block will be done again in memory

Answer 14

Adding a second level cash This way we would have a L1 cash, small enough to match the Fast CPU cycle time And another L2 cash large enough to capture many accesses that would go to main memory reducing the effective miss penalty

Answer 15

It would be calculated as the hit time of L1, plus the miss rate of L1 times the miss penalty of L1 But the penalty of L1 is equal to the hit time of L2, plus the miss rate of L2 times the miss penalty of L2 So our final result would be: Hit Time L1 + Miss Rate L1 * Hit Time L2 + Miss Rate L1L2 * Miss Penalty L2

Answer 16

In local miss rate, we would have the number of misses in the cash divided by the total number of memory accesses to the cash In Global miss rate, otherwise, is the misses in the cash divided by the total number of memory accesses generated by the CPU. For the upper level cash, for example, we would have the global miss rate equal to local, in the lower levels otherwise we would have the Global Miss rate as the multiplication of the local miss rate of the upper level and the current level. In other words, it indicates what fraction of memory accesses from CPU go all the way to main memory.

Answer 17

Block Size 128 bit = 16 Byte => 4 bit Block Offset Numberofblocks=CacheSize/BlockSize=64KByte/16Byte= 4 K blocks => 212 blocks => 12 bit Index Structure of the memory address: • Byte Offset: BO = 2 bit • Word Offset: WO = 2 bit • Index: 12 bit • Tag:(32–12-4)=16bit

Answer 18

Block size 128 bit = 16 Byte => 4 bit Block Offset Number of blocks = Cache Size / Block Size = = 256 Byte / 16 Byte = 16 Blocks Structure of the memory address: • Byte Offset: BO = 2 bit • Word Offset: KO = 2 bit • Tag:(32bit–4bit)=28bit

Answer 19

Block size 32 bit = 4 Byte => 2 bit Block Offset Numberofblocks=CacheSize/BlockSize=4KByte/4Byte=1Kblocks Numberofsets=CacheSize/(Blocksizexn)=4KByte/(4Bytex4)= =256 sets=28 sets=>8bitIndex Structure of the memory address: • Byte Offset: BO = 2 bit • Word Offset: WO = 0 bit • Index: 8 bit • Tag:(32–8–2)bit=22bit

Answer 20

Compulsory misses, capacity misses, and conflict misses

Answer 21

Cold start misses or first reference misses

Answer 22

If the cache cannot contain all the blocks needed during the execution of a program, capacity misses will occur due to blocks being replaced and later retrieved Can be reduced by increasing the cache size

Answer 23

If the block placement strategy is set, associative or direct mapping, conflict misses will occur because a block can be replaced and later retrieved when other blocks map to the same location in the cash Conflict misses decrease as associativity increases

Answer 24

Increase cash capacity. Which return increases heat time, area, power consumption, and cost. Increase block size. Reduces the miss rate up to a point when the block size is too large with respect to the cash size. Reduces compulsory misses due to spatial locality. Higher associativity. Which comes at the price of complexicity By introducing multibanked caches By introducing victim caches Via pseudo-associativity and way prediction By hardware prefetching of instructions and data By software prefetching data By compiler optimizations

Answer 25

Increasing the block size reduces the miss rate up to a point when the block size is too large with respect to the cache size Larger block size will reduce compulsory misses taking advantage of spatial locality Main drawbacks:  Larger blocks increase miss penalty  Larger blocks reduce the number of blocks so increase conflict misses (and even capacity misses) if the cache is small.

Answer 26

Higher associativity decreases the conflict misses Main drawbacks:  The complexity increases hit time, area, power consumption and cost.

Answer 27

Multibanked caches introduce a sort of associativity  Organize cache as independent banks to support simultaneous access to increase the cache bandwidth  Interleave banks according to block address to access each bank in turn (sequential interleaving)

Answer 28

Victim cache is a small fully associative cache used as a buffer to place data discarded from cache to better exploit temporal locality  Victim cache placed between cache and its refilling path towards the next lower-level in the hierarchy  Victim cache is checked on a miss to see if it has the required data before going to lower-level memory  If the block is found in Victim cache the victim block and the cache block are swapped

Answer 29

Basic idea: To exploit locality, pre-fetch next instructions (or data) before they are requested by the processor  Pre-fetching can be done in cache or in an external stream buffer

Answer 30

Instruction pre-fetch in Alpha AXP 21064  Fetch two blocks on a miss; the requested block (i) and the next consecutive block (i+1)  Requested block placed in cache, while next block in instruction stream buffer  If miss in cache, but hit in stream buffer, move stream buffer block into cache and pre-fetch next block (i+2)

Answer 31

Compiler-controlled pre-fetching (the compiler can help in reducing useless pre-fetching): Compiler inserts pre- fetch LOAD instructions to load data in registers/cache before they are needed

Answer 32

Basic idea: Apply profiling on SW applications, then use profiling info to apply code transformations Managing instructions:  Reorder instructions in memory so as to reduce conflict misses  Profiling to look at instruction conflicts

Answer 33

Way prediction is a technique used in cache memory systems to reduce the miss rate and improve access times. It helps in predicting the "way" in which a particular piece of data might be found, thus enhancing the efficiency of cache lookups. Here’s how way prediction can reduce the miss rate: **Way Prediction**: Modern caches are often set-associative, meaning each set in the cache can have multiple ways (slots) where data can be stored. Way prediction involves guessing which way within the set will hold the requested data, aiming to reduce the time and energy needed to search through all possible ways. ### How Way Prediction Reduces Miss Rate **Faster Access Times**: By predicting the correct way: - If the prediction is correct, the cache can quickly access the data, reducing the time spent on cache lookups. - Faster access times can reduce the penalty of cache misses, indirectly contributing to performance improvement. **Reduced Conflict Misses**: Way prediction can help mitigate conflict misses: - Conflict misses occur when multiple pieces of data compete for the same cache line or set. - By predicting and distributing data across different ways more effectively, way prediction can reduce the chances of such conflicts, thereby lowering the miss rate.

Answer 34

Pseudo-associativity in direct mapped caches: Divide the cache in two banks in a sort of associativity.  If the bank prediction is correct  Hit time  If misprediction on the first bank, check the other bank to see if there, if so have a pseudo hit (slow hit) and change the bank predictor  Otherwise go to the lower level of hierarchy (Miss penalty)

Answer 35

Managing Data a) Merging Arrays: improve spatial locality by single array of compound elements vs. 2 arrays (to operate on data in the same cache block) b) Loop Interchange: improve spatial locality by changing loops nesting to access data in the order stored in memory (re-ordering maximizes re-use of data in a cache block) c) Loop Fusion: improve spatial locality by combining 2 independent loops that have same looping and some variables overlap d) Loop Blocking: Improve temporal locality by accessing “sub-blocks” of data repeatedly vs. accessing by entire columns or rows

Answer 36

Give priority to reads over writes Sub-Block placement Early restart in critical word first Non-blocking cashes (hit under Miss, miss under miss) Introducing Second level cache Merging write buffer (victim cache)

Answer 37

Basic idea: Giving higher priority to read misses over writes to reduce the miss penalty. This is done by introducing a write buffer between CPU and the main memory Drawback: This approach can complicate the memory access because the write buffer might hold the updated value of a memory location needed on a read miss => RAW hazard through memory.

Answer 38

Don’t have to load full block on a miss: move sub- blocks We need valid bits per sub-block to indicate validity Drawback: no exploiting enough the spatial locality

Answer 39

Usually the CPU needs just one word of the block on a miss. Basic idea: Don’t wait for full block to be loaded before restarting CPU (by sending the requested missed word) Early restart: Request the words in normal order from memory, but as soon as the requested word of the block arrives, send it to the CPU to let the CPU continue execution, while filling in the rest of the words in the cache block; Critical Word First: Request the missed word first from memory and send it to the CPU as soon as it arrives to let the CPU continue execution, while filling the rest of the words in the cache block. This approach is also called requested word first.

Answer 40

Non-blocking cache (or lockup-free cache) allows data cache to continue to supply cache hits during a previous miss (Hit under Miss); “Hit under Miss” reduces the effective miss penalty by working during a miss instead of stalling CPUs on misses  Requires out-of-order execution CPU: the CPU needs to do not stall on a cache miss: For example, the CPU can continue fetching instructions from I-cache while waiting for D-cache to return the missing data.  This approach is a sort of “out-of-order” pipelined memory access “Miss under Miss” may further lower the effective miss penalty by overlapping multiple misses  Requires multiple memory banks (otherwise cannot support) to serve multiple misses  Significantly increases the complexity of the cache controller as there can be multiple simultaneous memory accesses  Pentium Pro allows 4 outstanding memory misses

Answer 41

Via small and simple L1 cash By avoiding address translation. If index is physical part of address, can start tag access in parallel with address translation so that can compare to physical tag Fast Hit Times Via Pipelined Writes Basic idea: To pipeline Tag Check and Update Cache Data as separate stages: current write tag check & previous write cache update The “Delayed Write Buffer”; must be checked on reads; either complete write or read from write buffer Via Small Sub-blocks for Write Through

Intro Caches Theory Flashcards

(65 cards)