CHAPTER 6: Memory Hierarchy Flashcards
temporal locality
locality principle stating that if a data location is referenced then it will tend to be referenced again soon
spatial locality
locality principle stating that if a data location is referenced, data locations with nearby addresses will tend to be referenced soon
memory hierarchy
structure that uses multiple levels of memories; as the distance from the processor increases, the size of the memories and the access time both increase
block (line)
minimum unit of information that can be either present or not present in a cache
hit rate
fraction of memory accesses found in a level of the memory hierarchy
miss rate
fraction of memory accesses not found in a level of the memory hierarchy
hit time
time required to access a level of the memory hierarchy, including the time needed to determine whether the access is a hit or a miss
miss penalty
time required to fetch a block into a level of the memory hierarchy from the lower level, including the time to access the block, transmit it from one level to the other, insert it in the level that experienced the miss, and then pass the block to the requestor
SRAM (static ram)
- memory arrays with (usually) a single access port that can provide either a read or a write - have a fixed access time to any datum, though the read and write access times may differ - don’t need to refresh and so the access time is very close to the cycle time - typically use six to eight transistors per bit to prevent the information from being disturbed when read
DRAM (dynamic ram)
- value kept in a cell is stored as a charge in a capacitor - use only one transistor per bit of storage, they are much denser and cheaper - store the charge on a capacitor, it cannot be kept indefinitely and must periodically be refreshed - use a two-level decoding structure, and this allows us to refresh an entire row (which shares a word line) with a read cycle followed immediately by a write cycle
seek
process of positioning a read/write head over the proper track on a disk
direct-mapped cache
cache structure in which each memory location is mapped to exactly one location in the cache
tag
field in a table used for a memory hierarchy that contains the address information required to identify whether the associated block in the hierarchy corresponds to a requested word

valid bit
field in the tables of a memory hierarchy that indicates that the associated block in the hierarchy contains valid data

cache miss
request for data from the cache that cannot be filled because the data are not present in the cache
write-through
scheme in which writes always update both the cache and the next lower level of the memory hierarchy, ensuring that data are always consistent between the two.
write buffer
queue that holds data while the data are waiting to be written to memory
write-back
scheme that handles writes by updating values only to the block in the cache, then writing the modified block to the lower level of the hierarchy when the block is replaced
split-cache
scheme in which a level of the memory hierarchy is composed of two independent caches that operate in parallel with each other, with one handling instructions and one handling data
fully associative cache
cache structure in which a block can be placed in any location in the cache
set-associative cache
cache that has a fixed number of locations (at least two) where each block can be placed
multilevel cache
memory hierarchy with multiple levels of caches, rather than just a cache and main memory
global miss rate
fraction of references that miss in all levels of a mutlilevel cache
local miss rate
fraction of references to one level of a cache that miss; used in multilevel hierarchies
error detection code
code that enables the detection of an error in data, but not the precise location and, hence, correction of the error
virtual memory
technique that uses main memory as a “cache” for secondary storage
protection
set of mechanisms for ensuring that multiple processes sharing the processor, memory, or I/O devices cannot interfere, intentionally or unintentionally, with one another by reading or writing each other’s data. These mechanisms also isolate the operating system from a user process
address translation (address mapping)
process by which a virtual address is mapped to an address used to access memory
virtually addressed cache
cache that is accessed with a virtual address rather than a physical address
aliasing
situation in which two addresses access the same object; it can occur in virtual memory when there are two virtual addresses for the same physical page
physically address cache
cache that is addressed by a physical address
exception enable (interrupt enable)
signal or action that controls whether the process responds to an exception or not; necessary for preventing the occurrence of exceptions during intervals before the processor has safely saved the state needed to restart
restartable instruction
instruction that can resume execution after an exception is resolved without the exception’s affecting the result of the instruction
virtual memory
name for the level of memory hierarchy that manages caching between the main memory and secondary memory
three Cs model
cache model in which all cache misses are classified into one of three categories: compulsory misses, capacity misses, and conflict misses
compulsory miss (cold-start miss)
cache miss caused by the first access to a block that has never been in the cache
capacity miss
cache miss that occurs because the cache, even with full associativity, cannot contain all the blocks needed to satisfy the request
conflict miss (collision miss)
cache miss that occurs in a set-associative or direct-mapped cache when multiple blocks compete for the same set and that are eliminated in a fully associative cache of the same size
finite-state machine
sequential logic function consisting of a set of inputs and outputs, a next-state function that maps the current state and the inputs to a new state, and an output function that maps the current state and possibly the inputs to a set of asserted outputs
next-state machine
combinational function that, given the inputs and the current state, determines the next state of a finite-state machine
false sharing
two unrelated shared variables are located in the same cache block and the full block is exchanged between processors even though the processors are accessing different variables
redundancy arrays of inexpensive disks
organization of disks that uses an array of small and inexpensive disks so as to increase both performance and reliability
raid 0
- striping across a set of disks makes the collection appear to software as a single large disk, which simplifies storage management
- improves performance for large accesses, since many disks can operate at once
- Video-editing systems, for example, frequently stripe their data and may not worry about dependability as much as, say, databases.
- no redundancy
striping
allocation of logically sequential blocks to separate disks to allow higher performance than a single disk can deliver
raid 1
- data are written to one disk, those data are also written to a redundant disk, so that there are always two copies of the information
- If a disk fails, the system just goes to the “mirror” and reads its contents to get the desired information
- most expensive RAID solution, since it requires the most disks
mirroring
writing identical data to multiple disks to increase data availability
raid 3
- reads or writes go to all disks in the group, with one extra disk to hold the check information in case there is a failure
- popular in applications with large data sets, such as multimedia and some scientific codes
- bit-interleaved parity
protection group
group of data disks or blocks that share a common check disk or block
raid 4
- parity is stored as blocks and associated with a set of data blocks
raid 5
- parity associated with each row of data blocks is no longer restricted to a single disk
- distributed parity organization
- organization allows multiple writes to occur simultaneously as long as the parity blocks are not located on the same disk
raid 6
- P & Q redundancy
- parity-based schemes protect against a single self-identifying failure
- single failure correction is not sufficient, parity can be generalized to have a second calculation over the data and another check disk of information
nonblocking cache
cache that allows the processor to make references to the cache while the cache is handling an earlier miss
pitfall: ignoring memory system behavior when writing programs or when generating code in a compiler
programmers can easily double performance if they factor the behavior of the memory system into the design of their algorithms
pitfall: forgetting to account for byte addressing or the cache block size in simulating a cache
catches many people, including the authors (in earlier drafts) and instructors who forget whether they intended the addresses to be in doublewords, words, bytes, or block numbers
pitfall: having less set associativity for a shared cache than the number of cores or threads sharing that cache
programmers could face apparently mysterious performance bugs—actually due to L2 conflict misses—when migrating from, say, a 16-core design to 32-core design if both use 16-way associative L2 caches
pitfall: using average memory access time to evaluate the memory hierarchy of an out-of-order processor
processor stalls during a cache miss, then you can separately calculate the memory-stall time and the processor execution time, and hence evaluate the memory hierarchy independently using average memory access time
pitfall: extending an address space by adding segments on top of an unsegmented address space
adding segments can turn every address into two words—one for the segment number and one for the segment offset—causing problems in the use of addresses in registers
fallacy: disk failure rates in the field match their specifications
- 100,000 disks that had quoted MTTF of 1,000,000 to 1,500,000 hours, or AFR of 0.6% to 0.8%
- 100,000 disks at Google, which had a quoted AFR of about 1.5%, saw failure rates of 1.7% for drives in their first year rise to 8.6% for drives in their third year, or about five to six times the declared rate
fallacy: operating systems are the best place to schedule disk accesses
- higher-level disk interfaces offer logical block addresses to the host operating system
- the best an OS can do to try to help performance is to sort the logical block addresses into increasing order
pitfall: implementing a virtual machine monitor on an instruction set architecture that wasn’t designed to be virtualizable
make small changes to the operating system to avoid using the troublesome pieces of the architecture
prefetching
technique in which data blocks needed in the future are brought into the cache early by using special instructions that specify the address of the block