Chapter 5 Flashcards
Define and explain the principle of locality. Provide examples for both temporal and spatial locality.
Temporal locality: This occurs when recently accessed items are likely to be accessed again soon. For example, instructions within a loop or frequently updated variables (induction variables).
Spatial locality: This occurs when items near recently accessed items are likely to be accessed soon. For example, sequential instruction access or elements in an array.
Describe the memory hierarchy and its importance in computer architecture. Explain the roles of cache memory, DRAM, and disk storage.
Cache memory: This is the fastest and closest to the CPU, typically made of SRAM. It stores frequently accessed data to speed up access times.
DRAM (Main Memory): This is slower than cache memory but has a larger capacity. It holds the data that is actively being used by the programs.
Disk storage: This is the slowest and largest in capacity, used for long-term data storage. It includes hard drives and SSDs. Data is moved between these levels to balance speed and capacity, leveraging temporal and spatial locality.
What is a cache memory? Explain how a direct-mapped cache works, including how data is stored and retrieved.
Cache memory is a small, high-speed storage area located close to the CPU, designed to store frequently accessed data to speed up processing.
Direct-mapped cache: Each memory block maps to exactly one cache line. When accessing data, the cache uses the low-order address bits to locate the cache line. The cache line includes a tag to verify if the correct block is present and a valid bit to indicate if the cache line contains valid data.
Define and differentiate between the following terms: hit, miss, hit ratio, and miss penalty. How do these metrics affect CPU performance?
Hit: When data requested by the CPU is found in the cache.
Miss: When data requested by the CPU is not found in the cache, necessitating a fetch from the next lower memory level.
Hit ratio: The percentage of memory accesses that result in hits, calculated as hits/total accesses.
Miss penalty: The time taken to fetch a block from a lower memory level on a miss.
These metrics impact CPU performance because a higher hit ratio and lower miss penalty reduce the average time to access memory, leading to faster overall system performance.
Explain how DRAM technology works, including the need for refresh operations. Describe one advancement in DRAM technology.
DRAM stores data as a charge in capacitors, with each capacitor representing a bit. A single transistor is used to access the charge. Due to leakage, the charge must be periodically refreshed by reading and rewriting the contents.
Advancement example: Double Data Rate (DDR) DRAM transfers data on both the rising and falling edges of the clock signal, effectively doubling the data transfer rate compared to single data rate DRAM.
Compare and contrast NOR flash and NAND flash memory. Include typical use cases for each type.
NOR flash: Has a bit cell similar to a NOR gate, supports random read/write access, and is typically used for storing code in embedded systems.
NAND flash: Has a denser bit cell similar to a NAND gate, supports block-at-a-time access, and is commonly used in USB drives and SSDs. NAND flash is cheaper per GB but less flexible in read/write operations.
Compare and contrast write-through and write-back cache write policies. What are the benefits and drawbacks of each?
Write-through: Data is written to both the cache and main memory simultaneously. Benefits include simplicity and data consistency, but it increases write latency.
Write-back: Data is written only to the cache and marked as dirty. It is written to main memory only when evicted. Benefits include reduced write latency and bandwidth usage, but it requires more complex control logic to handle dirty blocks and maintain consistency.
Describe the access patterns of the C, A, and B arrays in the unoptimized DGEMM inner loop. Why might this be inefficient?
for (int j = 0; j < n; ++j) {
double cij = C[j + i * n];
for (int k = 0; k < n; ++k) {
cij += A[k + i * n] * B[j + k * n];
}
C[j + i * n] = cij;
}
The access pattern for arrays C, A, and B is as follows:
C is accessed row-wise.
A is accessed row-wise.
B is accessed column-wise.
This can be inefficient because it results in poor cache utilization. The row-wise access of A and C is more cache-friendly, but the column-wise access of B means that each access to B might result in a cache miss if the columns are large and do not fit into the cache simultaneously. This leads to frequent data replacement in the cache and increased memory access times.
Outline the key components of a disk sector and describe the process of accessing data from a disk.
Sector ID: Identifies the sector.
Data: Typically 512 bytes, although 4096 bytes is proposed.
Error Correcting Code (ECC): Used to detect and correct errors.
Synchronization fields and gaps: Ensure data integrity and timing.
Accessing data from a disk involves:
Queuing delay: If other accesses are pending.
Seek: Moving the read/write heads to the correct track.
Rotational latency: Waiting for the disk to rotate to the correct position.
Data transfer: Reading or writing the data.
Controller overhead: Additional processing time by the disk controller.
Explain the differences between direct-mapped, set-associative, and fully associative caches. How does increasing associativity impact cache performance?
Direct-mapped cache: Each memory block maps to one specific cache line.
Set-associative cache: Each memory block can map to any of a set of cache lines. For example, in a 2-way set-associative cache, each block can map to one of two cache lines.
Fully associative cache: Any memory block can be stored in any cache line.
Increasing associativity generally reduces the miss rate due to fewer conflicts but requires more complex hardware and can increase the hit time due to the need to search multiple cache lines.
Describe the purpose of multilevel caches in a memory hierarchy. How do primary (L1) and secondary (L2) caches differ in terms of design priorities?
Multilevel caches are used to bridge the speed gap between the CPU and main memory, reducing the effective memory access time.
Primary (L1) cache: Located closest to the CPU, designed for minimal hit time, small in size, and operates at the CPU clock speed.
Secondary (L2) cache: Larger than L1, designed to reduce the miss rate, serves as a buffer between L1 and main memory, and operates at a slower speed than L1 but faster than main memory.
Explain the goal of software optimization via blocking in the context of DGEMM. How does this optimization impact data access patterns and performance?
The goal of software optimization via blocking in DGEMM (Double-precision General Matrix Multiply) is to maximize the number of accesses to data before it is replaced in the cache. This is achieved by dividing the data into smaller blocks that fit into the cache, thereby reducing cache misses and improving performance. By accessing smaller blocks repeatedly, the algorithm ensures that data remains in the cache longer, reducing the need to fetch it from slower main memory. This leads to improved temporal locality and better utilization of the cache, enhancing overall computational efficiency.
Explain the concept of cache blocking in DGEMM with the provided code. How does it improve performance?
Cache blocking in DGEMM involves dividing the matrices into smaller blocks that fit into the cache. This allows for repeated access to the smaller blocks, thus reducing cache misses. Here is the provided cache-blocked DGEMM code:
#define BLOCKSIZE 32
void do_block (int n, int si, int sj, int sk, double *A, double *B, double *C) {
for (int i = si; i < si + BLOCKSIZE; ++i) {
for (int j = sj; j < sj + BLOCKSIZE; ++j) {
double cij = C[i + j * n]; // cij = C[i][j]
for (int k = sk; k < sk + BLOCKSIZE; k++) {
cij += A[i + k * n] * B[k + j * n]; // cij += A[i][k] * B[k][j]
}
C[i + j * n] = cij; // C[i][j] = cij
}
}
}
void dgemm (int n, double* A, double* B, double* C) {
for (int si = 0; si < n; si += BLOCKSIZE) {
for (int sj = 0; sj < n; sj += BLOCKSIZE) {
for (int sk = 0; sk < n; sk += BLOCKSIZE) {
do_block(n, si, sj, sk, A, B, C);
}
}
}
}
This code breaks the matrices into 32x32 blocks (determined by BLOCKSIZE). By processing these smaller blocks, the code ensures that each block of the matrix can stay in the cache during the computation, significantly reducing cache misses. This improvement in data locality results in faster memory accesses and enhanced performance.
What is the role of the Hamming SEC code in dependable memory hierarchies? Describe how it detects and corrects errors.
The Hamming Single Error Correction (SEC) code is used to detect and correct single-bit errors in data. The key roles are:
Detection: It detects single-bit errors using a minimum Hamming distance of 2.
Correction: It corrects single-bit errors with a minimum Hamming distance of 3. The encoding process involves numbering bits from 1 and placing parity bits at positions that are powers of 2. Each parity bit checks specific data bits.
For example, if the parity bits indicate an error (non-zero parity), the bit position corresponding to the parity bits’ binary value is flipped to correct the error.
Define the measures of dependability for a memory hierarchy and explain how they contribute to overall system reliability.
Reliability (Mean Time To Failure - MTTF): The average time before a system component fails. Higher MTTF indicates more reliable components.
Service Interruption (Mean Time To Repair - MTTR): The average time required to repair a failed component. Lower MTTR means quicker recovery from failures.
Mean Time Between Failures (MTBF): The sum of MTTF and MTTR, indicating the expected time between failures.
Availability: The proportion of time a system is operational, calculated as
Availability=MTTF/
MTTF+MTTR. Higher availability means the system is more reliable and accessible.
Improving these measures through fault avoidance, fault tolerance, and effective diagnosis and repair processes enhances overall system reliability and ensures continuous operation with minimal interruptions.