Chapter 5 Flashcards

1
Q

Define and explain the principle of locality. Provide examples for both temporal and spatial locality.

A

Temporal locality: This occurs when recently accessed items are likely to be accessed again soon. For example, instructions within a loop or frequently updated variables (induction variables).
Spatial locality: This occurs when items near recently accessed items are likely to be accessed soon. For example, sequential instruction access or elements in an array.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe the memory hierarchy and its importance in computer architecture. Explain the roles of cache memory, DRAM, and disk storage.

A

Cache memory: This is the fastest and closest to the CPU, typically made of SRAM. It stores frequently accessed data to speed up access times.
DRAM (Main Memory): This is slower than cache memory but has a larger capacity. It holds the data that is actively being used by the programs.
Disk storage: This is the slowest and largest in capacity, used for long-term data storage. It includes hard drives and SSDs. Data is moved between these levels to balance speed and capacity, leveraging temporal and spatial locality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a cache memory? Explain how a direct-mapped cache works, including how data is stored and retrieved.

A

Cache memory is a small, high-speed storage area located close to the CPU, designed to store frequently accessed data to speed up processing.

Direct-mapped cache: Each memory block maps to exactly one cache line. When accessing data, the cache uses the low-order address bits to locate the cache line. The cache line includes a tag to verify if the correct block is present and a valid bit to indicate if the cache line contains valid data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define and differentiate between the following terms: hit, miss, hit ratio, and miss penalty. How do these metrics affect CPU performance?

A

Hit: When data requested by the CPU is found in the cache.
Miss: When data requested by the CPU is not found in the cache, necessitating a fetch from the next lower memory level.
Hit ratio: The percentage of memory accesses that result in hits, calculated as hits/total accesses.
Miss penalty: The time taken to fetch a block from a lower memory level on a miss.
These metrics impact CPU performance because a higher hit ratio and lower miss penalty reduce the average time to access memory, leading to faster overall system performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain how DRAM technology works, including the need for refresh operations. Describe one advancement in DRAM technology.

A

DRAM stores data as a charge in capacitors, with each capacitor representing a bit. A single transistor is used to access the charge. Due to leakage, the charge must be periodically refreshed by reading and rewriting the contents.

Advancement example: Double Data Rate (DDR) DRAM transfers data on both the rising and falling edges of the clock signal, effectively doubling the data transfer rate compared to single data rate DRAM.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Compare and contrast NOR flash and NAND flash memory. Include typical use cases for each type.

A

NOR flash: Has a bit cell similar to a NOR gate, supports random read/write access, and is typically used for storing code in embedded systems.
NAND flash: Has a denser bit cell similar to a NAND gate, supports block-at-a-time access, and is commonly used in USB drives and SSDs. NAND flash is cheaper per GB but less flexible in read/write operations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Compare and contrast write-through and write-back cache write policies. What are the benefits and drawbacks of each?

A

Write-through: Data is written to both the cache and main memory simultaneously. Benefits include simplicity and data consistency, but it increases write latency.
Write-back: Data is written only to the cache and marked as dirty. It is written to main memory only when evicted. Benefits include reduced write latency and bandwidth usage, but it requires more complex control logic to handle dirty blocks and maintain consistency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe the access patterns of the C, A, and B arrays in the unoptimized DGEMM inner loop. Why might this be inefficient?

A

for (int j = 0; j < n; ++j) {
double cij = C[j + i * n];
for (int k = 0; k < n; ++k) {
cij += A[k + i * n] * B[j + k * n];
}
C[j + i * n] = cij;
}

The access pattern for arrays C, A, and B is as follows:

C is accessed row-wise.
A is accessed row-wise.
B is accessed column-wise.
This can be inefficient because it results in poor cache utilization. The row-wise access of A and C is more cache-friendly, but the column-wise access of B means that each access to B might result in a cache miss if the columns are large and do not fit into the cache simultaneously. This leads to frequent data replacement in the cache and increased memory access times.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Outline the key components of a disk sector and describe the process of accessing data from a disk.

A

Sector ID: Identifies the sector.
Data: Typically 512 bytes, although 4096 bytes is proposed.
Error Correcting Code (ECC): Used to detect and correct errors.
Synchronization fields and gaps: Ensure data integrity and timing.
Accessing data from a disk involves:
Queuing delay: If other accesses are pending.
Seek: Moving the read/write heads to the correct track.
Rotational latency: Waiting for the disk to rotate to the correct position.
Data transfer: Reading or writing the data.
Controller overhead: Additional processing time by the disk controller.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain the differences between direct-mapped, set-associative, and fully associative caches. How does increasing associativity impact cache performance?

A

Direct-mapped cache: Each memory block maps to one specific cache line.
Set-associative cache: Each memory block can map to any of a set of cache lines. For example, in a 2-way set-associative cache, each block can map to one of two cache lines.
Fully associative cache: Any memory block can be stored in any cache line.
Increasing associativity generally reduces the miss rate due to fewer conflicts but requires more complex hardware and can increase the hit time due to the need to search multiple cache lines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe the purpose of multilevel caches in a memory hierarchy. How do primary (L1) and secondary (L2) caches differ in terms of design priorities?

A

Multilevel caches are used to bridge the speed gap between the CPU and main memory, reducing the effective memory access time.

Primary (L1) cache: Located closest to the CPU, designed for minimal hit time, small in size, and operates at the CPU clock speed.
Secondary (L2) cache: Larger than L1, designed to reduce the miss rate, serves as a buffer between L1 and main memory, and operates at a slower speed than L1 but faster than main memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain the goal of software optimization via blocking in the context of DGEMM. How does this optimization impact data access patterns and performance?

A

The goal of software optimization via blocking in DGEMM (Double-precision General Matrix Multiply) is to maximize the number of accesses to data before it is replaced in the cache. This is achieved by dividing the data into smaller blocks that fit into the cache, thereby reducing cache misses and improving performance. By accessing smaller blocks repeatedly, the algorithm ensures that data remains in the cache longer, reducing the need to fetch it from slower main memory. This leads to improved temporal locality and better utilization of the cache, enhancing overall computational efficiency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain the concept of cache blocking in DGEMM with the provided code. How does it improve performance?

A

Cache blocking in DGEMM involves dividing the matrices into smaller blocks that fit into the cache. This allows for repeated access to the smaller blocks, thus reducing cache misses. Here is the provided cache-blocked DGEMM code:
#define BLOCKSIZE 32
void do_block (int n, int si, int sj, int sk, double *A, double *B, double *C) {
for (int i = si; i < si + BLOCKSIZE; ++i) {
for (int j = sj; j < sj + BLOCKSIZE; ++j) {
double cij = C[i + j * n]; // cij = C[i][j]
for (int k = sk; k < sk + BLOCKSIZE; k++) {
cij += A[i + k * n] * B[k + j * n]; // cij += A[i][k] * B[k][j]
}
C[i + j * n] = cij; // C[i][j] = cij
}
}
}

void dgemm (int n, double* A, double* B, double* C) {
for (int si = 0; si < n; si += BLOCKSIZE) {
for (int sj = 0; sj < n; sj += BLOCKSIZE) {
for (int sk = 0; sk < n; sk += BLOCKSIZE) {
do_block(n, si, sj, sk, A, B, C);
}
}
}
}

This code breaks the matrices into 32x32 blocks (determined by BLOCKSIZE). By processing these smaller blocks, the code ensures that each block of the matrix can stay in the cache during the computation, significantly reducing cache misses. This improvement in data locality results in faster memory accesses and enhanced performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the role of the Hamming SEC code in dependable memory hierarchies? Describe how it detects and corrects errors.

A

The Hamming Single Error Correction (SEC) code is used to detect and correct single-bit errors in data. The key roles are:

Detection: It detects single-bit errors using a minimum Hamming distance of 2.
Correction: It corrects single-bit errors with a minimum Hamming distance of 3. The encoding process involves numbering bits from 1 and placing parity bits at positions that are powers of 2. Each parity bit checks specific data bits.
For example, if the parity bits indicate an error (non-zero parity), the bit position corresponding to the parity bits’ binary value is flipped to correct the error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define the measures of dependability for a memory hierarchy and explain how they contribute to overall system reliability.

A

Reliability (Mean Time To Failure - MTTF): The average time before a system component fails. Higher MTTF indicates more reliable components.
Service Interruption (Mean Time To Repair - MTTR): The average time required to repair a failed component. Lower MTTR means quicker recovery from failures.
Mean Time Between Failures (MTBF): The sum of MTTF and MTTR, indicating the expected time between failures.
Availability: The proportion of time a system is operational, calculated as
Availability=MTTF/
MTTF+MTTR. Higher availability means the system is more reliable and accessible.

Improving these measures through fault avoidance, fault tolerance, and effective diagnosis and repair processes enhances overall system reliability and ensures continuous operation with minimal interruptions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Given the following virtual memory setup, explain how a page fault is handled. Assume a system with a 4K page size, a 32-bit virtual address, and a single-level page table.

A

Identify the Fault: The CPU traps to the operating system, indicating a page fault. The faulting virtual address is used to find the page table entry (PTE).
Check Page Presence: The OS checks if the page is present in memory. If the PTE indicates the page is not present, the OS proceeds to fetch the page from disk.
Locate the Page on Disk: The OS determines the location of the page on disk (swap space).
Choose a Page to Replace: If the memory is full, the OS selects a page to replace, typically using a replacement policy like LRU (Least Recently Used). If the selected page is dirty (modified), it is written back to disk.
Read Page into Memory: The OS reads the required page from disk into memory, updates the page table with the new physical address, and marks the page as present.
Update TLB and Resume: The Translation Look-aside Buffer (TLB) is updated with the new PTE, and the CPU resumes execution from the faulting instruction.
This process ensures that the required data is loaded into memory and accessible to the CPU, albeit with a significant delay due to the disk access involved.

16
Q

Compare and contrast the roles of the Translation Lookaside Buffer (TLB) and the cache in a memory hierarchy.

A

The TLB and cache both play crucial roles in improving memory access times, but they serve different purposes and operate at different levels of the memory hierarchy:

Translation Lookaside Buffer (TLB):

Purpose: The TLB is a specialized cache that stores recent translations of virtual addresses to physical addresses, speeding up the address translation process in virtual memory systems.
Operation: When a virtual address is accessed, the TLB is checked first. If the translation is found (TLB hit), the physical address is quickly obtained. If not (TLB miss), a page table lookup is required, which can be slower.
Size and Speed: TLBs are small (typically 16-512 entries) but very fast, with hit times of 0.5-1 cycle.
Cache:

Purpose: The cache stores copies of frequently accessed data from main memory to reduce the average time to access data.
Operation: When data is accessed, the cache is checked first. If the data is found (cache hit), it is returned quickly. If not (cache miss), the data is fetched from main memory, which is slower.
Levels and Size: Caches are larger than TLBs and organized into multiple levels (L1, L2, L3) with increasing size and access time. L1 caches are small and very fast, while L3 caches are larger but slower.
Both TLBs and caches leverage locality principles (spatial and temporal) to improve performance, but they focus on different aspects of memory access optimization.

17
Q

Discuss the impact of memory consistency and cache coherence in a multiprocessor system. How do these concepts ensure correct execution of parallel programs?

A

Memory consistency and cache coherence are critical for ensuring the correct execution of parallel programs in a multiprocessor system:

Memory Consistency:
Definition: Memory consistency defines the order in which the results of memory operations (reads and writes) are visible to all processors.
Impact: It ensures that all processors see a consistent view of memory, maintaining the correct order of operations.

18
Q

Define dependability in computing and list the key measures used to evaluate it.

A

Reliability: Mean Time to Failure (MTTF) - the average time until a system or component fails.
Service Interruption: Mean Time to Repair (MTTR) - the average time required to repair a system or component after a failure.
Mean Time Between Failures (MTBF): The sum of MTTF and MTTR.
Availability: The ratio of MTTF to the sum of MTTF and MTTR, indicating the proportion of time a system is operational.

19
Q

Describe the role of a Virtual Machine Monitor (VMM) and how it maps virtual resources to physical resources.

A

A Virtual Machine Monitor (VMM) is responsible for managing the virtualized environment by mapping virtual resources (such as memory, I/O devices, and CPUs) to physical resources. The VMM ensures that guest code runs on the native machine in user mode and traps to the VMM for privileged instructions and access to protected resources. It handles real I/O devices while emulating generic virtual I/O devices for the guest operating systems, thus providing isolation, security, and resource sharing among multiple virtual machines.

20
Q

Discuss the concept of virtual memory and its importance in modern computing systems.

A

Virtual memory is a technique that allows the use of main memory as a “cache” for secondary storage (e.g., disk). It enables programs to use more memory than physically available by providing each process with a private virtual address space. The CPU and operating system manage the translation from virtual addresses to physical addresses, using pages. This approach not only facilitates efficient memory management and multitasking but also provides protection and isolation between processes. Virtual memory is essential for running large applications and improving system stability and security.

21
Q

Explain the different types of cache misses and strategies to reduce them.

A

Compulsory Misses: Occur on the first access to a block. Reducing compulsory misses can be achieved by using larger block sizes or prefetching techniques.
Capacity Misses: Happen when the cache cannot hold all the data needed by the program. Increasing cache size or using better cache management algorithms can help reduce capacity misses.
Conflict Misses: Occur in set-associative or direct-mapped caches when multiple blocks compete for the same cache set. Increasing associativity or using more sophisticated replacement policies like LRU (Least Recently Used) can mitigate conflict misses.

22
Q

Describe the role and functioning of the Translation Lookaside Buffer (TLB) in virtual memory systems.

A

The Translation Lookaside Buffer (TLB) is a cache that stores recent translations from virtual addresses to physical addresses. Its role is to speed up the virtual address translation process. When a virtual address needs to be translated, the TLB is checked first. If the translation is found (a TLB hit), the physical address is quickly obtained. If the translation is not in the TLB (a TLB miss), the system must access the page table in memory, which takes longer. TLBs significantly reduce the overhead of address translation and improve overall system performance.

23
Q

Discuss the concept of cache coherence in multiprocessor systems and the protocols used to maintain it.

A

Cache coherence ensures that all processors in a multiprocessor system have a consistent view of memory. It prevents scenarios where different processors might have different values for the same memory location. Two main types of protocols used to maintain cache coherence are:

Snooping Protocols: Each cache monitors (or “snoops”) the bus for read/write operations and ensures that any changes to the shared data are reflected in all caches.
Directory-based Protocols: A directory keeps track of the sharing status of each block of memory. It helps coordinate access to the memory blocks to maintain consistency.
Invalidation and updating are common strategies used in these protocols to ensure that caches do not hold stale data.

24
Q

Describe the memory protection mechanisms provided by modern operating systems and hardware.

A

Privileged Supervisor Mode (Kernel Mode): Certain critical operations and access to protected resources are only allowed in this mode.
Page Tables: Used to control and translate virtual to physical memory addresses, ensuring that processes cannot access each other’s memory.
System Calls: Special instructions that allow user-mode applications to request services from the operating system, ensuring controlled access to hardware and system resources.
These mechanisms prevent unauthorized access and modifications, maintaining the integrity and security of the system.

25
Q

[IMP]
Define Average Memory Access Time (AMAT) and explain its significance in the performance of a computer system.

A

Average Memory Access Time (AMAT) is a metric that measures the average time it takes to access memory, considering both cache hits and misses. It is calculated using the formula:
AMAT=HitTime+MissRate×MissPenalty

Significance: AMAT provides a comprehensive measure of memory performance, allowing for the assessment of how well the cache hierarchy is working. Lower AMAT values indicate better performance, as the system spends less time waiting for memory accesses.

26
Q

Given the following parameters, calculate the AMAT:

L1 cache hit time: 1 cycle

L1 cache miss rate: 5%

L1 miss penalty (time to access L2): 10 cycles

L2 cache hit time: 8 cycles

L2 cache miss rate: 20%

L2 miss penalty (time to access main memory): 100 cycles

A

First, calculate the effective miss penalty for L1 (which is the AMAT for L2):

L2AMAT=L2HitTime+L2MissRate×L2MissPenalty

L2AMAT=8+0.20×100=8+20=28cycles

Next, calculate the AMAT for the overall memory system:

AMAT=L1HitTime+L1MissRate×L1MissPenalty

Here, L1 Miss Penalty is the L2 AMAT.

AMAT=1+0.05×28=1+1.4=2.4cycles

27
Q

Describe the snooping mechanism in the context of cache coherence protocols. How does it ensure data consistency across multiple caches in a multiprocessor system?

A

Snooping is a cache coherence mechanism used in multiprocessor systems to maintain consistency among caches. Each cache that holds a copy of a memory block monitors (or “snoops” on) the shared bus for any communication related to that block.

When a processor writes to a block, a broadcast is sent on the bus, and other caches check if they have a copy of the block. Depending on the coherence protocol (e.g., MESI - Modified, Exclusive, Shared, Invalid), caches may invalidate their copies, update them, or take other actions to maintain consistency.

This ensures that all processors see a consistent view of memory, preventing issues like stale data.

28
Q

Explain the MESI cache coherence protocol. What states does a cache line transition through, and what events trigger these transitions?

A

The MESI protocol defines four states for each cache line:

Modified (M): The cache line is modified (dirty) and exclusive to this cache. The memory is out of date.
Exclusive (E): The cache line is clean and exclusive to this cache. The memory is up to date.
Shared (S): The cache line may be present in other caches and is up to date.
Invalid (I): The cache line is not valid.

Transitions are triggered by read and write operations, both from the local processor and from snooped bus transactions.
Key events include:
Read Miss: Transitions from I to S or E.
Write Miss: Transitions from I to M.
Bus Read: Transitions from M to S or from E to S.

29
Q

Given the following sequence of operations, determine the state transitions for a cache line under the MESI protocol:

Processor A reads a memory block.
Processor B reads the same memory block.
Processor A writes to the memory block.
Processor B reads the memory block.

A

Initial state: I (Invalid)

Operation 1: Processor A reads a memory block.
Transition: I -> E (Exclusive)

Operation 2: Processor B reads the same memory block.
Transition: E -> S (Shared) for Processor A, I -> S for Processor B

Operation 3: Processor A writes to the memory block.
Transition: S -> M (Modified) for Processor A, S -> I (Invalid) for Processor B

Operation 4: Processor B reads the memory block.
Transition: M -> S for Processor A, I -> S for Processor B

30
Q

In a multiprocessor system with snooping, explain the process and implications of a write miss in a processor’s cache. How does the system ensure cache coherence in this scenario?

A

A write miss occurs when a processor attempts to write to a cache line that is not currently in its cache.

Process:
The processor issues a write miss on the bus.
All other caches snoop the bus and check if they have a copy of the cache line.
If any cache has the line in the Modified or Exclusive state, it writes back the data to the main memory and/or invalidates its copy.
The writing processor fetches the updated line into its cache and transitions it to the Modified state.

Implications:
Ensures that no other cache holds stale data.
Maintains a single, coherent view of memory across all processors.
Potentially increases bus traffic and latency due to write-back and invalidation.