Week 10 Flashcards
Multiprocessors Challenges and Trends
Clock Speed Limitations:
Physical constraints (e.g., Einstein’s signal propagation limits) and heat dissipation challenges restrict CPU clock speed improvements.
Transition from small components to nanoscale requires innovative approaches.
Massively Parallel Systems:
The solution lies in systems with multiple CPUs working in parallel.
Such configurations excel in computationally intense problems like weather modeling.
Shared-Memory Multiprocessors
Key Features:
Multiple CPUs share the same memory, enabling efficient communication.
Programmers work without handling the complexities of underlying message passing.
Technological Challenges:
Synchronization ensures data consistency across CPUs.
Resource allocation and task scheduling require careful OS-level management.
UMA Multiprocessors (Uniform Memory Access)
Architecture:
All CPUs access memory through a shared bus or a crossbar switch.
Cache mechanisms reduce reliance on the bus, ensuring efficiency.
Limitations and Innovations:
Bus contention grows with the number of CPUs, limiting scalability.
Crossbar and multistage networks scale UMA to medium-sized systems but face complexity in larger setups.
NUMA Multiprocessors (Non-Uniform Memory Access)
Characteristics:
A single address space combines shared and private memory.
Remote memory access is slower than local access, which NUMA-aware OS designs mitigate.
Multiprocessor Synchronization
Critical Regions: Protect data structures (e.g., tables) with mutexes to ensure only one CPU accesses them at a time.
Challenges:
Disabling interrupts only works on single-CPU systems.
Requires specific protocols for managing synchronization across multiple CPUs.
Test-and-Set Lock (TSL):
Atomic instruction that locks the bus to avoid race conditions.
Only one CPU can successfully execute a TSL operation at any given moment.
Hardware ensures no simultaneous access.
Spin Lock:
CPUs repeatedly check if the bus is unlocked. This wastes CPU cycles.
Ethernet Backoff Algorithm:
Reduces bus traffic by incrementally increasing delays (e.g., 1, 2, 4 ms) after each failed attempt.
Spinning: CPU loops while waiting for the lock, wasting cycles.
Switching: Context switches to another thread, but cold starts and context switch overheads are costly.
Optimizations: Use latency monitoring to decide between spinning and switching.
x86 Instructions:
MONITOR: Watches a memory region.
MWAIT: Waits for a change in that region, reducing active spinning.
Multiprocessor Scheduling