Parallel Systems: Cache Coherence, Synchronization, Communication Flashcards
What is the Shared Memory Machine Model?
Also called Dance Hall Architecture
Each CPU has a cache and its own memory
What is a Symmetric Memory Processor Model?
Each CPU has its own cache, but memory is shared by all CPUs
What is the Distributed Shared Memory Model?
Each CPU has its own cache and local memory, but can access other CPU’s memory via the network
How long does it take the SMP model to access main memory and cache?
- 100+ cycles to get to main memory
- ~2 cycles to access cache
What happens on update to a shared cache entry?
- Cache becomes snoop-y (checks for updates)
- Can invalidate other pages on change OR update all
What is the memory consistency model?
- Guarantee order of execution
- Program order + arbitrary interleaving
- Sequential consistency
What is memory consistency?
What is the model presented to the programmer
What is cache coherence?
How is the system implementing the model in the presence of private caches?
Shared address space + cache coherence
What is NCC?
Non cache coherence
Shared address space
What are the operations associated with hardware cache coherence?
- Write invalidate: invalidates cache entry if present
- Write update: send updated value to update cache entries
What is the issue with scalability?
- More processors doesn’t necessarily increase performance
- Increased overhead
Can exploit parallelism though
What are synchronization primitives for shared memory programming?
- Locks (exclusive and shared)
- Barriers
What are atomic operations?
- Guarantees no interrupts in instruction
What are examples of atomic operations?
- Test-and-set
- Patch-and-increment
- Fetch_and_Φ (Φ can be anything)
Basically Read Modify Write
Are atomic operations sufficient to ensure synchronization?
- No. Need atomicity system-wide
- Don’t look at the cache, just go straight to memory
What are the issues with scalability with synchronization?
- Latency
- Waiting Time (application problem)
- Contention
What are the goals of spin locks?
- Exploit the cache
- Disruptive of actual workers
- Avoid contention
What are spin locks?
- Spin lock located in cache
- Waiting processes spin on cached copy of lock
What are the issues with spin locks?
- Contention on lock release
- Takes N^2 operations to invalidate and update
What are spin locks with delay?
- Delay after lock release
- Delays should be staggered
- Delay with exponential back off
- Don’t need cache coherence for this to work
What is ticket lock?
- Similar to a deli situation
- Lock has a ticket number that matches the process it is serving
- Other processes have to wait until their number is called
What are the issues with ticket lock?
- Contention can happen if reading from memory
- Invalidate based protocol will cause contention
What are array based queueing locks?
- Circular queue
- Each slot has two values: has_lock, must_wait
- Array size = N number of processors
- queue_last pointer = pointer to last position in queue
- Processes point to where they are placed in the queue
- Processes spin on their spot to get the lock
- On unlock, set self to must_wait and set neighbor to has_lock
What is false sharing?
- Cache block may have multiple variables
- Variable appears shared due to cache layout/how data is stored in the cache
What is gang scheduling?
OS schedules multiple threads of the same process concurrently
How do we handle spin locks over blocking threads?
- Critical section should be small
- Avoid context switching overheads by allowing waiting threads to spin rather than block
What is a linked list based queue-ing locks?
- L stores last_requestor in queue
- Lock will point to current process
- Next process in line will spin on the lock
What is the issue with linked list based queueing locks?
Need mutual exclusion in order to work