P3L4 Synchronization - Spinlocks Flashcards
Paper: The Performance of Spinlock Alternatives for Shared-Memory Multiprocessors. (Anderson)
Efficient algorithms
arbitration is similar to control of network
method for explicitly queuing spinning processors
True or False?
For small critical sections, busy-waiting (spinning) is more efficient than relinquishing the processor.
True
Does the following implementaion of spinlock guarantee mutual exclusion?
- init:
- lock = free
- To lock the spinlock:
- SPIN:
- If (lock == free)
- lock == busy
- else
- goto SPIN
- Unlock:
- lock = free
No, because 2 threads can arrive at the condition statement (5) and see that the lock is free, and both will set lock to busy - but actually only one of them will set to busy - the other one busted through .. so now 2 threads will get to the critical section
Does the following implementaion of spinlock guarantee mutual exclusion?
- spinlock_init(lock)
- lock = free;
- spinlock_lock(lock)
- while (lock == busy); //spin
- lock = busy;
- spinlock_unlock
- lock = free;
No, and it’s also not efficient
more than one thread can arrive at line 4 when the lock is free and move on to line 5 and then into the critical section.
One of the threads actually did lock the section, but it’s too late …
What are two guarantees of an atomic instruction like test_and_set ?
spinlock_lock(lock)
{
while(test_and_set(lock) == busy);
}
Instructions happen atomically - i.e. together or not at all
Mutual exclusion - threads on multiple cores cannot perform the atomic operations in parallel. All concurrent attempts to execute an instruction will be queued and performed serially
True or False?
Atomic instructions are cross-platform
FALSE
Which specific atomic instructions are available on a given platform varies from hardware to hardware.
There may be differences in the efficiencies on different architectures.
So may need to port and to optimize
Name 3 atomic instructions supported on some hardware architectures
- test_and_set
- read_and_increment
- compare_and_swap
Shared Memory Multiprocessors
How is latency because of contention amongst CPUS for the shared memory mitigated?
Each of the CPUs in an SMP platform will have a cache.
Name the occurence:
- A write will go directly to main memory, and any cache references will be invalidated.
- The CPU write may be applied both to the cache and in memory.
- Apply the write immediately to the cache, and perform the write in main memory at some later point in time.
- A write will go directly to main memory, and any cache references will be invalidated. no-write
- The CPU write may be applied both to the cache and in memory. write-through
- Apply the write immediately to the cache, and perform the write in main memory at some later point in time. write-back
What happens when multiple CPUs reference the same data?
Explain cache-coherent (CC) vs. non cache coherent (NCC) hardware architectures:
cache coherent - If one CPU writes a new value for x in its cache, the hardware will make sure the other CPUs’ caches are updated
non-cache coherent - hardware leaves updates to the software …
What are two main cache coherence strategies employed by the hardware?
- Hardware using the write-invalidate (WI) strategy will invalidate all cache entries of X once one CPU updates its copy of X. Future references to invalidated cache entries will have to pass through to main memory before being re-cached.
- Hardware using the write-update (WU) strategy will ensure that all cache entries of X are updated once one CPU updates its copy of X. Subsequent accesses of X by any of the CPUs will continue to be served by the cache.
With write-invalidate, we pose ________ bandwidth requirements on the shared interconnect in the system.
This is because we don’t have to send the value of X, but rather just its _______ so it can be invalidated amongst the other caches.
With write-invalidate, we pose lower bandwidth requirements on the shared interconnect in the system.
This is because we don’t have to send the value of X, but rather just its address so it can be invalidated amongst the other caches.
Once a cache invalidation occurs, subsequent changes to X won’t require subsequent invalidations: a cache either has an entry for X or it doesn’t.
If X isn’t needed on any of the other CPUs anytime soon, its possible to amortize the cost of the coherence traffic over multiple changes.
X can change multiple times on one CPU before its value is needed on another CPU.
For write-update architectures, the benefit is that the value of X will be ___________________ on the other CPUs that need to access it.
We will not have to pay the cost of a _______________________. Programs that will need to access the new value of X immediately on another CPU will benefit from this design.
For write-update architectures, the benefit is that the value of X will be available immediately on the other CPUs that need to access it.
We will not have to pay the cost of a main memory access. Programs that will need to access the new value of X immediately on another CPU will benefit from this design.
True or False?
The use of write-update or write-invalidate is determined by the hardware supporting the platform.
You as the software developer have no say in which strategy is used.
TRUE
Atomic instructions on SMP systems are ______ expensive than on single CPU system because of ___________________________
In addition, atomics in general are _______ expensive because they ________ the cache and always trigger coherence traffic.
Atomic instructions on SMP systems are more expensive than on single CPU system because of bus or I/C (interconnect) contention.
In addition, atomics in general are more expensive because they bypass the cache and always trigger coherence traffic.