P3L4: Synchronization Constructs Flashcards
Why are hardware atomic instructions necessary for implementing synchronization mechanisms?
Atomic instructions are necessary to guarantee the correctness of our synchronization mechanisms. If they aren’t used, cycles may pass between checking for availability of a lock and setting the lock. In that time, another process might get the lock instead.
Instead of separate test and set commands, we want them to be performed “together or not at all.”
Name three examples of atomic instructions
test_and_set
read_and_increment
compare_and_swap
Why are spinlocks useful?
Spinlocks are useful because they are very simple, making them a good primitive to build more complex mechanisms with. They also have very low latency and delay.
Define latency (with respect to spinlocks)
The the time it takes a thread to acquire a free lock
Define delay/wait time (with respect spinlocks)
The amount of time required for a thread to stop spinning and acquire a lock that has been freed
Why does a test_and_set have low latency?
Because it acquires a free lock within the same atomic instruction that it detects the free lock, i.e., immediately
Why does test_and_set have low delay/wait time?
Because when using test_and_set, we’re constantly spinning on the atomic. When the lock is freed, the next instruction will be test_and_set.
Would you use a spinlock in every place where you’re currently using a mutex?
Because they continuously “spin” (check whether the lock is available), they create congestion on the CPU. So no, I wouldn’t use them everywhere.
Mutexes have the benefit of relinquishing the CPU when they are unable to get a lock.
Do you understand why is it useful to have more powerful synchronization constructs, like reader-writer locks or monitors?
More powerful synchronization constructs are both more expressive and less error-prone.
Example: Reader-writer locks lets you specify what you actually want to do at a high level: you simply specify whether you want a critical section to be specified by a read lock or a write lock, then the reader-writer lock handles the specifics under the hood.
Can you work through the evolution of the spinlock implementations described in the Anderson paper, from basic test-and-set to the queuing lock?
Short version: Test-and-set -> Test-and-test-and-set -> Delay lock -> Queueing lock
Long version:
- Test-and-set goes to memory on every spin, creating congestion for main memory.
- Test-and-test-and-set solves this by first testing the cached version of the lock before going to memory. However, contention performance depends on architecture and could be better or worse.
- Delay lock uses an artificial delay after the freed lock is recognized to keep all threads from executing the atomic simultaneously. This improves contention, but obviously worsens delay.
- Queueing lock instead addresses contention by maintaining a queue of threads that want a particular lock. Needs read_and_increment to handle adding multiple threads to queue at same time, which introduces some latency. But reduces contention without adding delay.
How does a queueing lock work?
it maintains a queue of threads that want a particular lock, each with a flag indicating whether they have the lock or are waiting for the lock. Pointers are maintained for the thread with the lock and the last element of the queue.
Since multiple threads might attempt to get the lock at the same time, we need an atomic instruction for adding them to the queue: read_and_increment.
What are the pros and cons of a queueing lock?
Pros:
- Solves the issue of contention without adding delay
- Requires atomic read_and_increment because multiple threads could try to get the lock simultaneously, which adds some latency
How does the performance of test-and-test-and-set depend on architecture?
Without a cache coherent architecture, it needs to go to main memory, making it no better than test-and-set.
With cache coherence using write-update, contention is better. However, CPUs see the lock is free at once and try to set simultaneously.
With cache coherence using write-invalidate, contention is awful. Every lock attempt will create both contention for memory and invalidation traffic.
What are monitors and what is the advantage of using them?
Monitors are a higher level synchronization construct that allow us to avoid manually invoking these synchronization operations. A monitor will specify a shared resource, the entry procedures for accessing that resource, and potential condition variables used to wake up different types of waiting threads.
When invoking the entry procedure, all of the necessary locking and checking will take place by the monitor behind the scenes. When a thread is done with a shared resource and is exiting the procedure, all of the unlocking and potential signaling is again performed by the monitor behind the scenes.