P3L4 Synchronization - Spinlocks Flashcards

Question 1

Q

Paper: The Performance of Spinlock Alternatives for Shared-Memory Multiprocessors. (Anderson)

Answer

A

Efficient algorithms

arbitration is similar to control of network

method for explicitly queuing spinning processors

Question 2

Q

True or False?

For small critical sections, busy-waiting (spinning) is more efficient than relinquishing the processor.

Question 3

Q

Does the following implementaion of spinlock guarantee mutual exclusion?

init:
lock = free
To lock the spinlock:
SPIN:
If (lock == free)
lock == busy
else
goto SPIN
Unlock:
lock = free

Answer

A

No, because 2 threads can arrive at the condition statement (5) and see that the lock is free, and both will set lock to busy - but actually only one of them will set to busy - the other one busted through .. so now 2 threads will get to the critical section

Question 4

Q

Does the following implementaion of spinlock guarantee mutual exclusion?

spinlock_init(lock)
lock = free;
spinlock_lock(lock)
while (lock == busy); //spin
lock = busy;
spinlock_unlock
lock = free;

Answer

A

No, and it’s also not efficient

more than one thread can arrive at line 4 when the lock is free and move on to line 5 and then into the critical section.

One of the threads actually did lock the section, but it’s too late …

Question 5

Q

What are two guarantees of an atomic instruction like test_and_set ?

spinlock_lock(lock)

{

while(test_and_set(lock) == busy);

}

Answer

A

Instructions happen atomically - i.e. together or not at all

Mutual exclusion - threads on multiple cores cannot perform the atomic operations in parallel. All concurrent attempts to execute an instruction will be queued and performed serially

Question 6

Q

True or False?

Atomic instructions are cross-platform

Answer

A

FALSE

Which specific atomic instructions are available on a given platform varies from hardware to hardware.

There may be differences in the efficiencies on different architectures.

So may need to port and to optimize

Question 7

Q

Name 3 atomic instructions supported on some hardware architectures

Answer

A

test_and_set
read_and_increment
compare_and_swap

Question 8

Q

Shared Memory Multiprocessors

How is latency because of contention amongst CPUS for the shared memory mitigated?

Answer

A

Each of the CPUs in an SMP platform will have a cache.

Question 9

Q

Name the occurence:

A write will go directly to main memory, and any cache references will be invalidated.
The CPU write may be applied both to the cache and in memory.
Apply the write immediately to the cache, and perform the write in main memory at some later point in time.

Answer

A

A write will go directly to main memory, and any cache references will be invalidated. no-write
The CPU write may be applied both to the cache and in memory. write-through
Apply the write immediately to the cache, and perform the write in main memory at some later point in time. write-back

Question 10

Q

What happens when multiple CPUs reference the same data?

Explain cache-coherent (CC) vs. non cache coherent (NCC) hardware architectures:

Answer

A

cache coherent - If one CPU writes a new value for x in its cache, the hardware will make sure the other CPUs’ caches are updated

non-cache coherent - hardware leaves updates to the software …

Question 11

Q

What are two main cache coherence strategies employed by the hardware?

Answer

A

Hardware using the write-invalidate (WI) strategy will invalidate all cache entries of X once one CPU updates its copy of X. Future references to invalidated cache entries will have to pass through to main memory before being re-cached.
Hardware using the write-update (WU) strategy will ensure that all cache entries of X are updated once one CPU updates its copy of X. Subsequent accesses of X by any of the CPUs will continue to be served by the cache.

Question 12

Q

With write-invalidate, we pose ________ bandwidth requirements on the shared interconnect in the system.

This is because we don’t have to send the value of X, but rather just its _______ so it can be invalidated amongst the other caches.

Answer

A

With write-invalidate, we pose lower bandwidth requirements on the shared interconnect in the system.

This is because we don’t have to send the value of X, but rather just its address so it can be invalidated amongst the other caches.

Once a cache invalidation occurs, subsequent changes to X won’t require subsequent invalidations: a cache either has an entry for X or it doesn’t.

If X isn’t needed on any of the other CPUs anytime soon, its possible to amortize the cost of the coherence traffic over multiple changes.

X can change multiple times on one CPU before its value is needed on another CPU.

Question 13

Q

For write-update architectures, the benefit is that the value of X will be ___________________ on the other CPUs that need to access it.

We will not have to pay the cost of a _______________________. Programs that will need to access the new value of X immediately on another CPU will benefit from this design.

Answer

A

For write-update architectures, the benefit is that the value of X will be available immediately on the other CPUs that need to access it.

We will not have to pay the cost of a main memory access. Programs that will need to access the new value of X immediately on another CPU will benefit from this design.

Question 14

Q

True or False?

The use of write-update or write-invalidate is determined by the hardware supporting the platform.

You as the software developer have no say in which strategy is used.

Question 15

Q

Atomic instructions on SMP systems are ______ expensive than on single CPU system because of ___________________________

In addition, atomics in general are _______ expensive because they ________ the cache and always trigger coherence traffic.

Answer

A

Atomic instructions on SMP systems are more expensive than on single CPU system because of bus or I/C (interconnect) contention.

In addition, atomics in general are more expensive because they bypass the cache and always trigger coherence traffic.

Question 16

Q

Name and explain 3 metrics we want for spinlock

Answer

A

Low latency. Time it takes a thread to acquire a free lock. Ideally - immediately with a single atomic instruction.
Low delay/wait. Time required for a thread to stop spinning and acquire a lock that has been freed. Ideally - immediately!
Reduce contention on the shared bus or interconnect. Contention delays any other CPU that is trying to access memory and delays the acquisition and release of the spinlock.

Question 17

Q

Analyse ‘test_and_set(lock)’ according to the 3 metrics:

Latency (how fast it gets the lock)
Delay (how much it has to wait, spin ..)
Contention - for memory etc.

Answer

A

Latency - very good- gets lock in one instruction
Delay - good- spins and when lock is freed gets it on next instruction.
Contention - - very bad because of the constant checking if lock is free (during the spin)

Question 18

Q

Spin on read or test and test and set spinlock

Answer

A

CPUs are spinning on the atomic operation. Let’s try to separate the test part - checking the value of the lock - from the atomic.

Question 19

Q

Improve contention by …

Answer

A

delay

(in write update it’s O(n^2)