P3L4 Synchronization - Spinlocks Flashcards

1
Q

Paper: The Performance of Spinlock Alternatives for Shared-Memory Multiprocessors. (Anderson)

A

Efficient algorithms

arbitration is similar to control of network

method for explicitly queuing spinning processors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

True or False?

For small critical sections, busy-waiting (spinning) is more efficient than relinquishing the processor.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Does the following implementaion of spinlock guarantee mutual exclusion?

  1. init:
  2. lock = free
  3. To lock the spinlock:
  4. SPIN:
  5. If (lock == free)
  6. lock == busy
  7. else
  8. goto SPIN
  9. Unlock:
  10. lock = free
A

No, because 2 threads can arrive at the condition statement (5) and see that the lock is free, and both will set lock to busy - but actually only one of them will set to busy - the other one busted through .. so now 2 threads will get to the critical section

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Does the following implementaion of spinlock guarantee mutual exclusion?

  1. spinlock_init(lock)
  2. lock = free;
  3. spinlock_lock(lock)
  4. while (lock == busy); //spin
  5. lock = busy;
  6. spinlock_unlock
  7. lock = free;
A

No, and it’s also not efficient

more than one thread can arrive at line 4 when the lock is free and move on to line 5 and then into the critical section.

One of the threads actually did lock the section, but it’s too late …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are two guarantees of an atomic instruction like test_and_set ?

spinlock_lock(lock)

{

while(test_and_set(lock) == busy);

}

A

Instructions happen atomically - i.e. together or not at all

Mutual exclusion - threads on multiple cores cannot perform the atomic operations in parallel. All concurrent attempts to execute an instruction will be queued and performed serially

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

True or False?

Atomic instructions are cross-platform

A

FALSE

Which specific atomic instructions are available on a given platform varies from hardware to hardware.

There may be differences in the efficiencies on different architectures.

So may need to port and to optimize

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Name 3 atomic instructions supported on some hardware architectures

A
  • test_and_set
  • read_and_increment
  • compare_and_swap
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Shared Memory Multiprocessors

How is latency because of contention amongst CPUS for the shared memory mitigated?

A

Each of the CPUs in an SMP platform will have a cache.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Name the occurence:

  1. A write will go directly to main memory, and any cache references will be invalidated.
  2. The CPU write may be applied both to the cache and in memory.
  3. Apply the write immediately to the cache, and perform the write in main memory at some later point in time.
A
  1. A write will go directly to main memory, and any cache references will be invalidated. no-write
  2. The CPU write may be applied both to the cache and in memory. write-through
  3. Apply the write immediately to the cache, and perform the write in main memory at some later point in time. write-back
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What happens when multiple CPUs reference the same data?

Explain cache-coherent (CC) vs. non cache coherent (NCC) hardware architectures:

A

cache coherent - If one CPU writes a new value for x in its cache, the hardware will make sure the other CPUs’ caches are updated

non-cache coherent - hardware leaves updates to the software …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are two main cache coherence strategies employed by the hardware?

A
  1. Hardware using the write-invalidate (WI) strategy will invalidate all cache entries of X once one CPU updates its copy of X. Future references to invalidated cache entries will have to pass through to main memory before being re-cached.
  2. Hardware using the write-update (WU) strategy will ensure that all cache entries of X are updated once one CPU updates its copy of X. Subsequent accesses of X by any of the CPUs will continue to be served by the cache.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

With write-invalidate, we pose ________ bandwidth requirements on the shared interconnect in the system.

This is because we don’t have to send the value of X, but rather just its _______ so it can be invalidated amongst the other caches.

A

With write-invalidate, we pose lower bandwidth requirements on the shared interconnect in the system.

This is because we don’t have to send the value of X, but rather just its address so it can be invalidated amongst the other caches.

Once a cache invalidation occurs, subsequent changes to X won’t require subsequent invalidations: a cache either has an entry for X or it doesn’t.

If X isn’t needed on any of the other CPUs anytime soon, its possible to amortize the cost of the coherence traffic over multiple changes.

X can change multiple times on one CPU before its value is needed on another CPU.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

For write-update architectures, the benefit is that the value of X will be ___________________ on the other CPUs that need to access it.

We will not have to pay the cost of a _______________________. Programs that will need to access the new value of X immediately on another CPU will benefit from this design.

A

For write-update architectures, the benefit is that the value of X will be available immediately on the other CPUs that need to access it.

We will not have to pay the cost of a main memory access. Programs that will need to access the new value of X immediately on another CPU will benefit from this design.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

True or False?

The use of write-update or write-invalidate is determined by the hardware supporting the platform.

You as the software developer have no say in which strategy is used.

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Atomic instructions on SMP systems are ______ expensive than on single CPU system because of ___________________________

In addition, atomics in general are _______ expensive because they ________ the cache and always trigger coherence traffic.

A

Atomic instructions on SMP systems are more expensive than on single CPU system because of bus or I/C (interconnect) contention.

In addition, atomics in general are more expensive because they bypass the cache and always trigger coherence traffic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Name and explain 3 metrics we want for spinlock

A
  1. Low latency. Time it takes a thread to acquire a free lock. Ideally - immediately with a single atomic instruction.
  2. Low delay/wait. Time required for a thread to stop spinning and acquire a lock that has been freed. Ideally - immediately!
  3. Reduce contention on the shared bus or interconnect. Contention delays any other CPU that is trying to access memory and delays the acquisition and release of the spinlock.
17
Q

Analyse ‘test_and_set(lock)’ according to the 3 metrics:

  1. Latency (how fast it gets the lock)
  2. Delay (how much it has to wait, spin ..)
  3. Contention - for memory etc.
A
  1. Latency - very good- gets lock in one instruction
  2. Delay - good- spins and when lock is freed gets it on next instruction.
  3. Contention - - very bad because of the constant checking if lock is free (during the spin)
18
Q

Spin on read or test and test and set spinlock

A

CPUs are spinning on the atomic operation. Let’s try to separate the test part - checking the value of the lock - from the atomic.

19
Q

Improve contention by …

A

delay

(in write update it’s O(n^2)