P3L4 Paper: Performance of Spin Locks Flashcards

Question 1

Q

Why is spinlock performance under contention important

Answer

A

May slow down highly performant applications with heavily used lock (long queue lines of processors wanting to access lock)
A spinlock that guards a bottleneck critical section => can affect all processes on the system!!!

Question 2

Q

Describe attributes of an efficient spinlock implementation

Answer

A

Good spinlock algorithm needs to both be efficient for high contention scenarios (minimal cost of spin-waiting) as well as for low-contention (quick to acquire lock)

Question 3

Q

Name Important parameters for efficient spin lock implementation under high lock contention

Answer

A

Minimize communication bandwidth on the system bus (do not try to atomaically read the lock value in a polling fashion the whole time) => prevents other processors from doing actual work
Minimize time between releasing lock and acquired by other processor (no one is executing critical section during that time)

Tradeoff: the more we poll, the higher the bandwith overhead, but the quicker we can acquire the free lock again

Question 4

Q

Describe the novel approach to queue spin-lock implementation. When is it good, when is it bad?

Answer

A

Maintain a queue of spinning processors. Whenever the lock is freed, the next processor in line can aquire the lock.

How it works: No explicit queue is maintained with dequeue and enqueue.

Instead, each arriving processor does an atomic read-and-increment to obtain a unqiueue & ordered serial number. When lock is released, the releasing processor taps the processor with the serial number +1.

Question 5

Q

Describe advantages of the novel approach to queue spin-lock

Answer

A

Advantage

time till new processor acquires the lock and executes critical section is reduced compared to static delays (both methods) or backoffs
reduced cache invalidations
Overhead decreases with more processors

Question 6

Q

Describe disadvantages of the novel approach to queue spin-lock

Answer

A

increased lock latency (time till acquire lock)

Question 7

Q

When should queueing be used and when another method (static delay, backoff after lock release or after each memory reference)?

Answer

A

With contention:: queuing the best

- Without contention: queuing worst due to high lock latency.

Question 8

Q

Name CPU cache update methods

Answer

A

*No-write**
Do not allow write to cache - only to Memory
Each write invalidates the entry in the cache
*Write-through**
A processor ALWAYS writes to BOTH cache AND memory
Avoid cache incohereance
*Write-back**
Write goes to cache first, but write to the memory location happens later

Question 9

Q

Name cache coherence mechanisms. What is their advantage?

Answer

A

Write invalidate
Write-update
*Tradeoff**:
Write invalidate: Lower bandwith on shared interconnect!
- Reason: Only send the memory address (not the whole value! ) over the bus or i/c to update the other caches
- If address is already invalidated in other caches, subsequent updates do not cause any bus or i/c traffic (already invalidated)
  - This can amortize the cost of having to make a memory request via bus or i/c in case of a cache miss due to invalidation
Write-update: Update available instantenously! No cache miss - no need for extra memory reference.
- has higher traffic on i/c or bus as the updated value has to be updated in every CPU cache

Question 10

Q

Can atomic instruction be executed against the cache of one CPU only?

Answer

A

No. An atomic operation always bypasses the CPU cache and goes directly to the memory location. This is to avoid cache incoherence.

Question 11

Q

What are the downsides/ costs of atomic instructions?

Answer

A

Slow: always go directly to memory controller instead of cache in order to be ordered & synchronized & avoid cache coherence
Creates traffic on bus / ic due to coherence traffic (update or invalidation of CPU caches that contain that entry)

Question 12

Q

How do you evaluate atomic operations on an SMP (shared-memory multiprocessor system) vs. atomic instructions on a single-CPU system?

Answer

A

Atomic operations more expensive than on single CPU systems due to

Always content for memory (bypass cache) using the bus or i/c
Cause even more bus or I/C contention due to coherence traffic (regardless of whether the mempry location was updated or not)
(Me: Single CPU systems could theoretically operate on the cache and the underlying write-back or write-through mechanisms could update the RAM)

Question 13

Q

What kind of hardware support do spinlocks require?

Answer

A

Atomic instructions. Otherwise threads could interleave in checking if the lock is currently busy.

Question 14

Q

Name Spinlock Performance Metrics

Answer

A

Lock Latency, Delay / Lock handover waiting time, Contention

Question 15

Q

Name advantages and disadvantages of the test-and-set spinlock implementation

Answer

A

Advantage:

portable
minimal latency (just atomic operation) - metric does not consider contention
Potentially Minimal delay (spin continuously on atomic operation) - quickly acquire freed lock

Disadvantage:
- Contention: spin on atomic: processors go to memory on each spin.

Question 16

Q

How does test-on-read spinlock implementation compare to simple test-on-set in regards to contention if there are NO coherent caches?

Answer

Study These Flashcards

A

Same bad. No coherent caches -> all reads make a memory refernce (atomic) => same as simple spin on test-and-set

Question 17

Q

How is the performance for “latency” of the spinlock implementation Spin on Read compared to simple spin on test-and-set?

Answer

Study These Flashcards

A

If no contention: potentiall slightly slower than simple test-and-set as not only execute atomic operation, but read from the cache first.

Question 18

Q

How is the metric “contention” of the spinlock implementation Spin on Read given coherent caches + write update mechanism?

Answer

Study These Flashcards

A

coherent cache must exist for Spin-on-read

OK (better than simple spin-and-test)

BUT: Still problem: all caches updated at same time seeing as freed lock -> all try concurrent test-and-set
At least: test-and-set from another CPU which does not change the lock value (e.g still busy) does not invalidate the cache -> can keep spinning on cache (no bus traffic, no contention)!

P3L4 Paper: Performance of Spin Locks Flashcards

(18 cards)