P3L4 Paper: Performance of Spin Locks Flashcards

1
Q

Why is spinlock performance under contention important

A
  • May slow down highly performant applications with heavily used lock (long queue lines of processors wanting to access lock)
  • A spinlock that guards a bottleneck critical section => can affect all processes on the system!!!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe attributes of an efficient spinlock implementation

A

Good spinlock algorithm needs to both be efficient for high contention scenarios (minimal cost of spin-waiting) as well as for low-contention (quick to acquire lock)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name Important parameters for efficient spin lock implementation under high lock contention

A
  • Minimize communication bandwidth on the system bus (do not try to atomaically read the lock value in a polling fashion the whole time) => prevents other processors from doing actual work
  • Minimize time between releasing lock and acquired by other processor (no one is executing critical section during that time)

Tradeoff: the more we poll, the higher the bandwith overhead, but the quicker we can acquire the free lock again

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe the novel approach to queue spin-lock implementation. When is it good, when is it bad?

A

Maintain a queue of spinning processors. Whenever the lock is freed, the next processor in line can aquire the lock.

How it works: No explicit queue is maintained with dequeue and enqueue.

Instead, each arriving processor does an atomic read-and-increment to obtain a unqiueue & ordered serial number. When lock is released, the releasing processor taps the processor with the serial number +1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe advantages of the novel approach to queue spin-lock

A

Advantage

  • time till new processor acquires the lock and executes critical section is reduced compared to static delays (both methods) or backoffs
  • reduced cache invalidations
  • Overhead decreases with more processors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe disadvantages of the novel approach to queue spin-lock

A
  • increased lock latency (time till acquire lock)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When should queueing be used and when another method (static delay, backoff after lock release or after each memory reference)?

A
  • With contention:: queuing the best

- Without contention: queuing worst due to high lock latency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Name CPU cache update methods

A
  • *No-write**
  • Do not allow write to cache - only to Memory
  • Each write invalidates the entry in the cache
  • *Write-through**
  • A processor ALWAYS writes to BOTH cache AND memory
  • Avoid cache incohereance
  • *Write-back**
  • Write goes to cache first, but write to the memory location happens later
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Name cache coherence mechanisms. What is their advantage?

A
  • Write invalidate
  • Write-update
  • *Tradeoff**:
  • Write invalidate: Lower bandwith on shared interconnect!
    • Reason: Only send the memory address (not the whole value! ) over the bus or i/c to update the other caches
    • If address is already invalidated in other caches, subsequent updates do not cause any bus or i/c traffic (already invalidated)
      • This can amortize the cost of having to make a memory request via bus or i/c in case of a cache miss due to invalidation
  • Write-update: Update available instantenously! No cache miss - no need for extra memory reference.
    • has higher traffic on i/c or bus as the updated value has to be updated in every CPU cache
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Can atomic instruction be executed against the cache of one CPU only?

A

No. An atomic operation always bypasses the CPU cache and goes directly to the memory location. This is to avoid cache incoherence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the downsides/ costs of atomic instructions?

A
  • Slow: always go directly to memory controller instead of cache in order to be ordered & synchronized & avoid cache coherence
  • Creates traffic on bus / ic due to coherence traffic (update or invalidation of CPU caches that contain that entry)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you evaluate atomic operations on an SMP (shared-memory multiprocessor system) vs. atomic instructions on a single-CPU system?

A

Atomic operations more expensive than on single CPU systems due to

  • Always content for memory (bypass cache) using the bus or i/c
  • Cause even more bus or I/C contention due to coherence traffic (regardless of whether the mempry location was updated or not)
  • (Me: Single CPU systems could theoretically operate on the cache and the underlying write-back or write-through mechanisms could update the RAM)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What kind of hardware support do spinlocks require?

A

Atomic instructions. Otherwise threads could interleave in checking if the lock is currently busy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Name Spinlock Performance Metrics

A

Lock Latency, Delay / Lock handover waiting time, Contention

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Name advantages and disadvantages of the test-and-set spinlock implementation

A

Advantage:

  • portable
  • minimal latency (just atomic operation) - metric does not consider contention
  • Potentially Minimal delay (spin continuously on atomic operation) - quickly acquire freed lock

Disadvantage:
- Contention: spin on atomic: processors go to memory on each spin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How does test-on-read spinlock implementation compare to simple test-on-set in regards to contention if there are NO coherent caches?

A

Same bad. No coherent caches -> all reads make a memory refernce (atomic) => same as simple spin on test-and-set

17
Q

How is the performance for “latency” of the spinlock implementation Spin on Read compared to simple spin on test-and-set?

A

If no contention: potentiall slightly slower than simple test-and-set as not only execute atomic operation, but read from the cache first.

18
Q

How is the metric “contention” of the spinlock implementation Spin on Read given coherent caches + write update mechanism?

A
  • coherent cache must exist for Spin-on-read

OK (better than simple spin-and-test)

  • BUT: Still problem: all caches updated at same time seeing as freed lock -> all try concurrent test-and-set
  • At least: test-and-set from another CPU which does not change the lock value (e.g still busy) does not invalidate the cache -> can keep spinning on cache (no bus traffic, no contention)!