Synchronization Flashcards
Cache Organization
caches are organized in fixed-size chunks called cache lines
* on most CPUs a cache line is 64 bytes
* data accesses go through cache, which is transparently managed by the CPU
* caches implement a replacement strategy to evict pages
Cache Coherency Protocol
CPU manages caches and provides the illusion of a single main memory
using a cache coherency protocol
Performance Implications of Cache Coherency
- reads result in copies of data in local caches
- writes invalidate cache lines on other cores
- communication between cores is fairly expensive (happens through shared
L3 or main memory)
x86-64 Memory Model
x86-64 implements Total Store Order (TSO), which is a strong memory model
* this means that x86 mostly executes the machine code as given
* loads are not reordered with respect to other loads, writes are not reordered
with respect to other writes
* however, writes are buffered (in order to hide the L1 write latency), and reads
are allowed to bypass writes
* a fence or a locked write operations will flush the write buffer (but will not
flush the cache)
* important benefit from TSO: sequentially consistent loads do not require
fences
Concurrent List-Based Set
- keys are stored in a (single-)linked list sorted by key
- head and tail are always there (unendlichkeit elements)
Coarse-Grained Locking
- use a single lock to protect the entire data structure
+ very easy to implement
− does not scale at all
Lock Coupling
hold at most two locks at a time
* interleave lock acquisitions/releases pair-wise
* read/write locks allow for concurrent readers
+ easy to implement
+ no restarts
− does not scale
Optimistic Lock Coupling
- associate lock with update counter
- write:
- acquire lock (exclude other writers)
- increment counter when unlocking
- do not acquire locks for nodes that are not modified (traverse like a reader)
- read:
- do not acquire locks, proceed optimistically
- detect concurrent modifications through counters (and restart if necessary)
- can track changes across multiple nodes (lock coupling)
+ easy to implement
+ scalable
− restarts
Terminology: Non-Blocking Algorithms
- killing a thread at any point of time should not affect consistency of the data
structure (this precludes locks) - non-blocking data structures may be beneficial for (soft) real-time
applications - classification (from strong to weak):
- wait-free: every operation is guaranteed to succeed (in a constant number of
steps) - lock-free: overall progress is guaranteed (some operations succeed, while
others may not finish) - obstruction-free: progress is only guaranteed if there is no interference from
other threads
Lock-Free List
- insert and remove are lock-free, contains is wait-free
- remove: marks node for deletion, but does not physically remove it
- marker is stored within the next pointer (by stealing a bit of the pointer)
- contains traverses marked nodes (but needs same check as Lazy variant)
+ contains always succeeds
+ scalable
− insert/remove may restart
Lazy
- contains is wait-free
- insert/remove traverse list only once (as long as there is no contention)
- add marker to each node for logically deleting a key
+ usually only one traversal necessary
+ scalable
− insert/remove have to lock
Locks
- locks are usually implemented using atomics
- waiting and suspending threads is an interesting problem 4
- tradeoff between speed on ’fast path’ and resource usage on ’slow path’
Memory Reclamation for Lock-Free Data Structures
after deleting a node in a lock-free data structure, lock-free readers might
still be accessing that node
* freeing/reusing that node immediately can cause correctness problems
and/or crashes
Reference Counting
- associate every node with an atomic counter
- effectively results in similar behavior as locks
+ easy to use
− does not scale
Epoch-Based Memory Reclamation
- global epoch counter (like timestamp, but incremented infrequently)
- per-thread, local epoch counters
- before every operation: enter epoch by setting local epoch to global epoch
- after every operation: set local epoch to ∞ indicating that this thread is not
accessing any node
+ fairly low overhead, scales well
+ no need to change data structure code (simply wrap operations)
− a single slow/blocking thread may prevent any memory reclamation