Synchronization Flashcards

Question 1

Q

Cache Organization

Answer

A

caches are organized in fixed-size chunks called cache lines
* on most CPUs a cache line is 64 bytes
* data accesses go through cache, which is transparently managed by the CPU
* caches implement a replacement strategy to evict pages

Question 2

Q

Cache Coherency Protocol

Answer

A

CPU manages caches and provides the illusion of a single main memory
using a cache coherency protocol

Question 3

Q

Performance Implications of Cache Coherency

Answer

A

reads result in copies of data in local caches
writes invalidate cache lines on other cores
communication between cores is fairly expensive (happens through shared
L3 or main memory)

Question 4

Q

x86-64 Memory Model

Answer

A

x86-64 implements Total Store Order (TSO), which is a strong memory model
* this means that x86 mostly executes the machine code as given
* loads are not reordered with respect to other loads, writes are not reordered
with respect to other writes
* however, writes are buffered (in order to hide the L1 write latency), and reads
are allowed to bypass writes
* a fence or a locked write operations will flush the write buffer (but will not
flush the cache)
* important benefit from TSO: sequentially consistent loads do not require
fences

Question 5

Q

Concurrent List-Based Set

Answer

A

keys are stored in a (single-)linked list sorted by key
head and tail are always there (unendlichkeit elements)

Question 6

Q

Coarse-Grained Locking

Answer

A

use a single lock to protect the entire data structure
+ very easy to implement
− does not scale at all

Question 7

Q

Lock Coupling

Answer

A

hold at most two locks at a time
* interleave lock acquisitions/releases pair-wise
* read/write locks allow for concurrent readers
+ easy to implement
+ no restarts
− does not scale

Question 8

Q

Optimistic Lock Coupling

Answer

A

associate lock with update counter
write:
acquire lock (exclude other writers)
increment counter when unlocking
do not acquire locks for nodes that are not modified (traverse like a reader)
read:
do not acquire locks, proceed optimistically
detect concurrent modifications through counters (and restart if necessary)
can track changes across multiple nodes (lock coupling)
+ easy to implement
+ scalable
− restarts

Question 9

Q

Terminology: Non-Blocking Algorithms

Answer

A

killing a thread at any point of time should not affect consistency of the data
structure (this precludes locks)
non-blocking data structures may be beneficial for (soft) real-time
applications
classification (from strong to weak):
wait-free: every operation is guaranteed to succeed (in a constant number of
steps)
lock-free: overall progress is guaranteed (some operations succeed, while
others may not finish)
obstruction-free: progress is only guaranteed if there is no interference from
other threads

Question 10

Q

Lock-Free List

Answer

A

insert and remove are lock-free, contains is wait-free
remove: marks node for deletion, but does not physically remove it
marker is stored within the next pointer (by stealing a bit of the pointer)
contains traverses marked nodes (but needs same check as Lazy variant)
+ contains always succeeds
+ scalable
− insert/remove may restart

Question 11

Q

Lazy

Answer

A

contains is wait-free
insert/remove traverse list only once (as long as there is no contention)
add marker to each node for logically deleting a key

+ usually only one traversal necessary
+ scalable
− insert/remove have to lock

Question 12

Q

Locks

Answer

A

locks are usually implemented using atomics
waiting and suspending threads is an interesting problem 4
tradeoff between speed on ’fast path’ and resource usage on ’slow path’

Question 13

Q

Memory Reclamation for Lock-Free Data Structures

Answer

A

after deleting a node in a lock-free data structure, lock-free readers might
still be accessing that node
* freeing/reusing that node immediately can cause correctness problems
and/or crashes

Question 14

Q

Reference Counting

Answer

A

associate every node with an atomic counter
effectively results in similar behavior as locks
+ easy to use
− does not scale

Question 15

Q

Epoch-Based Memory Reclamation

Answer

A

global epoch counter (like timestamp, but incremented infrequently)
per-thread, local epoch counters
before every operation: enter epoch by setting local epoch to global epoch
after every operation: set local epoch to ∞ indicating that this thread is not
accessing any node

+ fairly low overhead, scales well
+ no need to change data structure code (simply wrap operations)
− a single slow/blocking thread may prevent any memory reclamation

Question 16

Q

Traditional Hazard Pointers

Answer

A

Pointers for lock free operations

observation: most lock-free operations only require a bounded number of
node pointers at any point in time: hazard pointers
during traversal, one can store these into a limited number of thread-local
locations
hazard pointers effectively announce to other threads “I am currently
accessing this node, do not reclaim it”
after storing the hazard pointer, one may need to check reachability again
before physically reclaiming a node, check if any thread has a hazard pointer
referencing that node
normally the hazard pointer stores have to be sequentially consistent
+ non-blocking
− high overhead: stores to HP must be sequentially-consistent
− invasive: requires some data structure changes

Question 17

Q

Implementing Shared Concurrent Counters

Answer

A

alternative 1: shared global atomic counter
alternative 2: per-thread counters, add up on demand
alternative 3: approximate counter storing the logarithm of the counter value
(incremented probabilistically)

Question 18

Q

ART: lock coupling

Answer

A

easy to apply to ART
modifications only change 1 node and (potentially) its parent
additional lock for root note changes
can use read/write locks to allow for more concurrency

Question 19

Q

ART: optimistic lock coupling

Answer

A

fairly straightforward: add lock and version to each node
how to detect that root node changed?
1. extra optimistic lock for root pointer (outside of any node)
2. always keep the same Node256 as root node (similar to static head element in
list-based set)
3. after reading the version of the current root, validate that the root pointer still
points to this node

Question 20

Q

ART with ROWEX

Answer

A

+ wait-free lookups
− much more complicated than OLC, requires invasive changes

Question 21

Q

B tree: lock coupling

Answer

A

must detect root node changes (e.g., additional lock)
* structure modification operations may propagate up multiple levels:
* eagerly split full inner nodes during traversal (good solution for fixed-size keys)
* restart operation from the root holding all locks

Question 22

Q

B tree: optimistic lock coupling

Answer

A

writers usually only lock one leaf node (important for scalability)
additional issues due to optimism:
infinite loops: one has to ensure that the intra-node (binary) search terminates
in the presence of concurrent modifications
segmentation faults/invalid behavior: a pointer read from a node may be
invalid (additional check needed)
OLC is a good technique for B-trees

Question 23

Q

B-link tree

Answer

A

B-tree synchronization with only a single lock at a time (no coupling)
observation: as long as there are only splits (no merges), the key that is
being searched may have moved to a right neighbor
solution: add next pointers to inner and leaf nodes, operations may have to
check neighbor(s)
fence keys may help determining whether it is necessary to traverse to
neighbor
the B-Link tree idea was very important when data was stored on disk but is
also sometimes used for in-memory B-tree synchronization (

Question 24

Q

B-tree Contention Split

Answer

A

for skewed workloads, lock contention at some lock can become a problem
* contention may be caused by page granularity rather than one very hot entry
* contention split optimization: if lock cannot be acquired on first try, with a
certain probability split node even if it is not full

Question 25

Q

Bw tree

Answer

A

Latch-free B+Tree index
→ Threads never need to set latches or block.
Key Idea #1: Deltas
→ No updates in place
→ Reduces cache invalidation.
Key Idea #2: Mapping Table
→ Allows for CAS of physical locations of pages.

Question 26

Q

Bw tree

Answer

A

Latch-free B+Tree index
→ Threads never need to set latches or block.
Key Idea #1: Deltas
→ No updates in place
→ Reduces cache invalidation.
Key Idea #2: Mapping Table
→ Allows for CAS of physical locations of pages.

Question 27

Q

bw-tree: structure modifications

Answer

A

Split Delta Record
→ Mark that a subset of the base page’s key range is now
located at another page.
→ Use a logical pointer to the new page.
Separator Delta Record
→ Provide a shortcut in the modified page’s parent on what
ranges to find the new page.

Question 28

Q

Lock-Free Hash Table: Split-Ordered List

Answer

A

all entries are stored in a single lock-free list (see lock-free list-based set)
sorted by the hash value of the key

Question 29

Q

Recursive Split Ordering

Answer

A

key idea: store keys ordered by bit-wise reverse hash

Question 30

Q

Deletion

Answer

A

problem: deleting a node using CAS pointed to from a bucket does not work
(because it is also being pointed to from the list)
solution: for every split bucket, insert a special sentinel node into the list