Data Intensive Ch9 - Consistency and Consensus Flashcards
Context map
Timestamp ordering
Lamport timestamps
Causal ordering
Total order broadcast
Distributed transactions
Linearizability
Compare-and-set
Global constraints
Failure detectors - membership services
Summary
Linearizability - consistency model
Goal - make replicated data appear as though there was only a single copy; make all ops appear as if they acted on that copy atomically
Totally ordered, single timeline
Variable in single threaded alike; slow; susceptible to network delays
Causality - imposes ordering on events - what happened before what; cause and effect
Weaker consistency model - some things can be concurrent
Timeline with branching & merging
Less sensitive to network problems - less coordination overhead
Can be captured using Lamport timestamps
Causality not sufficient for ensuring name uniqueness fex
Consensus required
Achieving consensus - deciding something in a way all nodes agree on what was decided - decision is irrevocable
Many problems are reducible to consensus
- > Linearizable compare-and-set (atomically decide whether to set value based on current value and passed)
- > Atomic distributed trx commit
- > Total order broadcast (decide order in which to deliver messages)
- > Lock/lease (decide who acquired one)
- > membership/coordination service (given failure detector like timeout decide which node is alive)
- > Uniqueness constraint (decide who wrote the value first and who should abort)
Consensus is easy in single node environment or only a single node can decide. Single leader database is an example. SPOF; can be handled by:
- waiting for the leader to recover (what if they don’t? termination property violated - system blocked forever)
- Consensus by act of God -> manual fail over - admin chooses new leader and reconfigures the cluster
Humans are SLOW
- Algorithmically auto select new leader - CONSENSUS ALG
Single leader DB provides linearizability for writes without consensus but requires it to maintain its leadership!
Leader “KICKS THE CAN DOWN THE ROAD”
Outsourcing consensus, failure detection & membership service using ZooKeeper and similar
Easier than implementing custom algorithms able to withstand: partial failure (network packet lost), process pauses, clock drifts
Not all systems require consensus - leaderless, multi-leader rely on handling conflicts instead
Causality is enough no need for linearizability
Idea behind building fault-tolerant systems
Find general-purpose abstractions with useful guarantees, implement them once and let apps rely on them.
Database transactions as an example - apps can pretend there’re no crashes (atomicity), nobody accesses db at the same time (isolation), storage devices are reliable (durability)
Eventual consistency revisited
For replicated databases - if you stop writing to DB for some UNSPECIFIED length of time then all reads will finally return the same value
Weak guarantee - no guarantees when replicas converge
No guarantees for reading your own writes
Problems appear only when there is high concurrency or network issues.
Distributed consistency vs transaction isolations
Trx isolation primarily about avoiding race conditions due to concurrently executing trx
DC - coordinating state of replicas in the face of delays and faults
Lenearizability
Atomic consistency, strong consistency, immediate consistency, external consistency
Idea: All clients have the same view of the data as if there is only one replica. No worries about replication lag
Constraint - after any read has returned a new value ALL following reads from any replica must also return the new value
Imagine there is some point in time (between start and end of the write) where value of x flips from old to the new value everywhere
Linearizability vs Serializability
Different guarantees
Serializability - trx isolation property
Every trx may read/write many OBJECTS
Trx behave though as if they had executed in SOME serial order. Serial order may not be the order in which trx were actually run.
Linearizability - guarantees on reads and writes of an INDIVIDUAL object
Does not group operations into trx
Does not prevent write skews
2PL, literal serial execution are typically linearizable as well
SSI is not - reads are made from consistent snapshot
Snapshot does not include more recent writes than the snapshot
Where do we need linearizability
Locking/leader election
Only single leader allowed (no split brain)
Once lock is acquired all reads must acknowledge that
Uniqueness and other constraints
Same as acquiring a lock for given name
Bank account cannot go below zero
Cross-channel timing dependencies
Image resizer - queue fetching an image from a storage. This may happen before new version of an image is stored.
Which replication methods are linearizable?
Single leader - potentially yes - reads made from leader or sync update followers. If snapshot isolation used then nop.
If delusional leader continues to serve requests - likely to violate
Async replication, failover - committed writes can be lost - violates durability and linearizability
Consensus - we can implement linearizable storage using it
Multi-leader - nope, concurrently process writes on multiple nodes; async replication to other nodes; this produces conflicts
Leaderless - probably not - quorum reads not really
Strict quorum does not help (w+r>n) because of variable network delays (we can read from the 2 replicas that didn’t get an update yet and other reader can read from the one value was written initially)
Fig 9-6 p334
How to make Dynamo-style quorums linearizable?
Reader must perform a sync read repair
Writer must read the latest state of a quorum of nodes before sending its writes
Cost - reduced performance
p335
Why is CAP theorem unhelpful?
The definition “Consistency, Availability, Partition (tolerance) - pick 2 out of 3” is misleading
Partitioning is a network fault which is not chosen - it just happens
If network is working correctly - system can provide consistency (linearizability) and total avaiability but when network fails - you choose either of the two.
Recommended to avoid CAP
Why are RAM and multicore CPU not linearizable system?
One Thread writing to a variable to the memory and other reading it shortly afterwards - read can be stale due to use of cache.
Trade-off for LATENCY
Linearizability is dropped for performance, not fault toerance.
Linearizability is slow always, not only during a network fault.
Using CAP here makes no sense
C is dropped yet we don’t expect system to be available if there is a partitioning (CPU disconnected from the system)
Causal consistency
System obeys the ordering imposed by causality
cause comes before effect, message is sent before it is received, quest is asked before it’s answered
Causal order - what happened before what
Examples:
Snapshot isilation - consistent snapshot as causally consistent
Causal order is a partial order
2 concurrent events neither is greater or lower
Strongest possible consistency model that does not slow down due to network delays and remains available in the face of network failures!
Which kind of order is linearizability?
Total order
IF system behaves as if there is single copy of data and all operations are atomic then for any 2 opeartions one must have happened before the other
There is no concurrency in linearizable datastore as there is always a single, linear timeline
Linearizability vs causality
Linearizability implies causailty
Capturing causal dependencies
If. replica processes operation then it must ensure ALL causaully preceding operations (which happened before) have already been processed. Otherwise processing must be postponed.
System needs a way to know which value system has seen when it made given write.
If node seen X when it write Y then X is causally related to Y
Causal dependencies must be tracked for across entire database (all objects) not just a single object.
Version vectors can be used for this
DB needs to know which version of data was read by the application; Snapshot version numbers being passed on a write.