Lesson 5: Consensus in Distributed Systems Flashcards
What is consensus?
Agreement among distributed processes:
- on a value, action, timestamp, …
- on the outcome of a transaction e.g., bank transaction
Reaching a consensus makes it possible for the system to be correct and is critical for making forward progress.
What’s hard about reaching consensus?
- non-determinism (there can be multiple things happen at any given point in time and the order of these operations may be different across executions)
- lack of global time
- network delays
- malicious behavior
What are the key properties of consensus?
- Termination / Liveness Property: all non-faulty processes eventually decide on a value (termination/liveness)
- Agreement / Safety Property: all processes decide on a the same value
- Validity / Safety Property: the value that’s decide on must have been proposed by some process
Is consensus really impossible?
While the FLP work suggests that it’s impossible to guarantee consensus when there’s a fault in the system, in reality we have several consensus protocols to ensure correctness:
- 2 phase commit
- 3 phase commit
- paxos
- raft
These protocols don’t contradict the FLP result, but they change the assumptions / properties of the system model. e.g., will the protocol (always) terminate? Under which conditions will it provide consensus?
What is the FLP theorem?
FLP Theorem proves that in a system with 1 faulty processor and reordered and arbitrarily delayed messages, it is impossible to guarantee that a consensus can be reached.