Module 10d - RAFT Flashcards
What does RAFT stand for?
Replicated
And
Fault
Tolerant
What does consensus mean in the context of RAFT?
Consensus:
- Allows collection of machines to work as coherent group
- Continuous service, even if machines fail
What are some disadvantages with Paxos when compared to RAFT?
- Paxos is harder to understand
- Paxos is incomplete, it only agrees on a single value
- Paxos is inefficient
What are the 3 main properties of RAFT
- Leader election:
- Selects one server to act as leader
- If a crash is detected, it chooses a new leader - Log replication
- Leader accepts commands from clients, appends to its log
- Leader replicates its log to other servers - Safety
- Keep logs consistent
- Only servers with up-to-date logs can become leader
What 3 states do servers have in RAFT?
- Follower (servers start as this)
- Candidate
- Leader
In RAFT, how does a Follower server become a Candidate server?
What does a candidate server do?
- Follower becomes a Candidate if there’s no heartbeat
- Candidates issue RequestVote RPCs to get elected as a leader
In RAFT, how does a Candidate server become a Leader server?
What does a Leader server do?
- Candidate server becomes a leader if it wins an election
- Leader server issues AppendEntries RPCs to replicate its log to other servers & uses heartbeats to maintain leadership
In RAFT, how many leaders per term? Give an upper and lower bound. Explain briefly
There can be either 1 or 0 leaders per term.
1 leader if there’s a successful election, and 0 if there’s a failed election
In RAFT, how does a Leader or a Candidate server become a follower server?
What does a Follower server do?
Leader or Candidate servers become followers by discovering a higher term
Follower servers are passive, but they expect regular heartbeats
In RAFT, each server maintains its current term value, and it is exchanged in every RPC.
From the perspective of the server:
1. What happens if a peer has a later term?
2. What happens if an incoming RPC has an obsolete term?
- Peer has a later term? Update term, and then revert to being a follower
- If the incoming RPC is from a peer with an obsolete term, then reply with an error
RAFT election correctness has some safety mechanisms.
How many votes does each each server give?
For what case would there be 0 leaders?
- Each server gives only one vote per term
- A majority vote is required to win an election (at least 50%)
- Whenever the election leads to a failed election, by no majority vote from the other servers.
RAFT’s election correctness has a liveness property that some candidate must eventually win an election. This is done with some randomness property - which is simpler than ranking
How does this work?
- There are election timeouts, which are randomly chosen between 150 and 300 ms
- Whichever server times out first wins the election
- Works well if broadcast time is low
In RAFT, what happens in a normal series of operations step-by-step?
- Client sends command to leader
- Leader appends command to its log
- Leader sends AppendEntries RPCs to all followers
- Leader executes command in its state machine, returns result to client
- Leader notifies followers to execute committed commands in their state machines
In RAFT, what does the leader do in an operation if there are crashed or slow followers?
In the event of a failed operation, leader retries AppendEntries RPCs until they succeed
How does RAFT ensure that the entries are not lost in the event of a crash?
Entries are stored on disk