5. Replication Flashcards
Purpose of replication
- Availability - handle some machine failure
- Increase read throughput - scale out machines that can serve read request.
- Reduce latency - Keep data geographically close to users
Approaches of replication
- Single-leader replication
Clients send all writes to a single node (the leader), which sends a stream of data change events to the other replicas (followers). Reads can be performed on any replica, but reads from followers might be stale. - Multi-leader replication
Clients send each write to one of several leader nodes, any of which can accept writes. The leaders send streams of data change events to each other and to any follower nodes. - Leaderless replication
Clients send each write to several nodes, and read from several nodes in parallel in order to detect and correct nodes with stale data.
Consistency models to deal with replication lag.
- Read-after-write consistency
Users should always see data that they submitted themselves. - Monotonic reads
After users have seen the data at one point in a time, they shouldn’t later see the data from some earlier point in time. - Consistent prefix reads
Users should see the data in a state that makes causal sense: for example, seeing a question and its reply in the correct order.
Approaches of replication - Single-leader replication
Compare sync and async replication
Sync:
Pros: Followers guarantee to have up-to-date copy of data
Cons: Follower might block the write
Async:
Pros: Non blocking. Fast
Cons: Data lost if lead fails
In practical, one lead, one sync follower, all others are async. (The only one sync follower is to handle the failure of the lead)
Consistency models to deal with replication lag - 1. Reading your own writes
- When reading something user may have modified, read it from the leader.
- Within one minute after the update, make all reads from the leader
- Compared to the timestamp when user made the update, serve any reads with update after that. Try other replica or just wait.
Additional complexity when handling cross-device
- Client timestamp is not sharable
- Two devices might route to different datacenter, thus you need to route requests to the same leader.
Consistency models to deal with replication lag - 2. Monotonic reads
Problem: Users see things moving backward in time due to async followers.
Implementation: Make sure each user always makes reads from same replica. e.g. - user id based hashing.
Consistency models to deal with replication lag - 3. Consistent prefix reads
Problem: A third person observed the conversation in wrong order.
Solve: Make sure any writes that are causally related to each other are written to the same partition.
Implementation of replication logs
- Statement-based replication
Leader logs every write request (statement) that it executes and sends it to followers. - Write-ahead log (WAL) shipping
Reuse the log when write to disk - Logical (row-based) log replication
Similar to 2, but use a different format of log
Approaches of replication - Multi-leader replication
Use case
- Multi-datacenter operation
- Clients with offline operation
- Collaborative editing
Approaches of replication - Multi-leader replication
Handling write conflicts
- If sync conflict detection, just use single leader replication
- Conflict avoidance
Ensure all writes for a particular record go through the same leader - Converging toward a consistent state
Last write wins (LWW) - Custom conflict resolution logic
Approaches of replication - Multi-leader replication
Topologies
- Circular
- Star (tree)
- All-to-all (graph)
Approaches of replication - Leader-less replication
w + r > n, w and r determines how many nodes we wait for.