5. Replication Flashcards

Question 1

Q

Purpose of replication

Answer

A

Availability - handle some machine failure
Increase read throughput - scale out machines that can serve read request.
Reduce latency - Keep data geographically close to users

Question 2

Q

Approaches of replication

Answer

A

Single-leader replication
Clients send all writes to a single node (the leader), which sends a stream of data change events to the other replicas (followers). Reads can be performed on any replica, but reads from followers might be stale.
Multi-leader replication
Clients send each write to one of several leader nodes, any of which can accept writes. The leaders send streams of data change events to each other and to any follower nodes.
Leaderless replication
Clients send each write to several nodes, and read from several nodes in parallel in order to detect and correct nodes with stale data.

Question 3

Q

Consistency models to deal with replication lag.

Answer

A

Read-after-write consistency
Users should always see data that they submitted themselves.
Monotonic reads
After users have seen the data at one point in a time, they shouldn’t later see the data from some earlier point in time.
Consistent prefix reads
Users should see the data in a state that makes causal sense: for example, seeing a question and its reply in the correct order.

Question 4

Q

Approaches of replication - Single-leader replication

Compare sync and async replication

Answer

A

Sync:
Pros: Followers guarantee to have up-to-date copy of data
Cons: Follower might block the write

Async:
Pros: Non blocking. Fast
Cons: Data lost if lead fails

In practical, one lead, one sync follower, all others are async. (The only one sync follower is to handle the failure of the lead)

Question 5

Q

Consistency models to deal with replication lag - 1. Reading your own writes

Answer

A

When reading something user may have modified, read it from the leader.
Within one minute after the update, make all reads from the leader
Compared to the timestamp when user made the update, serve any reads with update after that. Try other replica or just wait.

Additional complexity when handling cross-device

Client timestamp is not sharable
Two devices might route to different datacenter, thus you need to route requests to the same leader.

Question 6

Q

Consistency models to deal with replication lag - 2. Monotonic reads

Answer

A

Problem: Users see things moving backward in time due to async followers.

Implementation: Make sure each user always makes reads from same replica. e.g. - user id based hashing.

Question 7

Q

Consistency models to deal with replication lag - 3. Consistent prefix reads

Answer

A

Problem: A third person observed the conversation in wrong order.

Solve: Make sure any writes that are causally related to each other are written to the same partition.

Question 8

Q

Implementation of replication logs

Answer

A

Statement-based replication
Leader logs every write request (statement) that it executes and sends it to followers.
Write-ahead log (WAL) shipping
Reuse the log when write to disk
Logical (row-based) log replication
Similar to 2, but use a different format of log

Question 9

Q

Approaches of replication - Multi-leader replication

Use case

Answer

A

Multi-datacenter operation
Clients with offline operation
Collaborative editing

Question 10

Q

Approaches of replication - Multi-leader replication

Handling write conflicts

Answer

A

If sync conflict detection, just use single leader replication
Conflict avoidance
Ensure all writes for a particular record go through the same leader
Converging toward a consistent state
Last write wins (LWW)
Custom conflict resolution logic

Question 11

Q

Approaches of replication - Multi-leader replication

Topologies

Answer

A

Circular
Star (tree)
All-to-all (graph)

Question 12

Q

Approaches of replication - Leader-less replication

Answer

A

w + r > n, w and r determines how many nodes we wait for.

5. Replication Flashcards

(12 cards)