5. Replication Flashcards

1
Q

Purpose of replication

A
  1. Availability - handle some machine failure
  2. Increase read throughput - scale out machines that can serve read request.
  3. Reduce latency - Keep data geographically close to users
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Approaches of replication

A
  1. Single-leader replication
    Clients send all writes to a single node (the leader), which sends a stream of data change events to the other replicas (followers). Reads can be performed on any replica, but reads from followers might be stale.
  2. Multi-leader replication
    Clients send each write to one of several leader nodes, any of which can accept writes. The leaders send streams of data change events to each other and to any follower nodes.
  3. Leaderless replication
    Clients send each write to several nodes, and read from several nodes in parallel in order to detect and correct nodes with stale data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Consistency models to deal with replication lag.

A
  1. Read-after-write consistency
    Users should always see data that they submitted themselves.
  2. Monotonic reads
    After users have seen the data at one point in a time, they shouldn’t later see the data from some earlier point in time.
  3. Consistent prefix reads
    Users should see the data in a state that makes causal sense: for example, seeing a question and its reply in the correct order.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Approaches of replication - Single-leader replication

Compare sync and async replication

A

Sync:
Pros: Followers guarantee to have up-to-date copy of data
Cons: Follower might block the write

Async:
Pros: Non blocking. Fast
Cons: Data lost if lead fails

In practical, one lead, one sync follower, all others are async. (The only one sync follower is to handle the failure of the lead)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Consistency models to deal with replication lag - 1. Reading your own writes

A
  1. When reading something user may have modified, read it from the leader.
  2. Within one minute after the update, make all reads from the leader
  3. Compared to the timestamp when user made the update, serve any reads with update after that. Try other replica or just wait.

Additional complexity when handling cross-device

  • Client timestamp is not sharable
  • Two devices might route to different datacenter, thus you need to route requests to the same leader.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Consistency models to deal with replication lag - 2. Monotonic reads

A

Problem: Users see things moving backward in time due to async followers.

Implementation: Make sure each user always makes reads from same replica. e.g. - user id based hashing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Consistency models to deal with replication lag - 3. Consistent prefix reads

A

Problem: A third person observed the conversation in wrong order.

Solve: Make sure any writes that are causally related to each other are written to the same partition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Implementation of replication logs

A
  1. Statement-based replication
    Leader logs every write request (statement) that it executes and sends it to followers.
  2. Write-ahead log (WAL) shipping
    Reuse the log when write to disk
  3. Logical (row-based) log replication
    Similar to 2, but use a different format of log
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Approaches of replication - Multi-leader replication

Use case

A
  1. Multi-datacenter operation
  2. Clients with offline operation
  3. Collaborative editing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Approaches of replication - Multi-leader replication

Handling write conflicts

A
  1. If sync conflict detection, just use single leader replication
  2. Conflict avoidance
    Ensure all writes for a particular record go through the same leader
  3. Converging toward a consistent state
    Last write wins (LWW)
  4. Custom conflict resolution logic
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Approaches of replication - Multi-leader replication

Topologies

A
  1. Circular
  2. Star (tree)
  3. All-to-all (graph)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Approaches of replication - Leader-less replication

A

w + r > n, w and r determines how many nodes we wait for.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly