Lesson 6: Replication Flashcards
What is the goal of replication?
The system maintains the same state at more than one location.
Replication allows the service to be provided from more than one node.
What are the benefits of replication?
- fault-tolerance - even when there is a transient or permantent failure in the system, the service can continue. e.g., backup, disaster recovery
- availability
- scalability - when demand increases, system can continue to serve requests without service degredation
What is active replication?
Each replica serves requests & ensures replication of updates.
What is stand-by (primary-backup) replication?
Only one replica is active at any point in time. Others are kept in a consistent state so fast failover can be achieved. The primary replica serves requests & ensures replication of updates.
What is the state replication technique?
Execute updates on one replica, copy state changes to update other replicas.
Pros/Cons
+ no need to re-execute multiple times
- state may be large or hard to identify where all updates are
What is the replicated state machine technique?
Copy each operation (log of operations) to each replica and execute to produce the same update. This works if we expect the output of the executions to be deterministic (i.e., the same operation produces the same result)
Pros/Cons
+ no need to send large state, operation logs may be smaller
- must re-execute and ensure deterministic execution
What is chain replication?
Chain replication is a strategy for reducing the performance overhead of replication.
On write request, each replica write updates to the next replica in the chain (as opposed to the leader communicating with all replicas).
Read requests are always served from the tail (last replica in the chain) so they’re guaranteed to the latest committed update.
Read requests are NOT served from the middle because it’s not committed until it’s executed on the tail, so if the update fails, then we’ve served data that’s actually been rolled back.
Pros/Cons
+ greater leader scalability. fewer messages per replication at leader
+ higher write throughput: uses pipelining
+ strong consistency possible: reads guaranteed to return successfully committed writes
- many workloads are read-heavy
- inefficiency: intermediate nodes may be underutilized (especially in read-heavy workloads)
What is Chain Replication with Apportioned Queries (CRAQ)?
A modification to Chain Replication where reads are divided among the chain replicas.
Replicas store both the old and new versions of the data. When a read request comes in, the replica checks with the tail and if it has the new version, it serves the new version, otherwise it serves the old version.