LU6 Consistency and Replication Flashcards
What is replication in distributed systems?
Replication is copying data from one host to another to increase reliability and performance.
Why is data replication used?
To improve system reliability (e.g., disaster recovery) and performance (e.g., reducing latency).
What is a replica in distributed systems?
A copy of the same data stored on multiple nodes.
What is the main challenge of data replication?
Maintaining consistency across all replicas.
because at the mean time of replicating, NEW data will come in.
What is a read-write conflict?
A situation where a read and a write operation occur concurrently on the same data.
What is a write-write conflict?
When two concurrent write operations occur on the same data.
What happens when global ordering on conflicting operations is enforced?
It can degrade system scalability due to high synchronization costs.
What is a solution to avoid costly global synchronization?
Opt in weaken consistency model.
propagate the NEW data instead of instantly
(e.g. profile picture)
What is replica consistency?
Ensuring that all copies of data are consistent across nodes.
What is a distributed transaction?
read or write data on multiple nodes.
- Either all nodes must commit, or all must abort
- If any node crashes, all must abort
What is the atomic commitment problem?
Ensuring that either all nodes commit or all abort a distributed transaction.
What is the two-phase commit (2PC) protocol?
A protocol ensuring atomic commitment by having all nodes vote to commit or abort.
What is read-after-write consistency?
Ensuring a client can read the value it just wrote.
What is a quorum in replication?
A subset of replicas required to perform read or write operations.
What is the typical quorum size for n replicas?
(n+1)/2 for both read and write quorums.
What is read repair in replication?
A mechanism where the client helps propagate the most recent data to other replicas.
- Read repair fixes data inconsistencies during read operations.
- If a read finds different versions of data on different replicas, the system updates to the most recent one.
- This ensures future reads get the latest data (consistency).
What happens if the coordinator crashes in 2PC?
- If the coordinator crashes after writing the decision to disk: Upon recovery, it will read the decision from disk and resend it to the participants.
- If the coordinator crashes before writing the decision to disk: Participants will eventually time out, and in the absence of any decision, the system will abort to ensure consistency.
- If the coordinator crashes after receiving all ‘OK’ votes but before sending the decision: Participants will be in an uncertain state and must wait for the coordinator to recover or use a termination protocol. If no decision can be determined, the system will default to abort.
What is Linearizability?
A strong consistency model where every operation appears instantaneous between its start and finish.
- respect causality (real-time ordering)
How does linearizability differ from serializability?
- Linearizability requires real-time ordering
- Serializability only requires a consistent order.
What is eventual consistency?
A model where replicas become consistent over time if no updates occur.
What is the read-your-writes consistency model?
Ensures a client can always read its previous writes.
What is monotonic reads consistency?
Ensure reading the current/latest version of data (prevent reading older data).
What is monotonic writes consistency?
Ensures each new write happens after the previous one in the same session. (respecting the causality)
ensures that write operations by a single client are applied in the order they were issued. If a client performs multiple writes, the system guarantees that later writes won’t be applied before earlier ones across all replicas, preserving the write sequence even in the presence of replication delays or failures.
What is writes-follow-reads consistency?
A write operation follows a read and updates based on the most recent value read.