Module 9b - Consistency and Replication (part 2) Flashcards
Many commercial databases use “primary-based replication” protocols. What are primary-based replication protocols?
Protocols in which all updates are executed by a designated primary replica and then pushed to one or more backup replicas.
OR
Primary-based protocols require that each data item have a primary copy (or home) on which all writes are performed - backups inherit these updates from the primary protocol
Protocols in primary-based replication can be classified as “remote-write” or “local-write”. What does remote-write mean?
What does the workflow look like?
The primary replica is stationary and therefore data must be updated remotely by the backup servers
Workflow for remote write:
- Write request for item x (goes to backup)
- Forward request to primary, primary writes x
- Tell backups to update & write x
- Acknowledge that the update has been completed by backups
- Acknowledge to client that the write has been completed
Protocols in primary-based replication can be classified as “remote-write” or “local-write”. What does local-write mean?
What does the workflow look like?
The primary replica is migrates from server to server, allowing clients to perform updates to their local replica
Workflow for local write:
- Write request for item x (goes to client’s backup)
- Move item x to new primary (which is the client’s backup)
- Acknowledge write completed to client
- New primary tells backups to update
- Acknowledge to new primary that backups have updated x
In Primary-based protocols, if the ______ replica fails, then one of the ______ replicas may take over as the new ______. Accurate _______ detection is necessary to prevent ______ situations
primary backup primary failure split-brain
What is the benefit & drawbacks of forcing all updates through a primary replica?
Benefit:
Makes it possible to implement strong consistency models such as sequential consistency & linearizability
Drawbacks:
- Can lean to performance bottlenecks
- Temporary loss of availability when the primary fails
______ protocols allow replicas to receive updates such that each update must be accepted by a sufficiently large ______ of replicas.
Quorum-based
subset
Quorum systems improve ______ of ______ data. Every time a group of servers needs to agree on something, a ______ is involved in the decisions
consistency
replicated
quorum
Read-write quorums define two parameters n_R and n_W. What do these two mean? What are they signifying?
n_R is the minimum number of replicas that must participate in a read operation. These are the “read-quorums”
n_W is the minimum number of replicas that must participate in a write operation these are the “write quorums”
What are read-quorums and write-quorums?
read-quorums: The subset of all replicas which are involved in reading
write-quorums: The subset of all replicas which are involved in writing
In distributed databases, read and write quorums must satisfy 2 rules of overlap. What are they?
- The read and write quorums must overlap: n_R + n_W > N
- Two write quorums must overlap: n_W + n_W > N
Rule 2 means that at least half of the replicas must be write quorums, this enables detection of write-write conflicts
In Quorum-based protocols, what does ensuring that read and write quorums overlap enable?
Enables detection of read-write conflicts.
All read-quorums will be consistent with each other, and all write-quorums will be consistent with each other. Therefore, there is no opportunity for read-write conflicts & the execution is guaranteed to be sequentially consistent
What does ensuring that two write-quorums overlap enable?
Enables detection of write-write conflicts
In Quorum-based protocols, what constraint do we have on N (the number of protocols)?
(not in relation to N_r and N_w)
N (number of replicas) must be odd.
Correction: it is “usually” chosen as odd
In Quorum-based protocols, what constraint do we have on n_R, n_W and N with respect to each other?
- n_R + n_W > N
- n_W + n_W > N
- n_W > 0
- n_R > 0
- N is odd
What does ROWA stand for? and what is a ROWA scheme in quorum-based protocols?
ROWA - read one, write all
When you have n_R =1 and n_W = N
partial-quorums can be configured to provide various degrees of _____ by changing ____ and _____.
consistency
n_R
n_W
What is the difference between strong and weak consistency in distributed systems?
Strong consistency: The data in all replicas is the same at any time. If key x is read from replica A and B at the same time, they should return the same value
Weak consistency: There is no guarantee that all replicas have the same data at any time.
In partial-quorums, how can adjusting n_R and n_W provide strong or weak consistency?
if n_R + n_W > N, then the system will have strong consistency
if n_R + n_W <= N, then the system will have weaker consistency - depending on n_R and n_W
In _____ consistency mode, the system cannot detect read-write conflicts, nor write-write conflicts
weak
What is the “last write wins” policy?
Whenever you have 2 writes incoming into a system at the same time, their timestamps are used to resolve which one will be used. The later one will be the one which is used
To resolve ______ conflicts, updates are tagged with ______, and a ______ policy is applied
write-write
timestamps
resolution
In Quorum-based protocols, whenever the subset of replicas to not satisfy the 2 rules of overlap for (strict) quorums, then they are referred to as _____ ______.
Note that the 2 rules of overlap are:
n_R + n_W > N
n_W + n_W > N
partial quorums
Describe the difference between full replication and partial replication in databases
Full replication: the full database is stored in each replica (all data is duplicated)
Partial replication: only a fragment of the database is stored in a replica, just like sharding. Frequently used fragments may be duplicated.
Suppose “n” denotes the number of replicas for one data object. If n == number of replicas, then what type of replication is this scheme using?
Full replication. Every server has a copy of the data object.
When the replication factor is less than the total number of servers, this is known as _____ replication
partial
_____ replication allows us to increase the effective storage capacity of the system through the addition of _____ while keeping the ______ ______ constant.
partial
servers
replication factor
When the number of servers/replicas is larger than the replication factor (partial replication), then what does each server/replica store?
a fragment/subset of the data used of the system
What is eventually-consistent replication?
Whenever a read or write is issued to a distributed system, it is resolved to the nearest replica. This replica is responsible for propagating the message to the remaining replicas
In an ______ _______ replication system, a server that receives an update will reply with an ________ to the client first, and then propagate ________ to the remaining replicas
eventually-consistent
acknowledgement
lazily/asynchronously
What happens in an eventually-consistent replication system when an update is being propagated, and a replica is unreachable? How do they reach consistency?
It can be updated later using an anti-entropy mechanism. This can be replicas periodically exchanging hashes of data to detect discrepancies.
What do eventually consistent systems do to ensure that data is consistent across replicas?
Periodically, replicas exchange hashes of data to detect discrepancies, using merkle/hash trees.
Timestamps are used to tell which update is the latest.
In eventually consistent systems, how do replicas determine what is the latest version of a data object?
Using timestamps. The largest timestamp is the correct version
What is the purpose of merkle trees (or hash trees) in eventually-consistent systems?
The trees are exchanged between replicas to compare and update versions of data. The trees act as a compact version of the data, and allows the replicas to find the source of error.
In an eventually-consistent system, what is a “stale” read?
Whenever a client connects to replica which has not yet received the latest version of a data object, and this replica returns the old version of the object
Merkle trees are used to allow replicas to efficiently compare values of data objects. Describe the structure of these trees.
The leaf of the tree has the raw data blocks, and each parent of a node in the tree contains the concatenation of the hashes of their child nodes. This makes it efficient to compare hashes between replicas