Module 9b - Consistency and Replication (part 2) Flashcards

1
Q

Many commercial databases use “primary-based replication” protocols. What are primary-based replication protocols?

A

Protocols in which all updates are executed by a designated primary replica and then pushed to one or more backup replicas.

OR

Primary-based protocols require that each data item have a primary copy (or home) on which all writes are performed - backups inherit these updates from the primary protocol

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Protocols in primary-based replication can be classified as “remote-write” or “local-write”. What does remote-write mean?

What does the workflow look like?

A

The primary replica is stationary and therefore data must be updated remotely by the backup servers

Workflow for remote write:

  1. Write request for item x (goes to backup)
  2. Forward request to primary, primary writes x
  3. Tell backups to update & write x
  4. Acknowledge that the update has been completed by backups
  5. Acknowledge to client that the write has been completed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Protocols in primary-based replication can be classified as “remote-write” or “local-write”. What does local-write mean?

What does the workflow look like?

A

The primary replica is migrates from server to server, allowing clients to perform updates to their local replica

Workflow for local write:

  1. Write request for item x (goes to client’s backup)
  2. Move item x to new primary (which is the client’s backup)
  3. Acknowledge write completed to client
  4. New primary tells backups to update
  5. Acknowledge to new primary that backups have updated x
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In Primary-based protocols, if the ______ replica fails, then one of the ______ replicas may take over as the new ______. Accurate _______ detection is necessary to prevent ______ situations

A
primary
backup
primary
failure
split-brain
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the benefit & drawbacks of forcing all updates through a primary replica?

A

Benefit:
Makes it possible to implement strong consistency models such as sequential consistency & linearizability

Drawbacks:

  • Can lean to performance bottlenecks
  • Temporary loss of availability when the primary fails
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

______ protocols allow replicas to receive updates such that each update must be accepted by a sufficiently large ______ of replicas.

A

Quorum-based

subset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Quorum systems improve ______ of ______ data. Every time a group of servers needs to agree on something, a ______ is involved in the decisions

A

consistency
replicated
quorum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Read-write quorums define two parameters n_R and n_W. What do these two mean? What are they signifying?

A

n_R is the minimum number of replicas that must participate in a read operation. These are the “read-quorums”

n_W is the minimum number of replicas that must participate in a write operation these are the “write quorums”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are read-quorums and write-quorums?

A

read-quorums: The subset of all replicas which are involved in reading

write-quorums: The subset of all replicas which are involved in writing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In distributed databases, read and write quorums must satisfy 2 rules of overlap. What are they?

A
  1. The read and write quorums must overlap: n_R + n_W > N
  2. Two write quorums must overlap: n_W + n_W > N

Rule 2 means that at least half of the replicas must be write quorums, this enables detection of write-write conflicts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In Quorum-based protocols, what does ensuring that read and write quorums overlap enable?

A

Enables detection of read-write conflicts.

All read-quorums will be consistent with each other, and all write-quorums will be consistent with each other. Therefore, there is no opportunity for read-write conflicts & the execution is guaranteed to be sequentially consistent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does ensuring that two write-quorums overlap enable?

A

Enables detection of write-write conflicts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In Quorum-based protocols, what constraint do we have on N (the number of protocols)?

(not in relation to N_r and N_w)

A

N (number of replicas) must be odd.

Correction: it is “usually” chosen as odd

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In Quorum-based protocols, what constraint do we have on n_R, n_W and N with respect to each other?

A
  1. n_R + n_W > N
  2. n_W + n_W > N
  3. n_W > 0
  4. n_R > 0
  5. N is odd
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does ROWA stand for? and what is a ROWA scheme in quorum-based protocols?

A

ROWA - read one, write all

When you have n_R =1 and n_W = N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

partial-quorums can be configured to provide various degrees of _____ by changing ____ and _____.

A

consistency
n_R
n_W

17
Q

What is the difference between strong and weak consistency in distributed systems?

A

Strong consistency: The data in all replicas is the same at any time. If key x is read from replica A and B at the same time, they should return the same value

Weak consistency: There is no guarantee that all replicas have the same data at any time.

18
Q

In partial-quorums, how can adjusting n_R and n_W provide strong or weak consistency?

A

if n_R + n_W > N, then the system will have strong consistency

if n_R + n_W <= N, then the system will have weaker consistency - depending on n_R and n_W

19
Q

In _____ consistency mode, the system cannot detect read-write conflicts, nor write-write conflicts

A

weak

20
Q

What is the “last write wins” policy?

A

Whenever you have 2 writes incoming into a system at the same time, their timestamps are used to resolve which one will be used. The later one will be the one which is used

21
Q

To resolve ______ conflicts, updates are tagged with ______, and a ______ policy is applied

A

write-write
timestamps
resolution

22
Q

In Quorum-based protocols, whenever the subset of replicas to not satisfy the 2 rules of overlap for (strict) quorums, then they are referred to as _____ ______.

Note that the 2 rules of overlap are:
n_R + n_W > N
n_W + n_W > N

A

partial quorums

23
Q

Describe the difference between full replication and partial replication in databases

A

Full replication: the full database is stored in each replica (all data is duplicated)

Partial replication: only a fragment of the database is stored in a replica, just like sharding. Frequently used fragments may be duplicated.

24
Q

Suppose “n” denotes the number of replicas for one data object. If n == number of replicas, then what type of replication is this scheme using?

A

Full replication. Every server has a copy of the data object.

25
Q

When the replication factor is less than the total number of servers, this is known as _____ replication

A

partial

26
Q

_____ replication allows us to increase the effective storage capacity of the system through the addition of _____ while keeping the ______ ______ constant.

A

partial
servers
replication factor

27
Q

When the number of servers/replicas is larger than the replication factor (partial replication), then what does each server/replica store?

A

a fragment/subset of the data used of the system

28
Q

What is eventually-consistent replication?

A

Whenever a read or write is issued to a distributed system, it is resolved to the nearest replica. This replica is responsible for propagating the message to the remaining replicas

29
Q

In an ______ _______ replication system, a server that receives an update will reply with an ________ to the client first, and then propagate ________ to the remaining replicas

A

eventually-consistent
acknowledgement
lazily/asynchronously

30
Q

What happens in an eventually-consistent replication system when an update is being propagated, and a replica is unreachable? How do they reach consistency?

A

It can be updated later using an anti-entropy mechanism. This can be replicas periodically exchanging hashes of data to detect discrepancies.

31
Q

What do eventually consistent systems do to ensure that data is consistent across replicas?

A

Periodically, replicas exchange hashes of data to detect discrepancies, using merkle/hash trees.

Timestamps are used to tell which update is the latest.

32
Q

In eventually consistent systems, how do replicas determine what is the latest version of a data object?

A

Using timestamps. The largest timestamp is the correct version

33
Q

What is the purpose of merkle trees (or hash trees) in eventually-consistent systems?

A

The trees are exchanged between replicas to compare and update versions of data. The trees act as a compact version of the data, and allows the replicas to find the source of error.

34
Q

In an eventually-consistent system, what is a “stale” read?

A

Whenever a client connects to replica which has not yet received the latest version of a data object, and this replica returns the old version of the object

35
Q

Merkle trees are used to allow replicas to efficiently compare values of data objects. Describe the structure of these trees.

A

The leaf of the tree has the raw data blocks, and each parent of a node in the tree contains the concatenation of the hashes of their child nodes. This makes it efficient to compare hashes between replicas