LU6 Consistency and Replication Flashcards

1
Q

What is replication in distributed systems?

A

Replication is copying data from one host to another to increase reliability and performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is data replication used?

A

To improve system reliability (e.g., disaster recovery) and performance (e.g., reducing latency).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a replica in distributed systems?

A

A copy of the same data stored on multiple nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the main challenge of data replication?

A

Maintaining consistency across all replicas.

because at the mean time of replicating, NEW data will come in.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a read-write conflict?

A

A situation where a read and a write operation occur concurrently on the same data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a write-write conflict?

A

When two concurrent write operations occur on the same data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What happens when global ordering on conflicting operations is enforced?

A

It can degrade system scalability due to high synchronization costs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a solution to avoid costly global synchronization?

A

Opt in weaken consistency model.

propagate the NEW data instead of instantly
(e.g. profile picture)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is replica consistency?

A

Ensuring that all copies of data are consistent across nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a distributed transaction?

A

read or write data on multiple nodes.

  • Either all nodes must commit, or all must abort
  • If any node crashes, all must abort
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the atomic commitment problem?

A

Ensuring that either all nodes commit or all abort a distributed transaction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the two-phase commit (2PC) protocol?

A

A protocol ensuring atomic commitment by having all nodes vote to commit or abort.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is read-after-write consistency?

A

Ensuring a client can read the value it just wrote.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a quorum in replication?

A

A subset of replicas required to perform read or write operations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the typical quorum size for n replicas?

A

(n+1)/2 for both read and write quorums.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is read repair in replication?

A

A mechanism where the client helps propagate the most recent data to other replicas.

  • Read repair fixes data inconsistencies during read operations.
  • If a read finds different versions of data on different replicas, the system updates to the most recent one.
  • This ensures future reads get the latest data (consistency).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What happens if the coordinator crashes in 2PC?

A
  • If the coordinator crashes after writing the decision to disk: Upon recovery, it will read the decision from disk and resend it to the participants.
  • If the coordinator crashes before writing the decision to disk: Participants will eventually time out, and in the absence of any decision, the system will abort to ensure consistency.
  • If the coordinator crashes after receiving all ‘OK’ votes but before sending the decision: Participants will be in an uncertain state and must wait for the coordinator to recover or use a termination protocol. If no decision can be determined, the system will default to abort.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is Linearizability?

A

A strong consistency model where every operation appears instantaneous between its start and finish.

  • respect causality (real-time ordering)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How does linearizability differ from serializability?

A
  • Linearizability requires real-time ordering
  • Serializability only requires a consistent order.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is eventual consistency?

A

A model where replicas become consistent over time if no updates occur.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the read-your-writes consistency model?

A

Ensures a client can always read its previous writes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is monotonic reads consistency?

A

Ensure reading the current/latest version of data (prevent reading older data).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is monotonic writes consistency?

A

Ensures each new write happens after the previous one in the same session. (respecting the causality)

ensures that write operations by a single client are applied in the order they were issued. If a client performs multiple writes, the system guarantees that later writes won’t be applied before earlier ones across all replicas, preserving the write sequence even in the presence of replication delays or failures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is writes-follow-reads consistency?

A

A write operation follows a read and updates based on the most recent value read.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is the passive (primary-backup) replication model?
A model where one primary replica handles updates, which are then propagated to backups.
26
What is active replication?
A model where all replicas process requests independently but identically.
27
How does quorum-based replication handle **network partitions**?
By requiring a majority of replicas in the quorum to proceed with reads or writes.
28
What is the main advantage of quorum-based protocols?
They prevent read-write and write-write conflicts by overlapping quorums.
29
What is the constraint for quorum-based protocols?
NR + NW > N (to prevent read-write conflict) and NW > N/2 (to prevent write-write conflict).
30
What are push protocols in replication?
Updates are proactively sent to replicas.
31
What are pull protocols in replication?
Replicas request updates when needed.
32
What are leases in replication?
A hybrid of push and pull protocols where updates are pushed for a set time, after which replicas pull updates.
33
What is the advantage of using leases?
Combines the benefits of both push and pull protocols to optimize performance.
34
What is weak consistency?
Consistency is enforced only during synchronization operations.
35
What are synchronization operations?
Operations that propagate updates and ensure consistency at specific points.
36
What are the three properties of weak consistency?
1) Writes complete before synchronization, 2) Reads/writes after synchronization, 3) Sequential consistency for synchronization variables.
37
What is eventual consistency best suited for?
Systems with infrequent updates and high read operations, like DNS.
38
What are session guarantees in eventual consistency?
Consistency models that ensure a user's session sees their updates and consistent data.
39
What is a real-world example of replication issues?
Two users trying to book the last airline ticket concurrently, causing a read-write conflict.
40
What is the probability of all replicas being faulty?
pn, where p is the probability of a single replica being faulty.
41
What is the probability of at least one replica being faulty?
1 - (1 - p)^n.
42
What are the five phases in performing a request in replication?
1) Request, 2) Coordination, 3) Execution, 4) Agreement, 5) Response.
43
What is FIFO ordering in replication?
Ensuring messages are delivered in the order they were sent.
44
What is total ordering in replication?
Ensuring all replicas receive messages in the same order.
45
What are vector timestamps used for?
To ensure causal ordering in distributed systems.
46
What is strict consistency?
Absolute time ordering of all shared accesses matters.
47
What is sequential consistency?
All processes see all shared accesses in the same order, but not necessarily in real time.
48
What is causal consistency?
All processes see causally-related shared accesses in the same order.
49
What is FIFO consistency?
Writes from a single process are seen by all in the order issued, but writes from different processes may differ in order.
50
What is the relationship between sequential consistency and coherence?
Sequential consistency implies coherence, but not vice versa.
51
What is the relationship between sequential consistency and FIFO consistency?
Sequential consistency implies FIFO consistency, but not vice versa.
52
What is the relationship between coherence and FIFO consistency?
They are independent; one does not imply the other.
53
What happens in the passive replication model if the primary fails?
A backup takes over, and surviving replicas agree on the operations performed.
54
What is the main issue with network partitions in replication?
They prevent coordination, causing potential inconsistency.
55
What is the front end's role in replication?
It issues requests to replicas and returns responses to clients.
56
What is the atomicity requirement in distributed transactions?
A transaction must either fully commit or fully abort.
57
What happens if a node crashes during a distributed transaction?
All nodes must abort the transaction.
58
How does read quorum ensure consistency?
By reading from enough replicas to include the latest write.
59
How does write quorum prevent conflicts?
By requiring writes to be acknowledged by a majority of replicas.
60
What is the consequence of using strong consistency models?
Higher latency and reduced system scalability.
61
What is the benefit of using weak or eventual consistency models?
Improved performance and fault tolerance.
62
What is a real-world application that requires strong consistency?
Banking systems.
63
What is a real-world application that can use eventual consistency?
Social media feeds (e.g., Facebook, Twitter).
64
What is the role of the coordinator in 2PC?
To manage the commit or abort decision and communicate it to all nodes.
65
What is the limitation of the 2PC protocol?
It can block if the coordinator crashes after the prepare phase.
66
What is the difference between primary-backup and active replication?
Primary-backup has one primary handling updates, while active replication processes all requests on all replicas.
67
What is a read quorum?
A subset of replicas that must be read to ensure data consistency.
68
What is a write quorum?
A subset of replicas that must acknowledge a write to ensure data consistency.
69
How does the system resolve conflicting values in replication?
Using timestamps (vector or Lamport clocks) to determine the most recent update.
70
What is the impact of caching on consistency?
It can slow down consistency but improves performance.
71
What is the purpose of the validity condition in consistency models?
To ensure reads return the most recent write's value.
72
What is the coherence model in consistency?
Sequential consistency applied to individual data items.
73
How do leases help manage state-space overhead?
By reducing expiration times for less active replicas.
74
What is the significance of program order in consistency?
It defines the sequence in which operations are expected to execute.
75
What is the main challenge in distributed transactions?
Ensuring atomicity and consistency across multiple nodes.
76
What are the common types of consistency models?
Strict, linearizability, sequential, causal, FIFO, weak, and eventual.
77
What is a distributed data store's contract in consistency models?
Defines when and how modifications are made during concurrency.
78
What is the significance of synchronization variables in weak consistency?
They ensure operations are consistent when synchronization occurs.
79
What is the impact of faults on replication?
They can lead to unavailability and inconsistency if not managed properly.
80
What are the typical system models for replication?
Passive (primary-backup) and active replication.
81
What is the role of replica managers (RM)?
To contain and manage replicas on given computers.
82
What is the purpose of ordered multicast in active replication?
To ensure all replicas process requests in the same order.
83
What is the role of front-end coordination in replication?
To manage request distribution and ensure consistency across replicas.
84
What is the impact of network partitions on quorum-based protocols?
It can prevent achieving the required quorum, causing inconsistency.
85
What is the benefit of using a hybrid push-pull protocol (leases)?
It balances update propagation efficiency and consistency.
86
What are the constraints of quorum-based protocols to prevent conflicts?
NR + NW > N and NW > N/2.
87
What is the role of time-to-live (TTL) in lease protocols?
To define the duration for which updates are pushed before requiring a pull.
88
What are the benefits of using client-centric consistency models?
They provide a consistent view for individual clients, especially in mobile environments.
89
What is the difference between consistency and replication?
Consistency ensures data accuracy across replicas, while replication involves copying data to multiple nodes.
90
What happens if a quorum read returns conflicting values?
The system uses timestamps to resolve and determine the latest value.
91
What is the purpose of propagating updates in replication?
To ensure all replicas eventually reflect the same data state.
92
What is a lazy consistency model?
A model where updates propagate slowly, leading to eventual consistency.
93
What is the significance of the read/write quorum intersection?
It ensures that at least one replica has the latest write when reading.
94
What is the impact of race conditions on consistency models?
They can lead to inefficiencies or incorrect data if not properly managed.
95
How does a front-end handle failed replicas in replication?
It reroutes requests to available replicas to maintain service continuity.
96
What is the consequence of failing to meet quorum constraints?
It can lead to read-write or write-write conflicts, compromising consistency.