Chapter 6: Consistency & Replication Flashcards
1
Q
Why Replication?
A
-
Performance
- Caching data at browsers and proxy servers
- e.g. Content Delivery Network with several locations
-
High availability
- Upon crashes, service offered by replica
- Upon network partition, service available to clients in partition
-
Fault tolerance
- Providing reliable service in face of faulty servers
2
Q
The “cost” of replication NN
A
- Cost to keep replicas up to date in face of updates?
- E.g., additional bandwidth, number of messages exchanged, higher latency (i.e., service‐response time), complexity of code, etc.
- How can the problem of stale (out‐of‐date) data current reads at replicas be overcome?
3
Q
Consistency Models
A
- Definition (Consistency model)
- A contract between a distributed data store and a set of processes, which specifies what the results of read/write operations are in the presence of concurrency
- Distributed data store as synonym for
distributed database, shared memory, shared files, etc.
4
Q
Consistency Violation
A
- With concurrent read & write operations on a distributed data store, data inconsistency may arise
5
Q
Overview: Consistency Models important - know lal
A
- Strict consistency
- Sequential consistency
- Linearizable consistency
- Causal consistency
- FIFO consistency
- Weak consistency
6
Q
Strict Consistency
A
- Definition: Any read on a data item x returns a value corresponding to the result of the most recent write on x
- Uni‐processor systems have traditionally observed strict consistency, … but what about multi‐processor systems?
a = 1; a = 2; print(a); Output? most recent ! - Definition assumes existence of absolute global time for unambiguous determination of “most recent“.
- What do we know about uniquely time stamping events in distributed systems via absolute time references?
- Under strict consistent, all writes are instantaneously visible to all processes and absolute global time order is maintained
- Similarly, if a read is done, then it gets the most recent value, no matter how quickly the next write is done
7
Q
Strict Consistency: Thought Experiment
A
- To satisfy strict consistency, laws of physics may have to be violated! Obviously, not an option!!!
- To realize strict consistency in this case, R(X)a would have to travel at 10 times the speed of light!
- Strict consistency is impossible to guarantee in distributed systems, due to reliance on precise global time in its definition
8
Q
Definition of Sequential Consistency
A
- The result of any execution is the same as if the operations by all processes on the data store were executed in some sequential order and …
- … the operations of each individual process appear in this sequence in the order specified by its program
- Operations refer to read and write
- Weaker than strict consistency
- When processes run concurrently, any valid interleaving of read and write operation is acceptable behavior, but all processes see the same interleaving of operations
- Nothing is said about time, i.e., no reference to “most recent”, unlike strict consistency
- Property: if the read time is r, the write time is w, and the minimal packet transfer time between nodes is t, then r+w>=t holds (if you reduce reading time, writing time goes up)
9
Q
Definition of Linearizability
A
- Like Sequential consitency
- If there is an overlap order is not important - if not it is
- In addition, if tsOP1(x) < tsOP2(y), then operation OP1(x) should precede OP2(y) in this sequence (t is defined as intervals)
- tsOP(x) denotes the timestamp assigned to operation OP that is performed on data item x, and OP is either read (R) or write (W).
- Like strict consistency, assumes global time, but not absolute global time
- Assumes processes in the system have physical clocks synchronized to within an error bounded
- If W(x)b was the most recent write operation and there is no other write operation overlapping with W(x)b, then any later read should return b
- If W(x)a and W(x)b were two most‐recent overlapping write operation, then any later read should return either a or b, not something else
- A linearizable data store is also sequentially consistent
- I.e., linearizability is more restrictive
- Difference is the ordering according to a set of synchronized clocks
10
Q
Intuition for Causal Consistency
A
- Weaker than sequential consistency
- Distinguish events that are potentially causally related and those that are not (concurrent events) Rb
- If event B is influenced (caused) by an earlier event A, causality requires that everyone first sees A then B
- Concurrent events (i.e.,writes) may be seen in a different order on different machines
- Reading of x and writing of y are potentially causally
related
- Computation of y may have depended on value of x read by P2 (written by P1); e.g., y = f(x)
- On the other hand, y may not have depended on x, yet potential causality still holds in our formalization!
- Read followed by write in the same process, the two are potentially causally related
- Example: R(x)a … W(y)b (e.g., it may be that y=f(x))
- Read is potentially causally related to write that provided the value read got
- Example: W(x)a … R(x)a
- Independent writes by two process on a variable are not causally related (they are concurrent)
- Example: W(x)a … W(x)b
- Potentially causally related writes must be seen by all processes in the same order
- Concurrent writes may be seen in a different order by different processes
11
Q
Definition of FIFO Consistency
A
- Writes by a single process are seen by all other processes in the order in which they were issued
- Writes from different processes may be seen in a different order by different processes
12
Q
Summary of Consistency Models
A
13
Q
Eventual Consistency Motivation
A
- So far, goal was to maintain consistency in presence of concurrent read/write operations
- There are use cases with no (few) updates to the same data, while most operations are reads
- E.g., distributed database, DNS, Web, K/V‐Stores
- Majority of operations are reads
- Writes are mostly done by central authorities (domain owners, Web masters, etc.)
- Updates are keyed and arrive from single point, only
- Few concurrent updates and updates propagate lazily
- If no updates take place for a long time, all replicas will gradually become consistent
14
Q
Eventual Consistency
A
- Eventual consistency is desirable for large‐scale distributed systems where high availability is important
- Tends to be cheap to implement
- Constitutes a challenge for environments where strong consistency is important
- How well does eventual consistency work for mobile user? For high load of requests not suitable
- Here, we need another class of consistency, i.e., client-centric consistency models
15
Q
Client‐Centric Consistency Models
A
- Goal: How can we avoid system‐wide consistency by concentrating on what clients want, instead of what should be maintained by servers.
- Different client‐centric consistency models – Monotonic reads
- Monotonic writes
- Read‐your‐writes
- Write‐follows‐reads