Performance Caching data at browsers and proxy servers e.g. Content Delivery Network with several locations High availability Upon crashes, service offered by replica Upon network partition, service available to clients in partition Fault tolerance Providing reliable service in face of faulty servers

Definition (Consistency model) A contract between a distributed data store and a set of processes, which specifies what the results of read/write operations are in the presence of concurrency Distributed data store as synonym for distributed database, shared memory, shared files, etc.

Chapter 6: Consistency & Replication Flashcards by Andreas Hein

Why Replication?

Performance
- Caching data at browsers and proxy servers
- e.g. Content Delivery Network with several locations
High availability
- Upon crashes, service offered by replica
- Upon network partition, service available to clients in partition
Fault tolerance
- Providing reliable service in face of faulty servers

How well did you know this?

Not at all

Perfectly

The “cost” of replication NN

Cost to keep replicas up to date in face of updates?
- E.g., additional bandwidth, number of messages exchanged, higher latency (i.e., service‐response time), complexity of code, etc.
How can the problem of stale (out‐of‐date) data current reads at replicas be overcome?

How well did you know this?

Not at all

Perfectly

Consistency Models

Definition (Consistency model)
- A contract between a distributed data store and a set of processes, which specifies what the results of read/write operations are in the presence of concurrency
- Distributed data store as synonym for
  distributed database, shared memory, shared files, etc.

How well did you know this?

Not at all

Perfectly

Consistency Violation

With concurrent read & write operations on a distributed data store, data inconsistency may arise

How well did you know this?

Not at all

Perfectly

Overview: Consistency Models important - know lal

Strict consistency
Sequential consistency
Linearizable consistency
Causal consistency
FIFO consistency
Weak consistency

How well did you know this?

Not at all

Perfectly

Strict Consistency

Definition: Any read on a data item x returns a value corresponding to the result of the most recent write on x
Uni‐processor systems have traditionally observed strict consistency, … but what about multi‐processor systems?
a = 1; a = 2; print(a); Output? most recent !
Definition assumes existence of absolute global time for unambiguous determination of “most recent“.
What do we know about uniquely time stamping events in distributed systems via absolute time references?
Under strict consistent, all writes are instantaneously visible to all processes and absolute global time order is maintained
Similarly, if a read is done, then it gets the most recent value, no matter how quickly the next write is done

How well did you know this?

Not at all

Perfectly

Strict Consistency: Thought Experiment

To satisfy strict consistency, laws of physics may have to be violated! Obviously, not an option!!!
To realize strict consistency in this case, R(X)a would have to travel at 10 times the speed of light!
Strict consistency is impossible to guarantee in distributed systems, due to reliance on precise global time in its definition

How well did you know this?

Not at all

Perfectly

Definition of Sequential Consistency

The result of any execution is the same as if the operations by all processes on the data store were executed in some sequential order and …
… the operations of each individual process appear in this sequence in the order specified by its program
Operations refer to read and write
Weaker than strict consistency
When processes run concurrently, any valid interleaving of read and write operation is acceptable behavior, but all processes see the same interleaving of operations
Nothing is said about time, i.e., no reference to “most recent”, unlike strict consistency
Property: if the read time is r, the write time is w, and the minimal packet transfer time between nodes is t, then r+w>=t holds (if you reduce reading time, writing time goes up)

How well did you know this?

Not at all

Perfectly

Definition of Linearizability

Like Sequential consitency
If there is an overlap order is not important - if not it is
In addition, if tsOP1(x) < tsOP2(y), then operation OP1(x) should precede OP2(y) in this sequence (t is defined as intervals)
ts_OP(x) denotes the timestamp assigned to operation OP that is performed on data item x, and OP is either read (R) or write (W).
Like strict consistency, assumes global time, but not absolute global time
Assumes processes in the system have physical clocks synchronized to within an error bounded
If W(x)b was the most recent write operation and there is no other write operation overlapping with W(x)b, then any later read should return b
If W(x)a and W(x)b were two most‐recent overlapping write operation, then any later read should return either a or b, not something else
A linearizable data store is also sequentially consistent
I.e., linearizability is more restrictive
Difference is the ordering according to a set of synchronized clocks

How well did you know this?

Not at all

Perfectly

Intuition for Causal Consistency

Weaker than sequential consistency
Distinguish events that are potentially causally related and those that are not (concurrent events) Rb
If event B is influenced (caused) by an earlier event A, causality requires that everyone first sees A then B
Concurrent events (i.e.,writes) may be seen in a different order on different machines
Reading of x and writing of y are potentially causally

related

Computation of y may have depended on value of x read by P2 (written by P1); e.g., y = f(x)
On the other hand, y may not have depended on x, yet potential causality still holds in our formalization!
Read followed by write in the same process, the two are potentially causally related
- Example: R(x)a … W(y)b (e.g., it may be that y=f(x))
Read is potentially causally related to write that provided the value read got
- Example: W(x)a … R(x)a
Independent writes by two process on a variable are not causally related (they are concurrent)
- Example: W(x)a … W(x)b
Potentially causally related writes must be seen by all processes in the same order
Concurrent writes may be seen in a different order by different processes

How well did you know this?

Not at all

Perfectly

Definition of FIFO Consistency

Writes by a single process are seen by all other processes in the order in which they were issued
Writes from different processes may be seen in a different order by different processes

How well did you know this?

Not at all

Perfectly

Summary of Consistency Models

How well did you know this?

Not at all

Perfectly

Eventual Consistency Motivation

So far, goal was to maintain consistency in presence of concurrent read/write operations
There are use cases with no (few) updates to the same data, while most operations are reads
E.g., distributed database, DNS, Web, K/V‐Stores
- Majority of operations are reads
- Writes are mostly done by central authorities (domain owners, Web masters, etc.)
- Updates are keyed and arrive from single point, only
Few concurrent updates and updates propagate lazily
If no updates take place for a long time, all replicas will gradually become consistent

How well did you know this?

Not at all

Perfectly

Eventual Consistency

Eventual consistency is desirable for large‐scale distributed systems where high availability is important
Tends to be cheap to implement
Constitutes a challenge for environments where strong consistency is important
How well does eventual consistency work for mobile user? For high load of requests not suitable
Here, we need another class of consistency, i.e., client-centric consistency models

How well did you know this?

Not at all

Perfectly

Client‐Centric Consistency Models

Goal: How can we avoid system‐wide consistency by concentrating on what clients want, instead of what should be maintained by servers.
Different client‐centric consistency models – Monotonic reads
- Monotonic writes
- Read‐your‐writes
- Write‐follows‐reads

How well did you know this?

Not at all

Perfectly

Definition: Monotonic Reads

A data store provides monotonic‐read consistency if the following condition holds:
- If a process reads the value of a data item x, any successive read operation on x by that process will always return that same value or a more recent value
Monotonic‐read consistency guarantees that if a process has seen a value of x at time t, it will never see an older version of x at a later time
The previously seen value by the process must have reached the new site, where the reader wants to read

Monotonic Writes

Propagation of writes in the correct order to multiple destinations (copies of data store)
A data store provides monotonic‐write consistency if the following condition holds:
- A write operation by a process on a data item x is completed before any successive write operation on x by the same process
In other words, a write operation on a copy of data item x is performed only if that copy has been brought up to date

Read Your Writes

A data store provides read‐your writes consistency if the following condition holds:
- The effect of a write operation by a process on data item x will always be seen by a successive read operation on x by the same process
A write operation is always completed before a successive read operation by the same process, no matter where that read operation takes place

Writes Follow Reads

A data store is said to provide writes‐follow‐reads consistency if the following condition holds:
- A write operation by a process on a data item x following a previous read operation on x by the same process, is guaranteed to take place on the same or a more recent value of x that was read
- E.g., … R(x) … W(x) (for W, same or more recent value of x)
Any successive write operation by a process on a data item x will be performed on a copy of x that is up to date with the value most recently read by that process

Replication Techniques Satisfying Sequential Consistency

• Two main techniques
- Primary Backup
- Active Replication

System Model

Processes are either clients or replicas
Clients and replicas exchange messages through point‐to‐point links
Processes can crash

Passive Replication: Primary Backup

Primary
- Receives invocations from clients and sends back the answers (clint talks always to primary)
Backup
- Interacts with the primary
- Is used to replace the primary when it crashes

Primary Backup Scenario

Guarantee sequential consistency: Order in which the primary receives clients’ invocations define the order of the operation on the data item(s).

Primary Backup: Presence of Crash

Three scenarios
- Primary fails after the client receives the answer -> continue
- Primary fails before sending update messages -> details needed
- Primary fails after sending update messages and before receiving all the ACK messages –> leader election
In all cases, a new primary is elected from among the backups

Primary fails after the client receives the answer

* **Nothing bad happens** since the client has already received the response * **A new primary is elected**

Primary fails before sending update messages

* Client does not get an answer and **resends the requests after a timeout** * The **newly** elected **primary** will **handle** the **request** as new

Primary fails after sending update messages and before receiving all ACK messages

* When a primary fails, a **new primary is elected** by the backup replicas * **Client** does not get an answer and **resends** the **requests** after a timeout * **Since the new primary has already processed the update** message, it **immediately** **sends** the **response** **to** the **client** without further action (we need a log file for processed requests)

Active Replication

* There is **no coordinator,** **all replicas play the same role** * **Each** **replica** is **deterministic**: If all replicas start from the same state and receive the same input, they **will produce the same output** * As a matter of fact, clients will receive the same response, one from each replica

Gossip Architecture

* Gossip architecture **aims at achieving high availability** * Does **not** **offer** **strong** **consistencies** guarantees such as sequential consistency * **Updates** are **propagated** in a **lazy** fashion * **Response** **received** by a client from a replica to its query should **not** be **older** **than** the **last** **response** that the client had received from any replicas * **Client** and **replicas** are **separate processes** * **Replicas** **periodically** **exchange** ‘**gossip**’ messages to convey updates to other replicas * Front end (or client directly) communicates with an individual replica, but it may interact with other replicas, e.g., to tolerate failures

Message Types in Gossip Architecture

* **Query message**: A **message** **from** a **client** **to** its closet **replica** to **determine** the **state** **of** the **system** * **Update message**: A **message** **from** a **client** **to** the **closest** **replica** **to** **update** the state of the **system** * **Gossip message:** This message **contains** some **updates** that a **replica** **thinks** that some **other** **replica** **might** **not** **have** **received** (send to a number of other replicas)