Chapter 5: Coordination & Agreement Flashcards

1
Q

Synchronous distributed system model

A
  • Each message transmitted over a channel is received within a known bounded time
  • Time to execute each step of a process has known lower and upper bounds
  • Each process has a local clock whose driftrate from real time has a known bound
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Asynchronous distributed system model NN

A
  • Messages may need an arbitrary time to be transmitted
  • Each step of a process may take an arbitrary time to complete
  • Clock driftrates are arbitrary
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Thought experiment - Some takeaways

A
  • Asynchronous model is representative for Internet and many practical distributed applications
  • Solution valid for asynchronousd distributed systems are also valid for synchronous ones
  • Many design problems can not besolved in an asynchronous world (e.g., when vs. who leads attack)
  • Applyt timeouts and timing constraints to reduce uncertainty and to bring synchronous model into the picture
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Failure models

A
  • Process and communication channel may fail, i.e., depart from correct and expected behaviour
  • Failure model offers understanding for effects of failures
  • Failure types
    • Omission failures (one ore more responses fail)
    • Arbitrary failures (wrong value)
    • Timing failures (outside interval)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Process omission failures

A
  • Fails by halting, i.e., crashes (e.g. bluescreen)
  • Fail‐stop failure
    • Process halts and remains halted
    • Others can detect this state
  • Crash failure
    • Process halts and remains halted
    • Others can (only) suspect this state
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Communication omission failure

A
  • Receiving process does not receive message
  • Typically becaue of
    • Buffer overlfow
    • Network partition
  • Failure types
    • Send-omission
    • Receive-omission
    • Channel-omission
  • Process vs. communication failure: Generally, impossible to distinguish
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In our thought experiment NN

A
  • Could Army i detect whether Army j has been defeated? (i.e., process failed?)
  • Assuming the enemy could attack
  • While undefeated, send periodic message (e.g. hearbeat)
    • Assume enemy can not corrupt message
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Arbitrary failures

A
  • Encompasses prior failure classes
  • Process not just stops or crashes but processes requests incorrectly, corrupts state, produces inconsistent & incorrect output (commission failures)
  • Does not follow agreed protocol, messages get corrupted, messages introduced
  • Result:Systembehavesunpredictable
  • A.k.a. Byzantine failures / Produces –> produces wrong output
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Access to shared variables and RACE condition

A
  • Imaging a globally shared variable counter in a process accessible to multiple threads:
    • counter ++
    • counter –
    • Register could be 4, 6 or 5
  • Race condition:
    • Several threads manipulate shared data concurrently. The final value of the data depends upon which thread finishes last.
    • To prevent race conditions, concurrent processes must be synchronized
  • The statements: counter++; counter‐‐; must each be executed atomically.
  • Atomic operation means an operation that completes in its entirety without interruption.
  • This is achieved through synchronization primitives
  • Shared variable accessed in critical section, protected by synchronization primitives
  • Known as the critical section problem or as mutual exclusion
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Distributed mutual exclusion

A
  • In distributed systems, mutual exclusion is more complex due to lack of:
    • Shared memory
    • Timing issues
    • Lack of a global clock
    • Event ordering
  • Applications
    • Accessing a shared resource in distributed systems
    • Communicating over a wireless channel 802.11
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Critical section (CS) problem: No shared memory

A
  • System with n processes
  • Processes access shared resources in CS
  • Coordinate access to CS via message passing
  • Application‐level protocol for accessing CS
    • Enter_CS() – enter CS, block if necessary
    • ResourceAccess() – access shared resource in CS
    • Exit_CS() – leave CS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Assumptions for CS NN

A
  • System is asynchronous
    • No bound on delays, no bound on clock drift, etc.
  • Processes do not fail
  • Message delivery is reliable
    • Any message sent, is eventually delivered intact and exactly once
  • Client processes are well‐behaved and spent finite time accessing resources in the CS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Mutual exclusion requirements CS

A
  • Safety
    • At most one process in the critical section at a time
  • Liveness
    • Requests to enter & exit CS eventually succeed
    • No deadlock
  • Fairness (order & starvation)
    • If one request to enter CS happened‐before another one, then entry to CS is granted in that order
    • Requests are ordered such that no process enters the critical section twice while another waits to enter (i.e., no starvation)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Deadlock & starvation

A
  • Deadlock: Two or more processes become stuck indefinitely while attempting to enter and exit the CS, by virtue of their mutual dependency
  • Starvation: The indefinite postponement of entry to CS for a process that has requested it.
  • Can we order entry to the CS by the times processes requested it?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Performance metrics NN

A
  • Bandwidth: Number of messages sent, received or both
  • Synchronization delay: Time between one process exiting the critical section and the next entering
  • Client delay: Delay at entry and exit (response time)
  • We do not measure client access to resources protected by the critical section (assume finite)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Solution strategies Deadlocks

A
  • Centralized strategy
    • Divide processes into master and slaves, master dictates actions of slaves
  • Distributed strategy: Each process independently decides actions, based on local knowledge of others’ state
    • Token‐based: A node is allowed in the critical section (CS) if it has a token. Tokens are passed from site to site, in some (priority) order.
    • Non‐token‐based: A node enters CS when an assertion becomes true. A node communicates with other nodes to obtain their states and decides whether the assertion is true or false.
17
Q

Centralized strategy

A
  1. Elect a leader (details, cf. second part of this lecture)
    * Meets requirements: Safety, liveness, no starvation
    * Does solution meet the ordering requirement?
    * Advantages
    • Simple to implement
      * Disadvantages
    • Single point of failure
    • Bottleneck, network congestion, timeout
    • Deadlock potential for multiple resources with separate servers
      * Enter_CS()
    • Two messages: Request & Grant
    • Delays the requesting process by round trip for messages
      * Exit _CS()
    • One message: Release message
    • Assuming asynchronous message passing, incurs no delay
18
Q

Distributed strategy

A
  • In a distributed algorithm, the same decision must be made on each node, independent of the other nodes in the system.
  • Selected algorithms
    • Ring‐based algorithm
    • Logical clocks‐based algorithm (Lamport, 1976)
    • Ricart & Agrawala, 1981
    • Maekawa, 1985
    • Many more
19
Q

Ring‐based algorithm

A
  • Logical ring of processes
  • Each Pi knows its successor, P(i+1) mod N
  • Logical topology a priori unrelated to physical topology

Analysis:

  • Safe: Node enters CS only if it holds the token
  • Live: Since finite work is done by each node (can’t

re‐enter), the token eventually gets to each node

  • Fair: Ordering is based on ring topology
  • Performance
    • Constantly consumes network bandwidth, even when no processes seek entry, except when inside the CS

Synchronization delay: Between 1 and N messages – Client delay: Between 0 and N messages; 1 for exit

  • Problems
    • Lost token
    • Duplicated token
    • Failed node
20
Q

Ricart & Agrawala, 1981, algorithm

A
  • Basic idea
    • Processes wanting to enter CS, multicast a request to all processes
    • Enter CS, once all have granted request
  • Use Lamport timestamps… logical loch value to order requests: , t process identities T the timestamp, Pi the process identifier
  • Each process is in one of three states
    • ReleasedOutside the CS
    • WantedWanting to enter CS
    • HeldInside CS
  • If a process requests to enter CS and all other processes are in the Released state, entry is granted by each process
  • If a process, Pi, requests to enter CS and another process, Pk, is inside the CS (Held state), then Pk will not reply, until it is finished with the CS
21
Q

Ricart & Agrawala algorithm

22
Q

Fault‐tolerance aspects
of mutual exclusion algorithms

A
  • None of the algorithms tolerates message loss
  • Ring‐based algorithm cannot tolerate crash of single process
  • Centralized algorithm can tolerate crash of clients that are neither in the CS, nor have requested entry
  • Described R & A does not tolerate faults either
23
Q

Leader election

A
  • Problem: A group of processes, P1, …, Pn, must agree on some unique Pk to be the “leader
  • Often, the leader then coordinates another activity
  • Election runs when leader (a.k.a., coordinator) failed
  • Any process who hasn’t heard from the leader in some predefined time interval may call for an election
  • False alarm is a possibility (new election initiated, while current leader still alive)
  • Several processes may initiate elections concurrently
  • Algorithm should allow for process crash during election
24
Q

Process identifier

A
  • Elected leader must be unique
  • The one with the largest identifier
  • Identifier could be any “useful value” – I.e., unique & totally ordered
  • E.g., based on OS process identifiers, IP adr., port
  • E.g., based on least computational load
    • <1/load, i>, load > 0, i is process ID to break ties
  • Each process, Pi, has a variable electedi that holds
  • the value of the leader or is “┴” (undefined)
25
Applications of leader election
* **Berkeley clock synchronization algorithm** * **Centralized mutual exclusion algorithm** * Primary‐backup replication algorithms * Two‐phase commit protocol * Used in Amazon Dynamo KV‐store (replication) * **Used in Google BigTable & GFS**
26
As compared to mutual exclusion
* **Losers return to what they were doing ... ... instead of waiting** * **Fast election** is important ... ... not starvation avoidance * **All processes must know the result**... ... not just the winner
27
Election algorithm requirement
* **Safety** * A participating process, Pi, has variable electedi = “┴” or electedi = P, where P is chosen as the non‐crashed process at the end of the election run with the largest identifier. * **Liveness** * All processes participate in the election and eventually either set electedi ≠ “┴” or crash. * **Performance** * Total number of messages sent (bandwidth) –...
28
Ring‐based algorithm, 1978: Overview
* Essentially three phases 1. **Initialization** 2. **Election phase** (concurrent calls for election) * Determine election winner (voting phase) * Reach point of message originator 3. **Announce the leader phase** (victory announcement phase)
29
Ring‐based election algorithm
* Construct a ring (cf. ring‐based mutual exclusion) * Assume, **each P has unique ID** * Assume, **no failures and asynchronous system** * **Any Pi may begin an election** **by sending an election message to its successor** * **Election message holds Pi’s ID** * **Upon receiving** an election message, **Pk compares its own** * **ID** to ID in received message * **If message ID is greater:** **Forward** **_election message_** * **If ... smaller:** **Forward _election message_ with Pk’s ID,** unless Pk has already been participating in the current election run * **If ... equal: Pk is now leader.** **Forward _victory message_ to notify all other processes**
30
Different cases - Ring-based algorithm
31
Ring‐based algorithm: Example (Picture only)
* Construct a ring (cf. ring‐based mutual exclusion) * Assume, each P has unique ID * Assume, no failures and asynchronous system * Any Pi may begin an election by sending an election message to its successor * Election message holds Pi’s ID * Upon receiving an election message, Pk compares its own * ID to ID in received message * If message ID is greater: Forward election message * If ... smaller: Forward election message with Pk’s ID, unless Pk has already been participating in the current election run * If ... equal: Pk is now leader. Forward victory message to notify all other processes
32
Ring‐based election algorithm analysis
* **Worst case** * Single process calls for election * **Anti‐clockwise neighbour has highest identifier** * N‐1 messages to reach this neighbour * N messages to reach point of origin * Leader announcement takes another N messages * For a grand total of 3N – 1 messages
33
Bully algorithm, Garcia‐Molina, 1982
* Assumes **each process has a unique ID**, **reliable message delivery, and synchronous system with timeouts** * Assumes **processes know each others’ IDs** and can communicate with one another * **Higher IDs have priority** * Can “bully” lower numbered processes * Initiated by any process that suspects failure of the leader * Employs timeouts to detect failure * Tolerates processes crashing during elections
34
Bully algorithm messages
* Operates with three types of messages * **Election announces an election** * **Answer responds to an election message** * **Coordination announces the identity of leader** * Algorithm is triggered by any process that detects (suspects) the leader to have crashed
35
Pi detects failure of leader
* For any j \< i and any i \< k 1. **Broadcasts election message** 2. **Any Pk receiving election message responds with answe**r message and **starts another election** 3. Any Pj receiving election message does not respond ( as 4. **If P does not receive any answer message (timeout) then it broadcasts victory via coordination message** 5. **If Pi does receive answer message(s) then waits to receive coordinator message** 6. Restarts election, if no coordination message received
36
Upon crash of a process BULLY ALGO
* **Start** of a **new process replacing crashed one with crashed one’s ID (e.g., read from disk)** * **Process may determine that it has the highest identifier, thus, pronounce itself as leader** * Even though system may have an existing leader (elected during crash) * **New process “bullies” current leader**