Chapter 5: Coordination & Agreement Flashcards
1
Q
Synchronous distributed system model
A
- Each message transmitted over a channel is received within a known bounded time
- Time to execute each step of a process has known lower and upper bounds
- Each process has a local clock whose driftrate from real time has a known bound
2
Q
Asynchronous distributed system model NN
A
- Messages may need an arbitrary time to be transmitted
- Each step of a process may take an arbitrary time to complete
- Clock driftrates are arbitrary
3
Q
Thought experiment - Some takeaways
A
- Asynchronous model is representative for Internet and many practical distributed applications
- Solution valid for asynchronousd distributed systems are also valid for synchronous ones
- Many design problems can not besolved in an asynchronous world (e.g., when vs. who leads attack)
- Applyt timeouts and timing constraints to reduce uncertainty and to bring synchronous model into the picture
4
Q
Failure models
A
- Process and communication channel may fail, i.e., depart from correct and expected behaviour
- Failure model offers understanding for effects of failures
- Failure types
- Omission failures (one ore more responses fail)
- Arbitrary failures (wrong value)
- Timing failures (outside interval)
5
Q
Process omission failures
A
- Fails by halting, i.e., crashes (e.g. bluescreen)
-
Fail‐stop failure
- Process halts and remains halted
- Others can detect this state
-
Crash failure
- Process halts and remains halted
- Others can (only) suspect this state
6
Q
Communication omission failure
A
- Receiving process does not receive message
- Typically becaue of
- Buffer overlfow
- Network partition
- Failure types
- Send-omission
- Receive-omission
- Channel-omission
- Process vs. communication failure: Generally, impossible to distinguish
7
Q
In our thought experiment NN
A
- Could Army i detect whether Army j has been defeated? (i.e., process failed?)
- Assuming the enemy could attack
- While undefeated, send periodic message (e.g. hearbeat)
- Assume enemy can not corrupt message
8
Q
Arbitrary failures
A
- Encompasses prior failure classes
- Process not just stops or crashes but processes requests incorrectly, corrupts state, produces inconsistent & incorrect output (commission failures)
- Does not follow agreed protocol, messages get corrupted, messages introduced
- Result:Systembehavesunpredictable
- A.k.a. Byzantine failures / Produces –> produces wrong output
9
Q
Access to shared variables and RACE condition
A
- Imaging a globally shared variable counter in a process accessible to multiple threads:
- counter ++
- counter –
- Register could be 4, 6 or 5
- Race condition:
- Several threads manipulate shared data concurrently. The final value of the data depends upon which thread finishes last.
- To prevent race conditions, concurrent processes must be synchronized
- The statements: counter++; counter‐‐; must each be executed atomically.
- Atomic operation means an operation that completes in its entirety without interruption.
- This is achieved through synchronization primitives
- Shared variable accessed in critical section, protected by synchronization primitives
- Known as the critical section problem or as mutual exclusion
10
Q
Distributed mutual exclusion
A
- In distributed systems, mutual exclusion is more complex due to lack of:
- Shared memory
- Timing issues
- Lack of a global clock
- Event ordering
- Applications
- Accessing a shared resource in distributed systems
- Communicating over a wireless channel 802.11
11
Q
Critical section (CS) problem: No shared memory
A
- System with n processes
- Processes access shared resources in CS
- Coordinate access to CS via message passing
- Application‐level protocol for accessing CS
- Enter_CS() – enter CS, block if necessary
- ResourceAccess() – access shared resource in CS
- Exit_CS() – leave CS
12
Q
Assumptions for CS NN
A
- System is asynchronous
- No bound on delays, no bound on clock drift, etc.
- Processes do not fail
- Message delivery is reliable
- Any message sent, is eventually delivered intact and exactly once
- Client processes are well‐behaved and spent finite time accessing resources in the CS
13
Q
Mutual exclusion requirements CS
A
-
Safety
- At most one process in the critical section at a time
-
Liveness
- Requests to enter & exit CS eventually succeed
- No deadlock
-
Fairness (order & starvation)
- If one request to enter CS happened‐before another one, then entry to CS is granted in that order
- Requests are ordered such that no process enters the critical section twice while another waits to enter (i.e., no starvation)
14
Q
Deadlock & starvation
A
- Deadlock: Two or more processes become stuck indefinitely while attempting to enter and exit the CS, by virtue of their mutual dependency
- Starvation: The indefinite postponement of entry to CS for a process that has requested it.
- Can we order entry to the CS by the times processes requested it?
15
Q
Performance metrics NN
A
- Bandwidth: Number of messages sent, received or both
- Synchronization delay: Time between one process exiting the critical section and the next entering
- Client delay: Delay at entry and exit (response time)
- We do not measure client access to resources protected by the critical section (assume finite)