LU7 Fault Tolerance and Recovery Flashcards

Question

What is reliable communication in distributed systems?

Answer 1

Reliable communication ensures messages are delivered correctly and in order despite failures.

Answer 2

Client cannot locate server, request loss, server crashes, response loss, and client crashes.

Answer 3

The server guarantees it will carry out an operation at least once, even if it crashes.

Answer 4

The server guarantees it will carry out an operation at most once, avoiding duplicate executions.

Answer 5

An orphan computation occurs when a server continues processing after the client has crashed.

Answer 6

Orphans are killed by the client upon recovery or removed after a timeout.

Answer 7

Reliable multicasting ensures messages sent to a group are delivered to all intended recipients.

Answer 8

Atomic multicast ensures a message is delivered to all or none of the recipients in a group.

Answer 9

Feedback suppression reduces redundant retransmission requests by suppressing duplicate feedback from receivers.

Answer 10

A hierarchical structure aggregates feedback through intermediate nodes to improve scalability.

Answer 11

Distributed commit ensures that all processes in a distributed transaction either commit or abort together.

Answer 12

2PC is a protocol where the coordinator collects votes from participants to commit or abort a transaction.

Answer 13

3PC adds an additional phase to 2PC to prevent participants from blocking indefinitely.

Answer 14

Vote-request, vote-commit/vote-abort, and global-commit/global-abort.

Answer 15

Vote-request, prepare-commit/global-abort, and global-commit.

Answer 16

The participant recovers its state from logs or queries other participants for the coordinator's decision.

Answer 17

Participants remain blocked until the coordinator recovers and provides the decision.

Answer 18

3PC allows participants to proceed without blocking by adding a pre-commit phase.

Answer 19

Forward error recovery finds a new state from which the system can continue after a failure.

Answer 20

Backward error recovery brings the system back to a previous error-free state.

Answer 21

Checkpointing saves the system state at intervals to enable recovery to a known good state.

Answer 22

Message logging stores communication events to replay and recover system state after a failure.

Answer 23

A state where all received messages are shown to have been sent, ensuring data consistency.

Answer 24

The most recent consistent global checkpoint across all processes in a distributed system.

Answer 25

A rollback that propagates through the system, potentially reverting to the initial state due to inconsistent checkpoints.

Answer 26

A situation where checkpoints lead to cascading rollbacks to the system's start, complicating recovery.

Answer 27

A model where process execution is deterministic between nondeterministic events like message receipts.

Answer 28

Orphans lead to inconsistent states that cannot be correctly replayed during recovery.

Answer 29

Recording nondeterministic events ensures deterministic replay during system recovery.

Answer 30

Reliable communication ensures message delivery, while process resilience handles faulty processes through replication.

Answer 31

Forward recovery corrects the error state, while backward recovery reverts to a previous correct state.

Answer 32

Differentiating between process and network failures makes setting appropriate timeouts difficult.

Answer 33

It reduces feedback overhead and improves scalability by aggregating retransmission requests.

Answer 34

To ensure participants do not block indefinitely if the coordinator fails.

Answer 35

The coordinator asks participants to vote on whether to commit or abort the transaction.

Answer 36

Participants wait for the final commit or abort decision after indicating readiness to commit.

Answer 37

The coordinator manages the commit or abort decision process and ensures consistency.

Answer 38

Checkpoints provide a reference point to revert to in case of system failures.

Answer 39

Consistent cuts ensure all processes have a coherent view of message exchanges for accurate recovery.

Answer 40

Logging ensures the system can accurately replay and recover its state after a failure.

Answer 41

By allowing processes to suppress feedback if another process has already requested retransmission.

Answer 42

Incorrect timing can lead to cascaded rollbacks or the domino effect, complicating recovery.

Answer 43

Participants log the coordinator's decision to ensure they can recover to the correct state.

Answer 44

Ensuring system functionality despite faulty processes through replication and distributed computations.

Answer 45

It ensures data consistency and system coordination despite potential failures.

Answer 46

They allow simple recovery by storing intermediate results that can be committed or discarded.

Answer 47

Ensuring message delivery and consistency across diverse and potentially unreliable network paths.

Answer 48

Crash failures stop operations, while arbitrary failures may produce incorrect or unpredictable behavior.

Answer 49

By ensuring all participants agree to commit or abort the transaction together.

Answer 50

It can block participants until the coordinator recovers and provides a decision.

Answer 51

By introducing a pre-commit phase that prevents participants from blocking indefinitely.

Answer 52

It proactively spreads failure information to ensure all nodes are aware of failures.

Answer 53

It allows the system to replay messages and recover to a consistent state after a failure.

Answer 54

It determines how easily a system can be repaired and restored to service after a failure.

Answer 55

Framing allows for detecting bit errors in transmitted packets.

Answer 56

Idempotent operations can be safely retried without adverse effects, aiding in fault tolerance.

Answer 57

They aggregate feedback to reduce overhead and improve scalability.

Answer 58

By making operations idempotent or using retransmission strategies.

Answer 59

It marks the latest point where the system state is consistent across all processes.

Answer 60

It illustrates how improper checkpointing can lead to extensive rollbacks, complicating recovery.

Answer 61

By using feedback suppression and hierarchical structures to manage retransmissions.

Answer 62

To allow participants to recover to the correct state after a failure.

Answer 63

Participants may block until a new coordinator is elected or the original recovers.

Answer 64

Ensuring message delivery and order despite network failures and delays.

Answer 65

They store intermediate results, simplifying recovery and rollback processes.