Module 10c - Distributed Commit and Checkpoints Flashcards

Question 1

Q

ACID is a collection of properties that we expect in a database. What does ACID stand for? What does each letter mean?

Answer

A

Atomicity - All the updates take effect or none of them do

Consistency - Constraints are preserved

Isolation - Concurrent transactions are unaware of each other

Durability - Updates made by committed transactions are not lost in the event of a failure

Question 2

Q

The distributed commit problem concerns transaction _______ in a ________ environment

Answer

A

atomicity

distributed

Question 3

Q

Achieving atomicity is easy in a centralized database, but it is harder in a distributed database. Why?

Answer

A

Different components of the database could fail independently during a transaction. that makes it difficult to make sure that updates took place at either all of them or none of them

Question 4

Q

The coordinator is a _______ node which is responsible for ensuring ________

Answer

A

separate/independent

atomicity

Question 5

Q

In a distributed database system, transaction commitment is initiated by a _________ after the transaction _________ phase, during which each _________ discovers whether they are able to commit the transaction or not

Answer

A

coordinator
execution
participant

Question 6

Q

How do participants communicate to the coordinator on whether they should commit or abort a transaction?

Answer

A

The coordinator asks each participant for a vote on whether they should commit or abort the transaction globally

Question 7

Q

What happens after the coordinator collects the votes from the participants?

Answer

A

The coordinator computes the global decision after collecting votes, and then shares the decision with the participants

Question 8

Q

What must happen to the transaction if ANY of the participants vote to abort?

Answer

A

The transaction must abort globally whenever ANY participant votes to abort

Question 9

Q

If there are no failures in the system, and every participant votes to commit, what must happen?

Answer

A

The transaction must commit globally whenever all participants vote to commit

Question 10

Q

Two-phase commit (2PC) is a common solution based on two phases.

What happens during phase 1 of 2PC?

Answer

A

Phase 1:

Coordinator asks participants whether they are ready to commit. Participants respond with votes

Question 11

Q

Two-phase commit (2PC) is a common solution based on two phases.

What happens during phase 2 of 2PC?

Answer

A

Phase 2:
Coordinator examines votes and decides the outcome of the transaction. If all participants vote to commit, then the transaction is committed successfully.

Otherwise it is aborted

Question 12

Q

What are the 4 assumptions made with Two-phase commit (2PC)?

Answer

A

Synchronous processes (can be executed without interruption from start to finish)
Bounded communication delays
Crash-recovery failures
Processes have access to stable storage for logging recovery information

Question 13

Q

What does the state transition diagram look like for the <strong>coordinator</strong> in Two-Phase commit (2PC)?

Answer

A

https://media.discordapp.net/attachments/213179086351106048/1003046928721924207/unknown.png

Question 14

Q

What does the state transition diagram look like for the <strong>participant</strong> in Two-Phase commit (2PC)?

Answer

A

https://media.discordapp.net/attachments/213179086351106048/1003047663886929960/unknown.png

Question 15

Q

In 2PC, what are the set of actions committed by a coordinator that can be used to create a coordinator program?

Answer

A

multicast “vote request” to all participants
listen in on all the votes from all the participants, record all the incoming votes.
If a timeout has been invoked, multicast “global abort” to all participants, and exit the program/process
If all participants sent “vote commit” to the coordinator AND the coordinator votes “commit”, then multicast “global commit” to all participants
Else (if #4 is not true), multicast “global abort” to all participants

Question 16

Q

In 2PC, what are the set of actions committed by a participant that can be used to create a participant program?

Don’t include the daemon process that executes in the participant

Answer

A

Listen in on the coordinator, wait for a message from the coordinator
If a timeout is invoked, then abort and exit the program/process
Vote either commit or abort
If vote was to abort, then send “vote abort” to coordinator, abort the commit, and terminate the program
If vote was to commit, then send “vote commit” to coordinator, and then wait for decision from coordinator on whether a commit should be made or not. Decide to commit or abort accordingly
If a timeout is invoked while waiting for the coordinator to make a decision (no decision made), then multicast “decision request” to other participants, wait until decision is received, and commit or abort accordingly. Note that in this case, the participant may remain blocked forever while waiting for a decision from other participants.

Question 17

Q

In 2PC, the participant executes a daemon process in the background in addition to its core procedure. What are the set of actions executed by this separate process?

Answer

A

Run the following in an infinite loop:

Block on waiting for an incoming “decision request” to come in from other participants
When unblocked, read the most recently recorded state from the local log
If the local log state is “global commit”, then send “global commit” back to that participant which the request was received from
If the local log state is “init” or “global abort”, then send a “global abort” back to the participant which the request was received from
Repeat step 1 infinitely

Question 18

Q

In 2PC, if a participant does not receive a commit or abort decision from the _______, within a ________ period of time, then it may learn the decision from another ________

Answer

A

coordinator
bounded
participant

Question 19

Q

In 2PC, when a participant P tries to learn the decision (or commit or abort) from another participant Q, what are the set of actions that P would make based on Q’s state being:

COMMIT
ABORT
INIT
READY

note that Q is the remote participant, whereas P is the local participant

Answer

A

On commit, P will make a transition to commit
On abort, P will make a transition to abort
On init, P will make a transition to abort
On ready, P will contact another participant

Note: whenever all participants are in state READY and they are asking around to see what the decision should be, the system will wait forever

Question 20

Q

In 2PC, if the coordinator crashes, is a participant able to make progress?

How would the participant know what the decision should be to make progress on?

Answer

A

Yes the participant can make progress.

The participant would have to have received the decision from the coordinator despite the crash, or learn the decision from other participants

Question 21

Q

In 2PC, Following a coordinator crashing, under what condition is it safe for a transaction to be committed?

Answer

A

If all participants voted to commit (all are in READY or COMMIT state), then it is safe to commit

Question 22

Q

In 2PC, Following a coordinator crashing, under what condition is it safe for a transaction to be aborted?

Answer

A

If ANY of the participants are not in the COMMIT or READY state, then it is safe to abort

Question 23

Q

In 2PC, what happens if both a coordinator and a participant crash?

Answer

A

A smarter implementation of 2PC is needed (outside the scope of this course)

Generally, the coordinator or the participant needs to wait for the other party to recover, reload its logs, and continue

Question 24

Q

What is a distributed checkpoint?

Answer

A

A collection of checkpoints taken by numerous processes.

Allows inter-process communication and coordination

Question 25

Q

What is a distributed snapshot in context to distributed checkpoints?

Answer

A

In a distributed snapshot, if a process P has recorded the receipt of a message, then there should also be a process Q that has recorded the sending of that message

Question 26

Q

Why do we need distributed checkpoints?

Answer

A

If a part of a distributed system or the entire distributed system fails, then the checkpoint is used for recovery

Question 27

Q

For distributed checkpoints, in what case is recovery possible?

Answer

A

Recovery is possible only if the collection of checkpoints taken by individual processes forms a distributed snapshot

Question 28

Q

What is the most recent distributed snapshot is known as?

Answer

A

recovery line

We use it in backwards error recovery

Question 29

Q

How do we check if two checkpoints from two processes form an inconsistent collection of checkpoints? (not a distributed snapshot)

Answer

A

We take the two process-local checkpoints in a distributed checkpoint.
We then connect them using a dashed line
If there is a message that is sent on the RIGHT side of the dashed line, but then is received on the LEFT side, then we have a violation of the distributed snapshot

Otherwise, we have a distributed snapshot

Question 30

Q

In distributed checkpoints, what is the domino effect?

Answer

A

If the most recent checkpoints taken by processes do not provide a recovery line, then successively earlier checkpoints must be considered. This can keep cascading back all the way until the initial states are reaches as the most recent distributed snapshot

Question 31

Q

The “coordinated checkpointing algorithm” is used to deal with the ______ ______ in distributed checkpoints. It ensures that a ______ _____ is created

Answer

A

domino effect
recovery line (or distributed snapshot)

Question 32

Q

What happens in the 1st phase of the “coordinated checkpointing algorithm”?

Answer

A

Phase 1:
1. The coordinator sends a CHECKPOINT_REQUEST message to all processes

A process, upon receiving this message pauses sending new messages (temporarily creates a queue for all outgoing messages)
The process takes a local checkpoint (excluding messages in the queue since it is paused)
The process returns an acknowledgement to the coordinator that the checkpoint has been created

Question 33

Q

What happens in the 2nd phase of the “coordinated checkpointing algorithm”?

Answer

A

Phase 2:
1. The coordinator will receive acknowledgements from all the processes that their checkpoints have been created

The coordinator multicasts CHECKPOINT_DONE to all processes
All processes resume processing messages (unpause their blocked queue)

Question 34

Q

How does “coordinated checkpointing algorithm” ensure that there will always be a distributed snapshot created?

Answer

A

The coordinator sending a CHECKPOINT_REQUEST flag, and the receiving processes pausing all new messages until the CHECKPOINT_DONE flag is sent ensures that there cannot be any messages in between the dashed line when connecting checkpoints across processes