15. Distributed Transactions Flashcards
How distributed locking is commonly implemented in databases?
Since every node contains data that is independent of any other node’s data, every node can maintain its own local lock table. Coarser grained locks for entire tables or the database can either be given to all nodes containing a partition or be centralized at a predetermined node. This design makes locking simple as 2 phase locking is performed at every node using local locks in order to guarantee serializability between different transactions.
When dealing with locking, deadlock is always a possibility. To determine whether deadlock has occurred in a distributed database, the waits-for graphs for each node must be unioned to find cycles as transactions can be blocked by other transactions executing on different nodes.
What is consensus in distributed databases?
How is it commonly implemented?
In a distributed database, consensus is the idea that all nodes agree on one course of action.
Consensus is implemented through Two Phase Commit and enforces the property that all nodes maintain the same view of the data.
It provides this guarantee by ensuring that a distributed transaction either commits or aborts on all nodes involved. If consensus is not enforced, some nodes may commit the transaction while others abort, causing nodes to have views of data at different points in time.
Two-phase commit - 2 main types of nodes
1 coordinator and many participants
Two-phase commit - names of phases
- preparation phase
- commit/abort phase
Two-phase commit - preparation phase, describe steps
- Coordinator sends prepare message to participants to tell participants to either prepare for commit or abort
- Participants generate a prepare or abort record and flush record to disk
- Participants send a yes vote to the coordinator if prepare record is flushed or no vote if the abort record is flushed
- Coordinator generates a commit record if it receives unanimous yes votes or an abort record otherwise, and flushes the record to disk
Two-phase commit - commit/abort phase, describe steps
- Coordinator broadcasts (sends a message to every participant) the result of the commit/abort vote based on flushed record (see preparation phase).
- Participants generate a commit or abort record based on the received vote message and flush record to the disk
- Participants send an ACK (acknowledgment) message to the coordinator
- Coordinator generates an end record once all ACKs are received and flushes the record sometime in the future
Depict the scheme of two-phase commit
Two-phase commit, what will happen if:
Participant is recovering, and sees no prepare record.
– This probably means that the participant has not even started 2PC yet – and if it has, it hasn’t yet sent out any vote messages (since votes are sent after flushing the log record to disk).
– Since it has not sent out any vote messages, it aborts the transaction locally. No messages need to be sent out (the participant has no knowledge of the coordinator ID).
Two-phase commit, what will happen if:
Participant is recovering, and sees a prepare record.
– A lot of things could have happened between logging the prepare record and crashing – for instance, we don’t even know if we managed to send out our YES vote!
– Specifically, we don’t know whether or not the coordinator made a commit decision. So the participant node’s recovery process must ask the coordinator whether a commit happened (”Did the coordinator log a commit?”). The coordinator can be determined from the coordinator ID stored in the prepare log record.
– The coordinator will respond with the commit/abort decision, and the participant resumes 2PC from phase 2.
Two-phase commit, what will happen if:
Coordinator is recovering, and sees no commit record.
– The coordinator crashed at some point before receiving the votes of all participants and logging a commit decision.
– The coordinator will abort the transaction locally. No messages need to be sent out (the coordinator has no knowledge of the participant IDs involved in the transaction).
– If the coordinator receives an inquiry from a participant about the status of the transaction, respond that the transaction aborted.
Two-phase commit, what will happen if:
Coordinator is recovering, and sees a commit record.
– We’d like to commit, but we don’t know if we managed to tell the participants.
– So, rerun phase 2 (send out commit messages to participants). The participants can be determined from the participant IDs stored in the commit log record.
Two-phase commit, what will happen if:
Participant is recovering, and sees a commit record.
– We did all our work for this commit, but the coordinator might still be waiting for our ACK, so send ACK to coordinator. (The coordinator can be determined from the coordinator ID stored in the commit log record.)
Two-phase commit, what will happen if:
Coordinator is recovering, and sees an end record.
– This means that everybody already finished the transaction and there is no recovery to do.
Two-phase commit with presumed abort - explain
It turns out that two-phase commit still works if
- Everybody assumes that no log records means abort
- abort records never have to be flushed – not in phase 1 or phase 2, not by the participant or the coordinator.
This optimization is called presumed abort
Two-phase commit, what will happed (with and without presume abort) if
Participant is recovering, and sees no phase 1 abort record.
– Without presumed abort: This probably means that the participant has not even started 2PC yet – and if it has, it hasn’t yet sent out any vote messages (since votes are sent after flushing the log record to disk).
– With presumed abort: It is possible that the participant decided to abort and sent a ”no” vote to the coordinator before the crash.
– With or without presumed abort, the participant aborts the transaction locally. No messages need to be sent out (the participant has no knowledge of the coordinator ID).