12.13.Fault Tolerance - Transactions Flashcards
What is a characteristic that distinguishes distributed systems from single-machine systems?
Partial failure
What is the goal when partial failure occurs?
Tolerate faults
What is being fault tolerant related to?
Dependability
What is dependability>
the trustworthiness of a computing system which allows resilliance to be justifiably placed on the service it delivers
What are the requirements for Dependability?
- Availability
- Reliability
- Maintainability
- Safety
What does Safety mean?
If and when FAILURES occur the CONSEQUENCES are not catastrophic for the system
What does Availability mean?
the probability that the system operates correctly at ANY GIVEN MOMENT
What does Reliability mean?
LENGTH OF TIME that it can run continuously without failure
What does Maintainability mean?
how EASILY a failed system can be REPAIRED
Different types of failures?
- Crash
- Omission
- Response
- Timing
- Arbitrary (Byzanitine)
What is a technique for failure masking?
Redundancy
How many types of redundancy are there?
- Physical
- Information (send extra bits to allow for recovery if need be)
- Time (repeat action if need be)
What is one of our most important considerations in failure masking?
Making sure that a failure won’t leave the system in an inconsistent state
How is avoiding leaving the system in an inconsistent state achieved?
1.Atomic operations!
“The sequence of operations must execute as an ATOMIC operation”
When do concurrent executions not interfere with each other?
If their execution is equivalent to a serial one (they don’t interleave)
What does the property of Isolation in distributed systems refer to?
Isolated excecution (concurrent applications)
What should the distributed application not violate in order to achieve Consistency?
Database’s integrity constraints
What does durability mean in distributed systems?
Changes to the database are persistent
What concept allows for the reinforcement of the ACID propertires?
transactions
What is a transaction?
A set of operations that is either fully committed or aborted as a whole. If aborted no operation in the set is executed.
What algorithms in the implementation of transactions allow for ISOLATION?
Concurrency control
What do concurrency control algorithms do to ensure ISOLATION?
Ensure execution is equivalent to “serial” execution
What algorithms in the implementation of transactions allow for DURABILITY?
Recovery algorithms
What do recovery algorithms do to ensure DURABILITY?
- replay actions of committed transactions
- undo effects of aborted transactions
Two ways to improve concurrency control with locking?
- Optimistic concurrency control (transaction executed normal, checked at commit, aborted if problematic)
- Timestamp ordering (operations in transactions validated when carried out)
Two ways to do recovery when transaction needs to be aborted?
- Backwards (through state checkpoints->Previously correct state)
- Forwards (correct new state)
What problem arises when trying to make transactions where more than one server is involved?
the distributed commit problem(ATOMICITY)
Either all servers commit or all abort
Protocol to support distributed transactions? (more than one server)
- pick coordinator
- client communicates transaction to coordinator
- One or Two phase commit (coordinator communicates abort of transaction to servers)
What is the difference between a one phase commit and a two phase commit when dealing with distributed transactions?
Two phase commit involves the servers being able to Accept and Execute the commit (rather than just receive the command)
What are the drawbacks of the 2-phase commit?
- Coordinator fail (Three-phase commit and multicast?)
- participants must trust coordinator
- tranaction must be short
- distributed deadlock risk
How can deadlock be resolved?
By aborting one of the transactions
When does deadlock occur?
When there is a cycle in the wait- for graph of transactions for locks
What complicates detecting deadlock in a distributed system?
- Locks are held on different servers
- loop in the entire wait-for graph will not be apparent to any one server
One (bad) way of detecting distributed deadlock?
Coordinator stores entire wait-for graph (centra point of failure)
What is a better way to detect distributed deadlock?
Edge chasing (Path pushing)