Long Exam 2 - Distributed Systems Flashcards
What is a distributed system?
A collection of autonomous computing elements that appear as a single coherent system with autonomous computing elements (nodes).
What is it meant by a collection of autonomous nodes?
Each node is autonomous and has its own notion of time without a global clock. This however leads to fundamental synchronization and coordination problems.
What is an overlay network?
Each node in the collection communicated only with the other nodes in the system.
What are the two types of overlay networks?
Structured (well-defined set of neighbors through trees and rings) and unstructured (randomly select other nodes)
What are the four goals of a distributed system?
- sharing of resources
- distribution transparency
- openness
- scalability
What are the three types of scalability?
- size scalability
- geographical scalability
- administrative scalability
How to design fault-tolerant systems?
- Identify all possible faults
- Detect and contain the fault
- Handle the fault
What is the acronym RAID for?
Redundant Array of Inexpensive Disks
What is RAID 1?
- mirroring
- can recover form single-disk failure
- requires 2N disks
What is RAID 4?
- dedicated parity disk
- can recover from single-disk failure
- requires N+1 disk
- performance benefits if you stripe a single file across multiple data disks
- all writes hit the parity disk
What is RAID 5?
- spread out parity
- can recover from single-disk failure
- requires N+1 disk
- performance benefits if you stripe a single file across multiple data disks
- writes are spread across disks
What is isolation?
Occurs either completely before or completely after every other concurrent threads
What is the golden rule to achieve atomicity?
Never modify the only copy.
How to make renaming shadow copies atomic?
By using single-sector writes.
What is a shadow copy?
Shadow copies work because they perform updates/changes on a copy and automatically install a new copy using an atomic operation
What are the shortcomings of shadow copies?
- Hard to generalize to multiple files/directories
- Require copying the entire file for even small changes
- Haven’t even dealt with concurrency
What are transactions?
Transactions provide both atomicity and isolation. Each transaction will appear to have run to completion or not at all. When multiple transactions are run concurrently, it will appear as if they were run sequentially.
What are the three types of records used in a log?
UPDATE records include old and new values of a variable. COMMIT records specify that transaction committed. ABORT records specify that transaction aborted.
What is the drawback of using cell storage for logging?
The writes are okay but we write to disk twice instead of once. Recover is also slow as we have to scan the entire log.
What is the drawback for using cache for logging?
Recovery takes longer as the log grows. Truncating the log may help by flushing all cached updates to cell storage and writing a checkpoint record.
When does two operations conflict?
Two operations conflict if they operate on the same object and at least one of them is a write.
What is conflict serializability?
A schedule is conflict serializable if the order of all of its conflict is the same as the order of the conflict in some sequential schedule.
What is two-phase locking?
- Each shared variable has a lock
- Before any operation on a variable, the transaction must acquire the corresponding lock
- After a transaction releases a lock, it may not acquire any other locks
What are two phases in two-phase locking?
- Acquire phase, where transactions acquire locks. New locks on items can be acquired but none can be released;
- Release phase, where transactions release locks. existing locks can be released but no new locks can be acquired.
How to address the possibility of deadlocking in two-phase locking?
Take advantage of atomicity and abort one of the transactions by using victim selection, typically avoiding the transaction that have been running for a long time.
What are reader and writer locks?
Multiple transaction can hold reader locks for the same variable at once but only one transaction can hold a write lock for a variable.
What are the two phases for two-phase commit?
Prepare – all tasks should be completed before sending prepare
Commit – all prepares should be ACKed before sending commit
What to do if workers fail after commit point?
Recovery from crash.
What is consistency?
All clients see the same data at the same time, no matter which node they connect to.
What is strong consistency?
Whenever data is written to one node, it must be instantly forwarded or replicated to all the other nodes in the system before the write is deemed ‘successful’
What is the CAP theorem by Eric Brewer?
Any distributed data store can provide only two of the following three guarantees: Consistency, Availability, Partition Tolerance
What is ACID?
Atomicity, consistency, isolation, and durability.
What is a view server?
A view server determines which replica is the primary. All requests go through from the coordinators to the view server.
What happens if view server fails?
Election for a new view server
What are the six consistency guarantees?