Long Exam 2 - Distributed Systems Flashcards

1
Q

What is a distributed system?

A

A collection of autonomous computing elements that appear as a single coherent system with autonomous computing elements (nodes).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is it meant by a collection of autonomous nodes?

A

Each node is autonomous and has its own notion of time without a global clock. This however leads to fundamental synchronization and coordination problems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an overlay network?

A

Each node in the collection communicated only with the other nodes in the system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two types of overlay networks?

A

Structured (well-defined set of neighbors through trees and rings) and unstructured (randomly select other nodes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the four goals of a distributed system?

A
  • sharing of resources
  • distribution transparency
  • openness
  • scalability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the three types of scalability?

A
  • size scalability
  • geographical scalability
  • administrative scalability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How to design fault-tolerant systems?

A
  1. Identify all possible faults
  2. Detect and contain the fault
  3. Handle the fault
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the acronym RAID for?

A

Redundant Array of Inexpensive Disks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is RAID 1?

A
  • mirroring
  • can recover form single-disk failure
  • requires 2N disks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is RAID 4?

A
  • dedicated parity disk
  • can recover from single-disk failure
  • requires N+1 disk
  • performance benefits if you stripe a single file across multiple data disks
  • all writes hit the parity disk
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is RAID 5?

A
  • spread out parity
  • can recover from single-disk failure
  • requires N+1 disk
  • performance benefits if you stripe a single file across multiple data disks
  • writes are spread across disks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is isolation?

A

Occurs either completely before or completely after every other concurrent threads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the golden rule to achieve atomicity?

A

Never modify the only copy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to make renaming shadow copies atomic?

A

By using single-sector writes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a shadow copy?

A

Shadow copies work because they perform updates/changes on a copy and automatically install a new copy using an atomic operation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the shortcomings of shadow copies?

A
  • Hard to generalize to multiple files/directories
  • Require copying the entire file for even small changes
  • Haven’t even dealt with concurrency
17
Q

What are transactions?

A

Transactions provide both atomicity and isolation. Each transaction will appear to have run to completion or not at all. When multiple transactions are run concurrently, it will appear as if they were run sequentially.

18
Q

What are the three types of records used in a log?

A

UPDATE records include old and new values of a variable. COMMIT records specify that transaction committed. ABORT records specify that transaction aborted.

19
Q

What is the drawback of using cell storage for logging?

A

The writes are okay but we write to disk twice instead of once. Recover is also slow as we have to scan the entire log.

20
Q

What is the drawback for using cache for logging?

A

Recovery takes longer as the log grows. Truncating the log may help by flushing all cached updates to cell storage and writing a checkpoint record.

21
Q

When does two operations conflict?

A

Two operations conflict if they operate on the same object and at least one of them is a write.

22
Q

What is conflict serializability?

A

A schedule is conflict serializable if the order of all of its conflict is the same as the order of the conflict in some sequential schedule.

23
Q

What is two-phase locking?

A
  1. Each shared variable has a lock
  2. Before any operation on a variable, the transaction must acquire the corresponding lock
  3. After a transaction releases a lock, it may not acquire any other locks
24
Q

What are two phases in two-phase locking?

A
  • Acquire phase, where transactions acquire locks. New locks on items can be acquired but none can be released;
  • Release phase, where transactions release locks. existing locks can be released but no new locks can be acquired.
25
Q

How to address the possibility of deadlocking in two-phase locking?

A

Take advantage of atomicity and abort one of the transactions by using victim selection, typically avoiding the transaction that have been running for a long time.

26
Q

What are reader and writer locks?

A

Multiple transaction can hold reader locks for the same variable at once but only one transaction can hold a write lock for a variable.

27
Q

What are the two phases for two-phase commit?

A

Prepare – all tasks should be completed before sending prepare
Commit – all prepares should be ACKed before sending commit

28
Q

What to do if workers fail after commit point?

A

Recovery from crash.

29
Q

What is consistency?

A

All clients see the same data at the same time, no matter which node they connect to.

30
Q

What is strong consistency?

A

Whenever data is written to one node, it must be instantly forwarded or replicated to all the other nodes in the system before the write is deemed ‘successful’

31
Q

What is the CAP theorem by Eric Brewer?

A

Any distributed data store can provide only two of the following three guarantees: Consistency, Availability, Partition Tolerance

32
Q

What is ACID?

A

Atomicity, consistency, isolation, and durability.

33
Q

What is a view server?

A

A view server determines which replica is the primary. All requests go through from the coordinators to the view server.

34
Q

What happens if view server fails?

A

Election for a new view server

35
Q

What are the six consistency guarantees?

A