Fault Tolerance - Week 3 Flashcards

Question 1

Q

Fault tolerance

Answer

A

operate in an acceptable way when a (partial) failure occurs.

Question 2

Q

Types of Failure

Answer

A

Omission Failures
Timing Failures
Response Failures
Arbitrary (byzantine) failures
Crashes

Question 3

Q

Omission Failures

Answer

A

Server fails to respond to incoming messages

Server fails to receive incoming messages
Server fails to send messages

Question 4

Q

Time failures

Answer

A

Server fails to respond within a certain time

Question 5

Q

Response failures

Answer

A

A server’s response is incorrect

Question 6

Q

Arbitrary (byzantine) failures

Answer

A

A component produces output it should never have produced (may not be detected as incorrect): arbitrary response at arbitrary times

Question 7

Q

Crashes

Answer

A

Server halts

Question 8

Q

Fault-tolerance / Failure masking - through redundancy

Answer

A

Physical redundancy
Time redundancy
Information redundancy

Question 9

Q

Physical redundancy

Answer

A

Having a backup server (no definition given in the slides)

Question 10

Q

Time redundancy

Answer

A

An action is performed, if need be, again and again.
Especially helpful when faults are transient and intermittent

Question 11

Q

Information redundancy

Answer

A

e.g. Send extra bits when transmitting information to allow recovery

Question 12

Q

Two generals problem - unreliable network

Answer

A

If the two generals don’t attack at the same time they die, they are on separate mountains.

With an unreliable channel:
G1 -> G2: Let’s attack at 9am
G2 -> G1: I received your message to attack

G2 doesn’t know if G1 received the message

In general there is no way to guarantee both generals got the message

Question 13

Q

Two generals problem - reliable network

Answer

A

If the two generals don’t attack at the same time they die, they are on separate mountains.

Assume a reliable communication channel

If one general is a traitor, with four generals you can spot the traitor

Question 14

Q

Redundancy Pros

Answer

A

Helps increase reliability
- increase probability that the system operates correctly at any given moment

Question 15

Q

Redundancy Cons

Answer

A

Creates several problems
- consistency of replicas (e.g. data on all replicas need to be updated)
- should improve (somehow) system performance.

Has a cost (monetary or other)

Even in the presence of redundancy, we need to make sure that any failure won’t leave our system in an inconsistent (corrupted) state

Question 16

Q

Triple modular Redundancy

Answer

Study These Flashcards

A

A task is replicated three times, then fed to a series of three voters, can tell if one of the processes failed since it won’t match the other 2, the correct information is then passed to the next process by each voter (see image in Week3 OneNote)

Through replication three times, even if one component fails, the output will still be correct.

Used in airplanes. Chance of something going wrong is very low but still not zero.

Question 17

Q

Replication for performance

Answer

Study These Flashcards

A

Placing a copy of data close to the process using it, time to access the data decreases.

Useful for scalability, e.g:
- Server needs to handle more requests, can replicate the server and subsequently dividing the work.
- Caching: web browsers store a copy of a website to avoid the latency of fetching it from the originating server again.

Question 18

Q

Capacity Planning

Answer

Study These Flashcards

A

The process of determining the necessary capacity to meet a certain level of demand - extends beyond distributed computing.

E.g. Monte Carlo simulation re. number of servers

Question 19

Q

Replication Cons

Answer

Study These Flashcards

A

Cost of maintaining replicas

Consistency problems
To ensure consistency all modifications have to occur on all copies, when and where determines the price of replication

Replica management
- Where to place replica servers to minimize overall data transfer?
- In general is a classic optimisation problem, but in practice often a mangement/commercial issue

Fault Tolerance - Week 3 Flashcards

(19 cards)