Fault Tolerance - Week 3 Flashcards

1
Q

Fault tolerance

A

operate in an acceptable way when a (partial) failure occurs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Types of Failure

A

Omission Failures
Timing Failures
Response Failures
Arbitrary (byzantine) failures
Crashes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Omission Failures

A

Server fails to respond to incoming messages

Server fails to receive incoming messages
Server fails to send messages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Time failures

A

Server fails to respond within a certain time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Response failures

A

A server’s response is incorrect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Arbitrary (byzantine) failures

A

A component produces output it should never have produced (may not be detected as incorrect): arbitrary response at arbitrary times

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Crashes

A

Server halts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Fault-tolerance / Failure masking - through redundancy

A

Physical redundancy
Time redundancy
Information redundancy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Physical redundancy

A

Having a backup server (no definition given in the slides)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Time redundancy

A

An action is performed, if need be, again and again.
Especially helpful when faults are transient and intermittent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Information redundancy

A

e.g. Send extra bits when transmitting information to allow recovery

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Two generals problem - unreliable network

A

If the two generals don’t attack at the same time they die, they are on separate mountains.

With an unreliable channel:
G1 -> G2: Let’s attack at 9am
G2 -> G1: I received your message to attack

G2 doesn’t know if G1 received the message

In general there is no way to guarantee both generals got the message

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Two generals problem - reliable network

A

If the two generals don’t attack at the same time they die, they are on separate mountains.

Assume a reliable communication channel

If one general is a traitor, with four generals you can spot the traitor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Redundancy Pros

A

Helps increase reliability
- increase probability that the system operates correctly at any given moment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Redundancy Cons

A

Creates several problems
- consistency of replicas (e.g. data on all replicas need to be updated)
- should improve (somehow) system performance.

Has a cost (monetary or other)

Even in the presence of redundancy, we need to make sure that any failure won’t leave our system in an inconsistent (corrupted) state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Triple modular Redundancy

A

A task is replicated three times, then fed to a series of three voters, can tell if one of the processes failed since it won’t match the other 2, the correct information is then passed to the next process by each voter (see image in Week3 OneNote)

Through replication three times, even if one component fails, the output will still be correct.

Used in airplanes. Chance of something going wrong is very low but still not zero.

17
Q

Replication for performance

A

Placing a copy of data close to the process using it, time to access the data decreases.

Useful for scalability, e.g:
- Server needs to handle more requests, can replicate the server and subsequently dividing the work.
- Caching: web browsers store a copy of a website to avoid the latency of fetching it from the originating server again.

18
Q

Capacity Planning

A

The process of determining the necessary capacity to meet a certain level of demand - extends beyond distributed computing.

E.g. Monte Carlo simulation re. number of servers

19
Q

Replication Cons

A

Cost of maintaining replicas

Consistency problems
To ensure consistency all modifications have to occur on all copies, when and where determines the price of replication

Replica management
- Where to place replica servers to minimize overall data transfer?
- In general is a classic optimisation problem, but in practice often a mangement/commercial issue