Fault Tolerance Flashcards

1
Q

What is dependability?

A

can we expect the delivered service to match the specified service?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a fault?

A

Module deviates from the specified behavior. Ex. a function that works except 2+5=8

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an error?

A

Actual behavior in the system differs from specified behavior (an instance of a fault). Ex. We call add(5,3) and store 7 in a register

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a failure?

A

The system deviates from specified behavior (when an error causes our program to do something wrong). Ex. Schedule a meeting for 7am instead of 8.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is MTTF?

A

Mean time to failure. average time of continuous service accomplishment before a failure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is MTTR?

A

average time to repair

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a hardware fault?

A

hardware fails to perform as designed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a design fault?

A

software bugs, hardware design mistakes, FDIV bug - usually permanent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is an operational fault?

A

operator, user mistakes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is an environmental fault?

A

fire, power failure, sabotage etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a permanent fault?

A

One that is permanent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is an intermittent fault?

A

lasts for a while, but recurring. Ex. Overclocking - works fine for a while, then crashes, works fine for a while once rebooted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a transient fault?

A

lasts for a while, but then goes away

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is fault avoidance?

A

prevent faults from occurring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is fault tolerance?

A

prevent faults from becoming failures. Ex. Redundancy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are some fault tolerance techniques?

A

Checkpoint (recovery), Dual and Triple Module Recovery (DMR, TMR)

17
Q

What is N-Way Redundancy?

A

N modules do the same work, then vote on the answer. Can detect and recover from N-2 faults.

18
Q

What are some fault tolerance techniques for memory and storage?

A

Error detection/correction codes, RAID

19
Q

What is bit parity?

A

one extra bit added to the end of data (XOR of all bits). Can detect single-bit errors

20
Q

What is ECC?

A

Error-correction code. Method to detect and fix errors

21
Q

What is RAID?

A

Redundant Array of Independent Disks. Several disks playing the role of one disk

22
Q

What is RAID0?

A

“Striping”. Take one disk and put all the odd tracks on one disk and all the even tracks on another

23
Q

What is RAID1?

A

“Mirroring”. Same data on both disks. Can correct any fault that affects 1 disk.

24
Q

What is RAID4?

A

N disks, (N-1) disks are stripped. Nth disk is parity disk.

25
What's the drawback to RAID4?
parity disk becomes the bottleneck
26
What is RAID5?
Distributed block-interleaved parity. Rotate the parity block across all disks.
27
What is RAID6?
Have two parity blocks for each set of data blocks. Each parity block is computed differently. Can recover from 2 failures.
28
What is RAID6?
Have two parity blocks for each set of data blocks. Each parity block is computed differently. Can recover from 2 failures.