Fault Tolerance Flashcards

1
Q

What is dependability?

A

can we expect the delivered service to match the specified service?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a fault?

A

Module deviates from the specified behavior. Ex. a function that works except 2+5=8

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an error?

A

Actual behavior in the system differs from specified behavior (an instance of a fault). Ex. We call add(5,3) and store 7 in a register

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a failure?

A

The system deviates from specified behavior (when an error causes our program to do something wrong). Ex. Schedule a meeting for 7am instead of 8.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is MTTF?

A

Mean time to failure. average time of continuous service accomplishment before a failure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is MTTR?

A

average time to repair

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a hardware fault?

A

hardware fails to perform as designed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a design fault?

A

software bugs, hardware design mistakes, FDIV bug - usually permanent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is an operational fault?

A

operator, user mistakes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is an environmental fault?

A

fire, power failure, sabotage etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a permanent fault?

A

One that is permanent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is an intermittent fault?

A

lasts for a while, but recurring. Ex. Overclocking - works fine for a while, then crashes, works fine for a while once rebooted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a transient fault?

A

lasts for a while, but then goes away

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is fault avoidance?

A

prevent faults from occurring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is fault tolerance?

A

prevent faults from becoming failures. Ex. Redundancy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are some fault tolerance techniques?

A

Checkpoint (recovery), Dual and Triple Module Recovery (DMR, TMR)

17
Q

What is N-Way Redundancy?

A

N modules do the same work, then vote on the answer. Can detect and recover from N-2 faults.

18
Q

What are some fault tolerance techniques for memory and storage?

A

Error detection/correction codes, RAID

19
Q

What is bit parity?

A

one extra bit added to the end of data (XOR of all bits). Can detect single-bit errors

20
Q

What is ECC?

A

Error-correction code. Method to detect and fix errors

21
Q

What is RAID?

A

Redundant Array of Independent Disks. Several disks playing the role of one disk

22
Q

What is RAID0?

A

“Striping”. Take one disk and put all the odd tracks on one disk and all the even tracks on another

23
Q

What is RAID1?

A

“Mirroring”. Same data on both disks. Can correct any fault that affects 1 disk.

24
Q

What is RAID4?

A

N disks, (N-1) disks are stripped. Nth disk is parity disk.

25
Q

What’s the drawback to RAID4?

A

parity disk becomes the bottleneck

26
Q

What is RAID5?

A

Distributed block-interleaved parity. Rotate the parity block across all disks.

27
Q

What is RAID6?

A

Have two parity blocks for each set of data blocks. Each parity block is computed differently. Can recover from 2 failures.

28
Q

What is RAID6?

A

Have two parity blocks for each set of data blocks. Each parity block is computed differently. Can recover from 2 failures.