Fault Tolerance Flashcards
What is dependability?
can we expect the delivered service to match the specified service?
What is a fault?
Module deviates from the specified behavior. Ex. a function that works except 2+5=8
What is an error?
Actual behavior in the system differs from specified behavior (an instance of a fault). Ex. We call add(5,3) and store 7 in a register
What is a failure?
The system deviates from specified behavior (when an error causes our program to do something wrong). Ex. Schedule a meeting for 7am instead of 8.
What is MTTF?
Mean time to failure. average time of continuous service accomplishment before a failure
What is MTTR?
average time to repair
What is a hardware fault?
hardware fails to perform as designed
What is a design fault?
software bugs, hardware design mistakes, FDIV bug - usually permanent
What is an operational fault?
operator, user mistakes
What is an environmental fault?
fire, power failure, sabotage etc.
What is a permanent fault?
One that is permanent
What is an intermittent fault?
lasts for a while, but recurring. Ex. Overclocking - works fine for a while, then crashes, works fine for a while once rebooted
What is a transient fault?
lasts for a while, but then goes away
What is fault avoidance?
prevent faults from occurring
What is fault tolerance?
prevent faults from becoming failures. Ex. Redundancy
What are some fault tolerance techniques?
Checkpoint (recovery), Dual and Triple Module Recovery (DMR, TMR)
What is N-Way Redundancy?
N modules do the same work, then vote on the answer. Can detect and recover from N-2 faults.
What are some fault tolerance techniques for memory and storage?
Error detection/correction codes, RAID
What is bit parity?
one extra bit added to the end of data (XOR of all bits). Can detect single-bit errors
What is ECC?
Error-correction code. Method to detect and fix errors
What is RAID?
Redundant Array of Independent Disks. Several disks playing the role of one disk
What is RAID0?
“Striping”. Take one disk and put all the odd tracks on one disk and all the even tracks on another
What is RAID1?
“Mirroring”. Same data on both disks. Can correct any fault that affects 1 disk.
What is RAID4?
N disks, (N-1) disks are stripped. Nth disk is parity disk.