Fault Tolerance Flashcards
What is dependability?
can we expect the delivered service to match the specified service?
What is a fault?
Module deviates from the specified behavior. Ex. a function that works except 2+5=8
What is an error?
Actual behavior in the system differs from specified behavior (an instance of a fault). Ex. We call add(5,3) and store 7 in a register
What is a failure?
The system deviates from specified behavior (when an error causes our program to do something wrong). Ex. Schedule a meeting for 7am instead of 8.
What is MTTF?
Mean time to failure. average time of continuous service accomplishment before a failure
What is MTTR?
average time to repair
What is a hardware fault?
hardware fails to perform as designed
What is a design fault?
software bugs, hardware design mistakes, FDIV bug - usually permanent
What is an operational fault?
operator, user mistakes
What is an environmental fault?
fire, power failure, sabotage etc.
What is a permanent fault?
One that is permanent
What is an intermittent fault?
lasts for a while, but recurring. Ex. Overclocking - works fine for a while, then crashes, works fine for a while once rebooted
What is a transient fault?
lasts for a while, but then goes away
What is fault avoidance?
prevent faults from occurring
What is fault tolerance?
prevent faults from becoming failures. Ex. Redundancy