Fault Tolerance Flashcards
What are faults?
Unavoidable events
What is fault tolerance?
The aim to design a system in such a way that faults do not result in system failure
What is redundancy?
The use of additional elements within a system which would be required if the system was free of faults.
What is the triple modular redundancy (TMR) system?
A fault-tolerant system architecture that triplicate the processing elements and associated hardware , running them concurrently and uses voting mechanisms to achieve reliability by tolerating and correcting single faults.
What does a TMR system consist of?
3 identical hardware modules and one voting module.
What is the output of a TMR’s system voting module?
Output of the majority of the modules.
What is temporal redundancy?
Used in order to tolerate or detect transient faults
What is the disadvantage of TMR?
- Doesn’t cover design faults in the modules
- If one module fails it is likely that identical modules fail at the same time
- Helps only against random faults not systematic faults
- Doesn’t help against simultaneous failures of multiple modules.
What does many fault-tolerance techniques rely on?
Detection of faults
What are the three methods for obtaining hardware fault tolerance?
- Static redundancy
- Dynamic redundancy
- Hybrid approaches
What is static redundancy?
Static redundancy uses fault masking to hide faults, so that the system continues to work correctly even if a fault occurs.
What is dynamic redundancy?
Dynamic redundancy uses fault detection, to detect if a fault occurs and reconfigures the system in order to nullify the effects of faults.
What is hybrid approaches for hardware fault tolerance?
Using fault masking to prevent errors from propagating through the system and fault detect to reconfigure the system so that faulty units are removed
What may static redundancy use?
Fault masking and voting mechanisms.
What are single-point failures?
Vulnerabilities in a system where the malfunction/failure of a component can lead to a complete or partial breakdown of a system.
How can we avoid single point failures in the voting elements?
Triplicate the voting and pass the 3 output of the voting elements on to the next modules.
What is N-Modular Redundancy (NMR)?
Uses N modules instead of 3 modules and voting among these
What is the disadvantage of N-Modular Redundancy (NMR)?
- Additional cost
- Size
- Weight
- Power consumption
What is the design of dynamic redundancy systems?
Use one unit as well as one or more standby systems.
Which has more units Static or Dynamic redundancy?
Static redundancy has more units as all units have to be at least tripled.
What is standby spare arrangement?
A standby spare arrangement has one module which is operated with some fault detection mechanism.
What is a cold standy?
When the standby module is default off
What is the disadvantage of a standby module in the standby spare agreement?
- In case of a fault disruption is longer
- Fault detection mechanism cannot make use of the data processed by the standby module
What is a hot standy?
When the standby module processes the input but the output is ignored
What is the disadvantage of hot standy?
- More power consumption
- Standby unit is subject to same operating stress as main module, if the main module can’t handle so can’t the hot standby module.
What do self-checking pairs consists of?
- one main module
- one checking module
- a comparator
What does a comparator do in self-checking pairs?
Checks whether the output of the main modules coincides with the output of the checking module
What is passed on from self-checking pairs?
The output of the main modules and the result of the comparison are passed onwards.
Does self-checking pairs provide fault tolerance?
No, it provides error detection. But the output can be used in dynamic fault-tolerant systems
What is hybrid redundancy?
Use of a combination of voting, fault detection and module switching.
What is N-version programming?
N different versions of the software are written for the same spec and input data.
What is the problem of N-version programming?
- Development cost
- Processing power
What are N-version programming primarily used for?
Very critical application such as Airbus or space shuttle.
What is Recovery block technique?
Based on acceptance tests, which are software versions of fault detection.
What do acceptance test check?
The consistency of the output of one software module
What are some examples of acceptance tests?
- Check whether output is within boundaries
- Check for run time errors
- Check for excessive execution time
What is the process of recovery blocks?
- Execute the main module
- Carry out acceptance test
- If this fails switch to alternative module
- Carry out acceptance test
- if this fails switch again
- etc
- If everything fails raise an error
What is the most cost effective way to achieve fault tolerance?
Use Fault-tolerant architectures.
What are Fault-tolerant architectures?
For critical parts of the system we use non-computer-based mechanisms as
additional safe guards.
What is the general problem for static, dynamic redundancy and hybrid approaches?
We can’t achieve redundancy when the components make the final
decision
What is the problem with Static redudancy?
- Have to maintain possible redundant components.
- Lack of adaptability
What is the problem with Dynamic redudancy?
- Reconfigurations can cause delays and other issues
- Reliable fault detection is needed
What is the problem with hybrid approaches?
- Complex to design
- Have to be adaptable.
What are the methods for Hardware Fault Tolerance?
- Static Redundancy
- Dynamic Redundancy
- Hybrid Approaches
What are the methods for Software Fault Tolerance?
- N-version Programming
- Recovery Blocks