Architectural styles for Fault Tolerance Flashcards

1
Q

What is meant by system fault tolerance?

A

Meaning: Fault tolerance means that the system can continue in operation in spite of a software fault i.e. the fault does not lead to a failure.

Note: Fault tolerance is required when there are high availability requirements, no ‘fail safe’ state or where system failure costs are very high.

Note: This is important even if the system has been proven to conform to its specification as there may be specification errors or the validation may be incorrect/incomplete.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is meant by Dependable system architecture?

A

Meaning: a method for integrating fault tolerance techniques into a system to make it dependable

Note: Needed when fault tolerance is essential. Generally based on redundancy and diversity.

Examples of situations where dependable architectures are used:

  • Flight control systems, where system failure could threaten the safety of passengers
  • Reactor systems where failure of a control system could lead to a chemical or nuclear emergency
  • Telecommunication systems, where there is a need for 24/7 availability.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is meant by a protection system?

A

Meaning: A specialised system that is associated with some other control system, which can take emergency action if a failure occurs.

Note: Protection systems independently monitor the controlled system and the
environment.

Note: If a problem is detected, it issues commands to take emergency action to
shut down the system and avoid a catastrophe.

Example:
- System to stop a train if it passes
a red light.
- System to shut down a reactor if
temperature/pressure is too high.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe the functionality of protection systems?

A

Protection Systems are:

  • Are redundant because they include monitoring and
    control capabilities that replicate those in the control software.
  • Are diverse and use different technology from the
    control software.
  • Are simpler than the control system so more effort can be expended in validation and dependability assurance.

Note: The aim of protection systems is to ensure that there is a low probability of failure on demand for the protection system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is meant by Self-monitoring architectures?

A

Meaning: Multi-channel architectures where the system monitors its own operations
and takes action if inconsistencies are detected.

Note: The same computation is carried out on each channel and the results are compared. If the results are identical and are produced at the same time, then it is assumed that the system is operating correctly.

Note: If the results are different, then a failure is assumed and a failure exception is raised.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe the functionality of Self-monitoring systems?

A

Self-monitoring systems have the following:

  • Hardware in each channel has to be diverse, so that hardware failure doesn’t lead to each channel producing the same results.
  • Software in each channel must also be diverse, otherwise the same software error would affect each channel.

Note: If high-availability is required, you may use several self-checking systems in parallel.

Key Note: This is the approach used in the Airbus family of aircraft for their fight control systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is meant by N-version programming?

A

Meaning: A method in which multiple versions of a software system carry out computations at the same time.

Note: There should be an odd number of computers involved, typically 3.

Note: The results are compared using a voting system and the majority result is taken to be the correct one.

Note: Approach derived from the notion of triple-modular redundancy, as used in hardware systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How is the N-version programming process done in practice?

A

The different system versions are designed and implemented by different teams. It is assumed that there is a low probability that they will make the same mistakes. The algorithms used should but may not be different.

Note: There is some evidence that teams commonly misinterpret specifications in the same way and chose the same algorithms in their systems.

The key is to ensure that specs are in fact what you mean them to be!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Note on Software diversity

A

Note: Approaches to software fault tolerance embedded in the system architecture depend on software diversity where it is assumed that different implementations of the same software specification will fail in different ways.

Note: It is assumed that implementations are (a) independent and (b) do not include
common errors.

Note: Strategies to achieve diversity
* Different programming
languages
* Different design methods and
tools
* Explicit specification of different
algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the problems with design diversity?

A
  1. Teams are not diverse (thinking wise) so they tend to tackle problems in the same way.
  2. Characteristic Errors:
    - Different teams make the same
    mistakes. Some parts of an
    implementation are more difficult
    than others so all teams tend to
    make mistakes in the same place.
  3. Specification errors:
    - If there is an error in the
    specification then this is reflected
    in all implementations;
    - This can be addressed to some
    extent by using multiple
    specification representations.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Note on Specification dependency

A
  • Both approaches to software
    redundancy are susceptible to
    specification errors. If the
    specification is incorrect, the
    system could fail.
  • This is also a problem with
    hardware but software
    specifications are usually more
    complex than hardware
    specifications and harder to
    validate.
  • This has been addressed in some
    cases by developing separate
    software specifications from the
    same user specification.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Summary of Topic (Architectural Styles for Fault Tolerance)

A
  • Dependable system
    architectures are system
    architectures that are designed
    for fault tolerance.
  • Architectural styles that
    support fault tolerance include
    protection systems,
    self-monitoring architectures
    and N-version programming.
  • Software diversity is difficult to
    achieve because it is practically
    impossible to ensure that each
    version of the software is truly
    independent.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly