CH11 [in final] Flashcards

Reliability Engineering

1
Q

____________ are a usually a result of system errors that are derived from faults in the system.

A

Failures.

Faults –> errors –> system failure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

T/F: Faults do not necessarily result in system errors.

A

True, The erroneous system state resulting from the fault may be transient and ‘corrected’ before an error arises. Or, the faulty code may never be executed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

T/F: Errors do not necessarily lead to system failures.

A

True, the error can be corrected by built-in error detection and recovery
Or, the failure can be protected against by built-in protection facilities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Fault Management is achieved by Fault __________, Fault _________, and Fault __________.

A

Fault Avoidance, Fault Detection, Fault tolerance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Fault _________ is when verification and validation techniques are used to discover and remove faults in a system before it is deployed.

A

Fault Detection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Fault ________ is when the system is designed so that faults in the delivered software do not result in system failure.

A

Fault Tolerance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Fault _________ is when the system is developed in such a way that human error is avoided and thus system faults are minimised.

A

Fault Avoidance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

T/F: The development process is organised so that faults in the system are detected and repaired before delivery to the customer. This describes fault detection.

A

False, it is Fault avoidance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

T/F: Reliability can be achieve by fault avoidance, fault detection and removal, and fault tolerance.

A

True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

________ is the probability of failure-free system operation over a specified time in a given environment for a given purpose.

A

Reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the three approaches to improve reliability?

A

Fault avoidance, fault detection, fault tolerance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

___________ is the probability that a system, at a point in time, will be operational and able to deliver the requested services

A

Availability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

T/F: Availability and Reliability can be expressed quantitatively.

A

T, e.g. availability of 0.999 means that the system is up and running for 99.9% of the time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

T/F: Reliability can only be defined formally with respect to a system specification i.e. a failure is a deviation from a specification.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

T/F: Perceived reliability is more important in theory.

A

F, in practice

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

_____________ define system and software functions that avoid, detect or tolerate faults in the software and so ensure that these faults do not lead to system failure

A

Functional reliability requirements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

T/F: Software reliability requirements may also be included to cope with hardware failure or operator error.

A

True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

specified (Non-functional/Functional) reliability requirements define the number of failures that are acceptable during normal use of the system or the time in which the system must be available.

A

Non-Functional

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

________ are units of measurement of system reliability.

A

Reliability metrics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

System reliability is measured by counting the number of ________ failures and, where appropriate, relating these to the ________ made on the system and the ______ that the system has been operational.

A

operational, demands, time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

A _________________ is required to assess the reliability of critical systems.

A

long-term measurement programme

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the three reliability metrics?

A
  • Probability of failure on demand
  • Rate of occurrence of failures/Mean time to failure
  • Availability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

_________________ is the probability that the system will fail when a service request is made. Useful when demands for service are intermittent and relatively infrequent

A

Probability of failure on demand (POFOD)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

T/F: Probability of failure on demand (POFOD) is useful when demands for service are intermittent and relatively infrequent

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
T/F: Probability of failure on demand (POFOD) are appropriate for protection systems where services are demanded regularly and where there are serious consequence if the service is not delivered.
F, occasionally not regularly
26
T/F: Probability of failure on demand (POFOD) is relevant for many safety-critical systems with exception management components.
T, e.g. Emergency shutdown system in a chemical plant.
27
_________________ reflects the rate of occurrence of failure in the system.
Rate of fault occurrence (ROCOF)
28
ROCOF of 0.002 means ____ failures are likely in each _____ operational time units
2 failures in each 1000 operational time units e.g. 2 failures per 1000 hours of operation.
29
T/F: Mean time to Failure (MTTF) is relevant for systems where the system has to process a large number of similar requests in a short time. e.g. Credit card processing system, airline booking system.
F, Rate of fault occurrence (ROCOF)
30
Reciprocal of ROCOF is ___________.
Mean time to Failure (MTTF)
31
T/F: Rate of fault occurrence (ROCOF) is relevant for systems with long transactions i.e. where system processing takes a long time (e.g. CAD systems).
F, Mean time to Failure (MTTF)
32
MTTF should be (shorter/longer) than expected transaction length.
longer
33
________ is the measure of the fraction of the time that the system is available for use.
Availability
34
Availability takes ______ and_______ time into account
repair, restart.
35
Availability of 0.998 means software is available for?
for 998 out of 1000 time units.
36
T/F: Availability is relevant for non-stop, continuously running systems. e.g. telephone switching systems, railway signalling systems.
True.
37
_____________ are specifications of the required reliability and availability of a system using one of the reliability metrics (POFOD, ROCOF or AVAIL).
Non-functional reliability requirements
38
T/F: Quantitative reliability and availability specification has been used for many years in safety-critical systems but is uncommon for business critical systems.
True
39
T/F: Specify the availability and reliability requirements for different types of failure and system service.
True.
40
The four functional reliability requirements are: 1- __________ requirements that identify checks to ensure that incorrect data is detected before it leads to a failure. 2- _________ requirements that are geared to help the system recover after a failure has occurred. 3- _________ requirements that specify redundant features of the system to be included. 4- ________ requirements for reliability which specify the development process to be used may also be included.
Checking, Recovery, Redundancy, Process.
41
Fault ________ is required where there are high availability requirements or where system failure costs are very high.
tolerance
42
_________ means that the system can continue in operation in spite of software failure.
Fault tolerance
43
T/F: If the system has been proved to conform to its specification, there is no need for it to be fault tolerant.
F, it must also be fault tolerant as there may be specification errors or the validation may be incorrect.
44
_____________ are used in situations where fault tolerance is essential.
Fault-tolerant systems architectures
45
Fault-tolerant systems architectures are generally all based on _________ and _________.
redundancy, diversity.
46
T/F: Flight control systems, Reactor systems, Telecommunication systems are examples of situations where fault-tolerant system architectures are used.
True.
47
________ is a specialized system that is associated with some other control system, which can take emergency action if a failure occurs. e.g. System to stop a train if it passes a red light or System to shut down a reactor if temperature/pressure are too high.
Protection systems
48
Protection systems (independently/ dependently) monitor the controlled system and the environment.
independently
49
If a problem is detected, it issues commands to take emergency action to shut down the system and avoid a catastrophe. This describes a ________________
Protection system
50
______________ are multi-channel architectures where the system monitors its own operations and takes action if inconsistencies are detected.
Self-monitoring architectures
51
______________ is where multiple versions of a software system carry out computations at the same time. There should be an odd number of computers involved, typically 3.
N-version programming
52
T/F: The approach for N-version programming is derived from the notion of triple-modular redundancy, as used in hardware systems.
True
53
In N-version programming, the results are compared using a _______ and the majority result is taken to be the correct result.
voting system
54
Hardware fault tolerance depends on ____________.
triple-modular redundancy (TMR).
55
In ______________, there are three replicated identical components that receive the same input and whose outputs are compared. If one output is different, it is _______ and component failure is assumed.
Hardware fault tolerance, ignored.
56
____________ is based on most faults resulting from component failures rather than design faults and a low probability of simultaneous component failure.
Hardware fault tolerance
57
Approaches to software fault tolerance depend on ___________ where it is assumed that different implementations of the same software specification will fail in different ways.
software diversity
58
Software diversity assumes that implementations are _________ and do not include _______ errors
independent, common.
59
Different programming languages, different design methods and tools, and explicit specification of different algorithms are some strategies to achieve ________.
diversity
60
T/F: software specifications are usually more complex than hardware specifications and harder to validate.
True.
61
T/F: Dependable programming practices supports fault avoidance, fault tolerance, and fault detection.
True.
62
Name 4 dependable programming guidelines:
- Limit the visibility of information in a program - Check all inputs for validity - Provide a handler for all exceptions - Minimize the use of error-prone constructs - Provide restart capabilities - Check array bounds - include timeouts when calling external components - Name all constants that represent real-world values
63
T/F: You can control information visibility by using abstract data types where the data representation is private and you only allow access to the data through predefined operations.
True
64
Name the 4 types of validity checks. 1- ___________ Check that the input falls within a known range. 2- ___________ Check that the input does not include characters that should not be part of its representation e.g. names do not include numerals. 3- ___________ Use information about the input to check if it is reasonable rather than an extreme value. 4- ____________ Check that the input does not exceed some maximum size e.g. 40 characters for a name.
- Range checks - Representation checks - Reasonsable checks - Size checks
65
________ is an error or some unexpected event such as a power failure.
A program exception
66
T/F: Program faults are usually a consequence of human error.
T, because programmers lose track of the relationships between the different parts of the system.
67
For systems that involve __________ transactions or user interactions, you should always provide a restart capability that allows the system to restart after failure without users having to redo everything that they have done.
long
68
T/F: Restarts do not depend on the type of system.
F, they do.
69
Keep copies of forms so that users don’t have to fill them in again if there is a problem is an example of a ____________.
Restart capability.
70
In a distributed system, failure of a remote computer can be ‘silent’ so that programs expecting a service from that computer may never receive that service or any indication that there has been a failure. To avoid this, you should always include ________ on all calls to external components.
timeouts. (After a defined time period has elapsed without a response, your system should then assume failure and take whatever actions are required to recover from this.)
71
T/F: To assess the reliability of a system, you have to collect data about its operation.
True.
72
___________ involves running the program to assess whether or not it has reached the required level of reliability.
Reliability testing (Statistical testing)
73
__________ where testing software for reliability rather than fault detection.
Statistical testing
74
___________ includes measuring the number of errors allows the reliability of the software to be predicted.
Statistical testing
75
T/F: An acceptable level of reliability should be specified and the software tested and amended until that level of reliability is reached.
True.
76
In _________, more errors than are allowed for in the reliability specification must be induced.
Statistical testing
77
T/F: Normally, statistical testing is included as part of a normal defect testing process.
F, it cannot normally be included as part of a normal defect testing process because data for defect testing is (usually) atypical of actual usage data.
78
T/F: Reliability specification requires a specially designed data set that replicates the pattern of inputs to be processed by the system.
F, Reliability measurement
79
The repair or restart time after a system failure that leads to loss of service. This is used in the measurement of _________.
availability
80
_________ does not just depend on the time between failures but also on the time required to get the system back into operation.
Availability