CH11 [in final] Flashcards by Tala Aldhahri

____________ are a usually a result of system errors that are derived from faults in the system.

Failures.

Faults –> errors –> system failure

How well did you know this?

Not at all

Perfectly

T/F: Faults do not necessarily result in system errors.

True, The erroneous system state resulting from the fault may be transient and ‘corrected’ before an error arises. Or, the faulty code may never be executed.

How well did you know this?

Not at all

Perfectly

T/F: Errors do not necessarily lead to system failures.

True, the error can be corrected by built-in error detection and recovery
Or, the failure can be protected against by built-in protection facilities

How well did you know this?

Not at all

Perfectly

Fault Management is achieved by Fault __________, Fault _________, and Fault __________.

Fault Avoidance, Fault Detection, Fault tolerance

How well did you know this?

Not at all

Perfectly

Fault _________ is when verification and validation techniques are used to discover and remove faults in a system before it is deployed.

Fault Detection

How well did you know this?

Not at all

Perfectly

Fault ________ is when the system is designed so that faults in the delivered software do not result in system failure.

Fault Tolerance

How well did you know this?

Not at all

Perfectly

Fault _________ is when the system is developed in such a way that human error is avoided and thus system faults are minimised.

Fault Avoidance

How well did you know this?

Not at all

Perfectly

T/F: The development process is organised so that faults in the system are detected and repaired before delivery to the customer. This describes fault detection.

False, it is Fault avoidance.

How well did you know this?

Not at all

Perfectly

T/F: Reliability can be achieve by fault avoidance, fault detection and removal, and fault tolerance.

True.

How well did you know this?

Not at all

Perfectly

________ is the probability of failure-free system operation over a specified time in a given environment for a given purpose.

Reliability

How well did you know this?

Not at all

Perfectly

What are the three approaches to improve reliability?

Fault avoidance, fault detection, fault tolerance

How well did you know this?

Not at all

Perfectly

___________ is the probability that a system, at a point in time, will be operational and able to deliver the requested services

Availability

How well did you know this?

Not at all

Perfectly

T/F: Availability and Reliability can be expressed quantitatively.

T, e.g. availability of 0.999 means that the system is up and running for 99.9% of the time.

How well did you know this?

Not at all

Perfectly

T/F: Reliability can only be defined formally with respect to a system specification i.e. a failure is a deviation from a specification.

True

How well did you know this?

Not at all

Perfectly

T/F: Perceived reliability is more important in theory.

F, in practice

How well did you know this?

Not at all

Perfectly

_____________ define system and software functions that avoid, detect or tolerate faults in the software and so ensure that these faults do not lead to system failure

Functional reliability requirements

How well did you know this?

Not at all

Perfectly

T/F: Software reliability requirements may also be included to cope with hardware failure or operator error.

True.

How well did you know this?

Not at all

Perfectly

specified (Non-functional/Functional) reliability requirements define the number of failures that are acceptable during normal use of the system or the time in which the system must be available.

Non-Functional

How well did you know this?

Not at all

Perfectly

________ are units of measurement of system reliability.

Reliability metrics

How well did you know this?

Not at all

Perfectly

System reliability is measured by counting the number of ________ failures and, where appropriate, relating these to the ________ made on the system and the ______ that the system has been operational.

operational, demands, time.

How well did you know this?

Not at all

Perfectly

A _________________ is required to assess the reliability of critical systems.

long-term measurement programme

How well did you know this?

Not at all

Perfectly

What are the three reliability metrics?

Probability of failure on demand
Rate of occurrence of failures/Mean time to failure
Availability

How well did you know this?

Not at all

Perfectly

_________________ is the probability that the system will fail when a service request is made. Useful when demands for service are intermittent and relatively infrequent

Probability of failure on demand (POFOD)

How well did you know this?

Not at all

Perfectly

T/F: Probability of failure on demand (POFOD) is useful when demands for service are intermittent and relatively infrequent

True

How well did you know this?

Not at all

Perfectly

T/F: Probability of failure on demand (POFOD) are appropriate for protection systems where services are demanded regularly and where there are serious consequence if the service is not delivered.

F, occasionally not regularly

T/F: Probability of failure on demand (POFOD) is relevant for many safety-critical systems with exception management components.

T, e.g. Emergency shutdown system in a chemical plant.

_________________ reflects the rate of occurrence of failure in the system.

Rate of fault occurrence (ROCOF)

ROCOF of 0.002 means ____ failures are likely in each _____ operational time units

2 failures in each 1000 operational time units e.g. 2 failures per 1000 hours of operation.

T/F: Mean time to Failure (MTTF) is relevant for systems where the system has to process a large number of similar requests in a short time. e.g. Credit card processing system, airline booking system.

F, Rate of fault occurrence (ROCOF)

Reciprocal of ROCOF is ___________.

Mean time to Failure (MTTF)

T/F: Rate of fault occurrence (ROCOF) is relevant for systems with long transactions i.e. where system processing takes a long time (e.g. CAD systems).

F, Mean time to Failure (MTTF)

MTTF should be (shorter/longer) than expected transaction length.

longer

________ is the measure of the fraction of the time that the system is available for use.

Availability

Availability takes ______ and_______ time into account

repair, restart.

Availability of 0.998 means software is available for?

for 998 out of 1000 time units.

T/F: Availability is relevant for non-stop, continuously running systems. e.g. telephone switching systems, railway signalling systems.

True.

_____________ are specifications of the required reliability and availability of a system using one of the reliability metrics (POFOD, ROCOF or AVAIL).

Non-functional reliability requirements

T/F: Quantitative reliability and availability specification has been used for many years in safety-critical systems but is uncommon for business critical systems.

True

T/F: Specify the availability and reliability requirements for different types of failure and system service.

True.

The four functional reliability requirements are: 1- __________ requirements that identify checks to ensure that incorrect data is detected before it leads to a failure. 2- _________ requirements that are geared to help the system recover after a failure has occurred. 3- _________ requirements that specify redundant features of the system to be included. 4- ________ requirements for reliability which specify the development process to be used may also be included.

Checking, Recovery, Redundancy, Process.

Fault ________ is required where there are high availability requirements or where system failure costs are very high.

tolerance

_________ means that the system can continue in operation in spite of software failure.

Fault tolerance

T/F: If the system has been proved to conform to its specification, there is no need for it to be fault tolerant.

F, it must also be fault tolerant as there may be specification errors or the validation may be incorrect.

_____________ are used in situations where fault tolerance is essential.

Fault-tolerant systems architectures

Fault-tolerant systems architectures are generally all based on _________ and _________.

redundancy, diversity.

T/F: Flight control systems, Reactor systems, Telecommunication systems are examples of situations where fault-tolerant system architectures are used.

True.

________ is a specialized system that is associated with some other control system, which can take emergency action if a failure occurs. e.g. System to stop a train if it passes a red light or System to shut down a reactor if temperature/pressure are too high.

Protection systems

Protection systems (independently/ dependently) monitor the controlled system and the environment.

independently

If a problem is detected, it issues commands to take emergency action to shut down the system and avoid a catastrophe. This describes a ________________

Protection system

______________ are multi-channel architectures where the system monitors its own operations and takes action if inconsistencies are detected.

Self-monitoring architectures

______________ is where multiple versions of a software system carry out computations at the same time. There should be an odd number of computers involved, typically 3.

N-version programming

T/F: The approach for N-version programming is derived from the notion of triple-modular redundancy, as used in hardware systems.

True

In N-version programming, the results are compared using a _______ and the majority result is taken to be the correct result.

voting system

Hardware fault tolerance depends on ____________.

triple-modular redundancy (TMR).

In ______________, there are three replicated identical components that receive the same input and whose outputs are compared. If one output is different, it is _______ and component failure is assumed.

Hardware fault tolerance, ignored.

____________ is based on most faults resulting from component failures rather than design faults and a low probability of simultaneous component failure.

Hardware fault tolerance

Approaches to software fault tolerance depend on ___________ where it is assumed that different implementations of the same software specification will fail in different ways.

software diversity

Software diversity assumes that implementations are _________ and do not include _______ errors

independent, common.

Different programming languages, different design methods and tools, and explicit specification of different algorithms are some strategies to achieve ________.

diversity

T/F: software specifications are usually more complex than hardware specifications and harder to validate.

True.

T/F: Dependable programming practices supports fault avoidance, fault tolerance, and fault detection.

True.

Name 4 dependable programming guidelines:

- Limit the visibility of information in a program - Check all inputs for validity - Provide a handler for all exceptions - Minimize the use of error-prone constructs - Provide restart capabilities - Check array bounds - include timeouts when calling external components - Name all constants that represent real-world values

T/F: You can control information visibility by using abstract data types where the data representation is private and you only allow access to the data through predefined operations.

True

Name the 4 types of validity checks. 1- ___________ Check that the input falls within a known range. 2- ___________ Check that the input does not include characters that should not be part of its representation e.g. names do not include numerals. 3- ___________ Use information about the input to check if it is reasonable rather than an extreme value. 4- ____________ Check that the input does not exceed some maximum size e.g. 40 characters for a name.

- Range checks - Representation checks - Reasonsable checks - Size checks

________ is an error or some unexpected event such as a power failure.

A program exception

T/F: Program faults are usually a consequence of human error.

T, because programmers lose track of the relationships between the different parts of the system.

For systems that involve __________ transactions or user interactions, you should always provide a restart capability that allows the system to restart after failure without users having to redo everything that they have done.

long

T/F: Restarts do not depend on the type of system.

F, they do.

Keep copies of forms so that users don’t have to fill them in again if there is a problem is an example of a ____________.

Restart capability.

In a distributed system, failure of a remote computer can be ‘silent’ so that programs expecting a service from that computer may never receive that service or any indication that there has been a failure. To avoid this, you should always include ________ on all calls to external components.

timeouts. (After a defined time period has elapsed without a response, your system should then assume failure and take whatever actions are required to recover from this.)

T/F: To assess the reliability of a system, you have to collect data about its operation.

True.

___________ involves running the program to assess whether or not it has reached the required level of reliability.

Reliability testing (Statistical testing)

__________ where testing software for reliability rather than fault detection.

Statistical testing

___________ includes measuring the number of errors allows the reliability of the software to be predicted.

Statistical testing

T/F: An acceptable level of reliability should be specified and the software tested and amended until that level of reliability is reached.

True.

In _________, more errors than are allowed for in the reliability specification must be induced.

Statistical testing

T/F: Normally, statistical testing is included as part of a normal defect testing process.

F, it cannot normally be included as part of a normal defect testing process because data for defect testing is (usually) atypical of actual usage data.

T/F: Reliability specification requires a specially designed data set that replicates the pattern of inputs to be processed by the system.

F, Reliability measurement

The repair or restart time after a system failure that leads to loss of service. This is used in the measurement of _________.

availability

_________ does not just depend on the time between failures but also on the time required to get the system back into operation.

Availability

CH11 [in final] Flashcards

Reliability Engineering