Chapter 11 Flashcards
What is human error or mistake?
Human behavior that results in the introduction of faults into a system.
What is system fault?
A characteristic of a software system that can lead to a system error.
What is system error?
An erroneous system state that can lead to system behavior that is unexpected by system users.
What is system failure?
An event that occurs at some point in time when the system does not deliver a service as expected by its users.
Fill in the gaps:
_ are usually a result of system _ that are derived from _ in the system
Failures, errors, faults
Do faults necessarily result in system errors?
No, if the erroneous system state is transient and can be ‘corrected’ before an error arises
Do errors necessarily lead to system failures?
No, if the error is corrected by built-in error detection and recovery mechanism
Fault management strategies to achieve reliability
Fault avoidance, Fault detection and removal, Fault tolerance
What is fault avoidance?
Development techniques used that either minimize the possibility of mistakes or trap mistakes before they result in the introduction of system faults.
What is fault detection and removal?
Verification and validation techniques that increase the probability of detecting and correcting errors before the system goes into service
What is fault tolerance?
Run-time techniques used to ensure that system faults do not result in system errors and/or that system errors do not lead to system failures
What is reliability?
the probability of failure-free system operation over a specified time in a given environment for a given purpose. Can be expressed quantitatively
What is availability
the probability that a system, at a point in time, will be operational and able to deliver the requested services. Can be expressed quantitatively
Does the formal definition of reliability always reflect the user’s perception of a system’s reliability?
No, reliability can only be defined formally with respect to a system specification i.e. a failure is a deviation from a specification. Users don’t read specifications and don’t know how the system is supposed to behave; therefore, perceived reliability is more important in practice.
Availability is usually expressed as a percentage of time that the system is available to deliver services. What are the drawbacks of this?
It doesn’t take into account 2 factors:
The number of users affected by the service outage. Loss of service in the middle of the night is less important for many systems than loss of service during peak usage periods.
The length of the outage. The longer the outage, the more the disruption. Several short outages are less likely to be disruptive than 1 long outage. Long repair times are a particular problem.
Will removing X% of the faults in a system necessarily improve the reliability by X%
No, program defects may be in rarely executed sections of the code so may never be encountered by users. Removing these does not affect the perceived reliability
Can a program with known faults still be perceived as reliable by its users?
Yes. Program defects may be in rarely executed sections of the code so may never be encountered by users. Removing these does not affect the perceived reliability. Users adapt their behavior to avoid system features that may fail for them.
System reliability is measured by…
counting the number of operational failures and, where appropriate, relating these to the demands made on the system and the time that the system has been operational
Reliability metrics include:
Probability of failure on demand(POFOD), Rate of occurrence of failures (ROCOF), Availability(AVAIL)
What is the Probability of failure on demand(POFOD)?
The probability that the system will fail when a service request is made. Useful when demands for service are intermittent and relatively infrequent.
When the Rate of occurrence of failures (ROCOF) is relevant?
Relevant for systems where the system has to process a large number of similar requests in a short time.
What is the reciprocal of ROCOF?
Mean time to failure (MTTF)
Non-functional reliability requirements are…
specifications of the required reliability and availability of a system using one of the reliability metrics (POFOD, ROCOF or AVAIL)
Functional reliability requirements specify…
the faults to be detected and the actions to be taken to ensure that these faults do not lead to system failures.