Chapter 11 Flashcards
What is human error or mistake?
Human behavior that results in the introduction of faults into a system.
What is system fault?
A characteristic of a software system that can lead to a system error.
What is system error?
An erroneous system state that can lead to system behavior that is unexpected by system users.
What is system failure?
An event that occurs at some point in time when the system does not deliver a service as expected by its users.
Fill in the gaps:
_ are usually a result of system _ that are derived from _ in the system
Failures, errors, faults
Do faults necessarily result in system errors?
No, if the erroneous system state is transient and can be ‘corrected’ before an error arises
Do errors necessarily lead to system failures?
No, if the error is corrected by built-in error detection and recovery mechanism
Fault management strategies to achieve reliability
Fault avoidance, Fault detection and removal, Fault tolerance
What is fault avoidance?
Development techniques used that either minimize the possibility of mistakes or trap mistakes before they result in the introduction of system faults.
What is fault detection and removal?
Verification and validation techniques that increase the probability of detecting and correcting errors before the system goes into service
What is fault tolerance?
Run-time techniques used to ensure that system faults do not result in system errors and/or that system errors do not lead to system failures
What is reliability?
the probability of failure-free system operation over a specified time in a given environment for a given purpose. Can be expressed quantitatively
What is availability
the probability that a system, at a point in time, will be operational and able to deliver the requested services. Can be expressed quantitatively
Does the formal definition of reliability always reflect the user’s perception of a system’s reliability?
No, reliability can only be defined formally with respect to a system specification i.e. a failure is a deviation from a specification. Users don’t read specifications and don’t know how the system is supposed to behave; therefore, perceived reliability is more important in practice.
Availability is usually expressed as a percentage of time that the system is available to deliver services. What are the drawbacks of this?
It doesn’t take into account 2 factors:
The number of users affected by the service outage. Loss of service in the middle of the night is less important for many systems than loss of service during peak usage periods.
The length of the outage. The longer the outage, the more the disruption. Several short outages are less likely to be disruptive than 1 long outage. Long repair times are a particular problem.