Chapter 5 - Availability Flashcards
What is availability?
a property of software that it is there and ready to carry out its task when you need it to be
How is availability different from reliability?
it builds on reliability by adding the notion of recovery and repair
Availability general scenario: what are 5 possible values for “Source”
internal/external:
- people
- hardware
- software
- physical infrastructure
- physical environment
Availability general scenario: what are 4 possible values for “Stimulus”
- omission
- crash
- incorrect timing
- incorrect response
Availability general scenario: what are 4 possible values for “Artifact”
- system’s processors
- communication channels
- persistent storage
- processes
Availability general scenario: what are 6 possible values for “Environment”
- normal operation
- startup
- shutdown
- repair mode
- degraded operation
- overloaded operation
Availability general scenario: what are 3 possible values for “Response”
- prevent the fault from becoming a failure
- detect fault
- recover from fault
Availability general scenario: what are 6 possible values for “Response Measure”
- time interval when the system must be available
- availability percentage
- time to detect the fault
- time to repair the fault
- time interval in which system can be in degraded mode
- the rate of a certain class of faults that the system prevents
2 system actions that are done in order to “detect the fault”
- log the fault
- notify appropriate entities
4 possible system actions that can be done in order to “recover from fault”
- disable the source of events causing faults
- be temporarily unavailable while the repair is being affected
- fix/mask the fault or contain damage it causes
- operate in degraded mode while repair in progress
Definition of availability tactics?
they enable a system to endure faults so that services remain compliant with their specifications
The main goal of availability tactics?
to keep faults from becoming failures or at least bound the effects of the fault and make repair possible
9 tactics for detecting faults
- ping/echo
- monitor
- heartbeat
- timestamp
- sanity checking
- condition monitoring
- voting
- exception detection
- self-test
Tactic for detecting faults: What is ping/echo?
an asynchronous request/response message pair exchanged between nodes, used to determine reachability and the round-trip delay through the associated network path
Tactic for detecting faults: What is a monitor?
a component used to monitor the state of health of other parts of the system
Tactic for detecting faults: What is a heartbeat?
a periodic message exchange between a system monitor and a process being monitored
Tactic for detecting faults: What is a timestamp?
used to detect incorrect sequences of events, primarily in distributed message-passing systems
Tactic for detecting faults: What is sanity checking?
checks the validity or reasonableness of a component’s operations or outputs
Tactic for detecting faults: What is condition monitoring?
checking conditions in a process or device, or validating assumptions made during the design