Availability/Reliability Flashcards
The most important aspect of Availability
to avoid a single point of failure
What two methods are used to avoid a single point of failure
redundancy and diversity
What is SW redundancy
have multiple versions of the system running at once
What is SW diversity
having different ways of doing the same work
4 system architectures for ensuring availability
- hot backup (running multiple versions)
- triple voting system (have 3 versions of the system and run the best 2 at all times)
- module monitoring code (code that monitors the running of a module)
- dynamic post-test monitor (self-testing of hardware)
2 characteristics of a dependable SW development process
1) process is explicitly defined (aka well-documented)
2) process is repeatable and not subjective (works in different domains)
5 characteristics of a well-documented process
- auditable (outsiders can check it)
- diverse (multiple ways to check it)
- documentable (defines process activities and resulting documentation products)
- robust (can recover from failures)
- standardized
3 types of formal methods
- consistency proof
- refinement
- model checking
what is FM consistency proof
- develop math/logic model for developed SW program
- prove the program is consistent with the model
what is FM refinement
the generation of a program from a math/logic specification using trusted correctness-preserving transformations
what is FM model checking
develop math/logic model for the program. show that safety constraints/requirements/invariants are true at certain parts of the program
6 cons of using FM
- domain experts cannot check formal specs because they don’t know FM
- hard to estimate cost/effort savings from FMs
- few programmers have FM skills
- can’t use for large systems (because it doesn’t scale)
- few FM tools available
- incompatible with Agile
system fault vs system error vs system failure
fault - the wrong code is executed
error - bad system state
failure - user sees the bad system state
3 approaches to make systems more reliable
- fault avoidance (write better code)
- fault detect and correct (monitor system and correct it when faults arise)
- fault tolerant (accept faults but have workarounds for when they arise)
what is POFOD and what does it mean?
Probability of Failure on Demand
likelihood of system failure when a certain user action is taken