Dependability - Theory Flashcards
What are the basic steps on building reliable systems?
Error detection, error containment, error masking
What dependability is?
A measure of how much we trust a system
The ability of a system to perform its functionality while exposing reliability, availability, maintain ability, safety, security
What is reliability?
Continuity of correct service
What is availability?
Readiness for correct service
What is maintainability?
Ability for easy maintenance
What is safety?
Absence of catastrophic consequences
What is security?
Confidentiality and integrity of data
When do we think about dependability?
During design time and runtime
Failures in development should be avoided, failures in operation cannot be avoided, they must be dealt with
Design should take failures into account and guarantee that control and safety are achieved when failures occur. Effects of such failures should be predictable and deterministic not catastrophic.
How can we provide dependability?
Through failure avoidance, and tolerance partum
What are some of the failures avoidance procedures we can take?
Conservative design
Design validation
Detailed test
Infant mortality screen
Error avoidance
What techniques can we implement in order to increase tolerance?
Error detection/error masking during system operations
Online monitoring
Diagnostics
Self recovery and self repair
Define reliability and how it is calculated
Ability of a system or components to perform its required functions under stated conditions for a specified period of time
It is therefore the Probably that the system will operate correctly in specified operating environment until time T
Define availability in how it is calculated
The degree to which system or component is operational and accessible when required for use
Is calculated by dividing the uptime by the sum of the uptime with the downtime (total time)
It is the probability that the system will be operating at time T
Is it possible to have systems with low reliability that have high availability? What about the opposite?
Yes, system failures can be repaired quickly and do not damage data, low reliability may not be a problem
The opposite is generally more difficult
What is MTTF?
Meantime to failure it is the meantime before any failure will occur
What is the MTBF?
Meantime between failures is the meantime between two failures
How can we calculate MTBF?
By dividing the total operating time by the number of failures
What are the types of failures according to time?
Infant mortal, random failures, wear out
Define fault
A defect within the system
Define error
A deviation from the required operation of the system or sub system
Define failure
The system fails to perform its required function
What are reliability block diagrams?
An inductive model where a system is divided into blocks that represent distinct elements such as components or subsystems
Every element in the RBD has its own reliability (previously calculated or modeled)
Blocks are then combined together to model all the possible success paths
How do we calculate the reliability of components in the RBD?
Series components we multiply the reliabilities
Parallel components we multiply the chances of each components to fail simultaneously, and from this overall failure probability, we obtain the reliability by subtracting one by this failure probability
What is triple modular redundancy and what is the MTTF of the system?
System works properly if two out of three components work properly and the voter works properly
The MTF of the system is equal to five times MTTF of simple component divided by six
Triple modular redundancies, good or bad
The MTTF of the system is shorter than the MTTF of the single component. But it has higher reliability if the mission time is shorter than 70% of the mean time to failure of the component
What is standby redundancy
System composed of two parallel replicas, the primary replica working all time, and the redundant replica that is activated when the primary replica fails
What is necessary in order to have a standby redundancy?
A mechanism to determine whether the primary replica is working properly or not
A dynamic switching mechanism to disable the primary replica and activate the redundant one
If nothing is sad in the exercise of the exam, what can we assume about the distribution of the failure? What is the value of the reliability?
This is an exponential distribution of the failure. Which means that the failure rate is constant across the period.
Elder elevators to minus lambda times time