7 - Fault Tolerance Flashcards
Dependability
Correctness of one component depends on another component
Dependability Example
Web server W requires database D.
D fails, so does W.
Dependability requires
Availability
Reliability
Safety
Maintainability
Availability
Readiness for usage
A(t) = probability that component is immediately usable
Reliability
Continuity of service delivery
R(t) = probability component works over time period [0,t]
Safety
Low risk of component failure leading to system failure.
Maintainability
Easy and quick to repair
When is a system more available but less reliable
more available meaning generally online but less reliable meaning scattered downtime
MTTF
Mean Time to Failure
Average time between start and failure
MTTR
Mean Time to Repair
Average time of repair
MTBF
Mean Time between Failures
MTBF = MTTF + MTTR
Availability (expression in relation to MTTF MTBF etc)
time available/whole time
MTTF/MTBF
Failure Rate z(t) of component C
z(t) is the (conditional) probability that C initially fails at t.
Reliability R(t) of component C
Declines exponentially even if z(t) is constant.
R(t) = e^(-z*t)
Density Function
f(t)
Probability of fail at
CDF
Cumulative Distribution Function
Probability to fail by t
Fault classifications
Permanent: exists until repaired
Intermittent: reappearing
Transient: occurs just once and disappears
Fault
Cause of error
Error
Part of component that can lead to failure
Failure
Component is not running as expected
Dealing with faults
Tolerant system
Good fault removal
Fault forecasting
Are parallel programs fault tolerant?
Often not fault-tolerant.
Wastes resources
Crashes as soon as one component fails