Block 4 part 1 Flashcards
what is the key metric availability and how do we calcul it
availability is the probability that an application,service,system is available to use
A= uptime ÷ (uptime + Downtime)
Give me examples of planned downtime
Backup and restauration , hardware os network upgrades , application and db maintenance
give me examples fo unplanned downtime
environmental factors, app errors, operator and user errors
What is the primary cause of downtime in data centers
Ups system failure
what is the average annual loss per company according to size
small 221 817
Medium 450 000
Large 927 823
How do we calculate reliability
check slide 13
reliability is the ability for a system to perform its fucntiin ynder conditions in a specified period of time
what is mean time tonfailure MTTF
measure of reliability for items that cannot be repaire
MTTF = (test period x num of item under test) ÷ num of items that fail
see example on slide 14
what is annualised failure rate AFR
afr = (num of failures × 8760 hours)÷ MTTF hours × 100%
Howbdo we calculate how much drive will fail in 1 year using AFR
num failures = num of drives × AFR
What is inherent availability and how do we calcul it
inherent availability is the availability of a system that has not been created
Ai = MTTF ÷ (MTTF + MTTR)
mttr being the mean time to replace (time)
what is operational availability
Ao = MTBM (mean time between maintenance) ÷ (MTBM + MDT (mean downtime))
How do we increase availability
Load sharing -> sharing workload accross a number of computers.
the internet send a load to the load share monitor that would distribute it to multiple nodes
What are the disadvantages of load sharing and the solutiond
- the monitor cannot track the responses if a node does fail
- monitor represents a single point of failure
- updates to each node independently
- no guarantee that multiple request from a client are directeed to the same node
Solution:
incorporate cookies
add a form of a shared storage
what is heartbeat in load sharing
a small message that is communicated from the node tk monitor and if its not communicated the monitor will assume failure
What is clustering
collection of independent computer nodes as single logicsl server to user
its goal is to increase availability
there is two form active - active (start copy of application) / active - passive (takeover in failure)