Lesson 18: Explaining Disaster Recovery and High Availability Concepts Flashcards
Define availability
The percentage of time that the system is online, measured over a certain period, typically one year.
Describe high availability and its goal
Metric that defines how closely systems approach the goal of providing data availability 100 percent of the time while maintaining a high level of system performance.
Define Maximum Tolerable Downtime (MTD)
Longest period that a process can be inoperable without causing irrevocable business failure.
How is downtime calculated?
Calculated from the sum of scheduled service intervals (Agreed Service Time) plus unplanned outages over the period
For critical systems, what the the suggested availability?
99% (two nines) to 99.9999 (six nines)
Define Recovery time objective (RTO)
Maximum time allowed to restore a system after a failure event; maximum amount of time allowed to identify that there is a problem and then perform recovery.
Define Work Recovery Time (WRT)
Time spent performing reintegration and testing of a restored or upgraded system following an event.
What two factors are considered in Maximum tolerable downtime (MTD)?
- RTO - Recovery time objective
- WRT - Work recovery time (WRT)
Combined they must not exceed MTD
Define Recovery Point Objective (RPO)
Longest period that an organization can tolerate lost data being unrecoverable.
Define a fault
An event that causes a service/asset to become unavailable; servers, disk arrays, switches, routers, etc. can have faults
What is a KPI?
Key performance indicator - used to determine the reliability of each asset and assess whether goals for MTD, RTO, and RPO can be met.
Define Mean Time Between Failures (MTBF)
Metric for a device or component that predicts the expected time between failures
How is Mean Time Between Failures (MTBF) calculated?
Total operational time divided by the number of failures
Define Mean Time to Failure (MTTF)
Metric indicating average time a non-repairable component is expected to be in operation
What non-repairable components would be measure with mean time to failure (MTTF)?
HDDs, SSDs
How is Mean Time to Failure (MTTF) calculated?
Total operational time divided by the number of devices.
When is Mean Time to Failure (MTTF) used in comparison to Mean Time Between Failures (MTBF)?
A hard drive may be described with an MTTF, while a server, which could be repaired by replacing the hard drive, would be described with an MTBF.
Define Mean Time to Repair (MTTR)
Metric representing average time taken for a device or component to be repaired, replaced, or recover from a failure.
How is Mean Time to Repair (MTTR) calculated?
Total number hours of unplanned maintenance divided by the number of failure incidents.
How is Mean Time to Repair (MTTR) used in a recovery effort?
Used to estimate whether a recovery time objective (RTO) is achievable.
Define fault tolerance
A system that can experience failures in individual components and sub-systems and continue to provide the same (or nearly the same) level of service.
How is fault tolerance achieved?
By provisioning redundancy for critical components to eliminate single points of failure.
Define a recovery/spare site
Another location that can provide the same (or similar) level of service. A disaster or systems failure at one site will cause services to failover to the alternate processing site.
What are the three types of recovery/spare sites?
- Hot
- Warm
- Cold
Define a hot site
Fully configured alternate processing site that can be brought online either instantly or very quickly.
Define a warm site
Alternate processing location that is dormant or performs noncritical functions under normal conditions, but which can be rapidly converted to a key operations site.
Define a cold site
Predetermined alternate location where a network can be rebuilt after a disaster.
Define a power distribution unit (PDU)
Provides filtered output voltage to “clean” the power signal, provides protection against spikes, surges, and brownouts.
Define an uninterruptible power supply (UPS)
Battery-powered device that swill provide a temporary power source in the event of a blackout or power failure.
Which metric is used to determine frequency of data backups?
Recovery Point Objective (RPO) is the maximum amount of data loss permitted, measured in units of time
Define multipathing and its purpose
A network node has more than one physical link to another node.
Define SAN (storage area network) multipathing
A node having multiple physical links to the SAN
How is multipathing performed with ISPs?
Contracting with multiple ISPs and using routing policies to forward traffic over multiple external circuits provides fault tolerance.
What needs to be confirmed when contracting with multiple ISPs?
Need to ensure that the ISPs are operating separate infrastructure and not using peering arrangements.
Define the concept of diverse paths
Provisioning links over separate cable conduits that are physically distant from one another.
Define link aggregation/bonding
Combining two or more separate cabled links between a host and a switch into a single logical channel.
How is link aggregation/bonding defined at the host level?
NIC teaming
How is link aggregation/bonding defined at the switch level?
Port aggregation
Besides increased bandwidth, what else does link aggregation/bonding offer?
Redundancy; if one link is broken, the connection is still maintained by the other.
What ethernet standard does link aggregation/bonding belong to?
802.3ad/802.1ax
What are bonded interfaces known as?
Link Aggregation Group (LAG)
What protocol is used to implement link aggregation/bonding?
802.11ad Link Aggregation Control Protocol (LACP); can be used to detect configuration errors and recover from the failure of one of the physical links.
Define a load balancer and its function
Type of switch, router, or software that distributes client requests between different resources, such as communications links or a pool of servers.
What are the two main types of load balancers?
- Layer 4 switch
- Layer 7 switch (content switch)
How does a layer 4 load balancer differ from a layer 7 load balancer?
Layer 4 load balancer makes decisions at the transport layer (Layer 4) while layer 7 load balancer makes decisions at the application layer
Define the concept of clustering
Load balancing technique where a group of servers are configured as a unit and work together to provide network services.
What term is used for the public IP address of a load balancer cluster/pair
Virtual IP/shared/floating address
How does a load balanced cluster use the virtual IP?
The nodes in the cluster share a private connection with their internal IPs, a redundancy protocol allows the active node in the cluster to own the virtual IP and respond to connections.
Define an active-passive cluster
Only one node is active at a time and the others are passive waiting to be active
Define an active-active cluster
All nodes are processing connections concurrently.
What is the purpose of a first hop redundancy protocol (FHRP)?
Designed to provide fault tolerance/redundancy to the default gateway of a subnet by provisioning failover routers in a cluster
What are the two types of first hop redundancy protocols (FHRPs)?
- Hot Standby Router Protocol (HSRP)
- Virtual Router Redundancy Protocol (VRRP)
How is the Hot Standby Router Protocol (HSRP) configured?
Each router interface connected is to the same subnet with its own unique MAC address and IP address, they also need to be configured to share a common virtual IP address and a common MAC address.
How does a router cluster using Hot Standby Router Protocol (HSRP) communicate?
Using IP multicasts
How is the active router chosen when using Hot Standby Router Protocol (HSRP)?
Based on priorities configured by an administrator; Of the remaining routers in the standby group, the router with the next highest priority is chosen as the standby router.
What is the difference in functionality between Virtual Router Redundancy Protocol (VRRP) and Hot Standby Router Protocol (HSRP)?
- In VRRP, the active router is known as the master, and all other routers in the group are known as backup routers
- VRPP routers can be configured to only use the virtual IP address