Lesson 18: Explaining Disaster Recovery and High Availability Concepts Flashcards by Emmanuel Barber-Thomas

Define availability

The percentage of time that the system is online, measured over a certain period, typically one year.

How well did you know this?

Not at all

Perfectly

Describe high availability and its goal

Metric that defines how closely systems approach the goal of providing data availability 100 percent of the time while maintaining a high level of system performance.

How well did you know this?

Not at all

Perfectly

Define Maximum Tolerable Downtime (MTD)

Longest period that a process can be inoperable without causing irrevocable business failure.

How well did you know this?

Not at all

Perfectly

How is downtime calculated?

Calculated from the sum of scheduled service intervals (Agreed Service Time) plus unplanned outages over the period

How well did you know this?

Not at all

Perfectly

For critical systems, what the the suggested availability?

99% (two nines) to 99.9999 (six nines)

How well did you know this?

Not at all

Perfectly

Define Recovery time objective (RTO)

Maximum time allowed to restore a system after a failure event; maximum amount of time allowed to identify that there is a problem and then perform recovery.

How well did you know this?

Not at all

Perfectly

Define Work Recovery Time (WRT)

Time spent performing reintegration and testing of a restored or upgraded system following an event.

How well did you know this?

Not at all

Perfectly

What two factors are considered in Maximum tolerable downtime (MTD)?

RTO - Recovery time objective
WRT - Work recovery time (WRT)
Combined they must not exceed MTD

How well did you know this?

Not at all

Perfectly

Define Recovery Point Objective (RPO)

Longest period that an organization can tolerate lost data being unrecoverable.

How well did you know this?

Not at all

Perfectly

Define a fault

An event that causes a service/asset to become unavailable; servers, disk arrays, switches, routers, etc. can have faults

How well did you know this?

Not at all

Perfectly

What is a KPI?

Key performance indicator - used to determine the reliability of each asset and assess whether goals for MTD, RTO, and RPO can be met.

How well did you know this?

Not at all

Perfectly

Define Mean Time Between Failures (MTBF)

Metric for a device or component that predicts the expected time between failures

How well did you know this?

Not at all

Perfectly

How is Mean Time Between Failures (MTBF) calculated?

Total operational time divided by the number of failures

How well did you know this?

Not at all

Perfectly

Define Mean Time to Failure (MTTF)

Metric indicating average time a non-repairable component is expected to be in operation

How well did you know this?

Not at all

Perfectly

What non-repairable components would be measure with mean time to failure (MTTF)?

HDDs, SSDs

How well did you know this?

Not at all

Perfectly

How is Mean Time to Failure (MTTF) calculated?

Total operational time divided by the number of devices.

How well did you know this?

Not at all

Perfectly

When is Mean Time to Failure (MTTF) used in comparison to Mean Time Between Failures (MTBF)?

A hard drive may be described with an MTTF, while a server, which could be repaired by replacing the hard drive, would be described with an MTBF.

How well did you know this?

Not at all

Perfectly

Define Mean Time to Repair (MTTR)

Metric representing average time taken for a device or component to be repaired, replaced, or recover from a failure.

How well did you know this?

Not at all

Perfectly

How is Mean Time to Repair (MTTR) calculated?

Total number hours of unplanned maintenance divided by the number of failure incidents.

How well did you know this?

Not at all

Perfectly

How is Mean Time to Repair (MTTR) used in a recovery effort?

Used to estimate whether a recovery time objective (RTO) is achievable.

How well did you know this?

Not at all

Perfectly

Define fault tolerance

A system that can experience failures in individual components and sub-systems and continue to provide the same (or nearly the same) level of service.

How well did you know this?

Not at all

Perfectly

How is fault tolerance achieved?

By provisioning redundancy for critical components to eliminate single points of failure.

How well did you know this?

Not at all

Perfectly

Define a recovery/spare site

Study These Flashcards

Another location that can provide the same (or similar) level of service. A disaster or systems failure at one site will cause services to failover to the alternate processing site.

What are the three types of recovery/spare sites?

Study These Flashcards

Hot
Warm
Cold

Define a hot site

Fully configured alternate processing site that can be brought online either instantly or very quickly.

Define a warm site

Alternate processing location that is dormant or performs noncritical functions under normal conditions, but which can be rapidly converted to a key operations site.

Define a cold site

Predetermined alternate location where a network can be rebuilt after a disaster.

Define a power distribution unit (PDU)

Provides filtered output voltage to "clean" the power signal, provides protection against spikes, surges, and brownouts.

Define an uninterruptible power supply (UPS)

Battery-powered device that swill provide a temporary power source in the event of a blackout or power failure.

Which metric is used to determine frequency of data backups?

Recovery Point Objective (RPO) is the maximum amount of data loss permitted, measured in units of time

Define multipathing and its purpose

A network node has more than one physical link to another node.

Define SAN (storage area network) multipathing

A node having multiple physical links to the SAN

How is multipathing performed with ISPs?

Contracting with multiple ISPs and using routing policies to forward traffic over multiple external circuits provides fault tolerance.

What needs to be confirmed when contracting with multiple ISPs?

Need to ensure that the ISPs are operating separate infrastructure and not using peering arrangements.

Define the concept of diverse paths

Provisioning links over separate cable conduits that are physically distant from one another.

Define link aggregation/bonding

Combining two or more separate cabled links between a host and a switch into a single logical channel.

How is link aggregation/bonding defined at the host level?

NIC teaming

How is link aggregation/bonding defined at the switch level?

Port aggregation

Besides increased bandwidth, what else does link aggregation/bonding offer?

Redundancy; if one link is broken, the connection is still maintained by the other.

What ethernet standard does link aggregation/bonding belong to?

802.3ad/802.1ax

What are bonded interfaces known as?

Link Aggregation Group (LAG)

What protocol is used to implement link aggregation/bonding?

802.11ad Link Aggregation Control Protocol (LACP); can be used to detect configuration errors and recover from the failure of one of the physical links.

Define a load balancer and its function

Type of switch, router, or software that distributes client requests between different resources, such as communications links or a pool of servers.

What are the two main types of load balancers?

1. Layer 4 switch 2. Layer 7 switch (content switch)

How does a layer 4 load balancer differ from a layer 7 load balancer?

Layer 4 load balancer makes decisions at the transport layer (Layer 4) while layer 7 load balancer makes decisions at the application layer

Define the concept of clustering

Load balancing technique where a group of servers are configured as a unit and work together to provide network services.

What term is used for the public IP address of a load balancer cluster/pair

Virtual IP/shared/floating address

How does a load balanced cluster use the virtual IP?

The nodes in the cluster share a private connection with their internal IPs, a redundancy protocol allows the active node in the cluster to own the virtual IP and respond to connections.

Define an active-passive cluster

Only one node is active at a time and the others are passive waiting to be active

Define an active-active cluster

All nodes are processing connections concurrently.

What is the purpose of a first hop redundancy protocol (FHRP)?

Designed to provide fault tolerance/redundancy to the default gateway of a subnet by provisioning failover routers in a cluster

What are the two types of first hop redundancy protocols (FHRPs)?

1. Hot Standby Router Protocol (HSRP) 2. Virtual Router Redundancy Protocol (VRRP)

How is the Hot Standby Router Protocol (HSRP) configured?

Each router interface connected is to the same subnet with its own unique MAC address and IP address, they also need to be configured to share a common virtual IP address and a common MAC address.

How does a router cluster using Hot Standby Router Protocol (HSRP) communicate?

Using IP multicasts

How is the active router chosen when using Hot Standby Router Protocol (HSRP)?

Based on priorities configured by an administrator; Of the remaining routers in the standby group, the router with the next highest priority is chosen as the standby router.

What is the difference in functionality between Virtual Router Redundancy Protocol (VRRP) and Hot Standby Router Protocol (HSRP)?

1. In VRRP, the active router is known as the master, and all other routers in the group are known as backup routers 2. VRPP routers can be configured to only use the virtual IP address

Lesson 18: Explaining Disaster Recovery and High Availability Concepts Flashcards

(56 cards)