Module 9: Business Continuity Overview (Fault Tolerance Infrastructure) Flashcards
What is fault tolerance?
ability of an IT system to continue functioning in the event of a failure
What does fault tolerance ensure?
that a single fault/failure does not make an entire system or service unavailable
What is fault isolation?
contains the scope of a fault so that other areas of the system are not impacted by the fault
What is the importance of fault isolation?
does not prevent component failure but ensures failure doesn’t impact whole system
What is a single point of failure?
any individual component of infrastructure whose failure can make entire system unavailable
What is the point of compute clustering?
redundancy and load balancing
What is link aggregation?
combines links between two switches and also between a switch and a node
What is the reason for link aggregation implemntation?
enables network traffic failover in the event of a link failure
What is NIC training?
groups NICs so that they appear as a single logical NIC to the OS or hypervisor
What is multipathing?
enables a compute system to use multiple paths for transferring data to a LUN
What is elastic load balancing?
enables dynamic distribution of application IO traffic
What is erasure coding?
provides space optimal data redundancy to prevent data loss against multiple disk drive failures
How does erasure coding work?
a set of “n” disks is divided into “m” disks to hold data and “k” disks to hold coding info
coding info is calculated from data
What is dynamic disk sparring?
automatically replaces a failed drive with a spare drive to prevent data loss
What is cache mirroring?
each write to cache is held in two different memory locations on two independent memory cards