18 - Disaster Recovery Planning Flashcards
What is a Disaster Recovery Plan?
A plan that is put into place to restore operations and implement recovery procedures in the event of a disaster. Designed to reduce decision-making activities during a disaster.
What is a Disaster? And what are some examples?
-
Natural Disaster: Violent occurences that result from changes in the earth’s surface or atmosphere that are beyond human control.
- Earthquakes: The shifting of seismic plates
- Floods: Some floods are the gradual accumulation of rainwater in rivers, lakes, or other bodies of water that then overflow. Flash Floods can occur when a sudden severe storm dumps more rainwater on an area that the ground can absorb in a short period of time.
- Tsunamis: Large waves caused by seismic activity.
- Storms: Hurricanes. toronadoes, hail storms, lightning
- Fires: Natural and Man-made
- Regional: Volcanos, avalanches, mudslide
-
Man-Made Disasters:
- Fires
- Acts of Terrorism
- Bombings/Explosions
- Power Outages
- Network, Utility, and Infrastructure Failures
- Hardware/Software Failures
- Strikes/Picketing
- Theft Vandalism
What is a Single Point of Failure?
Any component that can cause the entire system to fail.
What is Fault Tolerance?
The ability of a system to suffer a fault but continue to operate
What is System Resilience?
The ability of a system to maintain an acceptable level of service during an adverse event.
What is a way to provide fault tolerance for hard drives?
Using a RAID array:
- RAID-0 (Striping): Half of your data will be stored on one drive, and the other half on the other. Fast read/write performance.
- RAID-1 (Mirroring): Both disks hold the same data
- RAID-5 (Striping w/Parity): 2 disks are striped and one disk holding parity information (correction code)
- RAID-10 (RAID 1 + 0): Known as a stripe of mirrors. Needs at least 4 disks and will always need an even number of disks. If any set of mirrors fails, the entire array will fail.
How can servers be set up for resilience?
- Load Balancing: Traffic can be balanced across multiple servers.
-
Failover: Having backup server(s) set up to automatically carry the load in the event the main server goes down.
- Failover Cluster: Two or more servers that are set up to take over if the main one fails
How can power sources be protected/have resilience?
- UPS (Uninterruptible Power Supply): Provides battery-supplied power for a short period of time (5-30 minutes).
-
Generator: Provides long-term, stable power
- Fuel supply for this.
-
Surge-Protection: Protection from electrical events:
- Spike: A quick instance of increased power
- Sag: A quick instance of reduced power
- Surge: Increased power for a long period of time.
- Brownout: Reduced power for a long period of time.
- Transients: Noise on power lines that can come from a variety of sources.
What is Trusted Recovery?
Trusted Recovery provides assurances that after a crash, or failure, the system is just as secure as it was before the event occurred.
- Fail-Secure: Defaults to a secure state in the event of a failure, blocking all access.
- Fail-Open: Defaults to an open state in the event of a failure, granting all access.
Types of Trusted Recovery:
- Manual Recovery: Admin is required to implement a secured state as the system fails in open-state.
- Automated Recovery: The system is able to perform trusted recovery activities to restore itself against at least one type of failure.
- Automated Recovery without Undue Loss: Same as Automated Recovery but has additional mechanisms to ensure specific objects are protected to prevent their loss.
- Function Recovery: Ensures the system is able to complete the recovery for specific functions, or that the system will be able to roll back the changes to return to a secure state.
What is Quality of Service (QoS)?
QoS controls protect the availability of data networks under load.
Factors contributing to QoS:
- Bandwidth: Network capacity available to carry communications
- Latency: Time it takes a packet to travel from source to destination
- Jitter: Variation in latency among packets
- Packet Loss: Packets lost in transmission that need to be resent
- Interference: Electrical noise and other factors that may corrupt the contents of packets.
- Traffic Prioritization
What critical subtasks need to be part of your Disaster Recovery Plan?
- Prioritize important business units and functions
- Crisis Management: Train resources to handle emergency situations and provide leadership on the ground
- Emergency Communications: Internally and Externally
- Workgroup Recovery: Getting vital groups of the business back to work with their normal activities.
- Alternate Processing Sites
- Mutual Assistance Agreements (reciprocal agreements): Two organizations pledge to assist each other in the event of a disaster by sharing computing facilities or other resources.
- Database Recovery
What are some Database Recovery techniques?
- Electronic Vaulting: Data is moved to a remote site using bulk transfers
- Remote Journaling: Bulk transfers happen more frequently
- Remote Mirroring: A live database server is maintained at a backup site.
What are some Alternate Processing Sites?
- Cold Sites: Facility large enough to handle business facilities but no major IT infrastructure in place (low cost)
- Hot Site: Facility maintained in constant working order with infrastructure in place with current data to assume operations immediately (high cost)
- Warm Sites: Facility that contains equipment and circuits but does not hold critical data or backups.
- Mobile Sites: Self-contained units that contain necessary systems for a workgroup to use.
- Service Bureaus: Companies that lease out computer resources. They own server farms and fields of workstations to be available on the fly.
- Cloud Computing: Use IaaS providers for on-demand services.
What documents typically go into a Disaster Recovery Plan?
- Emergency Response: Simple instructions for personnel to follow when a disaster is recognized.
- Personnel and Communications: A list of people to contact and their contact info. Should include primary and backup contacts
- Assessments: Gauging the situation and how well recovery is progressing
- Backups and Offsite and Storage: Lays out the backup strategy used by the organization.
- Software Escrow Arrangement: The developer of an app provides the source code to a trusted 3rd party in the event they go out of business.
- External Communications: Set up channels of communication to vendors, clients, and outside entities
- Utilities
- Logistics and Supplies: Moving large numbers of people, equipment, and supplies.
- Recovery and Restoration
What is the difference between Recovery v. Restoration?
- Recovery: Bringing business operations/processes back to a working state
- Restoration: Bringing a business facility and environment back to a workable/original state.