18 - Disaster Recovery Planning Flashcards

Question 1

Q

What is a Disaster Recovery Plan?

Answer

A

A plan that is put into place to restore operations and implement recovery procedures in the event of a disaster. Designed to reduce decision-making activities during a disaster.

Question 2

Q

What is a Disaster? And what are some examples?

Answer

A

Natural Disaster: Violent occurences that result from changes in the earth’s surface or atmosphere that are beyond human control.
- Earthquakes: The shifting of seismic plates
- Floods: Some floods are the gradual accumulation of rainwater in rivers, lakes, or other bodies of water that then overflow. Flash Floods can occur when a sudden severe storm dumps more rainwater on an area that the ground can absorb in a short period of time.
- Tsunamis: Large waves caused by seismic activity.
- Storms: Hurricanes. toronadoes, hail storms, lightning
- Fires: Natural and Man-made
- Regional: Volcanos, avalanches, mudslide
Man-Made Disasters:
- Fires
- Acts of Terrorism
- Bombings/Explosions
- Power Outages
- Network, Utility, and Infrastructure Failures
- Hardware/Software Failures
- Strikes/Picketing
- Theft Vandalism

Question 3

Q

What is a Single Point of Failure?

Answer

A

Any component that can cause the entire system to fail.

Question 4

Q

What is Fault Tolerance?

Answer

A

The ability of a system to suffer a fault but continue to operate

Question 5

Q

What is System Resilience?

Answer

A

The ability of a system to maintain an acceptable level of service during an adverse event.

Question 6

Q

What is a way to provide fault tolerance for hard drives?

Answer

A

Using a RAID array:

RAID-0 (Striping): Half of your data will be stored on one drive, and the other half on the other. Fast read/write performance.
RAID-1 (Mirroring): Both disks hold the same data
RAID-5 (Striping w/Parity): 2 disks are striped and one disk holding parity information (correction code)
RAID-10 (RAID 1 + 0): Known as a stripe of mirrors. Needs at least 4 disks and will always need an even number of disks. If any set of mirrors fails, the entire array will fail.

Question 7

Q

How can servers be set up for resilience?

Answer

A

Load Balancing: Traffic can be balanced across multiple servers.
Failover: Having backup server(s) set up to automatically carry the load in the event the main server goes down.
- Failover Cluster: Two or more servers that are set up to take over if the main one fails

Question 8

Q

How can power sources be protected/have resilience?

Answer

A

UPS (Uninterruptible Power Supply): Provides battery-supplied power for a short period of time (5-30 minutes).
Generator: Provides long-term, stable power
- Fuel supply for this.
Surge-Protection: Protection from electrical events:
- Spike: A quick instance of increased power
- Sag: A quick instance of reduced power
- Surge: Increased power for a long period of time.
- Brownout: Reduced power for a long period of time.
- Transients: Noise on power lines that can come from a variety of sources.

Question 9

Q

What is Trusted Recovery?

Answer

A

Trusted Recovery provides assurances that after a crash, or failure, the system is just as secure as it was before the event occurred.

Fail-Secure: Defaults to a secure state in the event of a failure, blocking all access.
Fail-Open: Defaults to an open state in the event of a failure, granting all access.

Types of Trusted Recovery:

Manual Recovery: Admin is required to implement a secured state as the system fails in open-state.
Automated Recovery: The system is able to perform trusted recovery activities to restore itself against at least one type of failure.
Automated Recovery without Undue Loss: Same as Automated Recovery but has additional mechanisms to ensure specific objects are protected to prevent their loss.
Function Recovery: Ensures the system is able to complete the recovery for specific functions, or that the system will be able to roll back the changes to return to a secure state.

Question 10

Q

What is Quality of Service (QoS)?

Answer

A

QoS controls protect the availability of data networks under load.

Factors contributing to QoS:

Bandwidth: Network capacity available to carry communications
Latency: Time it takes a packet to travel from source to destination
Jitter: Variation in latency among packets
Packet Loss: Packets lost in transmission that need to be resent
Interference: Electrical noise and other factors that may corrupt the contents of packets.
Traffic Prioritization

Question 11

Q

What critical subtasks need to be part of your Disaster Recovery Plan?

Answer

A

Prioritize important business units and functions
Crisis Management: Train resources to handle emergency situations and provide leadership on the ground
Emergency Communications: Internally and Externally
Workgroup Recovery: Getting vital groups of the business back to work with their normal activities.
Alternate Processing Sites
Mutual Assistance Agreements (reciprocal agreements): Two organizations pledge to assist each other in the event of a disaster by sharing computing facilities or other resources.
Database Recovery

Question 12

Q

What are some Database Recovery techniques?

Answer

A

Electronic Vaulting: Data is moved to a remote site using bulk transfers
Remote Journaling: Bulk transfers happen more frequently
Remote Mirroring: A live database server is maintained at a backup site.

Question 13

Q

What are some Alternate Processing Sites?

Answer

A

Cold Sites: Facility large enough to handle business facilities but no major IT infrastructure in place (low cost)
Hot Site: Facility maintained in constant working order with infrastructure in place with current data to assume operations immediately (high cost)
Warm Sites: Facility that contains equipment and circuits but does not hold critical data or backups.
Mobile Sites: Self-contained units that contain necessary systems for a workgroup to use.
Service Bureaus: Companies that lease out computer resources. They own server farms and fields of workstations to be available on the fly.
Cloud Computing: Use IaaS providers for on-demand services.

Question 14

Q

What documents typically go into a Disaster Recovery Plan?

Answer

A

Emergency Response: Simple instructions for personnel to follow when a disaster is recognized.
Personnel and Communications: A list of people to contact and their contact info. Should include primary and backup contacts
Assessments: Gauging the situation and how well recovery is progressing
Backups and Offsite and Storage: Lays out the backup strategy used by the organization.
Software Escrow Arrangement: The developer of an app provides the source code to a trusted 3rd party in the event they go out of business.
External Communications: Set up channels of communication to vendors, clients, and outside entities
Utilities
Logistics and Supplies: Moving large numbers of people, equipment, and supplies.
Recovery and Restoration

Question 15

Q

What is the difference between Recovery v. Restoration?

Answer

A

Recovery: Bringing business operations/processes back to a working state
Restoration: Bringing a business facility and environment back to a workable/original state.

Question 16

Q

What are the different types of Backups and Offsite Storage?

Answer

A

Full Backups: Stores a complete copy of the data contained on the protected device.
Incremental Backups: Stores only files that have been modified since any last backup.
Differential Backups: Stores only files that have been modified since the last full backup.
Keeping backups both near or onsite as well as keeping backups offsite for redundancy.
Use reliable media
Build proper capacity
Backup on off-hours
Tape Rotation

Question 17

Q

What kind of training is involved in DRP plans?

Answer

A

Orientation training for all new employees
Initial training when employees take on a new DRP role
Detailed refresher training
Brief awareness refreshers for all other employees

Question 18

Q

What types of tests are done for DRP?

Answer

A

Read-Through Test (Checklist): Distribute copies of the plans to the members of the DRP team to review:
- Ensures key personnel are aware of their responsibilities
- Provides individuals an opportunity to review plans for obsolete info
- Helps identify if key individuals are missing
Structured Walk-Through (Table-top Excercise): Role-play a disaster scenario
Simulation Test: Members are presented with a scenario and asked to develop an appropriate response
Parallel Test: Members are relocated to an alternate recovery site and implement site activation procedures. Operations at the primary site are not interrupted.
Full-Interruption Test: Same as Parallel Test but the primary site is actually shutdown.
Maintenance: As needs change, make the necessary modifcations.