Business Continuity and Disaster Recovery Flashcards
Disruptions
unplanned event that interrupts any organizational asset like processes, functions, devices
3 Categories of disruptions
nondisasters
disasters
catastrophes
nondisasters
temporary due to malfunctions or failure. easiest to recover from
disaster
suddenly occurring, has long term negative impact.
catastrophe
much wider and longer impact than disaster. facilities are destroyed, requiring rebuilding and temporary offsite locations
Disaster
usually affects wide geographical area. severe damage, injury, death
severity is affected by amount of time organization takes to recover
officially over when all business elements return to normal function at original site
Technological disasters
device failures. usually unintentional, even if caused by errors in configuration
if a disaster occurs because of deliberate attack, it’s considered man-made even if it’s against a technology
Man Made disaster
occurs through human intent or error. Attacks, personnel unavailability due to evacuation
Typically intentional
Natual
floods, tsunami, tornados, etc. Fires except for arson
Natural
floods, tsunami, tornados, etc. Fires except for arson
Disaster Recovery Plan (DRP)
Business Continuity Plan (BCP)
Each organizational function will have a DRP. It includes steps to restore functions and systems. Goal is to minimize damage and injury
The DRP’s are part of the BCP
DRP’s are implemented when emergency occurs
Disaster Recovery Plan (DRP)
Business Continuity Plan (BCP)
Each organizational function will have a DRP. It includes steps to restore/recover functions and systems. Goal is to minimize damage and injury
DRP’s are implemented when emergency occurs
The DRP’s are part of the BCP
Business continuity Plan (BCP)
considers all aspects affected by a disaster: functions, systems, personnel, facilities.
Lists and prioritizes services needed, particularly IT, telecom
Business continuity Plan (BCP)
Availability
Reliability
Recoverability
considers all aspects affected by a disaster: functions, systems, personnel, facilities.
Lists and prioritizes services needed, particularly IT, telecom
Availability is a main component. Orgs must determine acceptable levels of availability for functions and systems
Reliability is the capability of a function or systems to consistently perform to its specifications
Recoverability is the capability of a function to be recovered after a disruption
Contingency Plan
Instructions on what personnel should do until functions and systems are restored to full functionality
includes contact information for personnel, vendors and system and equipment requirements
failure of the contingency plan considered a mgmt failure
How often should the BCP, DRP and contingency plans be reviewed?
annually. maintain version control
Fault Tolerance
when a backup component starts working when primary fails.
Business Impact Analysis
4 main steps
ID critical processes, resources
ID outage impacts, estimate downtime
ID resource requirements
ID recovery priorities
Business Impact Analysis
ID Critical processes and resources
first ID the business units or functional areas
select people to gather necessary data, select how to gather data
use questionnaires, interviews, surveys, vulnerability analysis, risk assessment
document business processes, functions and the resources they depend on
Business Impact Analysis
Determine criticality level of resources by using these terms
Maximum Tolerable Downtime (MTD) aka Maximum Period Time of Disruption (MPTD)
Mean Time to Repair (MTTR)
Mean Time Between Failures (MTBF)
reliability increased by higher MTBF, lower MTTR
MTD/MPTD - maximum time an organization can tolerate a single resource being down
MTTR - Average time needed to repair a resource when a disaster happens
MTBF - Estimated time a device will operate before failure occurs. Calculated by device vendor
Business Impact Analysis
Terms to ID outage impacts and estimate downtime
Recovery Time Objective (RTO)
Recovery Point Objective (RPO)
Work Recovery Time (WRT)
RTO - shortest time period after a disruption that a resource must be restored to avoid unacceptable consequences. RTO should be smaller than MTD
RPO - Point in time to which the disrupted resource must be returned
WRT - difference between the RTO and MTD. The time left over after the RTO, before reaching MTD
Business Impact Analysis
Organizations must develop their own documented criticality levels:
critical resources
urgent resources
important resources
normal resources
critical - vital to operation, restored within minutes or hours
urgent - restored in 24 hours
important - restored in 72 hours
normal - resorted in 7 days
Recovery Strategies
Alternate locations include:
hot site
cold side
hot - contains all resources needed for full operation. Only resource needed to restore at hot site is data. Quickest recovery, but expensive, hard to manage.
cold - contains electrical, HVAC, communications wiring, plumbing. Longer to restore than hot or warm site. cheapest, but hard to test
Warm Site
Tertiary Site
Warm Site
typically has everything except computers
Most widely implemented alternate location
Tertiary Site
secondary backup site in case hot, warm or cold site is unavailable
usually used to protect against large catastrophes affecting wide geographic areas
Reciprocal agreements
Redundant sites
Reciprocal agreements
two organizations agree to act as alternate locations for each other. Can’t be legally enforced. May not handle workload of both organizations simultaneously
Redundant sites
not leased site, but owned by same organization as primary site.
most expensive but fastest way to recover
Disaster Recovery Plan Should include these things for hardware
vendor contact information in case new supplies need to be bought
recovery information for: hardware backup (computers, network gear, etc. guidelines and procedures for restoring data
Disaster Recovery Plan Should include these things for software
software backups including applications and data, should be stored at an alternate location.
All license information should be documented
software installation media, service packs, updates
frequent backups of applications should be taken
software escrow in case the software vendor goes out of business
Disaster Recovery Plan Should include these things for human resources
occupant emergency plan to minimize loss of life or injury
HR contacts personnel in event of disaster. Contact information should be stored on and offsite
After initial event, HR monitors personnel to guard against stress and burnout during recovery period
Provide adequate periods of rest
Guidelines to replace personnel lost during disaster
Ensure salaries and funding continue during and after disaster
signed checks should be securely stored offsite
executive succession plan should be created
Disaster Recovery Plan Should include these things for supplies
supplies - paper, cabling, water. Any vital resources to daily operations and vendors to get them from, plus alternate suppliers
Documentation
Each dept. should maintain their own critical documentation. Stored in central location onsite with offsite copy. Personnel should be tasked to ensure it’s created, stored and updated.
Create Recovery Strategies
DRP must include recovery information on these assets:
operations team
BCP teams
operations team - determines what data is backed up, how often and method of backup
BCP teams - ensure data is backed up and can be restored
Backup Types
full
incremental
differential
full backup - archive bit for each file is cleared. Best for offsite archiving. Longest time and most space. differential or incremental start with a full backup.
incremental
backups up everything changed since last backup of any kind. Archive bit for each file is cleared. Least amount of time and space to complete. To restore, full backup and each successive incremental backup must be restored
differential
backs up everything changed since the last full backup. Archive bit for each file is NOT cleared. Only the full and most recent differential backup are needed to restore
Backup schemes
transaction log backup
FIFO rotation scheme
grandfather/father/son rotation scheme
transaction log backup
recover to a specific point in time. Covers transactions that occurred since the last backup
FIFO rotation scheme
Newest backup is saved to oldest media. simplest but doesn’t protect against data errors. If error exists, you may not have a version of data without the error
grandfather/father/son rotation scheme
3 sets of backups, usually: daily, weekly, monthly.
daily backups are the “son”
weekly backups are the “father”
monthly backups are the “grandfather”
each week one son advances to the father set, each month one father advances to the grandfather set
data recovery terms
electronic vaulting remote journaling tape vaulting hierarchical storage management (HSM) optical jukebox replication
electronic vaulting
copies files when they’re modified
remote journaling
copies transaction logs offsite
tape vaulting
creates backups over network to offsite facility
hierarchical storage management (HSM)
moves data to different types (expenses) of media
optical jukebox
duh
replication - copies data to another location
synchronous - constant data updates
asynchronous - on a schedule
data recovery terms
RAID
SAN
Failover
Failsoft
RAID
redundant array of independent disks
SAN
storage devices connected by high speed network
Failover
ability to switch to a backup system if primary fails
Failsoft
ability to terminate non-critical processes when failure occurs
data recovery terms
clustering
load balancing
clustering
software product that does load balancing between applications.
One instance acts as a master controller distributing work to other instances
load balancing
hardware product that does load balancing. also called farms or pools
Top 2 priorities in a disaster
personnel safety
damage mitigation
Teams to support the DRP
Damage Assessment Team
Legal Team
Damage Assessment Team
determines disaster cause and amount of damage to organization. Identifies assets and functionality after disaster.
Legal Team
Overseas legal issues, PR events. Consult to ensure recovery operations follow laws and regulations
Teams to Support the DRP
Media Relations
Recovery Team
Relocation Team
Media Relations - informs public
Recovery Team - recovers critical business functions, ensures physical assets are in place. Oversees the relocation and restoration teams
Relocation Team - oversees transfer of assets between locations.
Teams to support the DRP
Restoration Team
Salvage Team
Security Team
Restoration Team - ensures assets and data are restored from backups
Salvage Team - recovers assets at disaster location, ensures primary site returns to normal. Does cleaning, rebuilding original facility, declares when original site can resume operations
Security Team - manages security at primary and alternate locations.
Types of tests for assessing BCP and DRP
Checklist test
Table-top exercise
Structured walk-through
Checklist test
department managers review the BCP. BCP committee uses their notes to update BCP
Table-top exercise
Most efficient and cost-effective way to ID areas of overlap early on. Brainstorming session where participants agree to a disaster scenario to focus on
Structured walk-through
each department rep reviews the BCP’s accuracy. Most important to perform before a live disaster
Types of tests for assessing BCP and DRP
Checklist test
Table-top exercise
Structured walk-through
Checklist test
department managers review the BCP. BCP committee uses their notes to update BCP
Table-top exercise
Most efficient and cost-effective way to ID areas of overlap early on. Brainstorming session where participants agree to a disaster scenario to focus on
Structured walk-through
each department rep reviews the BCP’s accuracy. Most important to perform before a live disaster
Types of tests for assessing BCP and DRP
Simulation test
Parallel test
Full-interruption test
Simulation test
operations and support personnel execute the DRP in a role-playing scenario.
Parallel test
validates new system against its predecessor. Performance of the replacement is compared to the original. If deficiencies are found, they’re resolved
Full-interruption test
shut down primary facility and bring up the alternate facility to full operation. Requires coordination between all parties. Perform after all other tests have been completed successfully
Types of tests for assessing BCP and DRP
Functional Drill
Evacuation Drill
Functional Drill
tests a single function or department to see if DRP is complete for that function.
Evacuation Drill
personnel follow evacuation or shelter in place guidelines.