Disaster Recovery Planning Flashcards
RTO
- recovery time objective
- maximum amount of time an IT service can be down before it has a negative impact on business
- ensure all parties know their roles in disaster recovery plan
BIA
- business impact analysis
- RTO is important component
alternate sites
- enable business to continue when disruption occurs at primary site
- require high-speed communication links between sites
- IT infrastructure must be in place
- data replication between sites must be configured
failing over IT services to alternate site
- DHCP/DNS
- hosted web site
- VMs
- line of business applications
- ensure notifications are sent to stakeholders
ensure network address changes don’t affect IT service consumers
dynamic DNS updates for changed IP addresses
failover clustering
- provides high availability
- multiple servers (cluster nodes) use same shared storage/configured identically
- redundancy
MRU
- most recently used
- MRU path normally used when cluster node connects to shared storage
- server will attempt other paths if current path fails
active/active clustered services
- clustered service is running simultaneously on multiple cluster nodes
- zero downtime
- live failover
active/passive clustered services
- service fails over/starts up on another cluster node if node where service is running fails
rolling cluster updates
- staggered process of applying cluster node updates
- ensure some cluster nodes are always running
periodic heartbeat transmission
- used by clustering solutions
- sent from each node to ensure nodes haven’t failed
- use dedicated network adapter for cluster heartbeats
hot site
- alternate location that can actively continue business operations
- disaster recovery (DR) sites commonly used as hot sites by cloud providers
- continuous data protection (CDP) replication between sites
- most expensive to maintain
cold site
- alternate location with power/communication links in place
- don’t have IT equipment/software/data/staff
- software incompatibility
- must restore data from backups
- must fit within RTO/business continuity plan (BCP)
- least expensive
warm site
- alternate location with power/communication links
- some equipment in place
- bare-metal server restoration
- application patching
- data restoration
bare-metal server restoration
- performs full system recovery
- including OS
- can be configured even when hardware configuration is different from the software configured when system backup/image was taken
- external bootable drives/PXE
data replication
can immediately provide data without requiring restoration procedure
synchronous data replication
writes to primary/alternate location simultaneously
asynchronous data replication
slight delay before alternate write completes
disk-to-disk data replication
- RAID 1 (disk mirroring) storage
- second copy of data is written to disk other than primary disk
- automatically fails over to redundant disk
Linux tar command
create compressed archives for backup purposes
tar -c
create archive
tar -v
display verbose output
tar -z
compress archive with gzip
tar -f
specify path/filename of archive file
tar -x
extract specified archive
tar -C (uppercase)
change to directory for extraction of archive
Linux dd command
back up specific disk blocks/entire partitions
server-to-server data replication
- host-to-host replication
- uses software in server OS to replicate data between 2 or more servers
- consumes server processing workload
Windows DFSR
- distributed file system replication
- Windows server role service
- synchronize folder contents between servers
- only file block changes are synchronized
- changes compressed before being sent over network
- replication can be scheduled
- servers can be configured for continuous replication
- can be configured with bandwidth throttling to preserve network resources
- asynchronous replication
- configured 1 or more servers in replication group as read-only to prevent changes from that host
rsync
- tool to replicate data between hosts in UNIX/Linux
- variants work on Windows
- synchronized 2 or more local folders over network
- only file changes are synchronized
site-to-site data replication
- primary/hot site
- between cloud provider datacenters
- network links must be able to accommodate large data transfers quickly
active/active copies of data
data copies from synchronous replication solution
who/what affected (BIA)
- personal safety
- critical data
- network hardware/software
- critical database servers
- front-end applications
RTO (BIA)
significant factor when determining what type of failures can be tolerated/how long
disaster recovery (DR) plan
- prepares organization for potential negative incidents that can affect IT systems
- simulations
- includes step-by-step procedures to recover failed systems
- proper role documentation
DR plan contents
- table of contents
- scope of DR document
- contact information for escalation/outsourcing
- recovery procedures
- document revision history
- glossary
MTTR
- mean time to repair
- on average how long it takes to restore failed components
- helps in planning equipment life cycle/restore failed equipment
MTBR
- mean time between failures
- manufacturer provided
- estimate on how much time before failure
- usually associated with hardware
BCP
- business continuity plan
- ensures business operations can continue/resume quickly during/after a failure
- should include preventative measures
- continuity of operations (COOP)
creating/using BCP
- assemble BCP team
- identify/prioritize critical systems/data
- determine if required skills available internally/outsourced
- determine if alternate sites will be used
- create DR plan for each IT service
- review BCP with BCP team
- run periodic drills
selective backups
- enable only restoring files that are required
- instead of overwriting all files/restoring to an alternate path from original backup location
SQL server log shipping
- uses primary/secondary SQL server
- primary SQL supports read/write access
- secondary SQL updated via transaction log updates from primary
- side-by-side backup
archive bit
- used in file systems to indicate that a file has been changed/needs to be backed up
- used by most backup solutions
- cleared by full backup
- turned on by OS when new file is created
full backup
- copies all data specified in backup set
- take longer to complete/restore
- commonly only performed periodically
- clears archive bit when performed
differential backup
- copies only files that have changed since the last full backup (not since last differential backup)
- more time to restore than full backups (full restore + restore of differential)
- archive bit is not normally cleared
incremental backup
- copies only files that have changed since the last incremental/full backup
- normally clears archive bit
- least amount of time to take
- most amount of time to restore
synthetic full backups
- take incremental backup
- combine with older full backup in same location
snapshots
- VMs
- capture settings/data in vdisk files
- should not be relied upon as sole backup (don’t replace backups)
- can also apply to disk volumes/storage arrays/LUNs/hypervisors/databases
storage snapshots
snapshots used in SAN environments
Windows VSS
- volume shadow service
- configured for each disk volume
- enable scheduled snapshots (volume shadow copies)
- only contain changed disk blocks (don’t consume much space)
bare-metal backup
- data included in recovery image
- can be used to deploy new servers quickly
- use snapshots (recovery points)
- require boot device
linear access tape
- linear tape-open (LTO)
- magnetic storage media that uses linear tape file system (LTFS)
- large capacities
- fast data seeks
- streaming
- commonly used with tape backup systems/archiving
- XML file used as catalog of backed-up content
AIT
- advanced intelligent tape
- magnetic tape storage used with tape backup/archiving systems
- each data cartridge contains a chip with metadata
DLT
- digital linear tape
- industry standard
- used for long-term archiving
- should be placed in protective cases
- superDLT (SDLT) supports larger capacities/transfer rates
- can use SDLT in DLT systems with only read access
tape library
management solution for multiple tape devices/backup media used for backups
cloud backup security
- connect network to cloud provider with site-to-site VPN
- connect network to cloud provider with private network connection
- encrypt data before backing up to cloud (if server side encryption isn’t provided)
GFS tape rotation strategy
- grandfather-father-son
- most common
- uses 3 backup sets (i.e. daily/weekly/monthly)
- each tape rotated on a schedule
GFS example
- son = daily backup
- father = weekly backup
- grandfather = monthly backup
- day 7 = son tape becomes a father/used for next weekly backups
- other daily tapes keep getting reused as cycle continues
- week 4 = father becomes grandfather/used for next monthly backups
- monthly backups can be stored offsite
backup best practices
- clear/concise backup media labeling
- data retention policy
- integrity verification
- backup media offsite storage
- backup media encryption
- backup media environmental controls
- periodic data restoration tests
RAID variant that can tolerate 2 disk failures
RAID 6