Disaster Recovery Flashcards

1
Q

What are the key metrics to work out during Disaster Recovery business impact analysis?

A
  1. Recovery Time Objective (RTO) - the maximum acceptable length of time that your application can be offline (usually covered in an SLA).
  2. Recovery Point Objective (RPO) - the maximum acceptable length of time that data may be lost due to a major incident. This can vary depending on the type or usage of the data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does GCP reduce costs associated with RTO and RPO?

A

GCP managed solutions bypass most DR requirements for: Capacity, Security, Network infrastructure, Support, Bandwidth and Facilities. Costs associated with managing complex applications are reduced as well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the GCP advantages for DR planning?

A

1) Global network: largest, most advanced backbone network for fast, consistent, scalable performance.
2) Redundancy: multiple points of presence, automatic data mirroring across storage devices in multiple locations
3) Scalability: designed to scale, managed services via App Engine, Compute Engine auto-scaler, Cloud SQL and Datastore
4) Security: see the Google Security Model, site reliability engineering
5) Compliance: regular third-party audits, complies with top certifications
6) Low cost: decrease of hardware is reflected in costs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the best practices for planning a DR strategy?

A

1) Design for end-to-end recovery
2) Make tasks specific
3) Implement control measures, monitoring and alerts
4) Ensure security controls are factored into recovery plan
5) Configure machine images to reflect your RTO using an optimized pre-configuration strategy.
6) Have more than one data recovery path (backups)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are some GCP services that may be utilized for DR?

A

1) Cloud Storage: data redundancy for regional, multi-regional and archival storage buckets.
2) Cloud SQL: useful as hot-standby, enabling automated point-in-time recovery.
3) BigQuery: useful for redirecting high-volume logging via Stackdriver or Fluentd, then use for logging analysis to diagnose DR issues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some GCP services or features useful for application backup and recovery?

A

1) Compute instance snapshots: diff-based backup of persistent disks, globally available to restore in any zone or region.
2) HTTP load-balancing: forwarding rules to route traffic from on-premise to GCP compute instances
3) Cloud DNS: high-volume, high-peformance DNS serving; can use for application failover if load-balancing is not available.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What tools does GCP provide for testing, debugging and deploying your DR plan?

A

1) Stackdriver logging: collects and stores, viewable in console, can be streamed to storage, bigquery or pub/sub
2) Stackdriver monitoring: dashboard and alerts for your application, view performance metrics, monitoring hooks for popular open-source services
3) Cloud Deployment Manager: automates the creation and management of GCP resources via templates and configuration files, use them to create deployments for a variety of GCP services like Storage, Compute and SQL
4) Remote Connectivity: remote recovery solution for on-premise environment of from another cloud service
5) Cloud Interconnect: enables high availability and low latency, access to private IP addresses
6) Direct Peering: access to public IP via direct network connection, high availability and low latency
7) Compute Engine VPN: connect on-premise network to Compute Engine network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain ‘Historical data recovery’, archival recommendation and a possible backup solution.

A

Data archived for compliance and historical analysis, important to include log and database data

Archiving log data: stream to bucket or bigquery dataset, use Stackdriver Logs Viewer to import

Archiving database data: - multi-tiered backup solution to move db backups to lower tier storage class types - create custom image of compute engine instance with database system installed - take regular snapshots, each time you upgrade - export data to a highly-portable flat format such as CSV, JSON or XML and store in a Cloud Storage Nearline bucket - exported CSV/JSON can be imported into BigQuery for easy data-analysis.

Archiving directly to BigQuery: real-time event data using ‘streaming inserts’, useful for big data analytics; high volume event logging, not transactional, aggregate analysis (queries for trends, not single or narrow row analysis)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain ‘Data corruption recovery’ strategy and solutions for it on GCP given RTO.

A

1) Use backups in combination with transactional log files from the corrupted database to roll back to a known-good state
2) Cloud SQL provides automated point-in-time recovery for databases with automated backups and binary logging enabled
3) For Compute Engine solutions, you are responsible for the db management and backup solution
4) If your RTO permits, you can prevent access to the table with the corrupted data by leaving your applications offline until the uncorrupted data has been restored to a new table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Explain all the ‘Application recovery’ failover options.

A

1) Hot standby server failover: running instance with no traffic until LB detects an unhealthy instance and switches over
2) Warm standby server failover: same as Hot except without LB, uses DNS record to cut-over to the standby server, RTO based on time to adjust DNS record
3) Cold standby server failover: offline server identical to main app server, if main goes offline, the standby server is started and switches over
- Solution: one (minimal) compute instance group for taking snapshots(backup) and monitoring with a ‘heartbeat’ monitor. If unhealthy serving instance is detected, a new main app snapshot instance is created and added to the main app instance group and the LB directs traffic to it. App Engine with Tasks can also work well here
4) Warm static site failover: Cloud Storage-based static site on standby, economical for websites with few or no dynamic elements, change DNS settings to have it serving from Storage

Cloud DNS -> Cloud Storage || HTTPS Load Balancer > Compute Engine instance groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Explain how GCP may be used for ‘Remote recovery’.

A
  • GCP is useful as target for backups and archives, using Career Interconnect, Direct Peering or Compute Engine VPN - Replicating storage with GCP: ex. from on-premise storage appliance or other provider via Cloud Storage XML API (limited support)
  • Replicating application data with GCP: hot/active database replication in GCP, cold standby app instances come online, DNS update to route traffic to app tier or Google LB external IP.
  • Maintaining machine image consistency: between on-premise/cloud or cloud/cloud environments; Packer recommended, other options are Chef, Puppet, Ansible or Saltstack
How well did you know this?
1
Not at all
2
3
4
5
Perfectly