Data Management Flashcards

1
Q

RDS backups

A
    • Transactional storage engine is recommended for durability (InnoDB MySQL)
    • Degrades performance If Multi-AZ is not enabled.
    • Deleting an instance deletes all automated backups(not manual backups)
    • Backups are stored internally on Amazon S3
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

RDS restoring

A
    • When restoring, only the default DB parameter and security groups are associated with the instance.
    • you can change to a different DB engine as long as it is closely related to the previous engine and there is enough space allocated.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Automatic Backups – ElastiCache

A
    • Backups available for Redis clusters only.
    • Snapshots backup data for the entire cluster at a specific point in time.
    • Backup window should be during the least-utilized time period of the day
    • Snapshots can degrade performance and should be performance on read replicas.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Automatic Backups – Redshift

A
    • provides free storage equal to the storage capacity of the cluster.
    • Snapshots can be automated or manual, and are incremental;
    • Restoring snapshots creates a new cluster and imports the data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

EC2 Backups

A
    • No Built-in automated backup option
    • Snapshots of EBS volumes are incremental and can be automated with the API, CLI, or even AWS Lambda
    • Snapshots cause performance degradation
    • Snapshots are stored on S3;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

RDS Read Replicas Across Regions – DR

A
    • multi-AZ deployments are not enough to protect against entire regions going down.
    • We can use read replicas in other regions for higher availability.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

RDS Read Replicas Across Regions – benefits

A
    • help with performance if we have a global audience
      • Packets have shorter distances to travel between our database and the end user;
    • Replica lag can be expected to go up.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Services to be used for DR

A
EC2 and EBS
S3
AWS Import/Export Snowball
Amazon RDS
Elastic Load Balancer and Auto Scaling
Amazon Storage Gateway
CloudFormation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Quick recovering from disasters

A
    • use read replicas across regions for our database
    • have a backup to our infrastructure in a geographically separate location.
    • have the latest data and configuration available on our backup.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

AWS Tools for DR

A
    • EC2 AMI
    • VM Import/Export
    • For VMWare – we can use the AWS Management Portal for vCenter
    • Direct Connect
    • S3 Transfer Acceleration
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

DR scenarios – Backup and restore scenario

A
    • use AWS as a backup solution only by storing VMs, Snapshots, and other data
    • strategically map out which data needs to be backed up, and how
    • choose tools and services that comply with requirements(regulatory, financial, etc…)
    • Determine data lifetime and longterm backup strategies
    • test you backups often and thoroughly.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

DR scenarios – pilot light

A
    • keeps the environment small but can ignite and scale to failover our on-premises infrastructure
    • provisions the bare minimum resources but is always ready for a failover
    • Growing the infrastructure to scale can take some time
    • resource deployment and provisioning should be automated
    • tested often and thoroughly.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

DR scenarios – hot Standby(multi-site)

A
    • provides the least downtime possible
    • keeps all of the resources ready for use at any moment’s notice
    • can be complex to maintain
    • usually the most expensive to implement.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

DR scenarios – Duplicate the environments from one region to another

A
    • Many concepts from our on-premises scenarios still apply for this scenario
    • we can use read replicas for our Amazon RDS database
    • route 53 has a Failover routing policy which routes traffic depending on availability of resources
    • AMIs are region specific and must be copied over to other regions
    • EC2 key pairs are also region specific and must be imported to other regions.
  • -Make sure that data and changes are up to date with both regions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Potential issues with replicating data

A
    • The distance between our replication sites can increase replica lag
    • Bandwidth limitations can also delay data replication
    • it’s important to understand which services have asynchronous replication, and which have synchronous replication.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Centralized logging

A
    • Consolidate logs in one central location

- - analyze, store, and modify the data in any way that you need

17
Q

Storing log files and backups tools

A
    • Rsyslog(native to Linux)
    • Splunk
    • Kiwi
    • Graylog
    • The ELK stack(Elasticssearch, Logstash, Kibana)
18
Q

Redshift

A
  • -a fast, fully managed, petabyte-scale data warehouse
    • use it to query large amounts of data
    • send it data from services like s3, DynamoDB, or Kinesis
19
Q

other types of logging – S3 access logs

A
    • enable logging on a bucket
    • requests made to that bucket will be logged and stored on S3
    • No extra charge, except for the extra storage cost
20
Q

CloudTrail

A
    • logs API calls made on our account

- - useful for debugging, security auditing, and to learn how users interact with our resources