Data Management Flashcards
RDS backups
- Transactional storage engine is recommended for durability (InnoDB MySQL)
- Degrades performance If Multi-AZ is not enabled.
- Deleting an instance deletes all automated backups(not manual backups)
- Backups are stored internally on Amazon S3
RDS restoring
- When restoring, only the default DB parameter and security groups are associated with the instance.
- you can change to a different DB engine as long as it is closely related to the previous engine and there is enough space allocated.
Automatic Backups – ElastiCache
- Backups available for Redis clusters only.
- Snapshots backup data for the entire cluster at a specific point in time.
- Backup window should be during the least-utilized time period of the day
- Snapshots can degrade performance and should be performance on read replicas.
Automatic Backups – Redshift
- provides free storage equal to the storage capacity of the cluster.
- Snapshots can be automated or manual, and are incremental;
- Restoring snapshots creates a new cluster and imports the data.
EC2 Backups
- No Built-in automated backup option
- Snapshots of EBS volumes are incremental and can be automated with the API, CLI, or even AWS Lambda
- Snapshots cause performance degradation
- Snapshots are stored on S3;
RDS Read Replicas Across Regions – DR
- multi-AZ deployments are not enough to protect against entire regions going down.
- We can use read replicas in other regions for higher availability.
RDS Read Replicas Across Regions – benefits
- help with performance if we have a global audience
- Packets have shorter distances to travel between our database and the end user;
- Replica lag can be expected to go up.
Services to be used for DR
EC2 and EBS S3 AWS Import/Export Snowball Amazon RDS Elastic Load Balancer and Auto Scaling Amazon Storage Gateway CloudFormation
Quick recovering from disasters
- use read replicas across regions for our database
- have a backup to our infrastructure in a geographically separate location.
- have the latest data and configuration available on our backup.
AWS Tools for DR
- EC2 AMI
- VM Import/Export
- For VMWare – we can use the AWS Management Portal for vCenter
- Direct Connect
- S3 Transfer Acceleration
DR scenarios – Backup and restore scenario
- use AWS as a backup solution only by storing VMs, Snapshots, and other data
- strategically map out which data needs to be backed up, and how
- choose tools and services that comply with requirements(regulatory, financial, etc…)
- Determine data lifetime and longterm backup strategies
- test you backups often and thoroughly.
DR scenarios – pilot light
- keeps the environment small but can ignite and scale to failover our on-premises infrastructure
- provisions the bare minimum resources but is always ready for a failover
- Growing the infrastructure to scale can take some time
- resource deployment and provisioning should be automated
- tested often and thoroughly.
DR scenarios – hot Standby(multi-site)
- provides the least downtime possible
- keeps all of the resources ready for use at any moment’s notice
- can be complex to maintain
- usually the most expensive to implement.
DR scenarios – Duplicate the environments from one region to another
- Many concepts from our on-premises scenarios still apply for this scenario
- we can use read replicas for our Amazon RDS database
- route 53 has a Failover routing policy which routes traffic depending on availability of resources
- AMIs are region specific and must be copied over to other regions
- EC2 key pairs are also region specific and must be imported to other regions.
- -Make sure that data and changes are up to date with both regions.
Potential issues with replicating data
- The distance between our replication sites can increase replica lag
- Bandwidth limitations can also delay data replication
- it’s important to understand which services have asynchronous replication, and which have synchronous replication.
Centralized logging
- Consolidate logs in one central location
- - analyze, store, and modify the data in any way that you need
Storing log files and backups tools
- Rsyslog(native to Linux)
- Splunk
- Kiwi
- Graylog
- The ELK stack(Elasticssearch, Logstash, Kibana)
Redshift
- -a fast, fully managed, petabyte-scale data warehouse
- use it to query large amounts of data
- send it data from services like s3, DynamoDB, or Kinesis
other types of logging – S3 access logs
- enable logging on a bucket
- requests made to that bucket will be logged and stored on S3
- No extra charge, except for the extra storage cost
CloudTrail
- logs API calls made on our account
- - useful for debugging, security auditing, and to learn how users interact with our resources