Databases on AWS Flashcards
Relational databases on AW
- SQL Server
- Oracle
- MySQL Server
- PostgreSQL
- Aurora
- MariaDB
RDS key feature
Multi-AZ for DR
Read Replicas for Performance
What is data warehouseing?
Used for business intelligence
Used to pull in very large and complex data sets
OLTP
Online Transaction Processing
RDS
OLAP
Online Analytics Processing
Datawarehousing
RedShift
OLTP vs OLAP
OLTP -
OLAP - Complicated queries for inferring info from data
ElastiCache
web service that makes it easy to deploy, operate, and scale an in-memory cache in the cloud.
Used to speed up performance of existing databases by caching frequent identical queries
Flavors:
- memcached
- redis
Helps when DBs get overloaded
RDS
Relational database
Red Shift
Amazon’s OLAP / Datawarehousing solution
Amazon’s NoSQL solution
DynamoDB
RDS backup types
Automated backups
Database snapshots
Automated Backups of RDS
- allow you to recover your db to any point in time within a retention period.
- backups take a full daily snapshot and store transaction logs thru the day
- During a recovery, AWS will choose the most recent snapshot and then apply the relevant transactions
- Enabled by default
- backup data stored in S3
Automated backup retention period
1 - 35 days
Database snapshots
done manually
stored even after RDS instance is deleted
Restoring backups
restored version will be a new RDS instance w/ a new DNS endpoint
DB encryption at rest
Done using the KMS service
encrypts backups, read replicas, and snapshots
Multi-AZ
- designed for DR
- creating a copy of the DB in a different AZ
- standby DB synced automatically
- If primary AZ goes down -> update DNS to point to backup in secondary AZ
- supports all but Aurora
- Aurora is fault-tolerant on its own
Read replica
Production db asynchronously writing new data to secondary DBs.
If there is too much load on the prod db,
Could also point individual EC2 instances to specific DBs
Used for read-heavy DB workloads
Available for all DBs
Used for scaling
must have automatic backups on
up to 5 read replica copies of any DB
Each have it’s own DNS endpoint
Can have multi-AZ
can be in a separate region from primary db
Redshift key management
By default, redshift takes care of key management
You can also manage your own keys through HSM or AWS KMS
Redshift Mult-AZ
Only available in 1 AZ - no multi-az
Can restore snapshots to a new AZ in event of outage
Redshift backup retention
enabled by default for 1 day retention period
max retention period = 35days
Redshift number of data copies
Tries to maintain 3 copies:
original
replica on compute nodes
backup in S3
Aurora compatible SQL languages
MySQL
PostgreSQL
Aurora relational vs noSQL
Aurora is a relational database
Aurora performance vs MySQL and PostgreSQL
5x better than MySQL
3x better than PostgreSQL
Aurora storage scaling
10GB - 64TB
Aurora DR
designed to handle loss of up to two copies of data w/out affecting db write availability
and up to three copies w/out affected read availability
self-healing: data blocks and disks scanned for errors and repaired automatically
Aurora replica types
Aurora replicas (15)
MySQL read replicas (5)
PosgreSQL (1)
Aurora read replication
asynchronous in milliseconds
In-region (no cross-region)
Automated failover
Aurora Serverless
on-demand autoscaling
good for infrequent, intermittent, or unpredictable workloads
Aurora copies
6 copies; 2 copies stored in each availability zone, w/ a minimum of 3 AZs
Aurora Snapshot sharing
can be shared w/ other AWS accounts
memcached vs redis
memcached
- simple
- multithreaded
redis
- multi-az
- advanced data types
- backup/restore
DMS - acronym
Database migration service
DMS - definition
service to make migrating easy for relational databases, data warehouses, nosql, and other types of data stores.
You can migrate into the cloud, between on-prem instances, or any combo of the two
Caching services
CloudFront
API Gateway
ElastiCache (Memcached, Redis)
DAX
EMR - definition
big data platform for processing large amounts of data
EMR - acronym
Elastic map reduce
EMR cluster
cluster - collection of EC2 instances
node - EC2 instance in the cluster
node type - node’s role within the cluster
EMR node types
Master - manages the cluster
Core - runs tasks and stores data (in HDFS)
Task - runs tasks. does NOT store data
EMR backup
Configure an archive of log files from master node to S3
Can only be configured when creating the cluster
archive done in 5 min intervals
Amazon Athena
query services that makes it easy to analyze data in S3 using SQL commands
Athena supported data formats
JSON, Parquet, ORC