Databases on AWS Flashcards
What is RDS?
Relational Database Service
- RDS Instance is provisioned by you and you can create one or more databases on it
- RDS can be single AZ or multiple AZ
RDS Multi-AZ
- Not in the free tier
- Synchronous replication between primary to standby replica
- Standby replica cannot be directly used, have to use a CNAME, and only connects to replica upon failure of main DB
- 60 to 120s for failover - so HA but not fault-tolerant
- only spans AZs, not regions
- backups taken from replica, so no impact to main DB
RDS Backups
Backups and Snapshots
Region resilient - backs up to S3
Backups are automatic, Snapshots are manual
To improve RPO transaction logs are saved every 5 minutes, they are then re-played over the last backup to get really low RPOs.
Backups can be retained between 0 and 35 days
Restores are not fast esp if db is huge - RTO is high
Restores happen to a brand new DB instance, not the old instance so apps have to be updated to point to the newly restored instance
Read Replicas
Performance and Availability
RO replicas of an RDS intances, only for READ
Async replications
So Sync replications = Multi-AZ
Async replications = read replicas
Can be across regions - CRR - cross region read replica
Max 5x Direct read replicas for an RDS instance
RRs can have their own RRs but lag becomes a problem
Global performance improvement - scale out reads globally
RRs can help you get near 0 RPO and very low RTO
A RR can be promoted as the primary RW instance very quickly in case of failure
RDS Data security
SSL/TLS is used for in-transit encryption, can be mandatory
EBS volume encryption can be done using KMS
AWS or customer-managed CMK is used to generate data-keys for encryption
Storage/logs/data/snapshots are all encrypted
Encryption cannot be removed once set.
MSSQL and Oracle supports TDE - done directly by the DB Engine
RDS oracle can integrate with CloudHSM
RDS Data Security with IAM
IAM Role or User can be associated with a RDS user
This generates a 15-minute valid token that can be used by the user/role to login to the RDS instance.
Auth still happens inside the DB via settings there, IAM only lets you Authenticate the user to the DB
Aurora
Very different from RDS.. uses a Cluster
A single primary instance + 0 or more replicas
Unlike RDS you can read from the replicas and they also provide multi-AZ capability
Storage - no local storage uses cluster volume
Faster provisioning and improved scalability
You can have upto 15 replicas and any of them can be a failover target
Multiple endpoints available - cluster end point is main endpoint, Reader endpoints balance across replicas
Aurora Storage
64TiB shared cluster storage
6 replicas across 3 AZs- failure due to disk are minimized
Replication of DB happens at the storage level
Automatic data repair on failed disk reduces failures
SSDs with high IOPs and low Latency
No allocation of storage necessary when provisioning an instance, automatically allocated upon usage
Aurora Backups
Backups are similar to RDS
Restore creates a new cluster
Backtrack - allows in-place rewind/rollback to a previous point in time
Fast-clone - it does not make a one for one copy - it references original storage, only stores differences between the two
Aurora Serverless
AS is to Aurora what Fargate is to ECS
No need to provision db instances of certain size
Removes one more piece of admin overhead - managing independent db instances
ACUs - Aurora capacity units - compute and memory
Min and Max ACUs can be specified
It scales to meet those numbers
It can even go down to 0 and be paused when there is no db activity
Same resilience of Aurora Provisioned - 6 copies across AZ
Proxy fleet brokers connection between the app and AS - scaling can be fluid because you are not connecting directly to an ACU
Aurora Serverless use cases
Infrequently used Applications
New Applications - where you are unsure about load being placed on app
Variable workloads - lightly used apps which has peaks
Unpredictable workloads
Dev and Test databases
Multi-tenant applications
Aurora Global Database
Global replication to up to 5 separate regions
~1 s replication at the storage layer between one region and the other regions
Upto 16 read-only replicas in the secondary region
When to use?
- Cross region disaster recovery and business continuity
- Global read scaling - low latency high performance
Aurora Multi Master Writes
Default is single Master and 0 or more RO replicas, failover takes time
MM has all nodes as R/W nodes
No Load balancing, no single cluster endpoint
Apps can initiate connection to one or more nodes in the cluster
When one node receives W operation it proposes that it commit the data to storage by all nodes, it wants a Quorum of acceptance. If Quorum rejects it, it generates a error for the user
Rejection can happen if data cannot be updated cleanly on all nodes and there are conflicts
Once data is committed to disk then in-memory caches of all nodes are also updated to reflect the new data.
MM writes means this solution can be fault-tolerant
Database Migration Service (DMS)
Runs using a replication instance
Source and Destination endpoints point at source and target databases one of which needs to be inside AWS
Full Load migration - full copy from source to destination
Full Load + CDC migration - full copy from src to dest but also any changes made to source db, after migration captured changes can be applied to the target to complete the migration
CDC only - allows use of external tooling to do the bulk copy and use DMS only for CDC only
DMS does not convert schema but there is a schema conversion tool (SCT) which can be used - purpose built for migrating db schema
DynamoDB Backup
On-demand backup: Full copy of table, retained until removed
Restore to same or cross region
Restore without indexes
Restore with different encryption settings
PITR - Point in time recovery
Disabled by default
Continuous stream of backup over a 35d window
DynamoDB on-demand
You don’t have to set explicit capacity settings
Can be 5 times the price vs provisioned version
Provisioned capacity allows you to specify RCUs and WCUs on a table
Every operation consumes at least 1 unit
1 RCU = 4KB read of data/second
1 WCU = 1KB of datas/second
WCU and RCU burst pool has 300s available - dip into this as little as possible otherwise you’ll get a “capacity exceeded” error
DynamoDB Query and Scan
Query: Pick one partition value and query inside it
(Tip: Getting more data in single query would consume lesser RCUs rather than making multiple queries with specific PK and SK)
Scan: Scans through entire table, can filter for any attribute value, but whole table is scanned (no index lookup), capacity is still used for scanning every row
DynamoDb Consistency
2 modes
- Strongly consistent
- Eventually consistent - 50% cheaper than strongly consistent
Writes are directed to the leader node
Leader updates the data and begins replication to additional storage nodes in other AZ
Strongly consistent read is directed to leader node hence more expensive
DynamoDB indexes
LSI - local secondary index creates a view based on an alternative SK
- attributes can be projected onto the index (all or some)
- must be created with the table
- 5 max LSIs per table
GSI - global secondary index - view based on PK and SK
- can be created at any time after table is created
- limited to 20 GSIs
- always eventually consistent
Use GSIs as default, LSIs only when strong consistency is needed
Use indexes for alternate access patterns
DynamoDB Global tables
Multi-master writes across regions
Tables are sync’ed with generally sub-second latency
Conflict resolution: Last writer wins
DAX - DynamoDB accelerator
In-memory cache for DynamoDB
Operates within a VPC, not a public service
Designed to be deployed to multiple AZ in VPC to ensure HA
Cluster service - nodes are placed in the AZ
Primary node (writes) and replicas (reads)
Item Cache - getItem or batchGetItem data is cached
Query cache - data returned from Query or Scans are cached
Cache hits are returned in microseconds
Cache misses are returned in ms
Can also using write-through caching - cached as it is written - write it using DAX SDK
DAX instances can be scaled both UP and OUT
Athena (underused, powerful)
Super powerful if you need ad-hoc queries on large datasets
Queries on data stored in S3, paying only for data consumed when running the query and storage used in S3.
“Schema on read” - table like translation
Schema defined in advance modifies data in flight into a table like structure, but original data is unmodified
Schema translates data into relational like format
Output can be sent to other services
AWS Elasticache
In memory db
Redis and Memcached are provided
Read-heavy workloads, low latency requirements
Can reduce time to read from DB
Place to store session-data so app servers can be stateless
Requires application code to understand cache api
Redis vs Memcached
Memcached
- Simple data
- No replication
- Multiple nodes - sharding
- No backups
- Multi threaded - multi core CPUs taken advantage of
Redis
- Strucured data
- Multi AZ
- Replication (scale reads) - HA
- Back up and restore
- Supports transactions
Redshift
Peta-byte scale data warehouse
For long term analysis - reporting and analysis
OLAP (Online Analytical processing) db, not OLTP
Server based, not serverless