Databases on AWS Flashcards

1
Q

What is RDS?

A

Relational Database Service

  • RDS Instance is provisioned by you and you can create one or more databases on it
  • RDS can be single AZ or multiple AZ
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

RDS Multi-AZ

A
  • Not in the free tier
  • Synchronous replication between primary to standby replica
  • Standby replica cannot be directly used, have to use a CNAME, and only connects to replica upon failure of main DB
  • 60 to 120s for failover - so HA but not fault-tolerant
  • only spans AZs, not regions
  • backups taken from replica, so no impact to main DB
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

RDS Backups

A

Backups and Snapshots
Region resilient - backs up to S3
Backups are automatic, Snapshots are manual

To improve RPO transaction logs are saved every 5 minutes, they are then re-played over the last backup to get really low RPOs.
Backups can be retained between 0 and 35 days

Restores are not fast esp if db is huge - RTO is high

Restores happen to a brand new DB instance, not the old instance so apps have to be updated to point to the newly restored instance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Read Replicas

A

Performance and Availability
RO replicas of an RDS intances, only for READ
Async replications

So Sync replications = Multi-AZ
Async replications = read replicas

Can be across regions - CRR - cross region read replica

Max 5x Direct read replicas for an RDS instance

RRs can have their own RRs but lag becomes a problem

Global performance improvement - scale out reads globally

RRs can help you get near 0 RPO and very low RTO

A RR can be promoted as the primary RW instance very quickly in case of failure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

RDS Data security

A

SSL/TLS is used for in-transit encryption, can be mandatory
EBS volume encryption can be done using KMS

AWS or customer-managed CMK is used to generate data-keys for encryption

Storage/logs/data/snapshots are all encrypted

Encryption cannot be removed once set.

MSSQL and Oracle supports TDE - done directly by the DB Engine

RDS oracle can integrate with CloudHSM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

RDS Data Security with IAM

A

IAM Role or User can be associated with a RDS user

This generates a 15-minute valid token that can be used by the user/role to login to the RDS instance.

Auth still happens inside the DB via settings there, IAM only lets you Authenticate the user to the DB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Aurora

A

Very different from RDS.. uses a Cluster

A single primary instance + 0 or more replicas

Unlike RDS you can read from the replicas and they also provide multi-AZ capability

Storage - no local storage uses cluster volume

Faster provisioning and improved scalability

You can have upto 15 replicas and any of them can be a failover target

Multiple endpoints available - cluster end point is main endpoint, Reader endpoints balance across replicas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Aurora Storage

A

64TiB shared cluster storage

6 replicas across 3 AZs- failure due to disk are minimized

Replication of DB happens at the storage level

Automatic data repair on failed disk reduces failures

SSDs with high IOPs and low Latency

No allocation of storage necessary when provisioning an instance, automatically allocated upon usage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Aurora Backups

A

Backups are similar to RDS

Restore creates a new cluster

Backtrack - allows in-place rewind/rollback to a previous point in time

Fast-clone - it does not make a one for one copy - it references original storage, only stores differences between the two

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Aurora Serverless

A

AS is to Aurora what Fargate is to ECS

No need to provision db instances of certain size

Removes one more piece of admin overhead - managing independent db instances

ACUs - Aurora capacity units - compute and memory
Min and Max ACUs can be specified
It scales to meet those numbers

It can even go down to 0 and be paused when there is no db activity

Same resilience of Aurora Provisioned - 6 copies across AZ

Proxy fleet brokers connection between the app and AS - scaling can be fluid because you are not connecting directly to an ACU

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Aurora Serverless use cases

A

Infrequently used Applications

New Applications - where you are unsure about load being placed on app

Variable workloads - lightly used apps which has peaks

Unpredictable workloads

Dev and Test databases

Multi-tenant applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Aurora Global Database

A

Global replication to up to 5 separate regions
~1 s replication at the storage layer between one region and the other regions
Upto 16 read-only replicas in the secondary region

When to use?

  • Cross region disaster recovery and business continuity
  • Global read scaling - low latency high performance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Aurora Multi Master Writes

A

Default is single Master and 0 or more RO replicas, failover takes time

MM has all nodes as R/W nodes
No Load balancing, no single cluster endpoint
Apps can initiate connection to one or more nodes in the cluster

When one node receives W operation it proposes that it commit the data to storage by all nodes, it wants a Quorum of acceptance. If Quorum rejects it, it generates a error for the user

Rejection can happen if data cannot be updated cleanly on all nodes and there are conflicts

Once data is committed to disk then in-memory caches of all nodes are also updated to reflect the new data.

MM writes means this solution can be fault-tolerant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Database Migration Service (DMS)

A

Runs using a replication instance

Source and Destination endpoints point at source and target databases one of which needs to be inside AWS

Full Load migration - full copy from source to destination

Full Load + CDC migration - full copy from src to dest but also any changes made to source db, after migration captured changes can be applied to the target to complete the migration

CDC only - allows use of external tooling to do the bulk copy and use DMS only for CDC only

DMS does not convert schema but there is a schema conversion tool (SCT) which can be used - purpose built for migrating db schema

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

DynamoDB Backup

A

On-demand backup: Full copy of table, retained until removed
Restore to same or cross region
Restore without indexes
Restore with different encryption settings

PITR - Point in time recovery
Disabled by default
Continuous stream of backup over a 35d window

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

DynamoDB on-demand

A

You don’t have to set explicit capacity settings
Can be 5 times the price vs provisioned version

Provisioned capacity allows you to specify RCUs and WCUs on a table

Every operation consumes at least 1 unit
1 RCU = 4KB read of data/second
1 WCU = 1KB of datas/second

WCU and RCU burst pool has 300s available - dip into this as little as possible otherwise you’ll get a “capacity exceeded” error

17
Q

DynamoDB Query and Scan

A

Query: Pick one partition value and query inside it

(Tip: Getting more data in single query would consume lesser RCUs rather than making multiple queries with specific PK and SK)

Scan: Scans through entire table, can filter for any attribute value, but whole table is scanned (no index lookup), capacity is still used for scanning every row

18
Q

DynamoDb Consistency

A

2 modes

  • Strongly consistent
  • Eventually consistent - 50% cheaper than strongly consistent

Writes are directed to the leader node
Leader updates the data and begins replication to additional storage nodes in other AZ

Strongly consistent read is directed to leader node hence more expensive

19
Q

DynamoDB indexes

A

LSI - local secondary index creates a view based on an alternative SK

  • attributes can be projected onto the index (all or some)
  • must be created with the table
  • 5 max LSIs per table

GSI - global secondary index - view based on PK and SK

  • can be created at any time after table is created
  • limited to 20 GSIs
  • always eventually consistent

Use GSIs as default, LSIs only when strong consistency is needed
Use indexes for alternate access patterns

20
Q

DynamoDB Global tables

A

Multi-master writes across regions
Tables are sync’ed with generally sub-second latency
Conflict resolution: Last writer wins

21
Q

DAX - DynamoDB accelerator

A

In-memory cache for DynamoDB

Operates within a VPC, not a public service
Designed to be deployed to multiple AZ in VPC to ensure HA
Cluster service - nodes are placed in the AZ
Primary node (writes) and replicas (reads)

Item Cache - getItem or batchGetItem data is cached
Query cache - data returned from Query or Scans are cached

Cache hits are returned in microseconds
Cache misses are returned in ms

Can also using write-through caching - cached as it is written - write it using DAX SDK

DAX instances can be scaled both UP and OUT

22
Q

Athena (underused, powerful)

A

Super powerful if you need ad-hoc queries on large datasets

Queries on data stored in S3, paying only for data consumed when running the query and storage used in S3.

“Schema on read” - table like translation

Schema defined in advance modifies data in flight into a table like structure, but original data is unmodified

Schema translates data into relational like format

Output can be sent to other services

23
Q

AWS Elasticache

A

In memory db
Redis and Memcached are provided
Read-heavy workloads, low latency requirements

Can reduce time to read from DB

Place to store session-data so app servers can be stateless

Requires application code to understand cache api

24
Q

Redis vs Memcached

A

Memcached

  • Simple data
  • No replication
  • Multiple nodes - sharding
  • No backups
  • Multi threaded - multi core CPUs taken advantage of

Redis

  • Strucured data
  • Multi AZ
  • Replication (scale reads) - HA
  • Back up and restore
  • Supports transactions
25
Q

Redshift

A

Peta-byte scale data warehouse
For long term analysis - reporting and analysis
OLAP (Online Analytical processing) db, not OLTP
Server based, not serverless