Automated database instantiation and autoscaling based on actual usage Good for infrequent, intermittent or unpredictable workloads No capacity planning needed Pay per second, can be more cost-effective

In case you want immediate failover for write node (HA) – Every node does R/W - vs promoting a RR as the new master

05 - Database Flashcards by Gozaru Hanamichi

RDS Backups

• Backups are automatically enabled in RDS

1) Automated backups:
• Daily full backup of the database (during the maintenance window)
• Transaction logs are backed-up by RDS every 5 minutes
• => ability to restore to any point in time (from oldest backup to 5 minutes ago)
• 7 days retention (can be increased to 35 days)

2) DB Snapshots:
• Manually triggered by the user
• Retention of backup for as long as you want

How well did you know this?

Not at all

Perfectly

RDS – Storage Auto Scaling
• Helps you increase storage on your RDS DB instance dynamically
• When RDS detects you are running out of free database storage, it scales automatically
• Avoid manually scaling your database storage

• You have to set Maximum Storage Threshold (maximum limit for DB storage)
• Automatically modify storage if:
- Free storage is less than 10% of allocated storage
- Low-storage lasts at least 5 minutes
- 6 hours have passed since last modification
• Useful for applications with unpredictable workloads
• Supports all RDS database engine

How well did you know this?

Not at all

Perfectly

RDS Read Replicas for read scalability
• In AWS there’s a network cost when data goes from one AZ to another
• For RDS Read Replicas within the same region, you don’t pay that fee

Up to 5 Read Replicas
Within AZ, Cross AZ or Cross Region
Replication is ASYNC, so reads are eventually consistent
Replicas can be promoted to their own DB
Applications must update the connection string to leverage read replicas

How well did you know this?

Not at all

Perfectly

RDS Multi AZ (Disaster Recovery)
• SYNC replication
• One DNS name – automatic app failover to standby
• Increase availability

Failover in case of loss of AZ, loss of network, instance or storage failure
No manual intervention in apps
Not used for scaling
Multi-AZ replication is free
Note:The Read Replicas be setup as Multi AZ for Disaster Recovery (DR)

How well did you know this?

Not at all

Perfectly

RDS - IAM Authentication

• IAM database authentication works with MySQL and PostgreSQL

1) You don’t need a password, just an authentication token obtained through IAM & RDS API calls
2) Auth token has a lifetime of 15 minutes

3) Benefits:
• Network in/out must be encrypted using SSL
• IAM to centrally manage users instead of DB
• Can leverage IAM Roles and EC2 Instance profiles for easy integration

How well did you know this?

Not at all

Perfectly

RDS Security – Summary

1) Encryption at rest:
• Is done only when you first create the DB instance
• or: unencrypted DB => snapshot => copy snapshot as encrypted => create DB from snapshot

2) Your responsibility:
• Check the ports / IP / security group inbound rules in DB’s SG
• In-database user creation and permissions or manage through IAM
• Creating a database with or without public access
• Ensure parameter groups or DB is configured to only allow SSL connections

3) AWS responsibility:
• No SSH access
• No manual DB patching
• No manual OS patching
• No way to audit the underlying instance

How well did you know this?

Not at all

Perfectly

Amazon Aurora
• Aurora is a proprietary technology from AWS (not open sourced)
• Postgres and MySQL are both supported as Aurora DB

Aurora is “AWS cloud optimized” and claims 5x performance improvement over MySQL on RDS, over 3x the performance of Postgres on RDS
Aurora storage automatically grows in increments of 10GB, up to 128 TB.
Aurora can have 15 replicas while MySQL has 5, and the replication process is faster (sub 10 ms replica lag)
Failover in Aurora is instantaneous. It’s HA (High Availability) native.
Aurora costs more than RDS (20% more) – but is more efficient

How well did you know this?

Not at all

Perfectly

Aurora High Availability and Read Scaling

1) 6 copies of your data across 3 AZ:
• 4 copies out of 6 needed for writes
• 3 copies out of 6 need for reads
• Self healing with peer-to-peer replication
• Storage is striped across 100s of volumes

2) One Aurora Instance takes writes (master)
3) Automated failover for master in less than 30 seconds
4) Master + up to 15 Aurora Read Replicas serve reads
5) Support for Cross Region Replication

How well did you know this?

Not at all

Perfectly

Aurora Serverless

Automated database instantiation and autoscaling based on actual usage
Good for infrequent, intermittent or unpredictable workloads
No capacity planning needed
Pay per second, can be more cost-effective

How well did you know this?

Not at all

Perfectly

Aurora Multi-Master

In case you want immediate failover for write node (HA) –

* Every node does R/W - vs promoting a RR as the new master

How well did you know this?

Not at all

Perfectly

Global Aurora

1) Aurora Cross Region Read Replicas:
• Useful for disaster recovery
• Simple to put in place

2) Aurora Global Database (recommended):
• 1 Primary Region (read / write)
• Up to 5 secondary (read-only) regions, replication lag is less than 1 second
• Up to 16 Read Replicas per secondary region
• Helps for decreasing latency
• Promoting another region (for disaster recovery) has an RTO of < 1 minute

How well did you know this?

Not at all

Perfectly

Aurora Machine Learning

• Enables you to add ML-based predictions to your applications via SQL

1) Simple, optimised, and secure integration between Aurora and AWS ML services

2) Supported services
• Amazon SageMaker (use with any ML model)
• Amazon Comprehend (for sentiment analysis)

3) You don’t need to have ML experience
4) Use cases: fraud detection, ads targeting, sentiment analysis, product recommendations

How well did you know this?

Not at all

Perfectly

ElastiCache – Redis vs Memcached

REDIS
• Multi AZ with Auto-Failover
• Read Replicas to scale reads and have high availability
• Data Durability using AOF persistence
• Backup and restore features

MEMCACHED
• Multi-node for partitioning of data (sharding)
• No high availability (replication)
• Non persistent
• No backup and restore
• Multi-threaded architecture

How well did you know this?

Not at all

Perfectly

ElastiCache – Redis Use Case

Gaming Leaderboards are computationally complex
Redis Sorted Sets guarantee both uniqueness and element ordering
Each time a new element added, it’s ranked in real time, then added in correct order

How well did you know this?

Not at all

Perfectly

Amazon DynamoDB
• Fully managed, highly available with replication across multiple AZs
• NoSQL database - not a relational database

Scales to massive workloads, distributed database
Millions of requests per seconds, trillions of row, 100s of TB of storage
Fast and consistent in performance (low latency on retrieval)
Integrated with IAM for security, authorization and administration
Enables event driven programming with DynamoDB Streams
Low cost and auto-scaling capabilities

How well did you know this?

Not at all

Perfectly

DynamoDB – Read/Write Capacity Modes

• Control how you manage your table’s capacity (read/write throughput)

Study These Flashcards

Provisioned Mode (default)
• You specify the number of reads/writes per second
• You need to plan capacity beforehand
• Pay for provisioned Read Capacity Units (RCU) & Write Capacity Units (WCU)
• Possibility to add auto-scaling mode for RCU & WCU

On-Demand Mode
• Read/writes automatically scale up/down with your workloads
• No capacity planning needed
• Pay for what you use, more expensive ($$$)
• Great for unpredictable workloads

DynamoDB Accelerator (DAX)

Study These Flashcards

Fully-managed, highly available, seamless in-memory cache for DynamoDB
Help solve read congestion by caching
Microseconds latency for cached data
Doesn’t require application logic modification (compatible with existing DynamoDB APIs)
5 minutes TTL for cache (default)

DynamoDB Global Tables

Study These Flashcards

Make a DynamoDB table accessible with low latency in multiple-regions
Active-Active replication
Applications can READ and WRITE to the table in any region
Must enable DynamoDB Streams as a pre-requisite

RDS Overview

Study These Flashcards

• Managed PostgreSQL / MySQL / Oracle / SQL Server
• Must provision an EC2 instance & EBS Volume type and size
• Support for Read Replicas and Multi AZ
• Security through IAM, Security Groups, KMS , SSL in transit
• Backup / Snapshot / Point in time restore feature
• Managed and Scheduled maintenance
• Monitoring through CloudWatch
• Use case: Store relational datasets (RDBMS / OLTP), perform SQL queries,
transactional inserts / update / delete is available

RDS for Solutions Architect

Study These Flashcards

Operations: small downtime when failover happens, when maintenance happens, scaling in read replicas / ec2 instance / restore EBS implies manual intervention, application changes
Security: AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, authorizing users in DB, using SSL
Reliability: Multi AZ feature, failover in case of failures
Performance: depends on EC2 instance type, EBS volume type, ability to add Read Replicas. Storage auto-scaling & manual scaling of instances
Cost: Pay per hour based on provisioned EC2 and EBS

Aurora Overview

Study These Flashcards

Compatible API for PostgreSQL / MySQL
Data is held in 6 replicas, across 3 AZ
Auto healing capability
Multi AZ, Auto Scaling Read Replicas
Read Replicas can be Global
Aurora database can be Global for DR or latency purposes
Auto scaling of storage from 10GB to 128 TB
Define EC2 instance type for aurora instances
Same security / monitoring / maintenance features as RDS
Aurora Serverless – for unpredictable / intermittent workloads
Aurora Multi-Master – for continuous writes failover
Use case: same as RDS, but with less maintenance / more flexibility / more performance

Aurora for Solutions Architect

Study These Flashcards

Operations: less operations, auto scaling storage
Security: AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, authorizing users in DB, using SSL
Reliability: Multi AZ, highly available, possibly more than RDS, Aurora Serverless option, Aurora Multi-Master option
Performance: 5x performance (according to AWS) due to architectural optimizations. Up to 15 Read Replicas (only 5 for RDS)
Cost: Pay per hour based on EC2 and storage usage. Possibly lower costs compared to Enterprise grade databases such as Oracle

ElastiCache Overview

Study These Flashcards

Managed Redis / Memcached (similar offering as RDS, but for caches)
In-memory data store, sub-millisecond latency
Must provision an EC2 instance type
Support for Clustering (Redis) and Multi AZ, Read Replicas (sharding)
Security through IAM, Security Groups, KMS, Redis Auth
Backup / Snapshot / Point in time restore feature
Managed and Scheduled maintenance
Monitoring through CloudWatch
Use Case: Key/Value store, Frequent reads, less writes, cache results for DB queries, store session data for websites, cannot use SQL.

ElastiCache for Solutions Architect

Study These Flashcards

Operations: same as RDS
Security: AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, users (Redis Auth), using SSL
Reliability: Clustering, Multi AZ
Performance: Sub-millisecond performance, in memory, read replicas for sharding, very popular cache option
Cost: Pay per hour based on EC2 and storage usage

DynamoDB Overview

* AWS proprietary technology, managed NoSQL database * Serverless, provisioned capacity, auto scaling, on demand capacity (Nov 2018) * Can replace ElastiCache as a key/value store (storing session data for example) * Highly Available, Multi AZ by default, Read and Writes are decoupled, DAX for read cache * Reads can be eventually consistent or strongly consistent * Security, authentication and authorization is done through IAM * DynamoDB Streams to integrate with AWS Lambda * Backup / Restore feature, Global Table feature * Monitoring through CloudWatch * Can only query on primary key, sort key, or indexes * Use Case: Serverless applications development (small documents 100s KB), distributed serverless cache, doesn’t have SQL query language available, has transactions capability from Nov 2018

DynamoDB for Solutions Architect

* Operations: no operations needed, auto scaling capability, serverless * Security: full security through IAM policies, KMS encryption, SSL in flight * Reliability: Multi AZ, Backups * Performance: single digit millisecond performance, DAX for caching reads, performance doesn’t degrade if your application scales * Cost: Pay per provisioned capacity and storage usage (no need to guess in advance any capacity – can use auto scaling)

Redshift Overview

* Redshift is based on PostgreSQL, but it’s not used for OLTP * It’s OLAP – online analytical processing (analytics and data warehousing) * 10x better performance than other data warehouses, scale to PBs of data * Columnar storage of data (instead of row based) * Massively Parallel Query Execution (MPP) * Pay as you go based on the instances provisioned * Has a SQL interface for performing the queries * BI tools such as AWS Quicksight or Tableau integrate with it * Data is loaded from S3, DynamoDB, DMS, other DBs… * From 1 node to 128 nodes, up to 128 TB of space per node * Leader node: for query planning, results aggregation * Compute node: for performing the queries, send results to leader * Redshift Spectrum: perform queries directly against S3 (no need to load) * Backup & Restore, Security VPC / IAM / KMS, Monitoring * Redshift Enhanced VPC Routing: COPY / UNLOAD goes through VPC

Redshift for Solutions Architect

* Operations: like RDS * Security: IAM, VPC, KMS, SSL (like RDS) * Reliability: auto healing features, cross-region snapshot copy * Performance: 10x performance vs other data warehousing, compression * Cost: pay per node provisioned, 1/10th of the cost vs other warehouses * vs Athena: faster queries / joins / aggregations thanks to indexes * Remember: Redshift = Analytics / BI / Data Warehouse

Neptune

• Fully managed graph database * When do we use Graphs? * High relationship data * Social Networking: Users friends with Users, replied to comment on post of user and likes other comments. * Knowledge graphs (Wikipedia) * Highly available across 3 AZ, with up to 15 read replicas * Point-in-time recovery, continuous backup to Amazon S3 * Support for KMS encryption at rest + HTTPS

05 - Database Flashcards

(29 cards)