05 - Database Flashcards

1
Q

RDS Backups

• Backups are automatically enabled in RDS

A

1) Automated backups:
• Daily full backup of the database (during the maintenance window)
• Transaction logs are backed-up by RDS every 5 minutes
• => ability to restore to any point in time (from oldest backup to 5 minutes ago)
• 7 days retention (can be increased to 35 days)

2) DB Snapshots:
• Manually triggered by the user
• Retention of backup for as long as you want

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

RDS – Storage Auto Scaling
• Helps you increase storage on your RDS DB instance dynamically
• When RDS detects you are running out of free database storage, it scales automatically
• Avoid manually scaling your database storage

A

• You have to set Maximum Storage Threshold (maximum limit for DB storage)
• Automatically modify storage if:
- Free storage is less than 10% of allocated storage
- Low-storage lasts at least 5 minutes
- 6 hours have passed since last modification
• Useful for applications with unpredictable workloads
• Supports all RDS database engine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

RDS Read Replicas for read scalability
• In AWS there’s a network cost when data goes from one AZ to another
• For RDS Read Replicas within the same region, you don’t pay that fee

A
  • Up to 5 Read Replicas
  • Within AZ, Cross AZ or Cross Region
  • Replication is ASYNC, so reads are eventually consistent
  • Replicas can be promoted to their own DB
  • Applications must update the connection string to leverage read replicas
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

RDS Multi AZ (Disaster Recovery)
• SYNC replication
• One DNS name – automatic app failover to standby
• Increase availability

A
  • Failover in case of loss of AZ, loss of network, instance or storage failure
  • No manual intervention in apps
  • Not used for scaling
  • Multi-AZ replication is free
  • Note:The Read Replicas be setup as Multi AZ for Disaster Recovery (DR)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

RDS - IAM Authentication

• IAM database authentication works with MySQL and PostgreSQL

A

1) You don’t need a password, just an authentication token obtained through IAM & RDS API calls
2) Auth token has a lifetime of 15 minutes

3) Benefits:
• Network in/out must be encrypted using SSL
• IAM to centrally manage users instead of DB
• Can leverage IAM Roles and EC2 Instance profiles for easy integration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

RDS Security – Summary

A

1) Encryption at rest:
• Is done only when you first create the DB instance
• or: unencrypted DB => snapshot => copy snapshot as encrypted => create DB from snapshot

2) Your responsibility:
• Check the ports / IP / security group inbound rules in DB’s SG
• In-database user creation and permissions or manage through IAM
• Creating a database with or without public access
• Ensure parameter groups or DB is configured to only allow SSL connections

3) AWS responsibility:
• No SSH access
• No manual DB patching
• No manual OS patching
• No way to audit the underlying instance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Amazon Aurora
• Aurora is a proprietary technology from AWS (not open sourced)
• Postgres and MySQL are both supported as Aurora DB

A
  • Aurora is “AWS cloud optimized” and claims 5x performance improvement over MySQL on RDS, over 3x the performance of Postgres on RDS
  • Aurora storage automatically grows in increments of 10GB, up to 128 TB.
  • Aurora can have 15 replicas while MySQL has 5, and the replication process is faster (sub 10 ms replica lag)
  • Failover in Aurora is instantaneous. It’s HA (High Availability) native.
  • Aurora costs more than RDS (20% more) – but is more efficient
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Aurora High Availability and Read Scaling

A

1) 6 copies of your data across 3 AZ:
• 4 copies out of 6 needed for writes
• 3 copies out of 6 need for reads
• Self healing with peer-to-peer replication
• Storage is striped across 100s of volumes

2) One Aurora Instance takes writes (master)
3) Automated failover for master in less than 30 seconds
4) Master + up to 15 Aurora Read Replicas serve reads
5) Support for Cross Region Replication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Aurora Serverless

A
  • Automated database instantiation and autoscaling based on actual usage
  • Good for infrequent, intermittent or unpredictable workloads
  • No capacity planning needed
  • Pay per second, can be more cost-effective
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Aurora Multi-Master

A
  • In case you want immediate failover for write node (HA) –

* Every node does R/W - vs promoting a RR as the new master

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Global Aurora

A

1) Aurora Cross Region Read Replicas:
• Useful for disaster recovery
• Simple to put in place

2) Aurora Global Database (recommended):
• 1 Primary Region (read / write)
• Up to 5 secondary (read-only) regions, replication lag is less than 1 second
• Up to 16 Read Replicas per secondary region
• Helps for decreasing latency
• Promoting another region (for disaster recovery) has an RTO of < 1 minute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Aurora Machine Learning

• Enables you to add ML-based predictions to your applications via SQL

A

1) Simple, optimised, and secure integration between Aurora and AWS ML services

2) Supported services
• Amazon SageMaker (use with any ML model)
• Amazon Comprehend (for sentiment analysis)

3) You don’t need to have ML experience
4) Use cases: fraud detection, ads targeting, sentiment analysis, product recommendations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

ElastiCache – Redis vs Memcached

A
REDIS
• Multi AZ with Auto-Failover
• Read Replicas to scale reads and have high availability
• Data Durability using AOF persistence
• Backup and restore features
MEMCACHED
• Multi-node for partitioning of data (sharding)
• No high availability (replication)
• Non persistent
• No backup and restore
• Multi-threaded architecture
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

ElastiCache – Redis Use Case

A
  • Gaming Leaderboards are computationally complex
  • Redis Sorted Sets guarantee both uniqueness and element ordering
  • Each time a new element added, it’s ranked in real time, then added in correct order
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Amazon DynamoDB
• Fully managed, highly available with replication across multiple AZs
• NoSQL database - not a relational database

A
  • Scales to massive workloads, distributed database
  • Millions of requests per seconds, trillions of row, 100s of TB of storage
  • Fast and consistent in performance (low latency on retrieval)
  • Integrated with IAM for security, authorization and administration
  • Enables event driven programming with DynamoDB Streams
  • Low cost and auto-scaling capabilities
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

DynamoDB – Read/Write Capacity Modes

• Control how you manage your table’s capacity (read/write throughput)

A

Provisioned Mode (default)
• You specify the number of reads/writes per second
• You need to plan capacity beforehand
• Pay for provisioned Read Capacity Units (RCU) & Write Capacity Units (WCU)
• Possibility to add auto-scaling mode for RCU & WCU

On-Demand Mode
• Read/writes automatically scale up/down with your workloads
• No capacity planning needed
• Pay for what you use, more expensive ($$$)
• Great for unpredictable workloads

17
Q

DynamoDB Accelerator (DAX)

A
  • Fully-managed, highly available, seamless in-memory cache for DynamoDB
  • Help solve read congestion by caching
  • Microseconds latency for cached data
  • Doesn’t require application logic modification (compatible with existing DynamoDB APIs)
  • 5 minutes TTL for cache (default)
18
Q

DynamoDB Global Tables

A
  • Make a DynamoDB table accessible with low latency in multiple-regions
  • Active-Active replication
  • Applications can READ and WRITE to the table in any region
  • Must enable DynamoDB Streams as a pre-requisite
19
Q

RDS Overview

A

• Managed PostgreSQL / MySQL / Oracle / SQL Server
• Must provision an EC2 instance & EBS Volume type and size
• Support for Read Replicas and Multi AZ
• Security through IAM, Security Groups, KMS , SSL in transit
• Backup / Snapshot / Point in time restore feature
• Managed and Scheduled maintenance
• Monitoring through CloudWatch
• Use case: Store relational datasets (RDBMS / OLTP), perform SQL queries,
transactional inserts / update / delete is available

20
Q

RDS for Solutions Architect

A
  • Operations: small downtime when failover happens, when maintenance happens, scaling in read replicas / ec2 instance / restore EBS implies manual intervention, application changes
  • Security: AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, authorizing users in DB, using SSL
  • Reliability: Multi AZ feature, failover in case of failures
  • Performance: depends on EC2 instance type, EBS volume type, ability to add Read Replicas. Storage auto-scaling & manual scaling of instances
  • Cost: Pay per hour based on provisioned EC2 and EBS
21
Q

Aurora Overview

A
  • Compatible API for PostgreSQL / MySQL
  • Data is held in 6 replicas, across 3 AZ
  • Auto healing capability
  • Multi AZ, Auto Scaling Read Replicas
  • Read Replicas can be Global
  • Aurora database can be Global for DR or latency purposes
  • Auto scaling of storage from 10GB to 128 TB
  • Define EC2 instance type for aurora instances
  • Same security / monitoring / maintenance features as RDS
  • Aurora Serverless – for unpredictable / intermittent workloads
  • Aurora Multi-Master – for continuous writes failover
  • Use case: same as RDS, but with less maintenance / more flexibility / more performance
22
Q

Aurora for Solutions Architect

A
  • Operations: less operations, auto scaling storage
  • Security: AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, authorizing users in DB, using SSL
  • Reliability: Multi AZ, highly available, possibly more than RDS, Aurora Serverless option, Aurora Multi-Master option
  • Performance: 5x performance (according to AWS) due to architectural optimizations. Up to 15 Read Replicas (only 5 for RDS)
  • Cost: Pay per hour based on EC2 and storage usage. Possibly lower costs compared to Enterprise grade databases such as Oracle
23
Q

ElastiCache Overview

A
  • Managed Redis / Memcached (similar offering as RDS, but for caches)
  • In-memory data store, sub-millisecond latency
  • Must provision an EC2 instance type
  • Support for Clustering (Redis) and Multi AZ, Read Replicas (sharding)
  • Security through IAM, Security Groups, KMS, Redis Auth
  • Backup / Snapshot / Point in time restore feature
  • Managed and Scheduled maintenance
  • Monitoring through CloudWatch
  • Use Case: Key/Value store, Frequent reads, less writes, cache results for DB queries, store session data for websites, cannot use SQL.
24
Q

ElastiCache for Solutions Architect

A
  • Operations: same as RDS
  • Security: AWS responsible for OS security, we are responsible for setting up KMS, security groups, IAM policies, users (Redis Auth), using SSL
  • Reliability: Clustering, Multi AZ
  • Performance: Sub-millisecond performance, in memory, read replicas for sharding, very popular cache option
  • Cost: Pay per hour based on EC2 and storage usage
25
Q

DynamoDB Overview

A
  • AWS proprietary technology, managed NoSQL database
  • Serverless, provisioned capacity, auto scaling, on demand capacity (Nov 2018)
  • Can replace ElastiCache as a key/value store (storing session data for example)
  • Highly Available, Multi AZ by default, Read and Writes are decoupled, DAX for read cache
  • Reads can be eventually consistent or strongly consistent
  • Security, authentication and authorization is done through IAM
  • DynamoDB Streams to integrate with AWS Lambda
  • Backup / Restore feature, Global Table feature
  • Monitoring through CloudWatch
  • Can only query on primary key, sort key, or indexes
  • Use Case: Serverless applications development (small documents 100s KB), distributed serverless cache, doesn’t have SQL query language available, has transactions capability from Nov 2018
26
Q

DynamoDB for Solutions Architect

A
  • Operations: no operations needed, auto scaling capability, serverless
  • Security: full security through IAM policies, KMS encryption, SSL in flight
  • Reliability: Multi AZ, Backups
  • Performance: single digit millisecond performance, DAX for caching reads, performance doesn’t degrade if your application scales
  • Cost: Pay per provisioned capacity and storage usage (no need to guess in advance any capacity – can use auto scaling)
27
Q

Redshift Overview

A
  • Redshift is based on PostgreSQL, but it’s not used for OLTP
  • It’s OLAP – online analytical processing (analytics and data warehousing)
  • 10x better performance than other data warehouses, scale to PBs of data
  • Columnar storage of data (instead of row based)
  • Massively Parallel Query Execution (MPP)
  • Pay as you go based on the instances provisioned
  • Has a SQL interface for performing the queries
  • BI tools such as AWS Quicksight or Tableau integrate with it
  • Data is loaded from S3, DynamoDB, DMS, other DBs…
  • From 1 node to 128 nodes, up to 128 TB of space per node
  • Leader node: for query planning, results aggregation
  • Compute node: for performing the queries, send results to leader
  • Redshift Spectrum: perform queries directly against S3 (no need to load)
  • Backup & Restore, Security VPC / IAM / KMS, Monitoring
  • Redshift Enhanced VPC Routing: COPY / UNLOAD goes through VPC
28
Q

Redshift for Solutions Architect

A
  • Operations: like RDS
  • Security: IAM, VPC, KMS, SSL (like RDS)
  • Reliability: auto healing features, cross-region snapshot copy
  • Performance: 10x performance vs other data warehousing, compression
  • Cost: pay per node provisioned, 1/10th of the cost vs other warehouses
  • vs Athena: faster queries / joins / aggregations thanks to indexes
  • Remember: Redshift = Analytics / BI / Data Warehouse
29
Q

Neptune

A

• Fully managed graph database

  • When do we use Graphs?
    • High relationship data
    • Social Networking: Users friends with Users, replied to comment on post of user and likes other comments.
    • Knowledge graphs (Wikipedia)
  • Highly available across 3 AZ, with up to 15 read replicas
  • Point-in-time recovery, continuous backup to Amazon S3
  • Support for KMS encryption at rest + HTTPS