AWS Database Services Flashcards
When to use Database on EC2
Full control over instance and database
Preferred DB not available under RDS
When to use Amazon RDS
Need traditional relational database for OLTP
Your data is well-formed and structured
Existing applications requiring RDBMS
When to use Amazon DynamoDB
Name/value pair data
Unpredictable data structure
In-memory performance with persistence
High I/O needs
Require dynamic scaling
When to use Amazon RedShift
Data warehouse for large volumes of aggregated data
Primarily OLAP workloads
When to use Amazon Neptune
Relationships between objects are of high value
When to use Amazon ElastiCache
Fast temporary storage for small amounts of data
Highly volatile data (non-persistent)
When to use Amazon S3
Binary large objects (BLOBs)
Static websites
A managed service that makes it easy to set up, operate, and scale a relational database in the cloud
Amazon Relational Database Service (RDS)
RDS facts
Relational databases are known as Structured Query Language (SQL) databases.
Non-relational databases are known as NoSQL databases.
RDS is an Online Transaction Processing (OLTP) type of database.
Aurora is Amazon’s proprietary database.
RDS is a fully managed service and you do not have access to the underlying EC2 instance (no root access)
Amazon Relational Database Service (RDS) features and benefits
SQL type of database.
Can be used to perform complex queries and joins.
Easy to setup, highly available, fault tolerant, and scalable.
Used when data is clearly defined.
Common use cases include online stores and banking systems.
Database engines supported by Amazon RDS
SQL Server.
Oracle.
MySQL Server.
PostgreSQL.
Aurora.
MariaDB.
Amazon RDS includes:
Security and patching of the DB instances.
Automated backup for the DB instances.
Software updates for the DB engine.
Easy scaling for storage and compute.
Multi-AZ option with synchronous replication.
Automatic failover for Multi-AZ option.
Read replicas option for read heavy workloads.
A database environment in the cloud with the compute and storage resources you specify
Database Instance
Amazon RDS encryption
You can encrypt your Amazon RDS instances and snapshots at rest by enabling the encryption option for your Amazon RDS DB instance.
Encryption at rest is supported for all DB types and uses AWS KMS.
You cannot encrypt an existing DB, you need to create a snapshot, copy it, encrypt the copy, then build an encrypted DB from the snapshot.
A collection of subnets (typically private) that you create in a VPC and that you then designate for your DB instances
Database Subnet Group
Database Subnet Group facts:
Each DB subnet group should have subnets in at least two Availability Zones in each region.
It is recommended to configure a subnet group with subnets in each AZ (even for standalone instances).
AWS Charges for RDS:
DB instance hours (partial hours are charged as full hours).
Storage GB/month.
I/O requests/month – for magnetic storage.
Provisioned IOPS/month – for RDS provisioned IOPS SSD.
Egress data transfer.
Backup storage (DB backups and manual snapshots).
RDS Scalability:
You can only scale RDS up (compute and storage).
You cannot decrease the allocated storage for an RDS instance.
You can scale storage and change the storage type for all DB engines except MS SQL.
How RDS provides fault tolerance and disaster recovery
RDS provides multi-AZ for disaster recovery which provides fault tolerance across availability zones:
Multi-AZ RDS creates a replica in another AZ and synchronously replicates to it (DR only).
There is an option to choose multi-AZ during the launch wizard.
AWS recommends the use of provisioned IOPS storage for multi-AZ RDS DB instances.
Each AZ runs on its own physically distinct, independent infrastructure, and is engineered to be highly reliable.
You cannot choose which AZ in the region will be chosen to create the standby DB instance.
Read Replicas
Read Replicas – provide improved performance for reads:
Read replicas are used for read heavy DBs and replication is asynchronous.
Read replicas are for workload sharing and offloading.
Read replicas provide read-only DR.
Read replicas are created from a snapshot of the master instance.
Must have automated backups enabled on the primary (retention period > 0).
A fully managed NoSQL database service that provides fast and predictable performance with seamless scalability.
Amazon DynamoDB
Dynamo DB features and benefits
NoSQL type of database (non-relational).
Fast, highly available, and fully managed.
Used when data is fluid and can change.
Common use cases include social networks and web analytics.
DynamoDB Facts
Push button scaling means that you can scale the DB at any time without incurring downtime.
SSD based and uses limited indexing on attributes for performance.
DynamoDB is a Web service that uses HTTP over SSL (HTTPS) as a transport and JSON as a message serialization format.
Amazon DynamoDB stores three geographically distributed replicas of each table to enable high availability and data durability.
Data is synchronously replicated across 3 facilities (AZs) in a region.
Provides low read and write latency.
Scale storage and throughput up or down as needed without code changes or downtime.
DynamoDB is schema-less.
DynamoDB can be used for storing session state.
Amazon DynamoDB cross-region replication
Cross-region replication allows you to replicate across regions:
Amazon DynamoDB global tables provides a fully managed solution for deploying a multi-region, multi-master database.
When you create a global table, you specify the AWS regions where you want the table to be available.
DynamoDB performs all the necessary tasks to create identical tables in these regions and propagate ongoing data changes to all of them.
Amazon DynamoDB read models:
Eventually consistent reads (Default):
The eventual consistency option maximizes your read throughput (best read performance).
An eventually consistent read might not reflect the results of a recently completed write.
Consistency across all copies reached within 1 second.
Strongly consistent reads:
A strongly consistent read returns a result that reflects all writes that received a successful response prior to the read (faster consistency).
A fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from milliseconds to microseconds – even at millions of requests per second.
Amazon DynamoDB Accelerator (DAX)
A fast, fully managed data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and existing Business Intelligence (BI) tools
Amazon RedShift
Amazon RedShift Facts
RedShift is a SQL based data warehouse used for analytics applications.
RedShift is a relational database that is used for Online Analytics Processing (OLAP) use cases.
RedShift is used for running complex analytic queries against petabytes of structured data, using sophisticated query optimization, columnar storage on high-performance local disks, and massively parallel query execution.
RedShift is ideal for processing large amounts of data for business intelligence.
RedShift is 10x faster than a traditional SQL DB.
RedShift uses columnar data storage:
Data is stored sequentially in columns instead of rows.
Columnar based DB is ideal for data warehousing and analytics.
Requires fewer I/Os which greatly enhances performance.
RedShift provides advanced compression:
Data is stored sequentially in columns which allows for much better performance and less storage space.
RedShift automatically selects the compression scheme.
How does Redshift enhance availability and improve durability
RedShift uses replication and continuous backups to enhance availability and improve durability and can automatically recover from component and node failures.
The 3 copies of data Redshift keeps
The original.
A replica on compute nodes (within the cluster).
A backup copy on S3.
RedShift provides continuous/incremental backups:
Multiple copies within a cluster.
Continuous and incremental backups to S3.
Continuous and incremental backups across regions.
Streaming restore.
RedShift provides fault tolerance for the following failures:
Disk failures.
Nodes failures.
Network failures.
AZ/region level disasters.
A web service that makes it easy to deploy and run Memcached or Redis protocol-compliant server nodes in the cloud.
Amazon ElastiCache
Amazon ElastiCache Benefits
The in-memory caching provided by ElastiCache can be used to significantly improve latency and throughput for many read-heavy application workloads or compute-intensive workloads.
Best for scenarios where the DB load is based on Online Analytics Processing (OLAP) transactions.
ElastiCache Web session Store Use Case
In cases with load-balanced web servers, store web session information in Redis so if a server is lost, the session info is not lost, and another web server can pick it up
ElastiCache Use Cases
Web Session Store
Database Caching
Leaderboards
Streaming data dashboards
ElastiCache Database Caching Use Case
Use Memcached in front of AWS RDS to cache popular queries to offload work from RDS and return results faster to users
ElastiCache Leaderboards Use Case
Use Redis to provide a live leaderboard for millions of users of your mobile app
ElastiCache Streaming data dashboards Use Case
Provide a landing spot for streaming sensor data on the factory floor, providing live real-time dashboard displays
ElastiCache Facts
ElastiCache EC2 nodes cannot be accessed from the Internet, nor can they be accessed by EC2 instances in other VPCs.
Can be on-demand or reserved instances too (but not Spot instances).
ElastiCache can be used for storing session state.
simplest model, can run large nodes with multiple cores/threads, can be scaled in and out, can cache objects such as DBs.
Memcached ElastiCache Engine
complex model, supports encryption, master / slave replication, cross AZ (HA), automatic failover and backup/restore.
Redis ElastiCache Engine
A web service that enables businesses, researchers, data analysts, and developers to process vast amounts of data easily and cost-effectively.
Amazon EMR
Amazon EMR Features
EMR utilizes a hosted Hadoop framework running on Amazon EC2 and Amazon S3.
Managed Hadoop framework for processing huge amounts of data.
Also support Apache Spark, HBase, Presto and Flink.
Amazon EMR Use Cases
Most commonly used for log analysis, financial analysis, or extract, translate and loading (ETL) activities.