6_DynamoDB, Redshift, Elasticache, Aurora Flashcards
DynamoDB vs RDS
DynamoDB offers “push button” scaling, meaning that you can scale your database on the fly, without any downtime.
RDS is not so easy and you usually have to use a bigger instance size (scale up) or to add a read replica.
DynamoDB vs RDS
DynamoDB offers “push button” scaling, meaning that you can scale your database on the fly, without any downtime.
RDS is not so easy and you usually have to use a bigger instance size (scale up) or to add a read replica.
DynamoDB
- Stored exclusively on SSD storage to provide high I/O performance
- Spread across 3 geographically distinct data centres
- Eventual Consistent Reads (default)
- Consistency across all copies of data is usually reached within a second. Repeating a read after a short time should return the updated data (Best read performance)
- Strongly Consistent Reads
- A strongly consistent read returns a result that reflects all writes that received a successful response prior to the read
DynamoDB
- Stored on SSD storage
- Spread across 3 geographically distinct data centres
- Eventual Consistent Reads (default)
- Consistency across all copies of data is usually reached within a second. Repeating a read after a short time should return the updated data (Best read performance)
- Strongly Consistent Reads
- A strongly consistent read returns a result that reflects all writes that received a successful response prior to the read
DynamoDB Accelerator (DAX) [SAA-C02]
- Fully managed, highly available, in-memory cache
- 10x performance improvement
- Reduces request time from milliseconds to microseconds - even under load
- No need for developers to manage caching logic
- Compatible with DynamoDB API calls
DynamoDB Backup and Restore [SAA-C02]
Point -in-Time Recovery (PITR)
- Protects against accidental writes or deletes
- Restore to any point in the last 35 days
- Incremental backups
- Not enables by default
- Latest restorable: five minutes in the past
DynamoDB Streams [SAA-C02]
DynamoDB Stream is an ordered flow of information about changes to items in an Amazon DynamoDB table. When you enable a stream on a table, DynamoDB captures information about every modification to data items in the table.
When you enable DynamoDB Streams on a table, you can associate the stream ARN with a Lambda function that you write. Immediately after an item in the table is modified, a new record appears in the table’s stream. AWS Lambda polls the stream and invokes your Lambda function synchronously when it detects new stream records.
- Time-ordered sequence of item-level changes in a table
- Stored for 24 hours
- Inserts, updates and deletes
DynamoDB - Global Tables [SAA-C02]
- Globally distributed applications
- Based on DynamoDB streams
- Multi-region redundacy for DR or HA
- No application rewrites
- Replication latency under one second
Redshift
Amazon Redshit is a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud. Customers can start small for just $0.25 per hour with no commitments or upfront costs and scale to a petabyte or more for $1000 per terabyte per year, less than a tenth of most other data warehousing solutions.
- Redshift is used for Business Intelligence
- Available in only 1 AZ
Redshift
Amazon Redshit is a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud. Customers can start small for just $0.25 per hour with no commitments or upfront costs and scale to a petabyte or more for $1000 per terabyte per year, less than a tenth of most other data warehousing solutions.
Redshift Configuration
- Single Node (up to 160Gb)
- Multi-Node
- Leader Node (manages client connections and receives queries)
- Compute Node (store data and perform queries and computations). Up to 128 Compute Nodes
Redshift Configuration
- Single Node (up to 160Gb)
- Multi-Node
- Leader Node (manages client connections and receives queries)
- Compute Node (store data and perform queries and computations). Up to 128 Compute Nodes
Columnar Data Storage
Instead of storing data as a series of rows, Amazon Redshift organizes the data by column. Unlike row-based systems, which are ideal for transaction processing, column-based systems are ideal for data warehousing and analytics, where queries often involve aggregates performed over large data sets. Since only the columns involved in the queries are processed and columnar data is stored sequentially on the storage media, column-based systems require far fewer I/Os, greatly improving query performance.
Columnar Data Storage
Instead of storing data as a series of rows, Amazon Redshift organizes the data by column. Unlike row-based systems, which are ideal for transaction processing, column-based systems are ideal for data warehousing and analytics, where queries often involve aggregates performed over large data sets. Since only the columns involved in the queries are processed and columnar data is stored sequentially on the storage media, column-based systems require far fewer I/Os, greatly improving query performance.
Advanced Compression
Columnar data stores can be compressed much more than row-based data stores because similar data is stored sequentially on disk. Amazon Redshift employs multiple compression techniques and can often achieve significant compression relative to traditional relational data stores. In addition, Amazon Redshift doesn’t require indexes or materialized views and so uses less space than traditional relational database systems.
When loading data into an empty table, Amazon Redshift automatically samples your data and selects the most appropriate compression scheme.
Advanced Compression
Columnar data stores can be compressed much more than row-based data stores because similar data is stored sequentially on disk. Amazon Redshift employs multiple compression techniques and can often achieve significant compression relative to traditional relational data stores. In addition, Amazon Redshift doesn’t require indexes or materialized views and so uses less space than traditional relational database systems.
When loading data into an empty table, Amazon Redshift automatically samples your data and selects the most appropriate compression scheme.
Massively Parallel Processing (MPP)
Amazon Redshift automatically distributes data and query load across all nodes.
Amazon Redshift makes it easy to add nodes to your data warehouse and enables you to maintain fast query performance as your data warehouse grows.
Massively Parallel Processing (MPP)
Amazon Redshift automatically distributes data and query load across all nodes.
Amazon Redshift makes it easy to add nodes to your data warehouse and enables you to maintain fast query performance as your data warehouse grows.
Redshift - Backups
- Enabled by default with a 1 day retention period.
- Maximum retention period is 35 days.
- Redshift always attempts to maintain at least three copies of your data (the original and replica on the compute nodes and a backup in Amazon S3).
- Redshift can also asynchronously replicate your snapshots to S3 in another region for disaster recovery.
Aurora Scaling
- Start with 10Gb, scales in 10Gb increments to 64Tb (Storage Autoscaling)
- Compute resources can scale up to 32vCPUs and 244Gb of memory
- 2 copies of your data is contained in each availability zone, with minimum of 3 availability zones. 6 copies of your data
- You can share Aurora Snapshots with other AWS accounts.
- Use Aurora Serverless if you want a simple, cost-effective option for infrequent, intermittent, or unpredictable workloads.
Aurora Scaling
- Start with 10Gb, scales in 10Gb increments to 64Tb (Storage Autoscaling)
- Compute resources can scale up to 32vCPUs and 244Gb of memory
- 2 copies of your data is contained in each availability zone, with minimum of 3 availability zones. 6 copies of your data
- Aurora is designed to transparently handle the loss of up to two copies of data without affecting database write availability and up to three copies without affecting read availability
- Aurora storage is also self-healing. Data blocks and disks are continuously scanned for errors and repaired automatically
Aurora Replicas
3 Types of Replicas are available:
- Aurora Replicas (currently 15)
- MySQL Read Replicas (currently 5)
- PostgresQL Replicas
Automated failover is only available with Aurora Replicas.
Aurora Replicas
2 Types of Replicas are available:
- Aurora Replicas (currently 15)
- MySQL Read Replicas (currently 5)
Aurora - Additional Tips
- Aurora has automated backup turned on by default. You can also take Snapshots with Aurora.
- You can share Aurora Snapshots with other AWS accounts.