6_DynamoDB, Redshift, Elasticache, Aurora Flashcards

1
Q

DynamoDB vs RDS

DynamoDB offers “push button” scaling, meaning that you can scale your database on the fly, without any downtime.

RDS is not so easy and you usually have to use a bigger instance size (scale up) or to add a read replica.

A

DynamoDB vs RDS

DynamoDB offers “push button” scaling, meaning that you can scale your database on the fly, without any downtime.

RDS is not so easy and you usually have to use a bigger instance size (scale up) or to add a read replica.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

DynamoDB

  • Stored exclusively on SSD storage to provide high I/O performance
  • Spread across 3 geographically distinct data centres
  • Eventual Consistent Reads (default)
    • Consistency across all copies of data is usually reached within a second. Repeating a read after a short time should return the updated data (Best read performance)
  • Strongly Consistent Reads
    • A strongly consistent read returns a result that reflects all writes that received a successful response prior to the read
A

DynamoDB

  • Stored on SSD storage
  • Spread across 3 geographically distinct data centres
  • Eventual Consistent Reads (default)
    • Consistency across all copies of data is usually reached within a second. Repeating a read after a short time should return the updated data (Best read performance)
  • Strongly Consistent Reads
    • A strongly consistent read returns a result that reflects all writes that received a successful response prior to the read
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

DynamoDB Accelerator (DAX) [SAA-C02]

  • Fully managed, highly available, in-memory cache
  • 10x performance improvement
  • Reduces request time from milliseconds to microseconds - even under load
  • No need for developers to manage caching logic
  • Compatible with DynamoDB API calls
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

DynamoDB Backup and Restore [SAA-C02]

​Point -in-Time Recovery (PITR)

  • Protects against accidental writes or deletes
  • Restore to any point in the last 35 days
  • Incremental backups
  • Not enables by default
  • Latest restorable: five minutes in the past
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

DynamoDB Streams [SAA-C02]

DynamoDB Stream is an ordered flow of information about changes to items in an Amazon DynamoDB table. When you enable a stream on a table, DynamoDB captures information about every modification to data items in the table.

When you enable DynamoDB Streams on a table, you can associate the stream ARN with a Lambda function that you write. Immediately after an item in the table is modified, a new record appears in the table’s stream. AWS Lambda polls the stream and invokes your Lambda function synchronously when it detects new stream records.

  • Time-ordered sequence of item-level changes in a table
  • Stored for 24 hours
  • Inserts, updates and deletes
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

DynamoDB - Global Tables [SAA-C02]

  • Globally distributed applications
  • Based on DynamoDB streams
  • Multi-region redundacy for DR or HA
  • No application rewrites
  • Replication latency under one second
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Redshift

Amazon Redshit is a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud. Customers can start small for just $0.25 per hour with no commitments or upfront costs and scale to a petabyte or more for $1000 per terabyte per year, less than a tenth of most other data warehousing solutions.

  • Redshift is used for Business Intelligence
  • Available in only 1 AZ
A

Redshift

Amazon Redshit is a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud. Customers can start small for just $0.25 per hour with no commitments or upfront costs and scale to a petabyte or more for $1000 per terabyte per year, less than a tenth of most other data warehousing solutions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Redshift Configuration

  • Single Node (up to 160Gb)
  • Multi-Node
    • Leader Node (manages client connections and receives queries)
    • Compute Node (store data and perform queries and computations). Up to 128 Compute Nodes
A

Redshift Configuration

  • Single Node (up to 160Gb)
  • Multi-Node
    • Leader Node (manages client connections and receives queries)
    • Compute Node (store data and perform queries and computations). Up to 128 Compute Nodes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Columnar Data Storage

Instead of storing data as a series of rows, Amazon Redshift organizes the data by column. Unlike row-based systems, which are ideal for transaction processing, column-based systems are ideal for data warehousing and analytics, where queries often involve aggregates performed over large data sets. Since only the columns involved in the queries are processed and columnar data is stored sequentially on the storage media, column-based systems require far fewer I/Os, greatly improving query performance.

A

Columnar Data Storage

Instead of storing data as a series of rows, Amazon Redshift organizes the data by column. Unlike row-based systems, which are ideal for transaction processing, column-based systems are ideal for data warehousing and analytics, where queries often involve aggregates performed over large data sets. Since only the columns involved in the queries are processed and columnar data is stored sequentially on the storage media, column-based systems require far fewer I/Os, greatly improving query performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Advanced Compression

Columnar data stores can be compressed much more than row-based data stores because similar data is stored sequentially on disk. Amazon Redshift employs multiple compression techniques and can often achieve significant compression relative to traditional relational data stores. In addition, Amazon Redshift doesn’t require indexes or materialized views and so uses less space than traditional relational database systems.

When loading data into an empty table, Amazon Redshift automatically samples your data and selects the most appropriate compression scheme.

A

Advanced Compression

Columnar data stores can be compressed much more than row-based data stores because similar data is stored sequentially on disk. Amazon Redshift employs multiple compression techniques and can often achieve significant compression relative to traditional relational data stores. In addition, Amazon Redshift doesn’t require indexes or materialized views and so uses less space than traditional relational database systems.

When loading data into an empty table, Amazon Redshift automatically samples your data and selects the most appropriate compression scheme.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Massively Parallel Processing (MPP)

Amazon Redshift automatically distributes data and query load across all nodes.
Amazon Redshift makes it easy to add nodes to your data warehouse and enables you to maintain fast query performance as your data warehouse grows.

A

Massively Parallel Processing (MPP)

Amazon Redshift automatically distributes data and query load across all nodes.
Amazon Redshift makes it easy to add nodes to your data warehouse and enables you to maintain fast query performance as your data warehouse grows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Redshift - Backups

  • Enabled by default with a 1 day retention period.
  • Maximum retention period is 35 days.
  • Redshift always attempts to maintain at least three copies of your data (the original and replica on the compute nodes and a backup in Amazon S3).
  • Redshift can also asynchronously replicate your snapshots to S3 in another region for disaster recovery.
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Aurora Scaling

  • Start with 10Gb, scales in 10Gb increments to 64Tb (Storage Autoscaling)
  • Compute resources can scale up to 32vCPUs and 244Gb of memory
  • 2 copies of your data is contained in each availability zone, with minimum of 3 availability zones. 6 copies of your data
  • You can share Aurora Snapshots with other AWS accounts.
  • Use Aurora Serverless if you want a simple, cost-effective option for infrequent, intermittent, or unpredictable workloads.
A

Aurora Scaling

  • Start with 10Gb, scales in 10Gb increments to 64Tb (Storage Autoscaling)
  • Compute resources can scale up to 32vCPUs and 244Gb of memory
  • 2 copies of your data is contained in each availability zone, with minimum of 3 availability zones. 6 copies of your data
  • Aurora is designed to transparently handle the loss of up to two copies of data without affecting database write availability and up to three copies without affecting read availability
  • Aurora storage is also self-healing. Data blocks and disks are continuously scanned for errors and repaired automatically
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Aurora Replicas

3 Types of Replicas are available:

  • Aurora Replicas (currently 15)
  • MySQL Read Replicas (currently 5)
  • PostgresQL Replicas

Automated failover is only available with Aurora Replicas.

A

Aurora Replicas

2 Types of Replicas are available:

  • Aurora Replicas (currently 15)
  • MySQL Read Replicas (currently 5)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Aurora - Additional Tips

  • Aurora has automated backup turned on by default. You can also take Snapshots with Aurora.
  • You can share Aurora Snapshots with other AWS accounts.
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Elasticache

Elasticache is a web service that makes it easy to deploy, operate, and scale an in-memory cache in the cloud. The service improves the performance of web applications by allowing you to retrieve information from fast, managed, in-memory caches, instead of relying entirely on slower disk-based databases.

Caching improves application performance by storing critical pieces of data in memory for low-latency access. Cached information may include the results of I/O-intensive database queries or the results of computationally-intensive calculations.

Use Elasticache to increase database and web application performance.

A

Elasticache

Elasticache is a web service that makes it easy to deploy, operate, and scale an in-memory cache in the cloud. The service improves the performance of web applications by allowing you to retrieve information from fast, managed, in-memory caches, instead of relying entirely on slower disk-based databases.

Caching improves application performance by storing critical pieces of data in memory for low-latency access. Cached information may include the results of I/O-intensive database queries or the results of computationally-intensive calculations.

17
Q

Types of Elasticache

  • Memcached
    • A widely adopted memory object caching system. Elasticache is protocol compliant with Memcached, so popular tools that you use today with existing Memcached environmnents will work seamlessly with the service.
  • Redis
    • A popular open-source in-memory key-value store that supports data structures such as sorted sets and lists. Elasticache supports Master/Slave replication and Multi-AZ which can be used to achieve cross AZ redundancy.
      • Redis is multi-AZ
      • You can do backups and restores of Redis
A

Types of Elasticache

  • Memcached
    • A widely adopted memory object caching system. Elasticache is protocol compliant with Memcached, so popular tools that you use today with existing Memcached environmnents will work seamlessly with the service.
  • Redis
    • A popular open-source in-memory key-value store that supports data structures such as sorted sets and lists. Elasticache supports Master/Slave replication and Multi-AZ which can be used to achieve cross AZ redundancy.
18
Q

Elasticache Exam Tips

Typically you will be given a scenario where a particular database is under a lot of stress/load. You may be asked which service you should use to alleviate this.

Elasticache is a good choice if your database is particularly read heavy and not prone to frequent changing (use Read Replicas instead when more writes/updates to the main DB).

Redshift is a good answer if the reason your database is feeling stress is because management keep running OLAP transactions on it etc.

A

Elasticache Exam Tips

Typically you will be given a scenario where a particular database is under a lot of stress/load. You may be asked which service you should use to alleviate this.

Elasticache is a good choice if your database is particularly read heavy and not prone to frequent changing (use Read Replicas instead when more writes/updates to the main DB).

Redshift is a good answer if the reason your database is feeling stress is because management keep running OLAP transactions on it etc.

19
Q

DMS [SAA-C02]

  • DMS allows you to migrate databases from one source to AWS
  • The source can either be on-premises, or inside AWS itself ot another cloud provider such as Azure
  • You can do homogenous migrations (same DB engines) or heterogeneous migrations
  • If you do a heterogeneous migration, you will need the AWS Schema Conversion Tool (SCT)
A
20
Q

Caching Strategies on AWS [SAA-C02]

Caching is a balancing act between up-to-date, accurate information and latency. We can use the following services to cache on AWS:

  • CloudFront
  • API Gateway
  • ElastiCache - Memcached and Redis
  • DynamoDB Accelerator (DAX)
A
21
Q

EMR [SAA-C02]

  • EMR is used for big data processing
  • Consists for a master node, a core node, and optionally a task node
  • By default, log data is stored on the master node
  • You can configure replication to S3 on five-minutes intervals for all log data from the master node; however this can only be configured when creating the cluster for the first time
A