Databases and Analytics Flashcards
What does RDS stand for?
Relational database service
What is the primary language of RDS?
SQL
What is the name of Amazon’s proprietary database service?
Aurora
Give 3 reasons why you might use RDS instead of deploying on EC2 yourself?
- AWS will manage provisioning and patching
- It will continuously backup and restore to specific timestamps
- You will have a multi-AZ setup
- It will be easy to scale both horizontally and vertically
- You get monitoring dashboards
Why might you use Aurora over RDS? Why not?
Use:
* Claimed to have a 5x performance improvement over MySQL on RDS
* 3x the performance of Postgres
Not:
* Aurora is 20% more expensive than standard RDS
What benefits could having a serverless version of Aurora bring (depending on use case of course)?
- No capacity planning required
- Little management overhead (all automatic)
- Pay per second (could be more efficient depending on use case)
What is a read replica?
A duplicate of a database that is specifically made to increase read speeds by allowing there to be multiple sources from which applications can read
Since applications are constantly reading from read replicas - are they also written directly to as well?
No - read replicas are only for reading from. All writing is done to the main database.
What are the positives and negatives of a multi-region read replica system?
Multi region read replicas allow multi-region applications to maintain fast write speeds as they can use a database that is closer to them for reading. This also means that there is a disaster recovery system if a region goes down.
However, there is a significant cost associated with this system that must considered in its adoption.
What is a elasticache cache? Why would you use it?
An in-memory database with high performance and low latency.
Helps reduce the load off databases for read-intensive workloads.
What is DynamoDB?
A fully managed, highly available database.
Serverless and thus hugely scalable.
Key-value and non-relational.
What tiering option is available for DynamoDB that is similar to S3 storage?
Standard and infrequent access table classes for cost saving
If you need an in-memory cache for DynamoDB, what service would you use?
DynamoDB accelerator (DAX). NOT elasticache - DAX is well integrated with DynamoDB
What are DynamoDB global tables?
Making DynamoDB tables accessible in multiple regions with low latency by setting up a 2 way replication for the table. You can edit the table in any region.
What is Redshift?
A database service based on PostgreSQL used for online analytical processing and data warehousing
What is the redshift pricing model?
Pay as you go
What is elastic map reduce (EMR)? What would you use it for?
A service that allows you to create Hadoop clusters and use them to process a vast amount of data. Used for ML, big data, data processing etc.
What is Athena?
A serverless query service to perform analytics against S3 objects using SQL
What is QuickSight?
A business intelligence service used to create interactive dashboards
What is DocumentDB?
A database service used to store, query and index JSON data.
What is Neptune?
A graph database service used to build and run applications working with highly connected datasets
What is quantum ledger database (QLB)?
A database used to review the history of all the changes made to your application data over time (a ledger!!). Can manipulate with SQL
What does the ETL in Glue ETL stand for?
Extract, transform and load
What does Glue ETL do?
Allow us to prepare and transform data so that it is ready for analytics processing
What is the Glue data catalog?
A catalog of datasets that can be brought in and used with Athena, Redshift, EMR etc.
What is the Database Migration Service?
A service that is run on EC2 instances that allows for the movement of a database from one place to another.
Allows the source database to remain available throughout the migration process.
What is database system is Redshift based on?
PostgreSQL
What type of database is DynamoDB?
Key value database