Databases & Analytics Flashcards
What are types of DBs?
- Relational
- NoSQL
What are the benefits of using an AWS provided DB instead of deploying one on EC2?
- Quick Provisioning, Scaling
- Automated Backup & Restore (Point in Time restore)
- Operating System Patching
- Monitoring and Alerting
- Disaster Recovery
Only backside: no ssh to the server!
What is Amazon RDS?
Relational Database Service
It allows to create relational DB for SQL
How is Amazon RDS structured?
Elastic Load Balancer
-> EC2 Instances (possibly with ASG)
-> will connect to relational SQL Database
What is Amazon Aurora?
Proprietary Tech from AWS (not open sourced)
Supports PostgreSQL and MySQL
It is Cloud optimized offering 5xPerformance on RDS with MYSQL and 3x with PostgreSQL.
Notes:
Automaticaly scales up to 128TB in increments of 10GBs
Costs about 20% more then RDS, but since it’s more effective it will be cheaper alltogether.
What is Amazon Aurora Serverless?
Automated DB Instantiation and auto-scaling based on the usage
pay per second (can be more effective)
Least (no) management overhead, no capa planning needed
Diagram
Client -> Proxy Fleet (managed by Aurora) -> n Aurora Instances
What are RDS Read Replicas?
Where are extra RDS Servers that can perform read operations. (up to 15 read replicas)
Writes are still performed via the Main RDS
What is RDS Multi-AZ
Failover in case of AZ outage.
Should the main RDS crash for whatever reason then Failover DB in different AZ will take over.
You can only create one RDS Multi-AZ.
What is RDS Multi-Region?
It’s for Read Replicas, but here they are spread across multiple Regions.
However when performing writes those still need to be performed via Main RDS in it’s region.
This still allows for some local optimizations as the reads don’t need to connect cross region only the writes and better disaster recovery.
Important: as we replicate data cross regions there will be network cost added.
What is Amazon ElastiCashe?
Way to get managed Redis or Memcached
(Caches are in-memory databases with high performance and low latency)
Reduces load off database for read intensive and frequently used data.
AWS takes care of basically everything in this case
How does the Architecture look in case of Elasti Cache?
-> Elastic Load Balancer -> EC2 Instances (possibly ASG)
-> (fast) read/write to ElastiCache
-> (slower) read/write to DB (Amazon RDS for instance)
What is Dynamo DB?
Fully Managed highly Available NoSQL DB with replication across 3 AZs
It can scale to massive workloads as a distibuted serverless database (u can create only a table not a whole DB)
Important it’s not a relational DB (no joins / foreign keys etc)
Offers single digit milisecond latency and is integrated with IAM
It also offers Standard and IA Table Class for cost-optimizations
What is DynamoDB Accelerator (DAX)
In-Memory cache for DynamoDB.
Offers 10x performance improvement
What are Global Tables of DynamoDB?
A Table that is (active - active) replicated between multiple selected regions where users can read and write on all of them instead just the main one like in RDS
What is Redshift?
Based on PostgreSQL data warehouse (OLAP)
10x better then other data warehouses, scales up to PBs of data
Columnar storage of data (no rows)
Massively Parallel Query Execution (MPP) is offered
Has a SQL Interface
Pay as u go (instances provisioned)
Offers connection to BI Tools like Quicksight or Tableau
What is Redshift Serverless?
No provisioning or Scaling of infrastructure.
Pay only for what u use (compute and storage during query) (save costs)
Reporting, dashboarding, real-time analytics.
What is Amazon EMR?
Elastic MapReduce that helps analyze and process vast amounts of data, by creating Hadoop clusters (Big Data).
It works good with Apache Spark, HBase and lot more.
Hundreds of EC2 Instances can be clustered.
provides auto–scaling and can be integrated with spot instances
What is Amazon Athena?
Serverless Query service that performs analytics on S3 Objects
It can be connected to Quicksight for reporting and dashboards
Priced at around 5$ per TB of data scanned.
Can be reduced by using compressed or columnar data (less scan)
Used for analysing logs, trails etc
What is Amazon QuickSight?
Serverless machine learning-powered business intelligence service to
create interactive dashboards :)
It’s embeddable with per-session pricing.
It can run on top of RDS, Aurora, Athena, Redshift, S3 and many more …
What is DocumentDB?
Basically MongoDB (NoSQL DB)
What is Amazon Neptune?
A fully managed Graph Database.
Highly available across 3 AZ and up to 15 read replicas.
Great for knowledge graphs, fraud detection, social networking etc..
What is Amazon Timestream?
Fully managed time series Database.
Auto scaling.
Allows to store and analyse trillions of events per day
It can be 1000s times faster & 1/10th of the cost of relational DB
What is Amazon QLDB?
Quantum Ledger Database
Book recording financial transactions (basically blockchain) as it is immutable.
2-3x better perfomance with possibility of using SQL to manipulate data.
What is Amazon Managed Blockchain?
Decentralized Blockchain
allows to join public ones or
create your own private one.
Compatible with Hyperledger Fabric and ethereum.W
What is AWS Glue?
Managed ETL Service (Extract Transform Load)
It can e.g extract data from both S3 and RDS glue it together (tranform and then load in Redshift to analyze it there.
What serivces ca use Glue Data Catalog?
Athena, Redshift, EMR to discover the datasets and do something with them.
What is Amazon DMS (Data Migration Service)?
IT allows to quickly and securlely migrate databases to AWS. The source DB remains available during the migration.
It supports not only homogeneous migrations like
Oracle to Oracle on AWS, but also
Microsoft SQL Server to Aurora