Section 9: Databases and Analytics Flashcards
What language do Relational Databases use?
SQL
What are NoSQL Databases?
Non-relational Databases
Benefits of NoSQL Databases
- Flexibility: easy to evolve data model
- Scalability: designed to scale-out by using distributed clusters
- High-performance: optimized for a specific data model
- Highly functional: types optimized for the data model
AWS RDS is
Relational Database Service that allows you to create databases in the cloud that are managed by AWS
Amazon Aurora is
An AWS cloud optimized database service
Aurora is (more/less) expensive than RDS but (more/less) efficient
Aurora is more expensive than RDS but more efficient
What is Amazon Aurora Servrerless?
Automated database instantiation and auto-scaling based on actual usage
Read Replica RDS Deployment
Scale the read workload of your DB
* Can create up to 15 Read Replicas
* Data is only written to the main DB
Multi-AZ RDS Deployment
- Failover in case of AZ outage (high availability)
- Data is only read/written to the main database
- Can only have 1 other AZ as failover
Multi-Region RDS deployment
- Disaster recovery in case of region issue
- Local performance for global reads
- Replication cost
What is Amazon ElasticCache?
AWS managed in-memory databases with high performance, low latency that helps reduce load off databases for read intensive workloads
DynamoDB
- Fully Managed Highly available with replication across 3 AZ
- NoSQL database - not a relational database
- Scales to massive workloads, distributed “serverless” database
- Millions of requests per seconds, trillions of row, 100s of TB of storage
- Fast and consistent in performance
- Single-digit millisecond latency – low latency retrieval
- Integrated with IAM for security, authorization and administration
- Low cost and auto scaling capabilities
- Standard & Infrequent Access (IA) Table Class
DynamoDB Accelerator - DAX
Fully Managed in-memory cache for DynamoDB only
DynamoDB – Global Tables
- Make a DynamoDB table accessible with low latency in multiple-regions
- Active-Active replication (read/write to any AWS Region)
Redshift
*OLAP
*Columnar Storage
*Massively Parallel Query Execution (MPP)
*SQL
*Data Warehouse
*BI tools intergration
Redshift Serverles
- Automatically provisions and scales data warehouse underlying capacity
- Run analytics workloads without managing data warehouse infrastructure
- Pay only for what you use (save costs)
- Use cases: Reporting, dashboarding applications, real-time analytics…
Elastic MapReduce (EMR)
*helps creating Hadoop clusters (Big Data) to analyze and process
vast amount of data
* Auto-scaling and integrated with Spot instances
* Use cases: data processing, machine learning, web indexing, big
data…
Athena
- Serverless query service to analyze data stored in Amazon S3
- Uses standard SQL language to query the files
- Exam Tip: analyze data in S3 using serverless SQL, use Athena
QuickSight
- Serverless machine learning-powered business intelligence service to
create interactive dashboards - Fast, automatically scalable, embeddable, with per-session pricing
- Use cases: Business analytics, Building visualizations, Perform ad-hoc analysis, Get business insights using data
DocumentDB
*The MongoDB of AWS
*NoSQL
*used to store, query, and index JSON data
* Fully Managed, highly available with replication across 3 AZ
*Automatically scales
Amazon Neptune
- Fully managed graph database
- Highly available across 3 AZ, with up to 15 read replicas * Build and run applications working with highly connected
datasets – optimized for these complex and hard queries - Great for knowledge graphs (Wikipedia), fraud detection, recommendation engines, social networking
Amazon Timestram
- Fully managed, fast, scalable, serverless time series database
- Automatically scales up/down to adjust
capacity - Store and analyze trillions of events per day * 1000s times faster & 1/10th the cost of relational databases
- Built-in time series analytics functions (helps you identify patterns in your data in near real-time)
Quantum Ledger Database
- Fully Managed, Serverless, High available, Replication across 3 AZ
- Used to review history of all the changes made to your application data over time
- Immutable system: no entry can be removed or modified, cryptographically verifiable
- Difference with Amazon Managed Blockchain: no decentralization component, in accordance with financial regulation rules
Amazon Managed Blockchain
- Amazon Managed Blockchain is a managed service to jJoin public blockchain networks or create your own scalable private network
- Compatible with the frameworks Hyperledger Fabric & Ethereum
AWS Glue
- Managed extract, transform, and load (ETL) service
- Useful to prepare and transform data for analytics
- Fully serverless service
- can be used by Athena, Redshift, EMR
Database Migration Service (DMS)
- Quickly and securely migrate databases
to AWS, resilient, self healing - The source database remains available
during the migration
*Supports homogenous and heterogenous migrations