Database & Analytics Flashcards
QLDB
QLDB stands for ”Quantum Ledger Database”
- A ledger is a book recording financial transactions
- FullyManaged,Serverless,Highavailable,Replicationacross3AZ
- Used to review history of all the changes made to your application data over time
- Immutable system: no entry can be removed or modified, cryptographically verifiable
- 2-3x better performance than common ledger blockchain frameworks, manipulate data using SQL
- Difference with Amazon Managed Blockchain: no decentralization component, in accordance with
financial regulation rules
Elasticache
Managed Redis or Memcached
- In-memory database with high performance
- Offloads databases
EMR
EMR stands for “Elastic MapReduce”
- EMR helps creating Hadoop clusters (Big Data) to analyze and process vast amount of data
- The clusters can be made of hundreds of EC2 instances
- Also supports Apache Spark, HBase, Presto, Flink…
- EMR takes care of all the provisioning and configuration
- Auto-scaling and integrated with Spot instances
- Use cases: data processing, machine learning, web indexing, big data…
DocumentDB
** DocumentDB is the same for MongoDB (which is a NoSQL database)**
- MongoDB is used to store, query, and index JSON data
- Similar “deployment concepts” as Aurora
- Fully Managed, highly available with replication across 3 AZ
- DocumentDB storage automatically grows in increments of 10GB
- Automatically scales to workloads with millions of requests per seconds
DynamoDB
Key value database
- Fully Managed Highly available with replication across 3 AZ
- NoSQL database - not a relational database
- Scales to massive workloads, distributed “serverless” database
- Millions of requests per seconds, trillions of row, 100s of TB of storage * Fast and consistent in performance
- Single-digit millisecond latency – low latency retrieval
- Integrated with IAM for security, authorization and administration
- Low cost and auto scaling capabilities
- Standard & Infrequent Access (IA) Table Class
Redshift
Redshift is based on PostgreSQL, but it’s not used for OLTP
- It’s OLAP – online analytical processing (analytics and data warehousing) * Load data once every hour, not every second
- 10x better performance than other data warehouses, scale to PBs of data
- Columnar storage of data (instead of row based)
- Massively Parallel Query Execution (MPP), highly available
- Pay as you go based on the instances provisioned
- Has a SQL interface for performing the queries
- BI tools such as AWS Quicksight or Tableau integrate with it
Athena
Serverless query service to analyze data stored in Amazon S3
- Uses standard SQL language to query the files
- SupportsCSV,JSON,ORC,Avro,andParquet(builtonPresto)
- Pricing: $5.00 per TB of data scanned
- Use compressed or columnar data for cost-savings (less scan)
- Use cases: Business intelligence / analytics / reporting, analyze &
query VPC Flow Logs, ELB Logs, CloudTrail trails, etc… - Exam Tip: analyze data in S3 using serverless SQL, use Athena
QuickSight
Serverless machine learning-powered business intelligence service to create interactive dashboards
- Fast, automatically scalable, embeddable, with per-session pricing
- Use cases:
- Business analytics
- Building visualizations
- Perform ad-hoc analysis
- Get business insights using data
- Integrated with RDS, Aurora, Athena, Redshift, S3…
AMB
Amazon Managed Blockchain
- Blockchain makes it possible to build applications where multiple parties can execute transactions without the need for a trusted, central authority.
- Amazon Managed Blockchain is a managed service to: * Join public blockchain networks
- Or create your own scalable private network
- Compatible with the frameworks Hyperledger Fabric & Ethereum
Glue
Managed extract, transform, and load (ETL) service
Useful to prepare and transform data for analytics
* Fully serverless service
S3 Bucket Amazon RDS
Extract
Glue ETL
Transform
Load
* Glue Data Catalog: catalog of datasets * can be used by Athena, Redshift, EMR
DMS
database migration service
Quickly and securely migrate databases
to AWS, resilient, self healing
* The source database remains available during the migration
* Supports:
* Homogeneous migrations: ex Oracle to
Oracle
* Heterogeneous migrations: ex Microsoft SQL Server to Aurora
Neptune
Fully managed graph database
- A popular graph dataset would be a social network
- Users have friends
- Posts have comments
- Comments have likes from users
- Users share and like posts…
- Highly available across 3 AZ, with up to 15 read replicas
- Build and run applications working with highly connected
datasets – optimized for these complex and hard queries - Can store up to billions of relations and query the graph with milliseconds latency
- Highly available with replications across multiple AZs
- Great for knowledge graphs (Wikipedia), fraud detection,
recommendation engines, social networking
Timestream
Fully managed, fast, scalable, serverless time
series database
- Automatically scales up/down to adjust capacity
- Store and analyze trillions of events per day
- 1000s times faster & 1/10th the cost of
relational databases - Built-in time series analytics functions (helps you identify patterns in your data in near real-time)
RDS
Relational Database Service
Managed database using SQL
* Postgres
* MySQL
* MariaDB
* Oracle
* Microsoft SQL Server
* Aurora
Aurora
Proprietary technology
Implements support for PostgresSLQ and MySQL
Cloud optimized
5x over MsSQL on RDS, 3x over Poestgres