Databases Flashcards

1
Q

Amazon Neptune

A
  • Fully managed graph database
  • A popular graph dataset would be a social network
  • Users have friends
  • Posts have comments
  • Comments have likes from users
  • Users share and like posts…
  • Highly available across 3 AZ, with up to 15 read replicas
  • Build and run applications working with highly connected
    datasets – optimized for these complex and hard queries
  • Can store up to billions of relations and query the graph with
    milliseconds latency
  • Highly available with replications across multiple AZs
  • Great for knowledge graphs (Wikipedia), fraud detection,
    recommendation engines, social networking
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

DocumentDB

A
  • Aurora is an “AWS-implementation” of PostgreSQL / MySQL …
  • DocumentDB is the same for MongoDB (which is a NoSQL database)
  • MongoDB is used to store, query, and index JSON data
  • Similar “deployment concepts” as Aurora
  • Fully Managed, highly available with replication across 3 AZ
  • DocumentDB storage automatically grows in increments of 10GB, up to 64 TB.
  • Automatically scales to workloads with millions of requests per seconds
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Amazon QuickSight

A
  • Serverless machine learning-powered business intelligence service to
    create interactive dashboards
  • Fast, automatically scalable, embeddable, with per-session pricing
  • Use cases:
  • Business analytics
  • Building visualizations
  • Perform ad-hoc analysis
  • Get business insights using data
  • Integrated with RDS, Aurora, Athena, Redshift, S3…
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Amazon Athena

A

Serverless query service to analyze data stored in Amazon S3
* Uses standard SQL language to query the files
* Supports CSV, JSON, ORC, Avro, and Parquet (built on Presto)
* Pricing: $5.00 per TB of data scanned
* Use compressed or columnar data for cost-savings (less scan)
* Use cases: Business intelligence / analytics / reporting, analyze &
query VPC Flow Logs, ELB Logs, CloudTrail trails, etc…
* Exam Tip: analyze data in S3 using serverless SQL, use Athena

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Amazon EMR
(Elastic MapReduce)

A
  • EMR stands for “Elastic MapReduce”
  • EMR helps creating Hadoop clusters (Big Data) to analyze and process vast amount of data
  • The clusters can be made of hundreds of EC2 instances
  • Also supports Apache Spark, HBase, Presto, Flink…
  • EMR takes care of all the provisioning and configuration
  • Auto-scaling and integrated with Spot instances
  • Use cases: data processing, machine learning, web indexing, big data…
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Redshift

A
  • Redshift is based on PostgreSQL, but it’s not used for OLTP
  • It’s OLAP – online analytical processing (analytics and data warehousing)
  • Load data once every hour, not every second
  • 10x better performance than other data warehouses, scale to PBs of data
  • Columnar storage of data (instead of row based)
  • Massively Parallel Query Execution (MPP), highly available
  • Pay as you go based on the instances provisioned
  • Has a SQL interface for performing the queries
  • BI tools such as AWS Quicksight or Tableau integrate with it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

DynamoDB

A
  • Fully Managed Highly available with replication across 3 AZ
  • NoSQL database - not a relational database
  • Scales to massive workloads, distributed “serverless” database
  • Millions of requests per seconds, trillions of row, 100s of TB of storage
  • Fast and consistent in performance
  • Single-digit millisecond latency – low latency retrieval
  • Integrated with IAM for security, authorization and administration
  • Low cost and auto scaling capabilities
  • Standard & Infrequent Access (IA) Table Class

DynamoDB Accelerator - DAX
* Fully Managed in-memory cache for DynamoDB
* 10x performance improvement – single- digit millisecond latency to microseconds latency – when accessing your DynamoDB tables
* Secure, highly scalable & highly available
* Difference with ElastiCache at the CCP level: DAX is only used for and is integrated with DynamoDB, while ElastiCache can be used for other databases applications table Amazon

DynamoDB – Global Tables
* Make a DynamoDB table accessible with low latency in multiple-regions
* Active-Active replication (read/write to any AWS Region)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Amazon Aurora

A
  • Aurora is a proprietary technology from AWS (not open sourced)
  • OLTP database
  • PostgreSQL and MySQL are both supported as Aurora DB
  • Aurora is “AWS cloud optimized” and claims 5x performance improvement over MySQL on RDS, over 3x the performance of Postgres on RDS
  • Aurora storage automatically grows in increments of 10GB, up to 128 TB
  • Aurora costs more than RDS (20% more) – but is more efficient
  • Not in the free tier
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Amazon QLDB

A
  • QLDB stands for ”Quantum Ledger Database”
  • A ledger is a book recording financial transactions
  • Fully Managed, Serverless, High available, Replication across 3 AZ
  • Used to review history of all the changes made to your application data over time
  • Immutable system: no entry can be removed or modified, cryptographically verifiable
  • 2-3x better performance than common ledger blockchain frameworks, manipulate data using SQL
  • Difference with Amazon Managed Blockchain: no decentralization component, in accordance with financial regulation rules
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Amazon Managed Blockchain

A
  • Blockchain makes it possible to build applications where multiple parties can execute transactions without the need for a trusted, central authority.
  • Amazon Managed Blockchain is a managed service to:
    Join public blockchain networks
    Or create your own scalable private network
  • Compatible with the frameworks Hyperledger Fabric & Ethereum
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

AWS Glue

A
  • Managed extract, transform, and load (ETL) service
  • Useful to prepare and transform data for analytics
  • Fully serverless service
  • Glue Data Catalog: catalog of datasets
  • Can be used by Athena, Redshift, EMR
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

DMS – Database Migration Service

A
  • Quickly and securely migrate databases to AWS, resilient, self healing
  • The source database remains available during the migration

Supports:
* Homogeneous migrations: ex Oracle to Oracle
* Heterogeneous migrations: ex Microsoft SQL Server to Aurora

How well did you know this?
1
Not at all
2
3
4
5
Perfectly