Databases and Analytics Flashcards
Describe Relational Databases
a collection of data items with pre-defined relationships between them.
Think like Excel Spreadsheets
Describe NoSQL Databases
Non relational databases
Built for specific data models and have a flexible schemas for building applications
Name some of the benefits of NOSQL database
Flexiblity
Scalability
High-performance: optimized for a specific data model
Highly functional: optimized for the data model
Example Key-value, document, graph, in-memory, search databases
What does RDS stand for and what is it
Relational Database Service
Managed DB service for DB using SQL as a query language
Allows you to creater databases in the cloud that are managed by AWS
What is Aurora
a relational database service
Supports PostgreSQL and MySQL
not in the free tier
What are caches in AWS databases
in-memory databases with high performance, low latency
helps reduce loadoff databases for read intensive workloads
When it comes to Elastic cache what is Amazon responsible for
OS maintenance / patching, optimizations, setup, config, monitoring, failure recovery and backups
What is DynamoDB
Fully Managed Highly database available with replication across 3 AZ
Part of the NOSQL database - not relational database
Scales to massive workloads, distributed serverless database
What type of database is DynamoDB
Key-Value database
What is DynamoDB Accelerator -DAX
Fully Managed in-memory cache for DynamoDB
What is a DynamoDB Global Tables
Makes a DynamoDB table accessible with low latency in multiple regions
What is Redshift
a fully managed, petabyte-scale data warehouse service in the cloud
Its OLAP - online analytical processing (analytics and data warehousing)
Load data once every hour, not every second
10X better performance than other data warehouses, scale to PBs of data
Columnar storage of data (instead of row based)
Massively Parallel Query Execution (MPP)
Pay as you go
Has SQL interface for performing the queries
BI tools such as AWS quick sight
What is Amazon EMR
Elastic Map Reduce
Helps creating Hadoop clusters (BIG Data) to analyze and process vast amount of data
The clusters can be made of hundreds of EC2 instances
Also supports Apache Spark, HBase, Presto, Flink
EMR takes care of all the provisioning and config
Auto-scaling and integrated with Spot instances
What are some of the use cases for EMR
data processing, machine learning, web indexing, big data
Describe Athena
Severless query service to perfrom analytics against S3 objects
Uses standard SQL language to query the files
Supports CSV, JSON, ORC,Avri and Parquet (built on presto)
Pricing 5.00 per TB of data scanned
Use compressed or columnar data for cost-savings (less scan)
Use cases Business intelligence / analytics/ reporting, analyze and query VPC Flow Logs, ELB Logs, CloudTrail trails