Athena, OpenSearch, EMR, QuickSight Flashcards
- Serverless query service to analyze data stored in Amazon S3
- Uses standard SQL language to query the files (built on Presto)
- SupportsCSV,JSON,ORC,Avro,andParquet
- Pricing: $5.00 per TB of data scanned
- Commonly used with Amazon Quicksight for repor ting/dashboards
Amazon Athena
analyze data in S3 using serverless SQL
Athena
Amazon Athena – Performance Improvement, Use __________ for cost-savings (less scan)
columnar data
What is recommended when using columnar data for cost-savings?
Apache Parquet or ORC
With Amazon Athena use ________ to convert your data to Parquet or ORC
Glue
__________ datasets in S3 for easy querying on virtual columns
Partition
What are 4 Performance Improvements for Athena
Use columnar data for cost-savings (less scan)
Compress data for smaller retrievals
Partition datasets in S3 for easy querying on virtual columns
Use larger files (> 128 MB) to minimize overhead
- Allows you to run SQL queries across data stored in relational, non-relational, object, and custom data sources (AWS or on-premises)
- Uses Data Source Connectors that run on AWS Lambda to run Federated Queries (e.g., CloudWatch Logs, DynamoDB, RDS, …)
- Store the results back in Amazon S3
Amazon Athena – Federated Query
- Based on PostgreSQL
- It’s OLAP – online analytical processing (analytics and data warehousing)
- 10x better performance than other data warehouses, scale to PBs of data
- Columnar storage of data (instead of row based) & parallel query engine
- Pay as you go based on the instances provisioned
- Has a SQL interface for performing the queries
- BI tools such as Amazon Quicksight or Tableau integrate with it
- vs Athena: faster queries / joins / aggregations thanks to indexes
Redshift
2 types of nodes for Redshift Cluster
Leader Node
Compute Node
Redshift Cluster node for query planning, results aggregation
Leader node
Redshift Cluster node for performing the queries, send results to leader
Compute node
Do you need to provision the node size in advance?
YES
With Redshift Cluster, can you used Reserved Instances for cost saving
YES
Redshift has “Multi-AZ” mode for ____________
some clusters