Section 22: Data and Analytics Flashcards
A serverless, interactive analytics service built on open-source frameworks, supporting open-table and file formats
Amazon Athena
Amazon Athena is commonly paired with _____ in order to create reports and dashboards
Amazon Quicksight
This service is the best tool available when you need to analyze data in S3 using serverless SQL
Amazon Athena
What are four ways you can enhance the performance of Amazon Athena?
Use columnar data for cost-savings (less scan)
Compress data for smaller retrievals
Partition your datasets in S3
Use larger files
If you have data in sources other than Amazon S3, you can use this Athena feature to query the data in place or build pipelines that extract data from multiple data sources and store them in Amazon S3
Amazon Athena Federated Query
How much do Athena queries cost to run?
$5.00 per TB of data scanned
This service is a fully managed, petabyte-scale data warehouse service in the cloud
Amazon Redshift
What types of nodes comprise a Redshift Cluster?
Leader Node
Compute Nodes
What are three ways you can insert data into Redshift?
Kinesis Data Firehose
S3 Copy Command
Insert in batches from EC2 instance using JDBC driver
Feature that allows you to efficiently query and retrieve structured and semi-structured data from files in Amazon S3 without having to load the data into Amazon Redshift tables
Redshift Spectrum
An open source, distributed search and analytics suite derived from Elasticsearch that makes it easy for you to perform interactive log analytics, real-time application monitoring, website search, and more
Amazon OpenSearch Service
AWS service that is a cloud big data solution for petabyte-scale data processing, interactive analytics, and machine learning using open-source frameworks such as Apache Spark, Apache Hive, and Presto
Amazon EMR (Elastic MapReduce)
EMR node type that coordinates and manages the health of all your other nodes
Master Node
EMR node type that runs tasks and stores data
Core Node
EMR node type that only runs tasks - typically it is a good practice to use Spot instances for these nodes
Task Node
A cloud-native, serverless, business intelligence service with native ML integrations and usage-based pricing, used to create interactive dashboards
Amazon QuickSight