Analytics - Athena Flashcards
What is Athena?
Serverless, easy function used to analyze data in S3 using SQL
What can I do with Athena?
Analyze structured and unstructured data sets using ANSI SQL without the need to aggregate or load data into Athena
What which structured data sets can Athena process?
CSV, JSON, Avro, Apache Parquet or ORC
How do I get started with Athena?
log into the AWS Management console and create a schema using DDL statements on the console or using the create table wizard.
How can Athena be accessed?
AWS Management Console, an API, ODBC, or JDBC
What is the underlying technology behind Athena?
Presto with full SQL support
How does Athena store table definitions and schema?
uses a data catalog to store info about the tables and databases you create for your data store in S3.
What is AWS Glue?
A fully managed ETL service, data catalog. It is not available in all regions.
What does Glue ETL allow you to do?
transform and move to various destinations, automatically scan your data sources, identify data formats and infers schemas and stores metadata about databases and tables in S3
What is the difference between Athena, EMR and Redshift?
Redshift provides fastest query performance for enterprise reporting and business workloads. EMR is a simple and cost effective method of running distributed frameworks like Hadoop, Spark when compared to on premise. Athena provides the easiest way to un ad-hoc queries for data in S3 without servers.
What is partioning?
allows you to restrict the amount of data scanned by each query, thus improving performance and reducing cost. Data can be partitioned by any key.
What are queries?
Athena retains query history for 45 days in S3. Many types of logs and geospatial data can be queried. Supports INTEGER, DOUBLE, VARCHAR, MAPS, ARRAY AND STRUCT data types.
How is Athena secured?
IAM policies, access control lists, and S3 bucket policies. Queries can be performed on encrypted S3 data
Pricing?
you only pay for queries that you run and amount of data scanned by each query. Not charged for failed queries. Compressing, partitioning, or converting your data to a columnar format.
Availability?
Because Athena uses S3, it highly available and executes