Analytics - Athena Flashcards

1
Q

What is Athena?

A

Serverless, easy function used to analyze data in S3 using SQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What can I do with Athena?

A

Analyze structured and unstructured data sets using ANSI SQL without the need to aggregate or load data into Athena

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What which structured data sets can Athena process?

A

CSV, JSON, Avro, Apache Parquet or ORC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do I get started with Athena?

A

log into the AWS Management console and create a schema using DDL statements on the console or using the create table wizard.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can Athena be accessed?

A

AWS Management Console, an API, ODBC, or JDBC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the underlying technology behind Athena?

A

Presto with full SQL support

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does Athena store table definitions and schema?

A

uses a data catalog to store info about the tables and databases you create for your data store in S3.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is AWS Glue?

A

A fully managed ETL service, data catalog. It is not available in all regions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does Glue ETL allow you to do?

A

transform and move to various destinations, automatically scan your data sources, identify data formats and infers schemas and stores metadata about databases and tables in S3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the difference between Athena, EMR and Redshift?

A

Redshift provides fastest query performance for enterprise reporting and business workloads. EMR is a simple and cost effective method of running distributed frameworks like Hadoop, Spark when compared to on premise. Athena provides the easiest way to un ad-hoc queries for data in S3 without servers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is partioning?

A

allows you to restrict the amount of data scanned by each query, thus improving performance and reducing cost. Data can be partitioned by any key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are queries?

A

Athena retains query history for 45 days in S3. Many types of logs and geospatial data can be queried. Supports INTEGER, DOUBLE, VARCHAR, MAPS, ARRAY AND STRUCT data types.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is Athena secured?

A

IAM policies, access control lists, and S3 bucket policies. Queries can be performed on encrypted S3 data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Pricing?

A

you only pay for queries that you run and amount of data scanned by each query. Not charged for failed queries. Compressing, partitioning, or converting your data to a columnar format.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Availability?

A

Because Athena uses S3, it highly available and executes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly