Athena Flashcards

1
Q

What is Amazon Athena?

A

Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Athena is _________(on-server/serverless)

A

serverless

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Athena infrastructure

A

Athena has no infrastructure to set up or manage, and you pay only for the queries you run.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Athena scaling

A

Athena scales automatically—running queries in parallel—so results are fast, even with large datasets and complex queries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data formats that can be analyzed using Athena

A

Athena helps you analyze unstructured, semi-structured, and structured data stored in Amazon S3.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Do you have to load the data into Athena to analyze the data stored in S3?

A

You can use Athena to run ad-hoc queries using ANSI SQL, without the need to aggregate or load the data into Athena.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Athena integrates with _______ for easy data visualization.

A

Amazon QuickSight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Athena integrates with the AWS Glue Data Catalog, which offers______________

A

a persistent metadata store for your data in Amazon S3.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does Athena integration Glue data store allow ______________

A

It allows you to create tables and query data in Athena based on a central metadata store available throughout your Amazon Web Services account and integrated with the ETL and data discovery features of AWS Glue.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Athena will use a default library called _________ to do the actual work of parsing the data.

A

LazySimpleSerDe

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

To use a regex in your CREATE TABLE statement, use syntax like the following.

A
ROW FORMAT SERDE org.apache.hadoop.hive.serde2.RegexSerDe'
  WITH SERDEPROPERTIES ("input.regex" = "regular_expression")
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The tables and databases that you work with in Athena to run queries are based on __________.

A

metadata

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is metadata?

A

Metadata is data about the underlying data in your dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How that metadata describes your dataset is called the ___________.

A

schema

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In Athena, we call a system for organizing metadata a ________ or _________

A

data catalog or a metastore.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The combination of a dataset and the data catalog that describes it is called a ____________

A

data source.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

The relationship of metadata to an underlying dataset depends on the type of _______ that you work with.

A

data source

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Types of data sources

A
  1. Relation data sources- like MySQL, PostgreSQL, and SQL tightly integrate the metadata with the dataset.
  2. Other data sources, like those built using Hive - allow you to define metadata on-the-fly when you read the dataset.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Athena uses the _______ to store and retrieve table metadata for the Amazon S3 data in your Amazon Web Services account.

A

AWS Glue Data Catalog

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How does table metadata helps Athena Query Engine?

A

. The table metadata lets the Athena query engine know how to find, read, and process the data that you want to query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is AWS Glue?

A

AWS Glue is a fully managed ETL srevice.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are AWS Glue crawlers?

A

AWS Glue crawlers automatically infer database and table schema from your data in Amazon S3 and store the associated metadata in the AWS Glue Data Catalog.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How to create database and table schema in Glue data catalog?

A

To create table and database schema in Glue data catalog:

  1. You can run AWS Glue crawlers on your data source from within Athena
  2. You can run DDL queries directly on your Athena Query Editor.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Under the hood, Athena uses _________ to process DML statements and _________ to process the DDL statements that create and modify schema.

A

Presto; Hive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

When you create schema in AWS Glue to query in Athena, you can use the AWS Glue Catalog Manager to ________, but at this time______and _______ cannot be changed using the AWS Glue console.

A

rename columns; table names and database names

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How to rename a database or table?

A

To rename databases or tables, you need to create a new database/table and copy tables/data to it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Athena does not recognize __________ that you specify for an AWS Glue crawler.

A

exclude patterns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

If Athena detects that the schema of a partition differs from the schema of the table, Athena may not be able to process the query and fails with ______________

A

HIVE_PARTITION_SCHEMA_MISMATCH.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

You can use the___________ for external Hive metastore to query data sets in Amazon S3 that use an _____________

A

Amazon Athena data connector; Apache Hive metastore.

30
Q

. The connection from Lambda to your Hive metastore is secured by a ________ and does not use the ___________

A

private Amazon VPC channel; public internet.

31
Q

Can you use AWS Glue Data catalog and external Hive metastores in the same Athena Query?

A

Yes

32
Q

How to use the syntax - database.table instead of catalog.database.table

A

Specify a catalog in the query execution context as the current default catalog.

33
Q

How does Athena interacts with Hive metastore?

A
  1. A Lambda function is created connecting the Athena and Hive metastore which is inside a VPC
  2. Register a unique catalog name for your hive metastore and a corresponding function name in your account
  3. When you run Athena DML or DDL query that uses the catalog name, Athena query engine calls the Lambda function that you associated with the catalog name.
  4. Using AWS PrivateLink Lambda function communicates with Hive metastore in your VPC and receives responses for metadata requests.
34
Q

When using Athena Data connector with external hive metastore - The maximum number of registered catalogs that you can have is _________

A

1000

35
Q

Hive views and Athena views.

A

Hive views are not compatible with Athena views and are not supported.

36
Q

Kerberos authentication for __________ is not supported.

A

Kerberos Authentication.

37
Q

What is spill location

A

Because of the limit on Lambda function limit sizes, responses larger than the threshold spill into Amazon S3 location that you specify when you create a lambda function.

38
Q

If you have data in sources other than Amazon S3, you can use __________ to query the data in place

A

Athena Federated Query

39
Q

Where else can Athena Federated Query be used?

A

It can be used to build pipelines that extract data from multiple data sources and store them in Amazon S3.

40
Q

With Athena Federated Query, you can run SQL queries across data stored in _______________

A

relational (SQL), non-relational (NoSQL), object (S3), and custom data sources.

41
Q

How does Athena run Federated Queries

A

Athena uses data source connectors that run on AWS Lambda to run federated queries.

42
Q

What is a data source connector?

A

data source connector is a piece of code that can translate between your target data source and Athena.

43
Q

Data source connector can be deemed as an extension of _______________

A

Athena Query Engine

44
Q

List of Prebuilt Athena data source connectors

A
  1. Amazon CloudWatch Logs
  2. Amazon DynamoDB
  3. Amazon DocumentDB
  4. Amazon RDS, and JDBC-compliant relational data sources such MySQL, and PostgreSQL under the Apache 2.0 license.
45
Q

Using ____________ you can write custom connectors.

A

Athena Query Federation SDK

46
Q

o choose, configure, and deploy a data source connector to your account, you can use ____________

A
  1. Athena or Lambda consoles

2. AWS Serverless Application Repository

47
Q

Can you include multiple catalogs from multiple data sources in a single query?

A

Yes.

48
Q

How can you include multiple catalogs from multiple data sources in the same query?

A

Using Athena Federated Queries.

49
Q

How is Athena/Athena Federated Query executed once a query is submitted against a data source?

A
  1. Athena invokes corresponding connector to identify parts of the table that needs to be read, manages parallelism and pushes down filter predicates.
50
Q

Connectors use _____________ as the format for returning data requested in a query, which enables connectors to be implemented in languages such as __________

A

Apache Arrow;

C, C++, Java, Python, and Rust

51
Q

connectors are processed in __________

A

Lambda

52
Q

Athena Federated Query is supported only on ____________

A

Athena engine version 2.

53
Q

How to use views using Federated Data sources?

A

You cannot use Views using Federated data sources.

54
Q

To control access to data catalogs, use _______________

A

resource-level IAM permissions or identity-based IAM policies.

55
Q

Athena uses an approach known as _________ for schema reading

A

schema-on-read, which means a schema is projected on to your data at the time you run a query.

56
Q

______ does not modify your data in Amazon S3.

A

Athena

57
Q

How can Athena query previous version of a object present in S3 bucket?

A

Athena cannot query previous versions. It can only query current versions.

58
Q

Athena supports querying objects that are stored with __________ in the same bucket specified by the LOCATION clause.

A

multiple storage classes

59
Q

Athena supports ____________ payment

A

Requester Pays Buckets.

60
Q

Athena does not support querying the data in the ____________ storage classes.

A

S3 Glacier or S3 Glacier Deep Archive

61
Q

All the tables in Athena are _________. Only tables with _________ keyword are created.

A

External;External

62
Q

If you are interacting with Apache Spark, then your table names and table column names must be _________

A

lowercase.

63
Q

Special characters other than __________ are not supported for Athena databases, tables or column names.

A

underscore (_)

64
Q

Specifying location for Athena tables

A
  1. s3://bucketname/folder/
  2. You can use a path to an Amazon S3 folder or an Amazon S3 access point alias. - s3://access-point-name-metadata-s3alias/folder/
65
Q

Your source data may be grouped into Amazon S3 folders called _________ based on a set of columns.

A

partitions

66
Q

If the S3 path is in _______, MSCK REPAIR TABLE doesn’t add the partitions to the AWS Glue Data Catalog.

A

camel case

67
Q

What is partition projection?

A

In partition projection, partition values and locations are calculated from configuration rather than read from a repository like the AWS Glue Data Catalog.

68
Q

How partition projection can reduce the runtime of queries

A

Because in-memory operations are often faster than remote operations, partition projection can reduce the runtime of queries against highly partitioned tables.

69
Q

_______ eliminates the need to specify partitions manually in AWS Glue or an external Hive metastore.

A

Partition projection

70
Q

What happens if a projected partition does not exist in Amazon S3?

A

If a projected partition does not exist in Amazon S3, Athena will still project the partition. Athena does not throw an error, but no data is returned.

71
Q

What happens if too many projected partitions are empty?

A

If too many of your partitions are empty, performance can be slower compared to traditional AWS Glue partitions. If more than half of your projected partitions are empty, it is recommended that you use traditional partitions.

72
Q

Partition projection is usable only when the table is queried through ______. If the same table is read through another service such as Amazon Redshift Spectrum or Amazon EMR, the__________ is used

A

Athena; standard partition metadata