Athena Flashcards
What is Amazon Athena?
Amazon Athena is an interactive query service that makes it easy to analyze data directly in Amazon Simple Storage Service (Amazon S3) using standard SQL
Athena is _________(on-server/serverless)
serverless
Athena infrastructure
Athena has no infrastructure to set up or manage, and you pay only for the queries you run.
Athena scaling
Athena scales automatically—running queries in parallel—so results are fast, even with large datasets and complex queries.
Data formats that can be analyzed using Athena
Athena helps you analyze unstructured, semi-structured, and structured data stored in Amazon S3.
Do you have to load the data into Athena to analyze the data stored in S3?
You can use Athena to run ad-hoc queries using ANSI SQL, without the need to aggregate or load the data into Athena.
Athena integrates with _______ for easy data visualization.
Amazon QuickSight
Athena integrates with the AWS Glue Data Catalog, which offers______________
a persistent metadata store for your data in Amazon S3.
What does Athena integration Glue data store allow ______________
It allows you to create tables and query data in Athena based on a central metadata store available throughout your Amazon Web Services account and integrated with the ETL and data discovery features of AWS Glue.
Athena will use a default library called _________ to do the actual work of parsing the data.
LazySimpleSerDe
To use a regex in your CREATE TABLE statement, use syntax like the following.
ROW FORMAT SERDE org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ("input.regex" = "regular_expression")
The tables and databases that you work with in Athena to run queries are based on __________.
metadata
What is metadata?
Metadata is data about the underlying data in your dataset.
How that metadata describes your dataset is called the ___________.
schema
In Athena, we call a system for organizing metadata a ________ or _________
data catalog or a metastore.
The combination of a dataset and the data catalog that describes it is called a ____________
data source.
The relationship of metadata to an underlying dataset depends on the type of _______ that you work with.
data source
Types of data sources
- Relation data sources- like MySQL, PostgreSQL, and SQL tightly integrate the metadata with the dataset.
- Other data sources, like those built using Hive - allow you to define metadata on-the-fly when you read the dataset.
Athena uses the _______ to store and retrieve table metadata for the Amazon S3 data in your Amazon Web Services account.
AWS Glue Data Catalog
How does table metadata helps Athena Query Engine?
. The table metadata lets the Athena query engine know how to find, read, and process the data that you want to query.
What is AWS Glue?
AWS Glue is a fully managed ETL srevice.
What are AWS Glue crawlers?
AWS Glue crawlers automatically infer database and table schema from your data in Amazon S3 and store the associated metadata in the AWS Glue Data Catalog.
How to create database and table schema in Glue data catalog?
To create table and database schema in Glue data catalog:
- You can run AWS Glue crawlers on your data source from within Athena
- You can run DDL queries directly on your Athena Query Editor.
Under the hood, Athena uses _________ to process DML statements and _________ to process the DDL statements that create and modify schema.
Presto; Hive
When you create schema in AWS Glue to query in Athena, you can use the AWS Glue Catalog Manager to ________, but at this time______and _______ cannot be changed using the AWS Glue console.
rename columns; table names and database names
How to rename a database or table?
To rename databases or tables, you need to create a new database/table and copy tables/data to it
Athena does not recognize __________ that you specify for an AWS Glue crawler.
exclude patterns
If Athena detects that the schema of a partition differs from the schema of the table, Athena may not be able to process the query and fails with ______________
HIVE_PARTITION_SCHEMA_MISMATCH.