AWS Athena Flashcards
What are the data formats Athena can work with?
- XML
- JSON
- CVS
- TSV
- AVRO
- ORC
- PARQUET
What is AVRO file format?
Avro is a row-based storage format for Hadoop
What is TSV file format?
TSV is a file extension for a tab-delimited file used with spreadsheet software
What is PARQUET file format
Parquet, an open-source file format for Hadoop.
What is ORC file format?
The Optimized Row Columnar (ORC) file format provides a highly efficient way to store Hive data
What does Athena enable you to do?
It enables you to perform SQL like queries on S3
What is the Athena data catalogue used for?
It is used to store you schema (the view over the date)
What is the Athena data catalogue used for?
It is used to store the schema (view)
How can you save money in Athena?
Using compressed file formats?
What are you charged for in Athena?
S3 storage and every 5$ per TB of data processed
How do I create the Athena server cluster to run Athena?
You do not, this is a question to make sure you are awake.
What type of data is more efficient to process in Athena?
Colum formatted data.
What is the query language used in Athena?
SQL
How can I easily query geo spacial information?
There are many public data sets available like health, geospatial, weather. This data is often available in S3 or sometimes need to be copied. You can then use Athena to query this data.
How do I create a schema table in Athena?
You run a SQL query to create the tables and then you can sun other SQL queries over these tables.