Analytics - Athena Flashcards
Interactive Query service for S3 where you can write SQL queries.
No need to load data, it stays in S3
Uses Presto under the hood
Serverless
a) RDS
b) Athena
c) S3
b) Athena
Supports many data formats.
Like human readable: CSV, TSV, JSON, Columnar/splittable Orc, Parquet, splittable Avo, compression Snapy, Zlib, LZO, Gzip.
I want to perform adhoc queries of web logs. Which AWS service is better to use?
a) RDS
b) Athena
c) S3
B) Athena
Which AWS service is better for querying staging data before loading to redshift
a) RDS
b) Athena
c) S3
b_athena
So like getting a bigger picture before you acutally commit data into a data warehouse.
Which aws service is better for analyzing CloudTrail/CloudFront/VPC/ELB logs in S3
a) RDS
b) Athena
c) S3
b) Athena
Which aws service is better for integration with jypter, zepplin, rstudio notebooks
a) RDS
b) Athena
c) S3
b) is has odcb and jdbc interfaces so you can treat athena like any other rdbms
You can also make use of amazon quicksight visualization tool into athena.
How does Athena work with Glue?
For example, you have a Glue ____________________ populating the glue catalog for your s3 data.
a) RDS
b) Athena
c) S3
d) Glue Crawler
d) Glue Crawler
Extract columns and table definitions out of it for you. You can use the glue console to refine that definiiton as needed.
Once you have a ___________________________ published for your S3 data, Athena will see it automatically and it can build a table from it.
a) RDS
b) Athena
c) S3
d) Glue data catalog
d) Anytime athena sees something in your glue data catalog in your account. Its going to make a table for that for you. So you can query it jus tlike you would any other sql database.
Other analytics tool can use that catalog to visulaize or analyze data like rds, redhsit, emr, redhsift spectrum. Services using apcahe give metastore.
IN Athena you can control query access and track costs by ________________
a) RDS
b) Athena
c) S3
d) Workgroups
d) Workgroups
For example, specific types of quereis,
Each workgroup can have its own, Query hisotry, Data limits, IAM policies and encryption settings.
in Athena how can you save lots of money by using ____________________.
ORC, Parquet. Save 30-90% and get better performance.
a) RDS
b) COlumnar formats
c) S3
Columnar formats
Partitions can also help lower the costs.
IN Athena pay as you go model. Do failed queries get charged?
No - only successful or cancelled queries count.
No charge for DDL operations (alter/create/drop)
But you are chaged by $5 per TB scans
True or Falase
Athena can encrypt results at rest in S3 staging directory
True
either server side - SSE-S3 (s3 managed key)
or SS-KMS (KMS Key)
or client side
CSE-KMS (KMS key)
True or false
You can have cross account access in S3 bucket policy for Athena.
True
True or Falase.
TLS encrypts in transit (between Athena and S3)
True
Should you use Highly Formatted report/ Visaulisation for Athena?
No - Use QuickSight
AThena is jsut a sql query engine.
Should you use Athena for ETL?
No
Thats was Glue is for.
Can also do that with apache saprk for larger data sets.