Athena Flashcards
use cases for Athena
- you can do BI, analytics, reporting,
2. you can analyze and query VPC Flow Logs, ELB Logs, CloudTrail trails, S3 acccess Logs, CloudFront Logs
Athena
a serverless service and you can perform analytics directly against S3 files. You can use the SQL language. It even has a JDBC or a ODBC driver if you wanted to connect your BI tools to it.
Athena charged
You get only charged per query and for the amount of data scanned.
Athena file support
It supports many, many, different types of file formats
such is CSV, JSON, ORC, Avro, Parquet and in the back end it basically runs Presto. Presto if you know is a query engine.
S3 Object Lock
you want to a WORM model, write once read many.
You write the file once to your S3 buckets, and then you will block that object version to be deleted for a specific amount of time, so no one can touch it, no one can modify it.
So we have the guarantee that the file will only be written once, and you will not have deletion or modifications happening to that file.
Glacier Vault Lock
you have the same WORM model, write once read many. You create a lock policy and that lock policy
prevents future edits to that file, so that no longer can be changed. And that policy itself is set in stone,
so once you set it, no one can delete that policy.
use case for S3 Object Lock and Glacier Vault Lock
helpful when you have compliance and data retention requirements, so you want to say okay I want to upload an object to S3, or Glacier, and have the guarantee that no one ever will be able to delete that object, so that we can retrieve in seven years time
in case there is an audit.
You have enabled versioning and want to be extra careful when it comes to deleting files on S3. What should you enable to prevent accidental permanent deletions?
MFA Delete forces users to use MFA tokens before deleting objects. It’s an extra level of security to prevent accidental deletes
You would like all your files in S3 to be encrypted by default. What is the optimal way of achieving this?
enable “Default encryption” on S3
You suspect some of your employees to try to access files in S3 that they don’t have access to. How can you verify this is indeed the case without them noticing?
Enable S3 Access Logs, they log all the requests made to buckets, and Athena can then be used to run serverless analytics on top of the logs files
You are looking for your entire S3 bucket to be available fully in a different region so you can perform data analysis optimally at the lowest possible cost. Which feature should you use?
S3 Cross Region Replication is used to replicate data from an S3 bucket to another one in a different region
You are looking to provide temporary URLs to a growing list of federated users in order to allow them to perform a file upload on S3 to a specific location. What should you use?
Pre-Signed URL are temporary and grant time-limited access to some actions in your S3 bucket.
How can you automate the transition of S3 objects between their different tiers?
use S3 Lifecycle Rules
You are looking to build an index of your files in S3, using Amazon RDS PostgreSQL. To build this index, it is necessary to read the first 250 bytes of each object in S3, which contains some metadata about the content of the file itself. There is over 100,000 files in your S3 bucket, amounting to 50TB of data. how can you build this index efficiently?
create an application that will traverse the S3 bucket, issue a Byte Range Fetch for the first 250 bytes, and store the information in RDS