advanced S3 and athena Flashcards
1
Q
s3 access logs
A
- for audit purpose, log all accesses
- separate logging bucket
- warning: don’t set your logging bucket to be the same as the monitored bucket (infinite loop!)
2
Q
S3 replication (CRR & SRR)
A
- cross region and same region replication
- asynchronous replication (must enable versioning)
- buckets can be in different accounts
- CRR - compliance, lower latency access,
- SRR - disaster recovery, etc.
- by default, delete markers are not replicated but there is a feature to do so
- permanent deletes are NOT replicated
- bucket must be empty before deleting
- only NEW objects are replicated after enabling
- otherwise use batch replication to replicate existing objects
- no ‘chaining’ of replicaiton
- bucket one replicates to bucket 2 which replicates to bucket 3, etc.
3
Q
S3 general purpose storage class
A
- 99.9999% availability
- frequently accessed
- low latency, high throughput
- big data, mobile and gaming, content distribution
4
Q
S3 infrequent access storage class
A
- lower cost, cost on retrieval
- standard-IA
- disaster recover, backups
- one zone-IA
- store secondary copies of backups
- less availability
5
Q
S3 Glacier storage class
A
- archive and backup
- pay per storage + retrieval cost
- instant retrieval
- milisecond retrieval
- minimum storage duration of 90 days
- Glacier Flexible Retrieval
- expedited (retrieval in 1-5 minutes) standard (3-5 hrs) bulk (5-12 hrs)
- minimum storage duration of 90 days
- Glacier Deep Archive
- standard (12) hrs Bulk (48hrs)
- 180 days minimum storage
6
Q
S3 intelligent tiering
A
- move between tiers based on usage
- higher cost
- but no retrieval charges
7
Q
S3 lifecycle rules
A
- can automate moving between storage classes using lifecycle configuration
- define transition actions
- i.e. move object to glacier after 6 months
- define expiration actions
- delete object (or version of object) after set time
- rules can be created for certain prefixes i.e. (bucket/mp3/*)
8
Q
S3 performance
A
- automatically scales to high request rates, latency 100-200ms
- 3500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per prefix in bucket
- no limit to prefixes
9
Q
S3 multi-part upload
A
- recommended for files > 100MB
- must use for files > 5GB
10
Q
S3 Select and Glacier Select (server side filtering)
A
- filter rows and columns (simple SQL statements)
- less network, cpu
- only request what you need
11
Q
Athena
A
- serverless query service to perform analytics against S3 objects
- use SQL to query files
- supports CSV, JSON, etc.