S3 Flashcards
What is S3?
- Simple Storage Solution
- Buckets store objects
- objects contains key : value pairs
- No object can live outside of a bucket
- Flat File System
What is the storage capacity of S3?
unlimited storage
What is the object size range in S3 (smallest possible to largest possible)?
0 bytes to 5TB
What is the durability of S3?
Data is stored across 3 AZs to ensure 11-9’s of durability
Describe the naming structure of a bucket
- Bucket names are global and must be unique across a partition (a grouping of regions).
- Names must be between 3-63 characters long.
- Names can only contain upper or lowercase, numbers, dots(.) or hyphens(-).
- Names must begin and end with a letter or a number.
- Names cannot be formatted like an IP address.
- Names can not begin with xn–
Describe the data consistency model for S3.
- Read after write consistent for New PUTS (posts)
- Eventually consistent for overwrite puts.
- Eventually consistent for DELETES
What operations can you perform on an object?
- PUT,
- GET,
- DELETE (RM),
- LIST
What is Versioning?
- Objects are given a version ID
- When new objects are uploaded the old objects are kept.
- You can access any object version.
- When you delete an object that is versioned then the previous version is restored.
Are buckets versioned by default?
- No. Versioning must be enabled, and can be enabled at anytime.
- Once turned on it can only be suspended.
T/F - All new buckets are private by default
True
What are bucket policies?
A resource based policy JSON documents that control access Grants other AWS accounts or IAM users access permissions for the bucket and objects in it.
Access Control Lists
- Legacy permissions control. Still used though.
- Grants access to objects and buckets with simple actions.
What is Cross Region Replication (CRR)?
- Allows files to be replicated across regions for greater durability.
- Versioning must be enabled.
Cross Region Replication - What gets replicated?
- Any new objects added after CRR is enabled.
- Object Metadata Tags Encryption (only SSE-KMS & SSE S3)(if source file is encrypted)
Cross Region Replication - What is NOT replicated?
- Objects that existed in source bucket before CRR was enabled.
- Objects encrypted with SSE-C
- Source objects that the bucket owner does not have read permissions on.
- Updates to bucket level sub resources (i.e changes to lifecycle configuration)
- Objects in the source bucket that are there as a result of replication from another bucket.
How do delete operations work on files that are Cross Region Replication?
- For delete WITHOUT version ID, S3 will add a delete marker which CRR DOES replicate.
- For delete with WITH a version ID (source) the source object is deleted but the destination object in NOT deleted.
What is transfer acceleration?
- provides faster and secure uploads from anywhere in the world.
- Data is uploaded to an Edge location, then that data is transported to your S3 bucket via AWS backbone network.
What is a presigned URL?
- A URL generated via the AWS CLI and SDK. Provides temporary access to write or download object data.
- Users given a pre-signed URL inherit the permissions of the person who generated the URL for GET / PUT.
- Pre-signed Urls are commonly used to access private objects.
Name the (6) S3 storage classes
- Standard
- Intelligent Tiering
- Standard Infrequent Access (IA)
- One Zone IA
- Glacier
- Glacier Deep Archive
S3 Standard
- Fast.
- 11-9’s of durability 99.99% availability
- replicated across at least 3 AZs
S3 Intelligent Tiering
- Uses machine learning to analyze your object usage and determines the appropriate storage class.
- Data is moved to the most cost effective class w/o any performance impact or added overhead.
Standard IA
- Cheaper then standard (50%).
- reduced availability.
- Good if file is accessed only once a month or less.
- Additional retrieval fee applied.
One Zone IA
- Objects only exist in 1 AZ -> Data could get destroyed
- Availability = 99.95%
- Cheaper then Standard IA (20%)
- Retrieval fee applied
Glacier
- Long term cold storage
- Retrieval can take minutes to hours
Deep Glacier
- Lowest cost storage
- Retrieval time = 12 - 48 hours
Life Cycle Management in S3
- Automates the process of moving objects to different storage classes or deleting objects all together.
- Can be used together with versioning.
- Can be applied to both current and previous versions.
How is S3 Encryption in Transit achieved?
Traffic between your local host and S3 is achieved via SSL/TLS
What are the 2 types of default encryption that can be applied to an S3 bucket?
- SSE-AES - S3 handles the key and used AES-256 algorithm.
- SSE-KMS - Envelope encryption, AWS KMS and you manage the keys.
What are other types of encryption that can be used for S3 but not offered by default?
- SSE-C : Server Side. Customer provided key
- Client-Side Encryption: you encrypt your own files before uploading to S3 and you manage the keys.
Buckets are PUBLIC or PRIVATE by default?
private
What is the difference between SSE-C and Client Side encryption for s3 buckets?
- SSE-C is server-side encryption that uses data keys that are fully managed by the customer outside of AWS.
- Client Side Encryption may use a client library like Amazon S3 Encryption client (therefore managing keys within AWS)
What is the difference between SSE-S3 & SSE-KMS S3 encryption services?
- SSE-KMS - keys are managed within AWS KMS Service which allows more user control and provides an audit trail.
- SSE-S3 is fully managed as part of S3 (no user control or audit trail).
Which s3 encryption method requires HTTPS?
SSE-C
What is S3 MFA - delete?
- forces user to generate a code on a device (usually a mobile phone or hardware) before doing important operations on S3
- Versioning must be enabled
Who can enable MFA-delete on an S3 bucket?
Root account
From which source can S3 MFA-deleted be enabled?
CLI only
What will you need S3 MFA delete for?
What will you NOT need it for?
You will need MFA to
- permanently delete an object version
- suspend versioning on the bucket
You will NOT need MFA for
- enabling versioning
- listing deleted versions
What are S3 Access Logs?
- For audit purposes you may want to log all access to S3 buckets
- Any request made to S3, from any account, authorized or denied, will be logged into another S3 bucket
- That data can be analyzed using data analysis tools, or Amazon Athena
Where shoud access logs be held in relation to the bucket being monitored? (Same bucket or different bucket)
In a different bucket! Otherwise, it will create a logging loop and bucket size will grow exponentially == HUGE BILL
Can Cross Region Replication (CRR) and Same Region Replication (SRR) happen across different accounts?
Yes, as long as the correct IAM permissions are granted
Does Cross Region Replication and Same Region Replication happen sychronously or asychronously?
asychronously
What are some use cases for Cross Region Replication?
- Compliance
- Lower latency access
- Replication across accounts
What are some use cases for Same Region Replication?
- Log aggregation
- Live replication between production and test accounts
What is the default length of time that a pre-signed URL is good for?
- Valid for a default of 3600 seconds (1-hour).
- Can change this by modifying the –expires-in [TIME_BY_SECONDS] argument.
What are some use cases to use a pre-signed URL?
- Allow only logged-in users to download a premium video on your s3 bucket.
- Allow an ever changing list of users to download files by generating URLs dynamically
- Allow temporay ability to a user to upload a file to your bucket (in a specific location).
What are the 3 retrieval options for Amazon Glacier and how long will it take to retrieve and object from each type?
- Expedited (1 to 5 minutes) - extra $
- Standard (3 -5 hours)
- Bulk (5-12 hours)
What is the minimum storage time for Glacier objects?
90 days
What are the 2 retrieval options for Glacier Deep Archive and their associated times?
- Standard (12 hours)
- Bulk (48 hours)
What is the minimum storage duration for Glacier Deep Archive?
180 days
Which S3 storage classes charge a retrieval fee?
- S3 Standard-IA
- S3 1-Zone IA
- S3 Glacier
- S3 Glacier Deep Archive
Is there a minimum storage duration for S3 Standard objects?
No, but there are for all other tiers
30 Day Min Storage
- S3 Intelligent Tiering
- S3 Standard-IA
- S3 One Zone IA
90 Day Min
- Glacier
180 Day Min
- Glacier Deep Archive
How many requests per second, per prefix can be achieved for an applications S3 bucket?
requests per second / per bucket prefix
- 3,500 PUT / COPY / POST / DELETE
- 5,500 GET / HEAD
What are some S3 - KMS Limitations?
- KMS as different quotas / second depending on the region, and these quoatas can not be increased.
- If quotas are exceeded then the S3 request(s) is throttled
When is multipart-upload recommended in S3?
When are they required?
- Recommended for files > 100MB
- Must be used for files > 5GB
How can data be filtered server-side S3? Why would you do this?
- Select & Glacier Select
- Can retrieve less data using SQL by performing server side filtering
- Can filter by rows and columns
- Can not aggregate data
- Do this for less network transfer = less CPU cost client side
What are the 3 event targets for S3?
- SNS
- SQS
- Lamba Function
What is Athena?
- Serverless service to perform analytics against S3 files
- Uses SQL language to query the files
- Has JDBC / ODBC driver
- Charged per query and amount of data scanned
What data formats does Athena support?
- CSV
- JSON
- ORC
- Avro
- Parquet
What are some use cases for Athena?
- Business Intelligence
- Analytics
- Reporting
- VPC Flow Logs
- ELB Logs
- CloudTrails
How can you analyze data on S3? (or ELB Logs/VPC Flow Logs / CloudTrail etc)
Athena
How could you guarantee that an object isn’t deleted from S3 or Glacier?
- S3 Object ock or
- Glacier Vault Lock
- Both polices adopt a WORM models (Write Once Read Many)
- Blocks an object deletion for a specific amount of time
- Helpful to lock against future edits
- Helpful for compliance and data retention