Storage Flashcards
What is the max size of an object in S3 Bucket?
The max size is 5TB
How upload more than 5GB?
You must use “multi-part upload”
Amazon S3 is strong consistency? What that means?
Yes, it is.
After a successsful write of a new object (new PUT) or an overwrite or delete of an existing object (overwrite PUT or DELETE)
Any subsequent read request immediately receives the last version of the object (read after write consistency)
subsequent list request immediately reflects changes (list consistency)
What are the classess of S3 Storage Classes?
• Amazon S3 Standard - General Purpose
- – High durability, 99.99% Availability
- – Use Cases: Big Data analytics, mobile & gaming applications, content distribution…
• Amazon S3 Standard-Infrequent Access (IA)
- – Suitable for data that is less frequently accessed, but requires rapid access when needed
- – High durability and Availability
- – Use Cases: As a data store for disaster recovery, backups…
• Amazon S3 One Zone-Infrequent Access
- – Low cost compared to IA (by 20%)
- – Use Cases: Storing secondary backup copies of on-premise data, or storing data you can recreate
• Amazon S3 Intelligent Tiering
— Automatically moves objects between two access tiers based on changing access patterns
• Amazon Glacier
- – Low cost object storage meant for archiving / backup
- – Data is retained for the longer term (10s of years)
- – Time to retrieve object: Expedited (1 to 5 minutes) / Standard (3 to 5 hours) / Bulk (5 to 12 hours)
- – Minimum storage duration of 90 days
• Amazon Glacier Deep Archive
— Time to retrieve object: Standard (12 hours) / Bulk (48 hours) / Minimum storage duration of 180 days
(Slide 115)
What are the S3 Lifecycle Rules?
Transition actions --- It defines when objects are transitioned to another storage class (move objects to Standard IA class 60 days after creation)
Expiration actions
— Configure objects to expire (delete) after some time
Can be used to delete old version of files
- Rules can be created for a certain prefix (ex - s3://mybucket/mp3/*)
- Rules can be created for certain objects tags (ex - Department: Finance)
How does S3 Performance work?
Amazon S3 automatically scales to high request rates, latency 100-200 ms
Your application can achieve at least 3,500
PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per prefix in a bucket.
How does upload in S3 work?
We have two options:
• Multi-Part upload: • recommended for files > 100MB, must use for files > 5GB • Can help parallelize uploads (speed up transfers)
• S3 Transfer Acceleration • Increase transfer speed by transferring file to an AWS edge location which will forward the data to the S3 bucket in the target region • Compatible with multi-part upload
How does Download in S3 work?
You can use S3 byte-range feches:
• Parallelize GETs by requesting
specific byte ranges
• Better resilience in case of failures
Can be used to speed up downloads
Can be used to retrieve only partial
data (for example the head of a
file)
How does S3 Encryption work?
There are 4 methods of encrypting objects in S3
• SSE-S3: encrypts S3 objects using keys handled & managed by AWS
— AES-256 encryption type
— Must set header: “x-amz-server-side-encryption”: “AES256”
• SSE-KMS: leverage AWS Key Management Service to manage
encryption keys
— KMS Advantages: user control + audit trail
— Must set header: “x-amz-server-side-encryption”: ”aws:kms”
• SSE-C: when you want to manage your own encryption keys
- – HTTPS must be used
- – Encryption key must provided in HTTP headers, for every HTTP request made
• Client Side Encryption
— Customer fully manages the keys and encryption cycle
(slide 128)
How does S3 Security Access work?
User based
• IAM policies - which API calls should be allowed for a specific user from IAM console
Resource Based
• Bucket Policies - bucket wide rules from the S3 console - allows cross
account
• Object Access Control List (ACL) – finer grain
• Bucket Access Control List (ACL) – less common
Note: an IAM principal can access an S3 object if
• the user IAM permissions allow it OR the resource policy ALLOWS it
• AND there’s no explicit DENY
How does S3 Security work?
Can be user based (IAM polices), Resource Based (bucket policies).
Networking - Supports VPC Endpoints
Logging and Audit - S3 Access Logs can be stored in other S3 bucket / API calls can be logged in AWS CloudTrail
User Security - MFA Delete / Pre-Signed URLs: URLs that are valid only for a limited time (ex:
premium video service for logged in users)
How does DynamoDb Partition work?
- You start with one partition
- Each partition:
- Max of 3000 RCU / 1000 WCU
- Max of 10GB
- To compute the number of partitions:
- By capacity: (TOTAL RCU / 3000) + (TOTAL WCU / 1000)
- By size: Total Size / 10 GB
- Total partitions = CEILING(MAX(Capacity, Size))
• WCU and RCU are spread evenly between partitions
How do DynamoDb Conditional Writes work?
- Accept a write / update only if conditions are respected, otherwise reject
- Helps with concurrent access to items
- No performance impact
How do DynamoDb Batching Writes work? What are the benefits?
- BatchWriteItem
- Up to 25 PutItem and / or DeleteItem in one call
- Up to 16 MB of data written
- Up to 400 KB of data per item
- Batching allows you to save in latency by reducing the number of API calls done against DynamoDB
- Operations are done in parallel for better efficiency
- It’s possible for part of a batch to fail, in which case we have the try the failed items (using exponential back-off algorithm)
How do DynamoDb Batching Read work? What are the benefits?
- GetItem:
- Read based on Primary key
- Primary Key = HASH or HASH-RANGE
- Eventually consistent read by default
- Option to use strongly consistent reads (more RCU - might take longer)
- ProjectionExpression can be specified to include only certain attributes
- BatchGetItem:
- Up to 100 items
- Up to 16 MB of data
- Items are retrieved in parallel to minimize latency
DynamoDB – Query
- Query returns items based on:
- PartitionKey value (must be = operator)
- SortKey value (=, , >=, Between, Begin) – optional
- FilterExpression to further filter (client side filtering)
- Returns:
- Up to 1 MB of data
- Or number of items specified in Limit
- Able to do pagination on the results
- Can query table, a local secondary index, or a global secondary index
DynamoDB - Scan
• Scan the entire table and then filter out data (inefficient)
• Returns up to 1 MB of data – use pagination to keep on reading
• Consumes a lot of RCU
• Limit impact using Limit or reduce the size of the result and pause
• For faster performance, use parallel scans:
• Multiple instances scan multiple partitions at the same time
• Increases the throughput and RCU consumed
• Limit the impact of parallel scans just like you would for Scans
• Can use a ProjectionExpression + FilterExpression (no change to RCU)