Doman 2 - Storage Flashcards
You are about to enter the Christmas sale and you know a few items in your website are very popular and will be read often. Last year you had a ProvisionedThroughputExceededException. What should you do this year?
Create a DAX cluster
You would like to react in real-time to users de-activating their account and send them an email to try to bring them back. The best way of doing it is to…
Integrate Lambda with a DynamoDB stream
What is the maximum number of fields that can be made a primary key in DynamoDB?
2, partition key + sort key
What is the maximum size of a row in DynamoDB?
400KB
You are writing an item of 8 KB in size at the rate of 12 per seconds. How many WCU do you need?
1 WCU = 1KB/s, so total 96 WCU
You are doing a strongly consistent read of 10 KB items at the rate of 10 per second. What RCU do you need?
30, strong consistent uses 1 RCU; 10/4 = 3
Note: eventual consistency is 0.5 RCU;
You would like to have DynamoDB automatically delete old data for you. What should you use?
1) Use TTL
2) Use DynamoDB Streams
3) Use DAX
4) Use a Lambda function
1) Use TTL
#S3 What is the consistency model of S3?
- Read after write consistency for PUTS of new objects;
Exception: if you did GET-PUT-GET, you will get 404 due to eventual consistency - why? - Eventual consistency for DELETES and PUTS of existing objects
#S3 What are the storage class/tiers of S3
Standard, IA, One Zone IA, Glacier, Glacier Deep Archive
#S3 What is the largest size you can store in the S3 or Glacier?
S3: 5TB; Glacier: 40TB
S3
What are the 3 data retrieval
options for S3 Glacier?
Expedited: 1 - 5 min
Standard: 3 - 5 hours
Bulk: 5 - 12 hours
S3
What are the 3 data retrieval
options for S3 Glacier Deep Archive?
Standard (12 hours);
Bulk (48 hours)
#S3 What is the minimum storage period for S3 Glacier and S3 Glacier Deep Archive?
S3 IA and S3 One Zone IA: 30 days
S3 Glacier: 90 days
S3 Deep Archive: 180 days
#S3 What is S3 LIfecycle Rules?
S3 Lifecycle rules can be used to define
- Transition actions, and
- Expiration actions
Rules can be applied to prefixes and tags
#S3 What is S3 Versioning?
- S3 versioning can be enbled at bucket level and can be suspended later
- Can be used to prevent unintended delete; you can restore a delete object to a previous versions
#S3 What is cross region replication (CRR)?
With IAM permissions, S3 can asynchronously copy data across regions
- You can change the storage class, e.g. standard -> Glacier
- Replication can be based on tag or prefix and you MUST enable versioning on both source and destination buckets
#S3 What is S3 Etag?
S3 ETag is Md5 hash used to ensure integrity of the object in S3.
You can calculate the MD5 hash of your file and compare that with upload file’s ETag (calculated by AWS)
#S3 What is the baseline performance for S3?
- Latency: 100 - 200 ms
- 3,500 PUT/COPY/POST/DELETE and 5,550 for GET / HEAD request per second per prefix in a bucket
Prefix = object path
#S3 Why do I have to care about KMS quota if I use SSE-KMS?
This is because upload and download all need to make KMS API calls if SSE-KMS is used;
There is a HARD quota on # of API requests per second on KMS
#S3 What are ways to improve the upload and download S3 performance?
For upload
- Multi-part upload
- S3 Transfer Accelerator
For downloads
1. Byte-range Fetch
#S3 What is S3 Byte-Range Fetch?
S3 Byte-Range Fetch can be used to parallel download a S3 file or download part of the file such as first # of bytes (header)
#S3 What is S3 Transfer Accelerator?
S3 Transfer Accelerator increase upload transfer speed by transferring data to a edge location and then forward data to S3 bucket in the target region.
S3 Transfer Accelerator is compatible with multi-part upload
#S3 What are the 4 methods of data encryption in S3?
- SSE-S3 - keys managed by S3
- SSE-KMS - keys managed by you with KMS
- SSE-C - keys managed by you; You need to transfer the key in the HTTPS header; HTTPS must be used; S3 doesn’t store your key
- Client-side Encryption - you encrypted your data
You can define the default encryption (SSE-S3 or SSE-KMS) for a given bucket
#S3 What is Glacier Vault?
Objects in Glacier are stored in vaults, each vault has ONE vault policy and ONE vault lock policy
#S3 What is a Glacier Vault Lock Policy?
Vault Lock Policy is a policy for regulatory and compliance reasons.
The policy is immutable, i.e. once set it can not be changed.
Use cases: 1) forbid deleting an archive 2) implement WORM policy (write once and read many times)
#S3 What is S3 Select and Glacier Select?
S3/Glacier’s feature that allows you to retrieve less data by performing server-side filtering.
You can use SQL to select rows or columns; Less data transfer and less CPU cost on client-side;
Can be used with Hadoop/EMR for efficient big data processing; can be up to 400% speed up and 80% cost saving
#DDB What is RCU and WCU for DynamoDB?
When a table is created in DynamoDB, you need to define its READ and WRITE capacity in # of RCU and WCU.
1 RCU is 4KB/s for strong consistent read or 8KB/s for eventual consistency read
1 WCU is 1KB/s write
Note: WCU and RCU for a table is EVENLY spread among all partitions
#DDB What is the consistency model in DynamoDB?
- Eventual consistency read
- Strong consistency read
#DDB What the solutions for capacity exceeded exception in DynamoDB?
- Exponential back-off
- Better distribution key
- DAX for read capacity issue
#DDB What are the max # of RCU and WCU a DynamoDB table partition has? What the max data size a partition can have?
3000 RCU and 1000 WCU
10GB
RCU and WCU of a table is evenly allocated to each partition - thus we can have HOT partition issue
#DDB What are two options for primary key for a DynamoDB table?
- Distribution / Hash key
2 Distribution key + Sort / Range key
#DDB How do you read data from DynamoDB?
- Use API, you can do GetItem or BatchGetItem(), batch read allow you read max 16MB or 100 items
- Query - you can only query on partition key and optionally range on range / sort key, you can get up to 1MB data each time
- Scan - you can get up to 1MB each time, but you use ProjectionExpression and FilterExpression to filter out results
#DDB What are differences between ProjectionExpression and FilterExpression?
ProjectExpression is used to select certain attributes on DynamoDB side
FilterExpression is used to filter out results on the Client side, and has no impact on RCU
#DDB What is DDB TTL? Is there any cost associated with it?
DDB TTL is used to automatically delete an item after expiry data/time; It is a background task performed by DDB
TTL is provided at no extra cost; deletion do not cost WCU / RCU
# DDB How can I recover the items deleted by DDB TTL?
DDB Streams can help recover expired items - their retention period is 24 hours!
#DDB How can I enable and use DDB TTL?
- TTL is enabled per row - you can add a timestamp column for TTL;
- Rows never expire do not need to have this enabled
You can use Unix Epoch value for a TTL column - DDB typically deletes expired items within 48 hours of expiration
- Deleted items due to TTL are also deleted in GSI / LSI
#DDB What is DynamoDB Global Table?
DynamoDB global tables provide a fully managed solution for deploying a multi-region, multi-master database, without having to build and maintain your own replication solution.
A couple of things to remember
- ACID is guaranteed on local transactions (region of the writer), for other regions, it is eventually consistent;
- Transaction across regions are not supported
- Global Tables enable you to read and write your data locally with single milliseconds latency
- Multi-region Fault Tolerance - in case of regional failure, you can redirect your application to another region
- You can use TTL and replicate TTL deletes to all replica tables
Note: in order to use DDB Global Table, DDB Stream MUST be enabled !
#DDB What is DynamoDB Local?
DynamoDB Local allows you to run DynamoDB locally for development, e.g. you can run DDB as a Docker container.