Doman 2 - Storage Flashcards by Haitao Jiang

You are about to enter the Christmas sale and you know a few items in your website are very popular and will be read often. Last year you had a ProvisionedThroughputExceededException. What should you do this year?

Create a DAX cluster

How well did you know this?

Not at all

Perfectly

You would like to react in real-time to users de-activating their account and send them an email to try to bring them back. The best way of doing it is to…

Integrate Lambda with a DynamoDB stream

How well did you know this?

Not at all

Perfectly

What is the maximum number of fields that can be made a primary key in DynamoDB?

2, partition key + sort key

How well did you know this?

Not at all

Perfectly

What is the maximum size of a row in DynamoDB?

400KB

How well did you know this?

Not at all

Perfectly

You are writing an item of 8 KB in size at the rate of 12 per seconds. How many WCU do you need?

1 WCU = 1KB/s, so total 96 WCU

How well did you know this?

Not at all

Perfectly

You are doing a strongly consistent read of 10 KB items at the rate of 10 per second. What RCU do you need?

30, strong consistent uses 1 RCU; 10/4 = 3

Note: eventual consistency is 0.5 RCU;

How well did you know this?

Not at all

Perfectly

You would like to have DynamoDB automatically delete old data for you. What should you use?

1) Use TTL
2) Use DynamoDB Streams
3) Use DAX
4) Use a Lambda function

1) Use TTL

How well did you know this?

Not at all

Perfectly

#S3
What is the consistency model of S3?

Read after write consistency for PUTS of new objects;
Exception: if you did GET-PUT-GET, you will get 404 due to eventual consistency - why?
Eventual consistency for DELETES and PUTS of existing objects

How well did you know this?

Not at all

Perfectly

#S3 
What are the storage class/tiers of S3

Standard, IA, One Zone IA, Glacier, Glacier Deep Archive

How well did you know this?

Not at all

Perfectly

#S3
What is the largest size you can store in the S3 or Glacier?

S3: 5TB; Glacier: 40TB

How well did you know this?

Not at all

Perfectly

What are the 3 data retrieval
options for S3 Glacier?

Expedited: 1 - 5 min
Standard: 3 - 5 hours
Bulk: 5 - 12 hours

How well did you know this?

Not at all

Perfectly

What are the 3 data retrieval
options for S3 Glacier Deep Archive?

Standard (12 hours);

Bulk (48 hours)

How well did you know this?

Not at all

Perfectly

#S3
What is the minimum storage period for S3 Glacier and S3 Glacier Deep Archive?

S3 IA and S3 One Zone IA: 30 days
S3 Glacier: 90 days
S3 Deep Archive: 180 days

How well did you know this?

Not at all

Perfectly

#S3
What is S3 LIfecycle Rules?

S3 Lifecycle rules can be used to define

Transition actions, and
Expiration actions

Rules can be applied to prefixes and tags

How well did you know this?

Not at all

Perfectly

#S3
What is S3 Versioning?

S3 versioning can be enbled at bucket level and can be suspended later
Can be used to prevent unintended delete; you can restore a delete object to a previous versions

How well did you know this?

Not at all

Perfectly

#S3
What is cross region replication (CRR)?

With IAM permissions, S3 can asynchronously copy data across regions

You can change the storage class, e.g. standard -> Glacier
Replication can be based on tag or prefix and you MUST enable versioning on both source and destination buckets

#S3
What is S3 Etag?

S3 ETag is Md5 hash used to ensure integrity of the object in S3.

You can calculate the MD5 hash of your file and compare that with upload file’s ETag (calculated by AWS)

#S3
What is the baseline performance for S3?

Latency: 100 - 200 ms
3,500 PUT/COPY/POST/DELETE and 5,550 for GET / HEAD request per second per prefix in a bucket

Prefix = object path

#S3
Why do I have to care about KMS quota if I use SSE-KMS?

This is because upload and download all need to make KMS API calls if SSE-KMS is used;

There is a HARD quota on # of API requests per second on KMS

#S3
What are ways to improve the upload and download S3 performance?

For upload

Multi-part upload
S3 Transfer Accelerator

For downloads
1. Byte-range Fetch

#S3
What is S3 Byte-Range Fetch?

S3 Byte-Range Fetch can be used to parallel download a S3 file or download part of the file such as first # of bytes (header)

#S3
What is S3 Transfer Accelerator?

S3 Transfer Accelerator increase upload transfer speed by transferring data to a edge location and then forward data to S3 bucket in the target region.

S3 Transfer Accelerator is compatible with multi-part upload

#S3
What are the 4 methods of data encryption in S3?

SSE-S3 - keys managed by S3
SSE-KMS - keys managed by you with KMS
SSE-C - keys managed by you; You need to transfer the key in the HTTPS header; HTTPS must be used; S3 doesn’t store your key
Client-side Encryption - you encrypted your data

You can define the default encryption (SSE-S3 or SSE-KMS) for a given bucket

#S3
What is Glacier Vault?

Objects in Glacier are stored in vaults, each vault has ONE vault policy and ONE vault lock policy

``` #S3 What is a Glacier Vault Lock Policy? ```

Vault Lock Policy is a policy for regulatory and compliance reasons. The policy is immutable, i.e. once set it can not be changed. Use cases: 1) forbid deleting an archive 2) implement WORM policy (write once and read many times)

``` #S3 What is S3 Select and Glacier Select? ```

S3/Glacier's feature that allows you to retrieve less data by performing server-side filtering. You can use SQL to select rows or columns; Less data transfer and less CPU cost on client-side; Can be used with Hadoop/EMR for efficient big data processing; can be up to 400% speed up and 80% cost saving

``` #DDB What is RCU and WCU for DynamoDB? ```

When a table is created in DynamoDB, you need to define its READ and WRITE capacity in # of RCU and WCU. 1 RCU is 4KB/s for strong consistent read or 8KB/s for eventual consistency read 1 WCU is 1KB/s write Note: WCU and RCU for a table is EVENLY spread among all partitions

``` #DDB What is the consistency model in DynamoDB? ```

- Eventual consistency read | - Strong consistency read

``` #DDB What the solutions for capacity exceeded exception in DynamoDB? ```

1. Exponential back-off 2. Better distribution key 3. DAX for read capacity issue

``` #DDB What are the max # of RCU and WCU a DynamoDB table partition has? What the max data size a partition can have? ```

3000 RCU and 1000 WCU 10GB RCU and WCU of a table is evenly allocated to each partition - thus we can have HOT partition issue

``` #DDB What are two options for primary key for a DynamoDB table? ```

1. Distribution / Hash key 2 Distribution key + Sort / Range key

``` #DDB How do you read data from DynamoDB? ```

1. Use API, you can do GetItem or BatchGetItem(), batch read allow you read max 16MB or 100 items 2. Query - you can only query on partition key and optionally range on range / sort key, you can get up to 1MB data each time 3. Scan - you can get up to 1MB each time, but you use ProjectionExpression and FilterExpression to filter out results

``` #DDB What are differences between ProjectionExpression and FilterExpression? ```

ProjectExpression is used to select certain attributes on DynamoDB side FilterExpression is used to filter out results on the Client side, and has no impact on RCU

``` #DDB What is DDB TTL? Is there any cost associated with it? ```

DDB TTL is used to automatically delete an item after expiry data/time; It is a background task performed by DDB TTL is provided at no extra cost; deletion do not cost WCU / RCU

``` # DDB How can I recover the items deleted by DDB TTL? ```

DDB Streams can help recover expired items - their retention period is 24 hours!

``` #DDB How can I enable and use DDB TTL? ```

- TTL is enabled per row - you can add a timestamp column for TTL; - Rows never expire do not need to have this enabled You can use Unix Epoch value for a TTL column - DDB typically deletes expired items within 48 hours of expiration - Deleted items due to TTL are also deleted in GSI / LSI

``` #DDB What is DynamoDB Global Table? ```

DynamoDB global tables provide a fully managed solution for deploying a multi-region, multi-master database, without having to build and maintain your own replication solution. A couple of things to remember - ACID is guaranteed on local transactions (region of the writer), for other regions, it is eventually consistent; - Transaction across regions are not supported - Global Tables enable you to read and write your data locally with single milliseconds latency - Multi-region Fault Tolerance - in case of regional failure, you can redirect your application to another region - You can use TTL and replicate TTL deletes to all replica tables Note: in order to use DDB Global Table, DDB Stream MUST be enabled !

``` #DDB What is DynamoDB Local? ```

DynamoDB Local allows you to run DynamoDB locally for development, e.g. you can run DDB as a Docker container.