Doman 2 - Storage Flashcards

1
Q

You are about to enter the Christmas sale and you know a few items in your website are very popular and will be read often. Last year you had a ProvisionedThroughputExceededException. What should you do this year?

A

Create a DAX cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

You would like to react in real-time to users de-activating their account and send them an email to try to bring them back. The best way of doing it is to…

A

Integrate Lambda with a DynamoDB stream

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the maximum number of fields that can be made a primary key in DynamoDB?

A

2, partition key + sort key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the maximum size of a row in DynamoDB?

A

400KB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

You are writing an item of 8 KB in size at the rate of 12 per seconds. How many WCU do you need?

A

1 WCU = 1KB/s, so total 96 WCU

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

You are doing a strongly consistent read of 10 KB items at the rate of 10 per second. What RCU do you need?

A

30, strong consistent uses 1 RCU; 10/4 = 3

Note: eventual consistency is 0.5 RCU;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

You would like to have DynamoDB automatically delete old data for you. What should you use?

1) Use TTL
2) Use DynamoDB Streams
3) Use DAX
4) Use a Lambda function

A

1) Use TTL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
#S3
What is the consistency model of S3?
A
  • Read after write consistency for PUTS of new objects;
    Exception: if you did GET-PUT-GET, you will get 404 due to eventual consistency - why?
  • Eventual consistency for DELETES and PUTS of existing objects
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
#S3 
What are the storage class/tiers of S3
A

Standard, IA, One Zone IA, Glacier, Glacier Deep Archive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
#S3
What is the largest size you can store in the S3 or Glacier?
A

S3: 5TB; Glacier: 40TB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

S3

What are the 3 data retrieval
options for S3 Glacier?

A

Expedited: 1 - 5 min
Standard: 3 - 5 hours
Bulk: 5 - 12 hours

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

S3

What are the 3 data retrieval
options for S3 Glacier Deep Archive?

A

Standard (12 hours);

Bulk (48 hours)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
#S3
What is the minimum storage period for S3 Glacier and S3 Glacier Deep Archive?
A

S3 IA and S3 One Zone IA: 30 days
S3 Glacier: 90 days
S3 Deep Archive: 180 days

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
#S3
What is S3 LIfecycle Rules?
A

S3 Lifecycle rules can be used to define

  • Transition actions, and
  • Expiration actions

Rules can be applied to prefixes and tags

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
#S3
What is S3 Versioning?
A
  • S3 versioning can be enbled at bucket level and can be suspended later
  • Can be used to prevent unintended delete; you can restore a delete object to a previous versions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
#S3
What is cross region replication (CRR)?
A

With IAM permissions, S3 can asynchronously copy data across regions

  • You can change the storage class, e.g. standard -> Glacier
  • Replication can be based on tag or prefix and you MUST enable versioning on both source and destination buckets
17
Q
#S3
What is S3 Etag?
A

S3 ETag is Md5 hash used to ensure integrity of the object in S3.

You can calculate the MD5 hash of your file and compare that with upload file’s ETag (calculated by AWS)

18
Q
#S3
What is the baseline performance for S3?
A
  • Latency: 100 - 200 ms
  • 3,500 PUT/COPY/POST/DELETE and 5,550 for GET / HEAD request per second per prefix in a bucket

Prefix = object path

19
Q
#S3
Why do I have to care about KMS quota if I use SSE-KMS?
A

This is because upload and download all need to make KMS API calls if SSE-KMS is used;

There is a HARD quota on # of API requests per second on KMS

20
Q
#S3
What are ways to improve the upload and download S3 performance?
A

For upload

  1. Multi-part upload
  2. S3 Transfer Accelerator

For downloads
1. Byte-range Fetch

21
Q
#S3
What is S3 Byte-Range Fetch?
A

S3 Byte-Range Fetch can be used to parallel download a S3 file or download part of the file such as first # of bytes (header)

22
Q
#S3
What is S3 Transfer Accelerator?
A

S3 Transfer Accelerator increase upload transfer speed by transferring data to a edge location and then forward data to S3 bucket in the target region.

S3 Transfer Accelerator is compatible with multi-part upload

23
Q
#S3
What are the 4 methods of data encryption in S3?
A
  1. SSE-S3 - keys managed by S3
  2. SSE-KMS - keys managed by you with KMS
  3. SSE-C - keys managed by you; You need to transfer the key in the HTTPS header; HTTPS must be used; S3 doesn’t store your key
  4. Client-side Encryption - you encrypted your data

You can define the default encryption (SSE-S3 or SSE-KMS) for a given bucket

24
Q
#S3
What is Glacier Vault?
A

Objects in Glacier are stored in vaults, each vault has ONE vault policy and ONE vault lock policy

25
Q
#S3
What is a Glacier Vault Lock Policy?
A

Vault Lock Policy is a policy for regulatory and compliance reasons.

The policy is immutable, i.e. once set it can not be changed.

Use cases: 1) forbid deleting an archive 2) implement WORM policy (write once and read many times)

26
Q
#S3
What is S3 Select and Glacier Select?
A

S3/Glacier’s feature that allows you to retrieve less data by performing server-side filtering.

You can use SQL to select rows or columns; Less data transfer and less CPU cost on client-side;

Can be used with Hadoop/EMR for efficient big data processing; can be up to 400% speed up and 80% cost saving

27
Q
#DDB
What is RCU and WCU for DynamoDB?
A

When a table is created in DynamoDB, you need to define its READ and WRITE capacity in # of RCU and WCU.

1 RCU is 4KB/s for strong consistent read or 8KB/s for eventual consistency read

1 WCU is 1KB/s write

Note: WCU and RCU for a table is EVENLY spread among all partitions

28
Q
#DDB
What is the consistency model in DynamoDB?
A
  • Eventual consistency read

- Strong consistency read

29
Q
#DDB
What the solutions for capacity exceeded exception in DynamoDB?
A
  1. Exponential back-off
  2. Better distribution key
  3. DAX for read capacity issue
30
Q
#DDB
What are the max # of RCU and WCU a DynamoDB table partition has? What the max data size a partition can have?
A

3000 RCU and 1000 WCU

10GB

RCU and WCU of a table is evenly allocated to each partition - thus we can have HOT partition issue

31
Q
#DDB
What are two options for primary key for a DynamoDB table?
A
  1. Distribution / Hash key

2 Distribution key + Sort / Range key

32
Q
#DDB
How do you read data from DynamoDB?
A
  1. Use API, you can do GetItem or BatchGetItem(), batch read allow you read max 16MB or 100 items
  2. Query - you can only query on partition key and optionally range on range / sort key, you can get up to 1MB data each time
  3. Scan - you can get up to 1MB each time, but you use ProjectionExpression and FilterExpression to filter out results
33
Q
#DDB
What are differences between ProjectionExpression and FilterExpression?
A

ProjectExpression is used to select certain attributes on DynamoDB side

FilterExpression is used to filter out results on the Client side, and has no impact on RCU

34
Q
#DDB
What is DDB TTL? Is there any cost associated with it?
A

DDB TTL is used to automatically delete an item after expiry data/time; It is a background task performed by DDB

TTL is provided at no extra cost; deletion do not cost WCU / RCU

35
Q
# DDB
How can I recover the items deleted by DDB TTL?
A

DDB Streams can help recover expired items - their retention period is 24 hours!

36
Q
#DDB
How can I enable and use DDB TTL?
A
  • TTL is enabled per row - you can add a timestamp column for TTL;
  • Rows never expire do not need to have this enabled
    You can use Unix Epoch value for a TTL column
  • DDB typically deletes expired items within 48 hours of expiration
  • Deleted items due to TTL are also deleted in GSI / LSI
37
Q
#DDB
What is DynamoDB Global Table?
A

DynamoDB global tables provide a fully managed solution for deploying a multi-region, multi-master database, without having to build and maintain your own replication solution.

A couple of things to remember
- ACID is guaranteed on local transactions (region of the writer), for other regions, it is eventually consistent;

  • Transaction across regions are not supported
  • Global Tables enable you to read and write your data locally with single milliseconds latency
  • Multi-region Fault Tolerance - in case of regional failure, you can redirect your application to another region
  • You can use TTL and replicate TTL deletes to all replica tables

Note: in order to use DDB Global Table, DDB Stream MUST be enabled !

38
Q
#DDB
What is DynamoDB Local?
A

DynamoDB Local allows you to run DynamoDB locally for development, e.g. you can run DDB as a Docker container.