DynamoDB Flashcards

1
Q

What are the different data types supported by DDB?

A

Scalar - String, Number, Binary, Boolean, Null
Document - List, Map
Set - String, Number, Binary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the max item size?

A

400 KB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between Provisioned and On-Demand Mode?

A

Provisioned - specify number of reads/writes per second; plan capacity beforehand; pay for provisioned RCUs/WCUs
On-Demand - auto scaling RCUs/WCUs; pay for what you use (more expensive)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What happens when throughput maximum is reached in provisioned mode?

A

Can be exceeded temporarily using ‘Burst Capacity’
- If this capacity has been consumed, will receive a ‘ProvisionedThroughputExceededException’
- Use exponential backoff retry on this exception

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is one WCU?

A

One write per second for an item up to 1KB in size (if the item is larger than 1KB, WCUs round to up to the next whole number - e.g. 6 writes per second for 4.5KB -> 6 * 5 WCUs due to rounding).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the two different types of read, and what are their RCU costs?

A

Strongly vs. eventually consistent read (strongly is twice the RCU cost of eventually)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you force a strongly consistent read on query?

A

Set ‘ConsistentRead’ parameter to Truew in API calls (GetItem, BatchGetItem, Query, Scan)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is one RCU?

A

One Strongly Consistent Read or two Eventually Consistent Reads per second, for an item up to 4KB in size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How are RCUs and WCUs spread across partitions?

A

Evenly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are some common causes of ‘ProvisionedThroughputExceededException’?

A

Hot keys - one partition key being read too many times (e.g., popular item)
Hot partition - partition keys do not have enough cardinality
Very large items - RCU and WCU depends on size of items, so large items consumer more units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does On-Demand mode differ in terms of reads/writes? Give a use case of On-Demand mode.

A

Charged for reads/writes that you use in terms of RRUs and WRUs (read request units and write request units)
- 2.5x more expensive than provisioned capacity
- Use cases: unknown workloads, unpredictable application traffic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference between PutItem and UpdateItem?

A

PutItem - creates a new item or fully replace an old item
UpdateItem - edits the specified attributes of an existing item, or adds a new item if one doesn’t exists

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a ProjectionExpression in a GetItem request?

A

Can be specified to retrieve only certain attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What parameters can you specify in a query?

A

KeyConditionExpression - Partition key equals (required); Sort key conditions (<, > etc.) (optional)
FilterExpression - Additional filtering after the query operation, use only with non-key attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the size limit on the return value of a query, and how should you get more data than this limit?

A

1MB - use pagination to get more data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

State the properties of a BatchWriteItem call.

A
  • Up to 25 PutItem and/or DeleteItem in one call (no UpdateItem)
  • Up to 16MB of data written, up to 400KB of data per item
  • UnprocessedItems property for failed write operations (exponential backoff or add WCU)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

State the properties of a BatchGetItem call.

A
  • Return items from one or more tables
  • Returns up to 100 items, up to 16MB of data
  • UnprocessedKeys property for failed read operations.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the difference between a filter expression and condition expression?

A

Filter is for reads, Condition is for writes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Describe LSIs

A

Alternative sort key for your table (using the same partition key)
- Sort Key consists of one scalar attribute
- Up to 5 LSIs per table
- Must be defined at table creation time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Describe GSIs

A

Alternative Primary Key
- Speed up queries on non-key attributes
- Index Key consists of scalar attributes
- Must provision RCUs and WCUs for the index
- Can be added/modified after table creation

21
Q

How does throttled differ between GSIs and LSIs?

A

Even though GSIs are given their own WCUs and RCUs, if writes are throttled on the GSI, then the main table will also be throttled.

LSIs use the capacity from the main table, so no special considerations.

22
Q

How does optimistic locking work?

A

Conditional write - each item has an attribute that acts as a version number. The write request passes in the version it wants to change, and if there is a mismatch, the write does not go through.

23
Q

Give a high level description of DAX, and the problem that it solves.

A

DDB accelerator - a highly available and seamless cache.
- Solves hot key problem (too many reads, these now come from the cache)

24
Q

When would you use ElastiCache or DAX?

A

DAX - good for individual objects resulting from query or scan
ElastiCache - good for aggregation results (i.e., logic/computation applied to results)

25
Q

What is a DDB stream?

A

An ordered list of item-level modifications in a table

26
Q

What can you do with a DDB stream?

A
  • Send to Kinesis Data Streams
  • Read with Lambda
  • Read with Kinesis Client Library applications
27
Q

How long is the data retention for DDB streams?

A

24 hours

28
Q

What is a use case for DDB streams?

A
  • React to changes in real-time
  • Analytics
  • Insert into derivative tables
  • Insert into ElasticSearch
  • Implement cross region replication
29
Q

What information can be written to a DDB stream?

A

KEYS_ONLY - key attributes of the item
NEW_IMAGE - entire item after modification
OLD_IMAGE - entire item pre-modification
NEW_AND_OLD_IMAGES

30
Q

What happens with old DDB items when enabling a DDB stream?

A

Records are not retroactively populated in a stream after enabling it.

31
Q

What considerations should be made about using TTL?

A
  • Doesn’t consume WCUs
  • Expired items are deleted within 48 hours of expiration (meaning that expired items may appear in reads unless explictly filtered out)
32
Q

What is a use case of TTL?

A
  • Reduced stored data by keeping only current items
  • Adhere to regulatory obligations
33
Q

What is the DDB CLI parameter you would use to specify which exact attributes you want to retrieve?

A

–projection-expression

34
Q

What is the DDB CLI parameter you would use to filter the items before they are returned on a read?

A

–filter-expression

35
Q

What CLI pagination options do you have?

A

–page-size: retrieve the full list of items but with a larger number of API calls instead of one API call (page size indicates max number of items per API call)
–max-items: max number of items to show in the CLI (returns a NextToken)
–starting-token: specify the last NextToken to retrieve the next set of items

36
Q

What is a transactional action, and how does it affect RCU/WCU?

A

Coordinated, all or nothing operations to multiple items across one or more tables
- If one read/write fails, all operations are rolled-back
Consumes 2x WCUs and RCUs (DDB performs 2 operations for each item - prepare and commit)

37
Q

What are the two key transaction operations in DDB?

A

TransactGetItems - one or more GetItem operations
TransactWriteItems - one or more PutItem, UpdateItem, DeleteItem operations

38
Q

When would you use DDB as a Session State Cache vs. ElastiCache, EFS, EBS & Instance Store or S3?

A
  • ElastiCache and DDB are key/value stores; ElastiCache is in-memory, DDB is serverless (and so offers options such as auto-scaling)
  • EFS must be attached to EC2 instances as a network drive, cannot be used with Lambda (for example); EFS is a file system vs. DDB as a database
  • EBS & Instance store can only be used for local caching, not shared caching
  • S3 is higher latency, not meant for small objects
39
Q

What strategy can be used to resolve hot partitions (say, 2 overall partition keys)?

A

Add a suffix to the partition key value
- Random or calculated suffix
This will distribute items evenly across partitions

40
Q

How can you solve the issue of concurrent writes?

A

Concurrent writes are when two (or more writes) occur on an object at once, overwriting each other
- Use conditional writes (update value = 1 only if value equals 0; known as Optimistic Locking)
- Use atomic writes (increase the value by 1)

41
Q

What is the DDB large objects pattern?

A

Can’t store objects larger than 400kb in DDB
- Store in S3, then capture the S3 metadata (including url) in DDB

42
Q

What is the DDB indexing S3 objects metadata pattern?

A

Upload to S3
- Invoke Lambda function to store the metadata of the object within DDB
- Query DDB for the specific information around the object without having to access S3

43
Q

How would you clean a table of all items?

A

Drop Table and Recreate
- Fast, efficient and cheap
Scan & DeleteItem
- Very slow, consumes RCU and WCU, expensive

44
Q

How would you copy a DDB table to a new table?

A
  • Use AWS Data Pipeline (this will launch an EMR cluster to read the table and write it into S3, and then read from S3 and write to a new table)
  • Backup and restore into a new table
  • Scan & PutItem or BatchWriteItem (can write own code to do some transformations as well)
45
Q

What security is there around DDB?

A
  • VPC endpoints available to access DDB without using the internet
  • Access fully controlled by IAM
  • Encryption at rest using AWS KMS and in-transit using SSL/TLS
46
Q

When would you use Global Tables?

A

For multi-region, multi-active, fully replicated, high performance tables.

47
Q

What can be used to move from a relational DB to DDB?

A

AWS Database Migration Service (DMS)

48
Q

How do you allow users to interact with the DDB with fine grained access control?

A

Do not give the user IAM access to the table
- Use an identity provider (cognito, google, SAML etc.) to provide temporary credentials and a restricted IAM role (restricted to only the data they own - LeadingKeys: cognito user - limit row level access for users based on the Primary Key, and limit to specific attributes)

49
Q

You want to perform a Scan operation on a DynamoDB table to retrieve all the items. What should you do to increase the performance of your scan operation, and why do you need to do this?

A

Parallel scans
- Default behaviour is sequential, as a scan operation can only read from one partition at a time
- To address these issues, the Scan operation can logically divide a table or secondary index into multiple segments, with multiple application workers scanning the segments in parallel