DynamoDB Flashcards by Scott Stevens

My application needs to make a large number if individual GetItem calls to DynamoDB DynamoDB table, but I only need to return a subset of attributes for each item. What can I do to reduce my bandwidth AND the number of API calls that I make to DynamoDB. Are there any side benefits to the approach?

You would use a combination of BatchGetItem to limit the number of API calls and projection-expression to select only the attributes you need to reduce bandwidth consumption. BatchGetItem will also allow DynamoDB to retrieve the items in the request in parallel so there is a performance benefit outside of reducting API calls and bandwidth usage.

How well did you know this?

Not at all

Perfectly

I need to CLEAR a table of ALL data in DynamoDB. Should I use scan and item delete, or would I be better to drop the table and recreate? Why?

Scan and delete is slow, and expensive in terms of RCU. It is faster and cheaper to drop the table and recreate it.

How well did you know this?

Not at all

Perfectly

Do you need to provision WCU for a Global Secondary Index in DynamoDB? Why/Why not? If you do, what value should you set your GSI WCU?

Yes. To avoid potential throttling, the provisioned write capacity for a global secondary index should be equal or greater than the write capacity of the base table because new updates write to both the base table and global secondary index.

How well did you know this?

Not at all

Perfectly

In DynamoDB you are received a provisioned throughput exception during peak load several months ago. You are anticipating a spike in load next week. On analysis you see that several keys are getting read repeatedly. What technology could you use to alleviate the problem and what code changes would you need to make to your application?

DAX will cache reads from dynamo transparently with no need to update the application

How well did you know this?

Not at all

Perfectly

I have a table of product data consisting of attributes including product description, stock status, inventory volume, price and size. Is it possible in DynamoDB to only return the inventory volume and product description or do I need to return the whole table and filter locally?

Yes, you can use projection-expression to return only the fields you need
aws dynamodb get-item \
–table-name ProductCatalog \
–key file://key.json \
–projection-expression “Description, InventoryVolume”

How well did you know this?

Not at all

Perfectly

I am setting up a lambda function to read from a dynamoDB (or kinesis) stream. When coming setting the starting position for the function to read from, I get three options: Latest, Trim_Horizon, and At timestamp. What records in a stream would each of these process?

Latest: Process new records that get added to the stream
Trim_Horizon: Process all records in the stream
At Timestamp: Process records from a specific time

How well did you know this?

Not at all

Perfectly

If you need to create a DynamoDB global table what do you need to enable in Dynamo first to allow this? In terms of global tables, how many replica tables of a global table can you have in each region? If I need to have a strongly consistent read/write then what limitations are there?

Need to enable DynamoDB streams as these enable Dynamo to generate a change log to replicate data across regions. You can only have one replica table per region for a global table. Strongly consistent reads/writes must be performed within the same region. You can only have eventual consistent r/w across regions

How well did you know this?

Not at all

Perfectly

In Dynamo DB what two attributes make a composite key and what is the maximum size of EACH of the attributes

A composite key is made up of:
The Partition Key: 2KB
The SORT key: 1KB

How well did you know this?

Not at all

Perfectly

When performing a Query against DynamoDB, which operators can be used to query against the partition key and the sort key:
=, <=, =>, > , < , Between, Begin
Which key is mandatory in a query and what is the maximum data that can be returned (MB)? Can results be pagineated?

You can only use = for the partition key. The sort key allows for all the operators. The query must include an expression for the partition key. Sort Key is optional. The most data that can be retrieved is 1MB or the number of items specified in the Limit. Results can be pagineated

How well did you know this?

Not at all

Perfectly

Lets say that we have a DynamoDB database - but no matter what we do our data restricts us to having only a very limited number of partition keys. We need to avoid a hot partition as we will be doing a large number of writes to a small number of keys. What could you do (1 way) and how would you do it (2 similar ways)? (Hint: Dax won’t work as we are writing not reading)

You could use write sharding in dynamo DB by:
Suffixing the key value with a random number
Hashing the key value, performing a calculation on it and using that as the key
Each way will allow data to be spread over a partition. When using a random suffix, this can be difficult to read back out as its random and you may not know what it was. A repeatable calculation is more effective.

How well did you know this?

Not at all

Perfectly

What are the TWO advantages to using Batching in DynamoDB for Puts and Deletes?

Reduce the amount of API calls which lowers latency

2. Batching allows DynamoDB to process operations in parallel without needing you to update your code

How well did you know this?

Not at all

Perfectly

How many AZ’s does DynamoDB replicate to?

How well did you know this?

Not at all

Perfectly

I’ve enabled TTL deletion on DynamoDB, but have just realised that the value I have added is to short. Can I recover this data?

Assuming you have streams running, and you are within the 24hr retention window you could recover data from there

How well did you know this?

Not at all

Perfectly

I have a table in DynamoDB which I need to clear out all data so I can repopulate it. Is it more efficient to iteratively call DeleteItem, or to use DeleteTable and recreate the table from scratch?

DeleteTable and then recreating is the more efficient mechanism.

How well did you know this?

Not at all

Perfectly

How can you allow a PutItem, UpdateItem or DeleteItem to only succeed based on the value of an attribute? Is there a performance impact? Why would you need to do this?

Conditional Writes allow for an item to be updated only if an attribute is set to a particular value. If the attribute is not set to this value, the put/update/delete will fail. Conditional writes impose no performance overhead and are used to deal with concurrent access to an item.

How well did you know this?

Not at all

Perfectly

Can we autoscale our WCU and RCU’s in DynamoDB

Yes, we can specify a min and max number of RCU’s and WCU’s for DynamoDB autoscaling and specify a target utilisation for when autoscaling kicks in.

How well did you know this?

Not at all

Perfectly

When setting up a DynamoDB table, do we need to provision throughput? What units are these defined in and what are their values?

Yes, you need to provision read and write capacity units.
1WCU = 1Kb/Sec.
1 RCU = 1 strong or 2 eventually consistent reads of 4Kb/Sec

How well did you know this?

Not at all

Perfectly

I have an application that uses DynamoDB. We are using DAX to cache key object data from Dynamo to speed up application throughput. A new requirement has come up however for our client to perform complex aggregation calculations on the data. Often, these calculations will be the same across multiple clients. Can I use DAX to help speed up these calculations?

No. What you would do would be to use DAX to continue caching the object data, but architect your application to store the aggregation calculations within Elasticache.

How well did you know this?

Not at all

Perfectly

If you are sending a batch of PutItems to DynamoDB from your application, and some of these items fail - will DynamoDB automatically retry processing of these?

No, it will be upto you to attempt a retry on the batch. Retries should use exponential backoff.

How well did you know this?

Not at all

Perfectly

Assume we have a table set up in DynamoDB to score leaderboard data for a mobile online gaming platform. Each gamer has a unique identifier, and each time they play a game that game is assigned a unique identifier. We also store the time the game started, how long the played for, their score and if they won or lost (Outcome). Currently we have a partition key for UserId and a sort key for GameID so we can query efficiently for user/game combination. If we wanted to also include WIN/LOSS how would we go about this? If I wanted to limit results to WIN/LOSS under the current architecture how would this be done?

Under the current set up, if we wanted to also limit our results to combinations of UserID, GameID and WIN for instance, DynamoDB would need to perform a SCAN, which is inefficient. What we would need to do would be to set up a LOCAL SECONDARY INDEX using the OutCome Attribute. This would allow us to include this in the query

How well did you know this?

Not at all

Perfectly

Assume we have a table set up in DynamoDB to score leaderboard data for a mobile online gaming platform. Each gamer has a unique identifier, a handle and each time they play a game that game is assigned a unique identifier. We also store the time the game started, how long they played for, the name of the game,their score and if they won or lost (Outcome). Currently we have a partition key for UserId and a sort key for GameID so we can query efficiently for user/game combination. We have a new requirement to be able to query for the 10 latest played games, its name and the handle of the gamer who played. How could we achieve this?

We would create a Global Secondary Index which create a new table in DynamoDB. In this case our GSI would use GameID as a partition key, time stamp as a sort key and project the gamers handle and game name as attributes.

How well did you know this?

Not at all

Perfectly

Do UpdateItem and PutItem calls in DynamoDB consume the same amount of WCU’s? Why, Why not?

Yes. DynamoDB considers the size of the item as it appears before and after the update. The provisioned throughput consumed reflects the larger of these item sizes. Even if you update just a subset of the item’s attributes, UpdateItem will still consume the full amount of provisioned throughput (the larger of the “before” and “after” item sizes).

How well did you know this?

Not at all

Perfectly

Can I batch UpdateItem calls in DynamoDB?

How well did you know this?

Not at all

Perfectly

For DynamoDB streams, how many times would an item appear as a result of a Create/Delete/Update event in DynamoDB, what order would they appear in and how long are they persisted for(hours)?

An item will appear once, in the order in which the change occurred in DynamoDB. Items persist for 24 hours.

How well did you know this?

Not at all

Perfectly

If you are using a scan in DynamoDB, how can you improve its performance? What is the impact on RCU? How much data does a scan return in single operation?

Scans can be parallelized in DynamoDB and scan multiple partitions at the same time. There is a high RCU cost with parallel scans. A Scan will return 1MB per operation by default unless its paginated or limited.

In a dynamoDB, what does the FilterExpression do?Does this allow me to filter the results any further within DynamoDB QUERY? (I.e. based on an attributes value). Are there impacts on read capacity when using a filter expression?

No. A filter expression is applied AFTER a Query finishes, but BEFORE the results are returned. Therefore, a Query consumes the same amount of read capacity, regardless of whether a filter expression is present.

For DynamoDB DAX, what is the: - Default TTL for a cached Item - The maximum number of nodes and do you need to provision these - Is Dax Multi AZ?

TTL: 5 minutes Up to 10 nodes per cluster which you need to provision MultiAZ with 3 nodes minimum recommended for production

For DynamoDB, what time range does a point in time recovery allow you to restore to (min and max values)? Are point in time backups incremental or full and what performance impact is there when they are being taken?

5 Minutes to 35 Days. There is no performance impact and the backups are incremental.

In a DynamoDB table, each key value must be unique. Is this also true of a global secondary index?

No - GSI's can have duplicate values as an index. For instance, if we project 'GameID' as a partition key and 'timestamp' as a sort key, it is valid in a GSI to have multiple instances of the same GameID.

If we are using lambda to poll from a dynamoDB stream, what do we need to set up (assume the stream is already working). Is the interaction synchronous or asynchronous?

You need to set up an event source mapping to read from the stream. The interaction is synchronous.

Does DynamoDB use optimisitic or pessimistic locking?

Optimistic

Does DynamoDB support bursting throughput?

Yes, through the use of burst credits. If these credits are exceeded, you'll get a provisioned throughput exception.

How many local secondary indexes can we have on a table

If your application is making updates to DynamoDB and you also have DAX acting as a cache, do these writes go to DAX or DynamoDB?

DAX is a write though cache - the write will go to DAX and then to DynamoDB

Are the limits the same for BatchGetItem and Batch Putitem. If not, what are they for GetItem?

The values are different. BatchGetItem can get up to 100 items per batch (vs 25 for PutItem). Both have a limit of 16MB per batch.

Is it more efficient to use Query or Scan in DynamoDB. What the the impacts on RCU usage? Can Scans be paginated?

In general, Scan operations are less efficient than other operations in DynamoDB. A Scan operation always scans the entire table or secondary index. It then filters out values to provide the result you want, essentially adding the extra step of removing data from the result set. Scans will consume RCU's as they are returning the entire table. Scans can be paginated

In Dynamo DB, can you apply a filter or projection expression to a scan. Does this impact RCU usage?

You can apply both, but it does not reduce the number of RCU's as scan still reads based on a limit, pagination or default read size (1MB)

When creating an LSI in DynamoDB - does the partition key for the LSI have to be the same as the partition key for the master table?

Yes.

What is the maximum size of an item (record) in Dynamo DB?

400Kb

Can you set WCU and RCU values independently for DynamoDB?

Yes. Both WCU and RCU need to be set and they do not need to be the same value.

How many RCU do we need for 16 eventually consistent reads of 12KB/sec?

12 is a multiple of 4, so we don't need to round up so - | (16/2)*(12/4)=8*3=24RCU

How many WCU's would we need in DynamoDB to handler 6 writes per second of an item 4.5kb in size (careful)

First we round up to 5KB as we can only deal in whole KB - 6*5=30KB. 30KB/1=30 WCU

Does DynamoDB have a VPC endpoint?

yes

How are WCU's and RCU's spread between partition in DynamoDB

WCU and RCU's are spread EVENLY across all partitions in Dynamo

With respect to DynamoDB where would you use exponential back-off?

If you start getting provisioned throughput exception errors

I have a dynamoDB table containing data going back several years. Analysis has shown that data is not accessed once it passes the 6 month mark. What could I implement to manage table size and remove data that is older than 6 months? Are there any WCU/RCU penalties that we would need to consider and do I need to run it manually?

You could specify a TTL in DynamoDB to delete an item after a specific expiry date/time. TTL does not incur any cost in terms of WCU/RCU utilisation and DynamoDB operates TTL tasks itself.

How many RCU would I need for 10 strongly consistent reads at 6KB/Sec?

First, round up as we can only work in whole 4KB lots | (10/1)*(8/4)=20

DynamoDB streams consist of shards much like Kinesis. Do you need to provision these?

No, DynamoDB streams shards are provisioned automatically by AWS.

I have a dynamo DB table which services multiple users. If I have 2 users - user A and user B - user A increases a value of an item by 1, and user B increases the value of the item by 4 what would the value be increased by if we are using Atomic Writes?

In an atomic write, both updates will succeed so to value will be increased by 5

How do you copy a dynamoDB table (3 ways)?

1. Use AWS DataPipeline (which is backed by EMR) 2. Create a backup of the table and restore to a new table name 3. Scan the first table and then write to the other table (Bad-expensive in terms of RCU and WCU)

When we create a DynamoDB stream, we must also decide what data will be written to the stream - not in terms of individual attributes but in terms of item data pre/post change. There are 4 options available - what are they and what do they write to the stream? Can records be retroactively populated?

KEYS_ONLY: only the key attribute of the changed item NEW_IMAGE: The entire item as it appears after it was modified OLD_IMAGE: The entire item as it appeared before the update NEW_AND_OLD_IMAGES: the new and old images of the item. Records cannot be retroactively populated after you enable a stream.

If my primary key is a combination of partition key and sort key, then a) the partion key must be unique b) the partition key + sort key must be unique

B - value of partition key + sort key must be unique

What formula would you use to calculate RCU provisioning for strongly and eventually consistent reads?

Strong: (ReadsPerSecond/1) * (KBPerSec/4) Eventual: (ReadsPerSecond/2) * (KBPerSec/4) Where the 1 and 2 represent 1RCU for a strongly consistent read, and 2 RCU for eventually consistent

Which fields in a DynamoDB table CANNOT be null?

Partition Key and Sort Key. All other fields can be null

I have a situation in DynamoDB where I have multiple users accessing a table for writes to a stock pricing table. The application first gets the item and price listing and then allows a user to update it. We have 2 users accessing the system from different locations and they need to update the price of some heat gear compression leggings from $125 to $85. How would we ensure that one users change does not overwrite the other users change if both updates occur concurrently?

DynamoDb supports conditional writes for putitem and update item calls. In this instance, we could put a condition on the price - as we perform GetItem which returns $125 we can have a condition on UpdateItem which only allows an update to $85 if the value in DynamoDb is still $125 (FAQ)

My application needs to make a large number if individual PutItem calls into DynamoDB. Could I parallelize these calls?

It would be more efficient to batch the PutItem calls and then DynamoDB will parallelize its operations automagically.

What provides encryption at rest in DAX? What about Dynamo DB? For Dynamo, is it enabled by default? Can I enable or disable encryption at rest on a DAX cluster after its been created?

KMS for both. For dynamo it is enabled by default, for DAX it is not. You cannot enable or disable encryption at rest after a cluster has been created. You must re-create the cluster to enable encryption at rest if it was not enabled at creation.

For GetItem in DynamoDB - are reads strongly or eventually consistent? In terms of response time, which of these is potentially SLOWER?

Reads are Eventually Consistent with Strongly consistent reads potentially being slower.

If a conditional write expression in DynamoDB evaluates to false, do we still consume WCU?

Yes

What is the difference between PutItem and UpdateItem in DynamoDB

PutItem either creates a new item or replaces an existing item. UpdateItem updates a partial list of attributes.

For GetItem in DynamoDB, is the read based on the Primary Key of the table, or an attribute value?

GetItem reads are based on the tables primary key - either the hash, or the hash range (partition key or partition key + sortkey.

If You would like to perform an efficient Query on an attribute that is not part of your table's primary key. What do you recommend - A global or a local secondary index?

GSI - The key to the question is that the attribute is NOT a part of the existing key. For an LSI the index has the same partition key as the base table, but a different sort key. For a GSI we can create an index with a partition key and a sort key that can be DIFFERENT from those on the base table

How many WCU's would we need in Dynamo DB to handle 10 writes per second of an item 2KB in size?

20: 1 WCU = 1 write per second of 1KB so 10 items * 2KB=20Kb = 20Kb/1=20

If I use BatchWriteItem for either PutItem or DeleteItem - how may items can I update at any one time, what is the maximum size PER ITEM and what is the Maximum OVERALL amount of data that I can have in a batch (in MB)

25 Items per batch over all and: 400Kb/Item 16MB overall maximum

Is DAX MultiAZ? How many nodes can you have per DAX cluster

DAX is multi-AZ and the recommended config is spreading it across 3 AZ's/ You can have 10 DAX nodes/Cluster.

What 4 types of write can occur in Dynamo DB and what are the outcomes?

Concurrent Writes: last write wins Conditional: Write only succeeds if condition is met Atomic: Both writes succeed Batch: Writes submitted as a single batch

If you specify a TTL for DynamoDB, do you do this by row or by item? How long after the TTL has expired would we expect to see the item deleted, and if the TTL has expired but the item has not yet been deleted, is it still queryable? If an item is deleted via TTL, are these deletes cascaded to GSI and LSI's?

You configure a TTL by ROW. Items will be deleted within 48 hours and are still queryable after expiration but before deletion. Deletes are cascaded to LSI and GSI's.

For DynamoDB, what are the MINIMUM attributes that can be projected into a GSI?

The key attributes from the base table

When can GSI's and LSI's be created and deleted in DynamoDB

GSI's can be created and deleted at any time. | LSI's must be created when the table is created and can only be deleted on table delete.

For DynamoDB - what three capacity models can we use when provisioning our DynamoDB? (similar to EC2)

1. On Demand: Dynamo automatically scales to meet demand 2. Provisioned: Define throughput reqs. manually 3. Reserved: Purchase throughput capacity in advance at a discount.

How long does data persist in a dynamoDb Stream?

25 Hours

What AWS technology is used to respond to events in a DynamoDB Stream?

Lambda