DynamoDB Flashcards
My application needs to make a large number if individual GetItem calls to DynamoDB DynamoDB table, but I only need to return a subset of attributes for each item. What can I do to reduce my bandwidth AND the number of API calls that I make to DynamoDB. Are there any side benefits to the approach?
You would use a combination of BatchGetItem to limit the number of API calls and projection-expression to select only the attributes you need to reduce bandwidth consumption. BatchGetItem will also allow DynamoDB to retrieve the items in the request in parallel so there is a performance benefit outside of reducting API calls and bandwidth usage.
I need to CLEAR a table of ALL data in DynamoDB. Should I use scan and item delete, or would I be better to drop the table and recreate? Why?
Scan and delete is slow, and expensive in terms of RCU. It is faster and cheaper to drop the table and recreate it.
Do you need to provision WCU for a Global Secondary Index in DynamoDB? Why/Why not? If you do, what value should you set your GSI WCU?
Yes. To avoid potential throttling, the provisioned write capacity for a global secondary index should be equal or greater than the write capacity of the base table because new updates write to both the base table and global secondary index.
In DynamoDB you are received a provisioned throughput exception during peak load several months ago. You are anticipating a spike in load next week. On analysis you see that several keys are getting read repeatedly. What technology could you use to alleviate the problem and what code changes would you need to make to your application?
DAX will cache reads from dynamo transparently with no need to update the application
I have a table of product data consisting of attributes including product description, stock status, inventory volume, price and size. Is it possible in DynamoDB to only return the inventory volume and product description or do I need to return the whole table and filter locally?
Yes, you can use projection-expression to return only the fields you need
aws dynamodb get-item \
–table-name ProductCatalog \
–key file://key.json \
–projection-expression “Description, InventoryVolume”
I am setting up a lambda function to read from a dynamoDB (or kinesis) stream. When coming setting the starting position for the function to read from, I get three options: Latest, Trim_Horizon, and At timestamp. What records in a stream would each of these process?
Latest: Process new records that get added to the stream
Trim_Horizon: Process all records in the stream
At Timestamp: Process records from a specific time
If you need to create a DynamoDB global table what do you need to enable in Dynamo first to allow this? In terms of global tables, how many replica tables of a global table can you have in each region? If I need to have a strongly consistent read/write then what limitations are there?
Need to enable DynamoDB streams as these enable Dynamo to generate a change log to replicate data across regions. You can only have one replica table per region for a global table. Strongly consistent reads/writes must be performed within the same region. You can only have eventual consistent r/w across regions
In Dynamo DB what two attributes make a composite key and what is the maximum size of EACH of the attributes
A composite key is made up of:
The Partition Key: 2KB
The SORT key: 1KB
When performing a Query against DynamoDB, which operators can be used to query against the partition key and the sort key:
=, <=, =>, > , < , Between, Begin
Which key is mandatory in a query and what is the maximum data that can be returned (MB)? Can results be pagineated?
You can only use = for the partition key. The sort key allows for all the operators. The query must include an expression for the partition key. Sort Key is optional. The most data that can be retrieved is 1MB or the number of items specified in the Limit. Results can be pagineated
Lets say that we have a DynamoDB database - but no matter what we do our data restricts us to having only a very limited number of partition keys. We need to avoid a hot partition as we will be doing a large number of writes to a small number of keys. What could you do (1 way) and how would you do it (2 similar ways)? (Hint: Dax won’t work as we are writing not reading)
You could use write sharding in dynamo DB by:
Suffixing the key value with a random number
Hashing the key value, performing a calculation on it and using that as the key
Each way will allow data to be spread over a partition. When using a random suffix, this can be difficult to read back out as its random and you may not know what it was. A repeatable calculation is more effective.
What are the TWO advantages to using Batching in DynamoDB for Puts and Deletes?
- Reduce the amount of API calls which lowers latency
2. Batching allows DynamoDB to process operations in parallel without needing you to update your code
How many AZ’s does DynamoDB replicate to?
3
I’ve enabled TTL deletion on DynamoDB, but have just realised that the value I have added is to short. Can I recover this data?
Assuming you have streams running, and you are within the 24hr retention window you could recover data from there
I have a table in DynamoDB which I need to clear out all data so I can repopulate it. Is it more efficient to iteratively call DeleteItem, or to use DeleteTable and recreate the table from scratch?
DeleteTable and then recreating is the more efficient mechanism.
How can you allow a PutItem, UpdateItem or DeleteItem to only succeed based on the value of an attribute? Is there a performance impact? Why would you need to do this?
Conditional Writes allow for an item to be updated only if an attribute is set to a particular value. If the attribute is not set to this value, the put/update/delete will fail. Conditional writes impose no performance overhead and are used to deal with concurrent access to an item.
Can we autoscale our WCU and RCU’s in DynamoDB
Yes, we can specify a min and max number of RCU’s and WCU’s for DynamoDB autoscaling and specify a target utilisation for when autoscaling kicks in.
When setting up a DynamoDB table, do we need to provision throughput? What units are these defined in and what are their values?
Yes, you need to provision read and write capacity units.
1WCU = 1Kb/Sec.
1 RCU = 1 strong or 2 eventually consistent reads of 4Kb/Sec
I have an application that uses DynamoDB. We are using DAX to cache key object data from Dynamo to speed up application throughput. A new requirement has come up however for our client to perform complex aggregation calculations on the data. Often, these calculations will be the same across multiple clients. Can I use DAX to help speed up these calculations?
No. What you would do would be to use DAX to continue caching the object data, but architect your application to store the aggregation calculations within Elasticache.
If you are sending a batch of PutItems to DynamoDB from your application, and some of these items fail - will DynamoDB automatically retry processing of these?
No, it will be upto you to attempt a retry on the batch. Retries should use exponential backoff.
Assume we have a table set up in DynamoDB to score leaderboard data for a mobile online gaming platform. Each gamer has a unique identifier, and each time they play a game that game is assigned a unique identifier. We also store the time the game started, how long the played for, their score and if they won or lost (Outcome). Currently we have a partition key for UserId and a sort key for GameID so we can query efficiently for user/game combination. If we wanted to also include WIN/LOSS how would we go about this? If I wanted to limit results to WIN/LOSS under the current architecture how would this be done?
Under the current set up, if we wanted to also limit our results to combinations of UserID, GameID and WIN for instance, DynamoDB would need to perform a SCAN, which is inefficient. What we would need to do would be to set up a LOCAL SECONDARY INDEX using the OutCome Attribute. This would allow us to include this in the query
Assume we have a table set up in DynamoDB to score leaderboard data for a mobile online gaming platform. Each gamer has a unique identifier, a handle and each time they play a game that game is assigned a unique identifier. We also store the time the game started, how long they played for, the name of the game,their score and if they won or lost (Outcome). Currently we have a partition key for UserId and a sort key for GameID so we can query efficiently for user/game combination. We have a new requirement to be able to query for the 10 latest played games, its name and the handle of the gamer who played. How could we achieve this?
We would create a Global Secondary Index which create a new table in DynamoDB. In this case our GSI would use GameID as a partition key, time stamp as a sort key and project the gamers handle and game name as attributes.
Do UpdateItem and PutItem calls in DynamoDB consume the same amount of WCU’s? Why, Why not?
Yes. DynamoDB considers the size of the item as it appears before and after the update. The provisioned throughput consumed reflects the larger of these item sizes. Even if you update just a subset of the item’s attributes, UpdateItem will still consume the full amount of provisioned throughput (the larger of the “before” and “after” item sizes).
Can I batch UpdateItem calls in DynamoDB?
No
For DynamoDB streams, how many times would an item appear as a result of a Create/Delete/Update event in DynamoDB, what order would they appear in and how long are they persisted for(hours)?
An item will appear once, in the order in which the change occurred in DynamoDB. Items persist for 24 hours.
If you are using a scan in DynamoDB, how can you improve its performance? What is the impact on RCU? How much data does a scan return in single operation?
Scans can be parallelized in DynamoDB and scan multiple partitions at the same time. There is a high RCU cost with parallel scans.
A Scan will return 1MB per operation by default unless its paginated or limited.
In a dynamoDB, what does the FilterExpression do?Does this allow me to filter the results any further within DynamoDB QUERY? (I.e. based on an attributes value). Are there impacts on read capacity when using a filter expression?
No. A filter expression is applied AFTER a Query finishes, but BEFORE the results are returned. Therefore, a Query consumes the same amount of read capacity, regardless of whether a filter expression is present.
For DynamoDB DAX, what is the:
- Default TTL for a cached Item
- The maximum number of nodes and do you need to provision these
- Is Dax Multi AZ?
TTL: 5 minutes
Up to 10 nodes per cluster which you need to provision
MultiAZ with 3 nodes minimum recommended for production
For DynamoDB, what time range does a point in time recovery allow you to restore to (min and max values)? Are point in time backups incremental or full and what performance impact is there when they are being taken?
5 Minutes to 35 Days. There is no performance impact and the backups are incremental.