DYNAMO DB Flashcards
Dynamo DB Structure
Consists of Tables
Table: Primary Key and Sort Key with Values
Table has infinite number of rows
Each value added is called: Attribute
Max Size of Attribute/Item is 400KB
Data Types
Scalar Types: String, Number, Binary, Boolean, Null
Documents: list, Map
Set Types: String Set, Number Set, Binary Set
DynamoDB: strongly Consistent/ Eventually Consistent
Eventually Consistent:
Reading after write will not always show write data. Takes time for replication
Strongly Consistent: Read after write gives right data
DEFAULT:
Always Eventually Consistant for:
GetItem, Query, Scan provide
ConsistentRead: Parameter can be set to true to allow strongly consistent read. RCU high.
DynamoDB: RCU
One Read Capacity unit:
One Strong Consistent Read
Two Eventually Consistent Read
4KB per RCU
(KB total / 4 per) * (Time/seconds)
DynamoDB Partitions
Amazon DynamoDB stores data in partitions. A partition is an allocation of storage for a table, backed by solid state drives (SSDs) and automatically replicated across multiple Availability Zones within an AWS Region. Partition management is handled entirely by DynamoDB—you never have to manage partitions yourself.
Partition Key goes through Hashing Algorithm to know which partition they go to
CALCULATE TOTAL:
Capacity: (Total RCU / 3000) + (Total WCU / 1000)
Size: Total Size / 10GB
Total Partitions: Celling(Max(capacity, size)
EXAM: WCU RCU Spread WCU AND RCU EVENLY BETWEEN PARTITIONS
100WCU 100RCU 10 Partitions, each gets 10.
DynamoDB: Throttling
ProvisionedThroughputExceededExceptions
Reasons:
Hot Keys: One Partition Key is being read too many times
Popular items
Hot Partitions: popular items in one partition
Large Items: RCU and WCU depend on size too
Solutions:
Exponential Backoff, in SDK
Distribute Partition Key as much as possible, avoid hot parititions
RCU issues can be resolved by DAX
DynamoDB Writing Data:
Put Item
Update Item
Conditional Writes
Put Item: Write to DB, Create or full replace, WCU consumed
Update Item: Update to Data (Partial Update of attributes) Possible to use Atomic Counters and Increase them
Conditional Writes:
Accept a write / Update only if conditions are respected
Helps concurrent access to items
No performance impact
how to write when have issues with Concurrency: Multiple writes happening at same item at same time
DynamoDB delete Data:
DeleteItem
Delete Table
DeleteItem:
Individual Row
Conditional Delete also
Delete Table
Dont for Whole Table, made for speed
Delete table and all items
DynamoDB Batch Writes
BatchWriteItem
BatchWriteItem
Up to 25 PutItem and or DeleteItem in one call total
Up to 16MB of Data written
400KB of data per item
Batch allows:
lower latency, reduce API calls
Operations can be done parallel for efficiency
Part of batch can fail and be retried. (exponential back off)
DynamoDB READ Data
GetItem
BatchGetItem
GetItem Read Based on Primary Key Primary Key: Hash or Hash Range Eventually Consistent by default Option for strong Consistent, more RCU **ProjectionExpression: Specified to include specific attributes**
BatchGetItem
up to 100 Items
Up to 16MB of Data
Items retrieved in paralell to min latency, less api call.
DynamoDB QUERY
Query:
Returns based on
Partition Key Value (Must be Exact, =)
Sort Key Value (=,=>,<=,>,
DynamoDB SCAN / Parallel Scan
- Scan
Scan entire table and then filter out data (inefficient)
Returns up to 1 MB of data, Pagination to keep reading
Consumes ALOT OF RCU
Limit impact
- Use Limit command , reduce return number
- Reduce Size of Scan
Speedy method:
2. Paralell Scans
Multiple Instances, multiple partitions can be used at same time Increases throughput and RCU consumed Limit impact of parallel scans with: -Limit comamnd -and reduced size
- ProjectionExpression + FilterExpression can be used to get specific items. NO CHANGE IN RCU
* ProjectionExpression: Specified to include specific attributes**
* Filterexpression: Further client side filtering*
LSI Local Secondary Index
Must be specified at creation of table
Alternate Range Key/ Sort key , local to the hash key
Sort Key: String /Number/ Binary
Consists of Exactly one scalar attribute
GSI Global Secondary Index
new table with original main key.
Speed up Queries of Non-key Attributes
GSI= Partition Key + Optional Sort Key
Index is new table:
- partition key and sort key of original table are always projected (Keys_only)
- Specify Extra attributes to project (include)
- Use all attributes from main table (ALL)
Must define RCU /WCU for this index table
POSSIBLE TO ADD AND MODIFY GSI, UNLIKE LSI
GSI LSI Throttle
GSI: WRITE ISSUES
- When a GSI has insufficient read capacity, the base table isn’t affected.
- When a GSI has insufficient write capacity, write operations won’t succeed on the base table or any of its GSIs.
Be sure that the provisioned write capacity for each GSI is equal to or greater than the provisioned write capacity of the base table. To modify the provisioned throughput of a GSI, use the UpdateTable operation. If automatic scaling is enabled on the base table, it’s a best practice to apply the same settings to the GSI. You can do this by choosing Apply same settings to global secondary indexes in the DynamoDB console. For more information, see Enabling DynamoDB Auto Scaling on Existing Tables.
Be sure that the GSI’s partition key distributes read and write operations as evenly as possible across partitions. This helps prevent hot partitions, which can lead to throttling. For more information, see Designing Partition Keys to Distribute Your Workload Evenly.
Writes on GSI throttle, then main table will be throttle Even if WCU on main are fine GSI partition needs to be chosen well Assign WCU capacity carefully GSI affects main table***
LSI:
Uses WCU and RSU of main table
No special throttle considerations
WCU RCU LIMIT PER PARTITION
Each partition on a DynamoDB table is subject to a hard limit of 1,000 write capacity units and 3,000 read capacity units. If your workload is unevenly distributed across partitions, or if the workload relies on short periods of time with high usage (a burst of read or write activity), the table might be throttled.
DynamoDB adaptive capacity automatically boosts throughput capacity to high-traffic partitions. However, each partition is still subject to the hard limit. This means that adaptive capacity can’t solve larger issues with your table or partition design. To avoid hot partitions and throttling, optimize your table and partition structure.
Resolution
Before implementing one of the following solutions, use Amazon CloudWatch Contributor Insights to find the most accessed and throttled items in your table. Then, use the solutions that best fit your use case to resolve throttling.
Distribute read and write operations as evenly as possible across your table. A hot partition can degrade the overall performance of your table. For more information, see Designing Partition Keys to Distribute Your Workload Evenly.
Implement a caching solution. If your workload is mostly read access to static data, then query results can be delivered much faster if the data is in a well‑designed cache rather than in a database. DynamoDB Accelerator (DAX) is a caching service that offers fast in‑memory performance for your application. You can also use Amazon ElastiCache.
Implement error retries and exponential backoff. Exponential backoff can improve an application’s reliability by using progressively longer waits between retries. If you’re using an AWS SDK, this logic is built‑in. If you’re not using an AWS SDK, consider manually implementing exponential backoff. For more information, see Error Retries and Exponential Backoff in AWS.
DynamoDB Concurrency
Conditional update/Delete
Ensure item has not changed before altering
This makes DynamoDB Optimistic Locking/
Optimistic Locking=Concurrency
Updating/delete item may change version, if version is not same as before then conditional changes may not be applied because they are looking to do changes on the previous first version.
DAX
Seamless Cache; no application rewrite is needed.
Writes go through DAX, if cached then microsecond latency will be used for reads
Solves Hot Key problem: Overly read sections may throttle requests, this will relieve the read heavy operations. Reads will hit cache rather than table.
DEFAULT 5 TTL Cache 10 Nodes in Cluster Multi-AZ 3 nodes min recommended Secure : KMS At rest, VPC IAM Cloudtrail.
DAX VS ELASTICACHE
Elasticache Memchad vs Redis Redis is HA For more advanced items: Store aggregation results of computation of data, the results can be cached for future use.
Memchad uses multithreading .
DAX
for individual objects
Query/scan cache
DynamoDB Streams overview
24 Hour data retention.
Track Changes like
Create/ Update/ Delete : this will load into stream
Stream read by AWS Lambda/ EC2 Instances Use case:
React to changes in real time: Welcome emails
Analytics
Create derivative tables: Read from table and process and update another table
Insert into Elasticsearch
-Cross Region replication: needs streams enabled to be used
DynamoDB Streams Keys_only New_image Old_Image New_and_old_images
Keys_only- only key attributes of modded item
New_image - Only see new item as modded
Old_Image- Send what WAS* Modded
*New_and_old_images- See old and new version of item
*best for normal use, more expensive
DynamoDB Streams
Structure/ what about previous records
SHARDS just like Kinesis
However not provisioned, automatically provisioned by AWS
Once enabled, previous records are not retroactively seen, ONLY FUTURE CHANGES!
DynamoDB Streams+ Lambda
Requirements:
Even Source Mapping: AWS Lambda resource that reads from an event source and invokes a Lambda function.
- Event Source Mapping (internal to lambda) poll from DB stream
- Return Batch is retrieved from Stream
- Event source mapping invokes lambda with return batch.
**MAKE sure lambda function has permissions for DynamoDB streams
INVOCATIONS ARE SYNCHRONOUS
DynamoDB TTL
TTL: Define a column, items inside automatically get deleted after expiry date / time
FREE , NO WCU RCU costs
Background task done by DynamoDB itself
Reduce Storage/ Manage table size/ adhere to regulatory issues
Defined per ROW**
Occurs within 48 hours of expiration**
Deleted items are deleted from indexes GSI/LSI**
Streams will create an event and can be used to recover items**
DynamoDB CLI
- -Projection-Expression
- -Filter-expression
–Page-size
- -max-items
- -starting-token
–Projection-Expression:
Attributes to retrieve
–Filter-expression
Filter results at client
Pagination options
Optimization:
–Page-size: Full dataset is retrieved but each API call will request less data to avoid timeouts.
LESS ITEMS PER CALL, NORMALLY 1000, SMALLER MEANS LESS CHANCE OF TIMEOUT
STILL ALL DATA!
Pagination:
–max-item: Max number of items to be returned by CLI returns NEXTTOKEN which can be used to hold place in ordered list of items retrieved with max amount
–starting-token:
Placeholder that is a savestate to start a search on, once a max items command is used we may need results on the next set, this allows us to read through another specific set of items.