DYNAMO DB Flashcards

1
Q

Dynamo DB Structure

A

Consists of Tables
Table: Primary Key and Sort Key with Values
Table has infinite number of rows

Each value added is called: Attribute
Max Size of Attribute/Item is 400KB

Data Types
Scalar Types: String, Number, Binary, Boolean, Null
Documents: list, Map
Set Types: String Set, Number Set, Binary Set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

DynamoDB: strongly Consistent/ Eventually Consistent

A

Eventually Consistent:
Reading after write will not always show write data. Takes time for replication

Strongly Consistent: Read after write gives right data

DEFAULT:
Always Eventually Consistant for:
GetItem, Query, Scan provide

ConsistentRead: Parameter can be set to true to allow strongly consistent read. RCU high.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

DynamoDB: RCU

A

One Read Capacity unit:
One Strong Consistent Read
Two Eventually Consistent Read

4KB per RCU

(KB total / 4 per) * (Time/seconds)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

DynamoDB Partitions

A

Amazon DynamoDB stores data in partitions. A partition is an allocation of storage for a table, backed by solid state drives (SSDs) and automatically replicated across multiple Availability Zones within an AWS Region. Partition management is handled entirely by DynamoDB—you never have to manage partitions yourself.

Partition Key goes through Hashing Algorithm to know which partition they go to

CALCULATE TOTAL:

Capacity: (Total RCU / 3000) + (Total WCU / 1000)
Size: Total Size / 10GB
Total Partitions: Celling(Max(capacity, size)

EXAM: WCU RCU Spread WCU AND RCU EVENLY BETWEEN PARTITIONS

100WCU 100RCU 10 Partitions, each gets 10.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

DynamoDB: Throttling

A

ProvisionedThroughputExceededExceptions

Reasons:
Hot Keys: One Partition Key is being read too many times
Popular items
Hot Partitions: popular items in one partition
Large Items: RCU and WCU depend on size too

Solutions:
Exponential Backoff, in SDK
Distribute Partition Key as much as possible, avoid hot parititions
RCU issues can be resolved by DAX

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

DynamoDB Writing Data:
Put Item
Update Item
Conditional Writes

A

Put Item: Write to DB, Create or full replace, WCU consumed

Update Item: Update to Data (Partial Update of attributes) Possible to use Atomic Counters and Increase them

Conditional Writes:
Accept a write / Update only if conditions are respected
Helps concurrent access to items
No performance impact
how to write when have issues with Concurrency: Multiple writes happening at same item at same time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

DynamoDB delete Data:
DeleteItem
Delete Table

A

DeleteItem:
Individual Row
Conditional Delete also

Delete Table
Dont for Whole Table, made for speed
Delete table and all items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

DynamoDB Batch Writes

BatchWriteItem

A

BatchWriteItem
Up to 25 PutItem and or DeleteItem in one call total
Up to 16MB of Data written
400KB of data per item

Batch allows:
lower latency, reduce API calls
Operations can be done parallel for efficiency
Part of batch can fail and be retried. (exponential back off)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

DynamoDB READ Data
GetItem
BatchGetItem

A
GetItem
Read Based on Primary Key
Primary Key: Hash or Hash Range
Eventually Consistent by default
Option for strong Consistent, more RCU
**ProjectionExpression: Specified to include specific attributes**

BatchGetItem
up to 100 Items
Up to 16MB of Data
Items retrieved in paralell to min latency, less api call.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

DynamoDB QUERY

A

Query:
Returns based on
Partition Key Value (Must be Exact, =)
Sort Key Value (=,=>,<=,>,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

DynamoDB SCAN / Parallel Scan

A
  1. Scan

Scan entire table and then filter out data (inefficient)
Returns up to 1 MB of data, Pagination to keep reading
Consumes ALOT OF RCU

Limit impact

  • Use Limit command , reduce return number
  • Reduce Size of Scan

Speedy method:
2. Paralell Scans

Multiple Instances, multiple partitions can be used at same time
Increases throughput and RCU consumed
Limit impact of parallel scans with: 
-Limit comamnd
-and reduced size
  1. ProjectionExpression + FilterExpression can be used to get specific items. NO CHANGE IN RCU
    * ProjectionExpression: Specified to include specific attributes**
    * Filterexpression
    : Further client side filtering*
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

LSI Local Secondary Index

A

Must be specified at creation of table
Alternate Range Key/ Sort key , local to the hash key

Sort Key: String /Number/ Binary
Consists of Exactly one scalar attribute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

GSI Global Secondary Index

A

new table with original main key.

Speed up Queries of Non-key Attributes
GSI= Partition Key + Optional Sort Key

Index is new table:

  1. partition key and sort key of original table are always projected (Keys_only)
  2. Specify Extra attributes to project (include)
  3. Use all attributes from main table (ALL)

Must define RCU /WCU for this index table

POSSIBLE TO ADD AND MODIFY GSI, UNLIKE LSI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

GSI LSI Throttle

A

GSI: WRITE ISSUES

  • When a GSI has insufficient read capacity, the base table isn’t affected.
  • When a GSI has insufficient write capacity, write operations won’t succeed on the base table or any of its GSIs.

Be sure that the provisioned write capacity for each GSI is equal to or greater than the provisioned write capacity of the base table. To modify the provisioned throughput of a GSI, use the UpdateTable operation. If automatic scaling is enabled on the base table, it’s a best practice to apply the same settings to the GSI. You can do this by choosing Apply same settings to global secondary indexes in the DynamoDB console. For more information, see Enabling DynamoDB Auto Scaling on Existing Tables.

Be sure that the GSI’s partition key distributes read and write operations as evenly as possible across partitions. This helps prevent hot partitions, which can lead to throttling. For more information, see Designing Partition Keys to Distribute Your Workload Evenly.

Writes on GSI throttle, then main table will be throttle
Even if WCU on main are fine
GSI partition needs to be chosen well
Assign WCU capacity carefully
GSI affects main table***

LSI:
Uses WCU and RSU of main table
No special throttle considerations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

WCU RCU LIMIT PER PARTITION

A

Each partition on a DynamoDB table is subject to a hard limit of 1,000 write capacity units and 3,000 read capacity units. If your workload is unevenly distributed across partitions, or if the workload relies on short periods of time with high usage (a burst of read or write activity), the table might be throttled.

DynamoDB adaptive capacity automatically boosts throughput capacity to high-traffic partitions. However, each partition is still subject to the hard limit. This means that adaptive capacity can’t solve larger issues with your table or partition design. To avoid hot partitions and throttling, optimize your table and partition structure.

Resolution
Before implementing one of the following solutions, use Amazon CloudWatch Contributor Insights to find the most accessed and throttled items in your table. Then, use the solutions that best fit your use case to resolve throttling.

Distribute read and write operations as evenly as possible across your table. A hot partition can degrade the overall performance of your table. For more information, see Designing Partition Keys to Distribute Your Workload Evenly.
Implement a caching solution. If your workload is mostly read access to static data, then query results can be delivered much faster if the data is in a well‑designed cache rather than in a database. DynamoDB Accelerator (DAX) is a caching service that offers fast in‑memory performance for your application. You can also use Amazon ElastiCache.
Implement error retries and exponential backoff. Exponential backoff can improve an application’s reliability by using progressively longer waits between retries. If you’re using an AWS SDK, this logic is built‑in. If you’re not using an AWS SDK, consider manually implementing exponential backoff. For more information, see Error Retries and Exponential Backoff in AWS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

DynamoDB Concurrency

A

Conditional update/Delete

Ensure item has not changed before altering
This makes DynamoDB Optimistic Locking/

Optimistic Locking=Concurrency

Updating/delete item may change version, if version is not same as before then conditional changes may not be applied because they are looking to do changes on the previous first version.

17
Q

DAX

A

Seamless Cache; no application rewrite is needed.

Writes go through DAX, if cached then microsecond latency will be used for reads

Solves Hot Key problem: Overly read sections may throttle requests, this will relieve the read heavy operations. Reads will hit cache rather than table.

DEFAULT
5 TTL Cache
10 Nodes in Cluster
Multi-AZ 3 nodes min recommended
Secure : KMS At rest, VPC IAM Cloudtrail.
18
Q

DAX VS ELASTICACHE

A
Elasticache
Memchad vs Redis
Redis is HA
For more advanced items:
Store aggregation results of computation of data, the results can be cached for future use.

Memchad uses multithreading .

DAX
for individual objects
Query/scan cache

19
Q

DynamoDB Streams overview

A

24 Hour data retention.

Track Changes like
Create/ Update/ Delete : this will load into stream

Stream read by AWS Lambda/ EC2 Instances Use case:

React to changes in real time: Welcome emails
Analytics
Create derivative tables: Read from table and process and update another table
Insert into Elasticsearch

-Cross Region replication: needs streams enabled to be used

20
Q
DynamoDB Streams
Keys_only
New_image
Old_Image
New_and_old_images
A

Keys_only- only key attributes of modded item
New_image - Only see new item as modded
Old_Image- Send what WAS* Modded
*New_and_old_images- See old and new version of item

*best for normal use, more expensive

21
Q

DynamoDB Streams

Structure/ what about previous records

A

SHARDS just like Kinesis
However not provisioned, automatically provisioned by AWS

Once enabled, previous records are not retroactively seen, ONLY FUTURE CHANGES!

22
Q

DynamoDB Streams+ Lambda

A

Requirements:

Even Source Mapping: AWS Lambda resource that reads from an event source and invokes a Lambda function.

  1. Event Source Mapping (internal to lambda) poll from DB stream
  2. Return Batch is retrieved from Stream
  3. Event source mapping invokes lambda with return batch.

**MAKE sure lambda function has permissions for DynamoDB streams

INVOCATIONS ARE SYNCHRONOUS

23
Q

DynamoDB TTL

A

TTL: Define a column, items inside automatically get deleted after expiry date / time

FREE , NO WCU RCU costs
Background task done by DynamoDB itself
Reduce Storage/ Manage table size/ adhere to regulatory issues

Defined per ROW**
Occurs within 48 hours of expiration**
Deleted items are deleted from indexes GSI/LSI**
Streams will create an event and can be used to recover items**

24
Q

DynamoDB CLI

  • -Projection-Expression
  • -Filter-expression

–Page-size

  • -max-items
  • -starting-token
A

–Projection-Expression:
Attributes to retrieve

–Filter-expression
Filter results at client

Pagination options
Optimization:
–Page-size: Full dataset is retrieved but each API call will request less data to avoid timeouts.
LESS ITEMS PER CALL, NORMALLY 1000, SMALLER MEANS LESS CHANCE OF TIMEOUT
STILL ALL DATA!

Pagination:
–max-item: Max number of items to be returned by CLI returns NEXTTOKEN which can be used to hold place in ordered list of items retrieved with max amount
–starting-token:
Placeholder that is a savestate to start a search on, once a max items command is used we may need results on the next set, this allows us to read through another specific set of items.