DynamoDB Flashcards
What are some NoSQL characteristics?
NoSQL dbs are distributed
NoSQL dbs do NOT support join
NoSQL dbs do not poerform aggregations such as sum
NoSQL dbs scale horizontally
What are some nice features about DynamoDB?
Fully managed NoSQL dbms highly available with replications across 3 AZs
Distributed databade
Scales to massive workloads
Millions of requrests per second, trillions of rows, 100 TBs of storage
Fast and consistent in performance (low latency retrieval)
Integrated with IAM for sceurity, authorization, administration
Enables event-driven programming with Dynamo DB Streams
Low cost and auto-scaling capabilities
What are some Dynamo DB Table properties?
Each table has a primary key, that must be chosen at creation time
Each table can have an infinite number of items aka rows
Each item has attributes that are added over time and can be null
Maxiumum size of an item is 400 KB
What data types can a Dynamo DB item attribute have?
Scalar types: String, Number, Binary, Boolean, NULL
Document Type: List, Map
Set Types: String Set, Number Set, Binary Set
What options does Dynamo DB offer as Primary Keys?
Option 1:
Partition Key Only (HASH):
Partition Key must be unique for each item
Partition Key must be as diverse as possible to distribute the data
Option 2:
Partition Key + Sort Key:
The combination of the two must be unique
Data is grouped by partition key
Data is sorted after the partition key by the sort key
What are some feature of Dynamo DBs Provisioned Throughput?
Tables must have provisioned RCUs and WCUs
An option exists to auto-scale throughput on demand
Throughput can be exceeded temporarily with “burst credit”
However, after all “burst credt” is used up, a ProvisionedThroughputException is returned
it’s then advied to an exponential backup recovery
What is the formula of WCU?
One WCU corrsponds to 1 write per second for an item up to 1 kb in size
10 objects per second, 2 kb each => 10*2=20 WCUs
6 objects per second, 4.5 kb each => 6*5=30 WCUs
120 objects per minute, each 2 kb => (120/60)*2=4 WCUs
What types of Reads does Dynamo DB offer?
Eventually Consistent Read:
If we read just after a write, we could get an unexpected response due to replication
Strongly Consistent Read:
If we read just after a wrist, we will get the correct data
Default:
Eventually Consistent Read
but
GetItem, Query and Scan provide a ConsistentRead parameter that can be set to True
What is the formula for RCUs?
Depends on read option
One RCU equals:
2 Eventually Consistent Reads per second for a file up to 4 kb in size
1 Strongly Consistent Reads per second for a file up to 4 kb in size
10 Strongly Consistent Reads per second for file of 4 kb size => 10*4/4= 10 RCUs
16 Eventually Consistent Reads per second for a file of 12 kb each => (16/2)*(12/4)= 24 RCUs
10 Strongly Consistent Reads per second for file of 6 kb size => 10*ceil(6/4)= 20 RCU
Is Dynamo DB data divided into partitions?
Yes.
Partition Keys go through a hashing algorithm to know to which partition they belong to
WCUs and RCUs are spread evenly across partitions
To compute the number of partitions:
by capacity: (TOTAL RCU/3000) + (TOTAL WCU/3000)
by size: Total Size/10gb
Total Partitions CEIL(MAX(capacity, size))
What is Throttling in Dynamo DB?
ProvisionedThroughputExceededException is received if RCUs or WCUs are exceeded
Reasons:
Hot Keys: One partition key is being read too many times
Hot Partitions:
Very large items
SOlutions:
Exponential backoff if exception is encountered (already in SDK)
Distribute partition keys as much as possible
If RCU issue, we can use Dynamo DB (Accelerator) DAX
What ways can you write data to Dynamo DB?
PutItem: Consumes WCU - create data or full replace
UpdateItem: partial update of attributes - Can use and increase Atomic Counters
Conditional Writs: Distributed system can write same row at same time - write condition such that write or update has to fullfil it to write to the table - no performance impact - helps with concurrent acces to items
What ways can you delete data in Dynamo DB?
DeleteItem: delete individual? row - ability to perform conditional delete
DeleteTable: delete a whole table and all its items - quicker deletion than calling DeleteItem on all items
What way can you batch-write data to DynamoDB?
BatchWriteItem: up to 25 PutItem or DeleteItem in one call - up to 16 mb of data written - up to 400 kb of data per item
Batching allows to reduce latency by reducing the number of API calls done against Dynamo DB
Operations are done in parallel for better efficiency
In case a part of the batch fails we have to retry using exponential back-off algorithm - up to me to perfmorm
How to read data from a Dynamo DB table?
GetItem: read based on primary key - primary key = HASH or HASH-RANGE - by default eventually consistent read - option to use strongly consistent read which might take longer and consumes more RCU - ProjectExpression can be specified to include only specific attributes
BatchGetItem: up to 100 items - up to 16 mb of data - done in parallel to minimize latency
Query: returns items based on partition key (must be ‘=’ operator) - optional: sort key (=, >=,<=, , Between, Begin) - FilterExpression to furhter filter from the client side - returns up to 1 mb of data - can use LIMIT - can query an index, local secondary index and, global secondary index - pagination
Scan: scan entire table then filters - returns up to 1 mb of data - use pagination to keep on reading - consumes a lot of RCU - can use LIMIT - for better performance use parrallel scans (more RCUs more thourghput, multiple machines scan multiple partitions) - can use ProjectionExpression and FIlterExpression