DynamoDB Flashcards
What are some NoSQL characteristics?
NoSQL dbs are distributed
NoSQL dbs do NOT support join
NoSQL dbs do not poerform aggregations such as sum
NoSQL dbs scale horizontally
What are some nice features about DynamoDB?
Fully managed NoSQL dbms highly available with replications across 3 AZs
Distributed databade
Scales to massive workloads
Millions of requrests per second, trillions of rows, 100 TBs of storage
Fast and consistent in performance (low latency retrieval)
Integrated with IAM for sceurity, authorization, administration
Enables event-driven programming with Dynamo DB Streams
Low cost and auto-scaling capabilities
What are some Dynamo DB Table properties?
Each table has a primary key, that must be chosen at creation time
Each table can have an infinite number of items aka rows
Each item has attributes that are added over time and can be null
Maxiumum size of an item is 400 KB
What data types can a Dynamo DB item attribute have?
Scalar types: String, Number, Binary, Boolean, NULL
Document Type: List, Map
Set Types: String Set, Number Set, Binary Set
What options does Dynamo DB offer as Primary Keys?
Option 1:
Partition Key Only (HASH):
Partition Key must be unique for each item
Partition Key must be as diverse as possible to distribute the data
Option 2:
Partition Key + Sort Key:
The combination of the two must be unique
Data is grouped by partition key
Data is sorted after the partition key by the sort key
What are some feature of Dynamo DBs Provisioned Throughput?
Tables must have provisioned RCUs and WCUs
An option exists to auto-scale throughput on demand
Throughput can be exceeded temporarily with “burst credit”
However, after all “burst credt” is used up, a ProvisionedThroughputException is returned
it’s then advied to an exponential backup recovery
What is the formula of WCU?
One WCU corrsponds to 1 write per second for an item up to 1 kb in size
10 objects per second, 2 kb each => 10*2=20 WCUs
6 objects per second, 4.5 kb each => 6*5=30 WCUs
120 objects per minute, each 2 kb => (120/60)*2=4 WCUs
What types of Reads does Dynamo DB offer?
Eventually Consistent Read:
If we read just after a write, we could get an unexpected response due to replication
Strongly Consistent Read:
If we read just after a wrist, we will get the correct data
Default:
Eventually Consistent Read
but
GetItem, Query and Scan provide a ConsistentRead parameter that can be set to True
What is the formula for RCUs?
Depends on read option
One RCU equals:
2 Eventually Consistent Reads per second for a file up to 4 kb in size
1 Strongly Consistent Reads per second for a file up to 4 kb in size
10 Strongly Consistent Reads per second for file of 4 kb size => 10*4/4= 10 RCUs
16 Eventually Consistent Reads per second for a file of 12 kb each => (16/2)*(12/4)= 24 RCUs
10 Strongly Consistent Reads per second for file of 6 kb size => 10*ceil(6/4)= 20 RCU
Is Dynamo DB data divided into partitions?
Yes.
Partition Keys go through a hashing algorithm to know to which partition they belong to
WCUs and RCUs are spread evenly across partitions
To compute the number of partitions:
by capacity: (TOTAL RCU/3000) + (TOTAL WCU/3000)
by size: Total Size/10gb
Total Partitions CEIL(MAX(capacity, size))
What is Throttling in Dynamo DB?
ProvisionedThroughputExceededException is received if RCUs or WCUs are exceeded
Reasons:
Hot Keys: One partition key is being read too many times
Hot Partitions:
Very large items
SOlutions:
Exponential backoff if exception is encountered (already in SDK)
Distribute partition keys as much as possible
If RCU issue, we can use Dynamo DB (Accelerator) DAX
What ways can you write data to Dynamo DB?
PutItem: Consumes WCU - create data or full replace
UpdateItem: partial update of attributes - Can use and increase Atomic Counters
Conditional Writs: Distributed system can write same row at same time - write condition such that write or update has to fullfil it to write to the table - no performance impact - helps with concurrent acces to items
What ways can you delete data in Dynamo DB?
DeleteItem: delete individual? row - ability to perform conditional delete
DeleteTable: delete a whole table and all its items - quicker deletion than calling DeleteItem on all items
What way can you batch-write data to DynamoDB?
BatchWriteItem: up to 25 PutItem or DeleteItem in one call - up to 16 mb of data written - up to 400 kb of data per item
Batching allows to reduce latency by reducing the number of API calls done against Dynamo DB
Operations are done in parallel for better efficiency
In case a part of the batch fails we have to retry using exponential back-off algorithm - up to me to perfmorm
How to read data from a Dynamo DB table?
GetItem: read based on primary key - primary key = HASH or HASH-RANGE - by default eventually consistent read - option to use strongly consistent read which might take longer and consumes more RCU - ProjectExpression can be specified to include only specific attributes
BatchGetItem: up to 100 items - up to 16 mb of data - done in parallel to minimize latency
Query: returns items based on partition key (must be ‘=’ operator) - optional: sort key (=, >=,<=, , Between, Begin) - FilterExpression to furhter filter from the client side - returns up to 1 mb of data - can use LIMIT - can query an index, local secondary index and, global secondary index - pagination
Scan: scan entire table then filters - returns up to 1 mb of data - use pagination to keep on reading - consumes a lot of RCU - can use LIMIT - for better performance use parrallel scans (more RCUs more thourghput, multiple machines scan multiple partitions) - can use ProjectionExpression and FIlterExpression
What is a Local Secondary Index in Dynamo DB?
ALternate range key for a table - local to the hash key
up to five LSIs per table
the sort key consists of one scalar value
the attribute chosen has to a scalar string, number, or binary
LSI must be defined at creation time
What is a Global Secondary Index?
To speed up queries on non-key attributes use GSI
GSI = partition key + optional sort key
The index is a “new” table and we can project attributes on it
Partition and sort key of the original table are always protected (KEYS_ONLY)
Can specify extra attributes to project (INCLUDE)
Can use all attributes from the original table (ALL)
Must define RCU/WCU for GSI
Unlike LSI GSI can be modified/created
What is the connection between Indexes and Throttling?
If the writes to the GSI are throttled than the main table will be throttled as well, even though its WCUs are fine => CHoose GSI partition key and the WCUs carefully
LSI uses the same WCU and RCU as the main table
What is the Dynamo DB Concurrency model?
Dynamo DB has conditional updates/deletes
You can ensure an item hasn’t changed before altering it
that makes Dynamo DB an optimsitc locking/concurrency database
Versions of updates are updated, if version e.g. increased then update is denied
What is Dynamo DB DAX?
Seemless Caching for Dynamo DB - no application re-write
Writes go through DAX to Dynamo DB
Micro Second Latency for cached reads & writes
Solves the Hot Key problem (too many reads) - Prevents Throttling
5 minutes TTL of cached content
up to 10 noded in the DAX cluster
Multi AZ
Secure (Encryption at rest with KMS, VPC, IAM, CloudTrail)
DAX vs. ElastiCache
DAX:
Individual, query or scan caches are perfect for DAX, very quick
ElastiCache:
Is not better than DAX at what DAX does
however, good to cache aggregation results in
What is Dynamo DB Streams?
Changes (create, update, delete) can end up in a Dynamo DB stream
This stream can be read by Lambda or EC2, as triggers, and then could:
react to changes in realtime (emails) - Analytics - Create derivative tables/views - Insert into ElasticSearch
Streams can implemented cross region replication
Streams has 24 hours of data retention
Writes logs to CloudWatch Log Groups
How does Dynamo DB Streams work?
CHoose what will be written to the stream whenever the data is modified:
KEYS_ONLY: Only the key attributes of the modfied item
NEW_IMAGE: THe entire item after it was modified
OLD_IMAGE: The entire item before it was modified
NEW_AND_OLD_IMAGE: New and old images of the item
Dynamo DB Streams are made of shard, like Kinesis Data Stream
Records are not retroactively poplutaed after enabling Streams
Unlike Kinesis Data Stream, shards are provisioned by AWS with Streams
How to configure Streams with Lambda?
In lambda define an Event Source Mapping to read from a Dynamo DB Stream
The ESM is polling from the Dynamo DB Stream and Dynamo DB returns event batches to the ESM
After receiving a batch, lambda is invoked synchronously with the Event Batch
make sure Lambda has the required permissions AWS LambdaDynamoDBExeceutionRole
What is Dynamo DB TTL?
Time To Live
delete an item after a specific time, date
No extra costs
no impact on WCU RCU
Background task performed by DynmaoDB
helps reduce storage and manage table size over time
helps with regulations
is enabled per row, we define a TTL column and a epoch timestamp
expired items are usually deleted after 48 hours
items are also deleted in GSI and LSI
Streams can help recover deleted items
What are some good DynamoDB CLI commands to know?
–projection-expression: attributes to retrieve
–filter-expression: filter results (use also –expression-attribute-values)
CLI pagination options:
Optimization:
–page-size: full dataset is still received but each API call requests less data - avoids timeouts
Pagination:
–max-items: max number of results to return from the CLI, return NextToken
–starting token: specify the last received NextToken to keep on reading
What is DynamoDB Transactions?
Ability to create, update, delete multiple rows in multiple tables at once
All or nothing operation - either everything succeed or nothing
Transactional is new write and a new rread mode
Consumes 2x WCUs/RCUs
API names: TransactWriteItems/TransactGetItems
How to save Session State Cache in DynmaoDB?
vs. ElastiCache:
EC is in-memeory
DynamoDB is serverless (automatic scaling)
both are key-value stores
vs. EFS:
EFS must be attached to e.g. EC2 not lambda
vs EBS and Instance store:
thse two only store locally not shared
vs. S3:
S3 is higher latency, and made for bigger objects
What is DynamoDB Write Sharding?
Add a random suffix to the partition key to scake the writes across many shardes
What are all dynmaoDB write types?
Concurrent Writes:
First write will be overwritten by second write
Conditional Write:
Write is bound to condition, after first write, second might be declined because condition state changed after first write
Atomic Writes:
Includes an INCREASE_BY or DECREASE_BY in the request
both requests succeed in writing and aggregation
batch Writes:
Write many items at a time
How to use DynamoDB pattern with S3?
Large objects can be written to DynamoDB:
First it is uploaded to S3 and
the metadata is written to DynamoDB , eg location, id , key
how to search for those files?
After object is uploaded to S3 and lambda function writes the metadata to DynamoDB
what are DynamoDB Operations?
Table CleanUp:
Option 1: Scan + delete, slow consumes many RCus and WCus
Option 2: Drop teable, recreate table, fast, cheao, efficient
Copying a DynamoDB Table:
Option 1: Use AWS Pipeline & EMR
Option 2: Create a backup and restore the backup, can take some time
Option 3: Scan + Write => requires own code
DynamoDB Security and other features
Security:
VPC endpoints are available to access DynamoDB without internet
Access is controlled by IAM
Users identified by federation can temporary tokens to access Dynamo DB data, attached IAM role can include Condition with LeadingKeys attribute to limit access by primary keys and Attributed to limit retrievable columns
Encryption at rest using KMS
Encryption in transit using SSL/TLS
Backup and resote features available:
Point in time restore like RDS
no performace impact
Global Tables:
Multi region (option?), fuly replicated, high performance
Migration:
Amazon DMS can be used to migrate to DynmaoDB from e.g. MongoDB
A local DynamoDB instance can be used for developemt