Database Specialty - DynamoDB Flashcards
Tool for Backup and restore
PITR
Terminology DynamoDB
Tables, Items, Attributes, Primary Keys, Local Secondary Indexes, Global Secondary Indexes
Data Types in DynamoDB
Scalar, Set, Document
Important points Read Consistency
Strong, Eventual and Transacional
Points Write Consistency
Standard and Transacional
Modes Pricing Model
Provisioned and On-Demand Capacity
Types of caches in DAX
Item Cache and Query Cache
Scaling Options
Automatic, Provisioned, Global Replication, Burst Capacity, On-Demand Capacity
Amazon DynamoDB – Overview Points
- Non-relational Key-Value store
- Fully Managed, Serverless, NoSQL database in the cloud
- Fast, Flexible, Cost-effective, Fault Tolerant, Secure
- Multi-region, multi-master database (Global Tables)
- Backup and restore with PITR (Point-in-time Recovery)
- Single-digit millisecond performance at any scale
- In-memory caching with DAX (DynamoDB Accelerator, microsecond latency)
- Supports CRUD (Create/Read/Update/Delete) operations through APIs
- Supports transactions across multiple tables (ACID support)
- No direct analytical queries (No joins)
- Access patterns must be known ahead of time for efficient design and performance
DynamoDB Tables
- Tables are top-level entities
- No strict inter-table relationships (Independent Entities)
- You control performance at the table level
- Table items stored as JSON (DynamoDB-specific JSON)
- Primary keys are mandatory, rest of the schema is flexible
- Primary Key can be simple or composite
- Simple Key has a single attribute (=partition key or hash key)
- Composite Key has two attributes
(=partition/hash key + sort/range key) - Non-key attributes (including secondary key attributes) are
optional
Data Types in DynamoDB
- Scalar Types
- Exactly one value
- e.g. string, number, binary, boolean, and null
- Keys or index attributes only support string, number and binary scalar types
- Set Types
- Multiple scalar values
- e.g. string set, number set and binary set
- Document Types
- Complex structure with nested attributes
- e.g. list and map
AWS Global Infrastructure
- Has multiple AWS Regions across
the globe - Each region has one or more AZs
(Availability Zones) - Each AZ has one or more
facilities (= Data Centers) - DynamoDB automatically
replicates data between multiple
facilities within the AWS region - Near Real-time Replication
- AZs act as independent failure
domains
DynamoDB Consistency
- Read Consistency: strong consistency, eventual consistency, and transactional
- Write Consistency: standard and transactional
- Strong Consistency
- The most up-to-date data
- Must be requested explicitly
- Eventual Consistency
- May or may not reflect the latest copy of
data - Default consistency for all operations
- 50% cheaper than strong consistency
- May or may not reflect the latest copy of
- Transactional Reads and Writes
- For ACID support across one or more
tables within a single AWS account and
region - 2x the cost of strongly consistent reads
- 2x the cost of standard writes
- For ACID support across one or more
Strongly Consistent Read vs Eventually Consistent Read
- Eventually Consistent Read: If we read just
after a write, it’s possible we’ll get
unexpected response because of replication - Strongly Consistent Read: If we read just
after a write, we will get the correct data - By default: DynamoDB uses Eventually
Consistent Reads, but GetItem, Query &
Scan provide a “ConsistentRead” parameter
you can set to True
DynamoDB Pricing Model - Provisioned Capacity
- You pay for the capacity you provision
(= number of reads and writes per second) - You can use auto-scaling to adjust the
provisioned capacity - Uses Capacity Units: Read Capacity Units
(RCUs) and Write Capacity Units (WCUs) - Consumption beyond provisioned capacity may
result in throttling - Use Reserved Capacity for discounts over 1 or
3-year term contracts (you’re charged a one- time fee + an houtly fee per 100 RCUs and
WCUs
DynamoDB Pricing Model - Provisioned Capacity - On-Demand Capacity
On-Demand Capacity
* You pay per request (= number of read and
write requests your application makes)
- No need to provision capacity units
- DynamoDB instantly accommodates your
workloads as they ramp up or down - Uses Request Units: Read Request Units and
Write Request Units - Cannot use reserved capacity with On-Demand mode
DynamoDB Throughput - Provisioned Capaciy mode
- Uses Capacity Units
- 1 capacity unit = 1 request/sec
- RCUs (Read Capacity Units)
- In blocks of 4KB, last block always rounded up
- 1 strongly consistent table read/sec = 1 RCU
- 2 eventually consistent table reads/sec = 1 RCU
- 1 transactional read/sec = 2 RCUs
- WCUs (Write Capacity Units)
- In blocks of 1KB, last block always rounded up
- 1 table write/sec = 1 WCU
- 1 transactional write/sec = 2 WCUs
DynamoDB Throughput - On-Demand Capacity mode
- Uses Request Units
- Same as Capacity Units for calculation purposes
- Read Request Units
- In blocks of 4KB, last block always
rounded up - 1 strongly consistent table read request = 1 RRU
- 2 eventually consistent table read request = 1 RRU
- 1 transactional read request = 2 RRUs
- In blocks of 4KB, last block always
- Write Request Units
- In blocks of 1KB, last block always rounded up
- 1 table write request = 1 WRU
- 1 transactional write request = 2 WRUs
Provisioned Capacity - Points
- Typically used in production environment
- Use this when you have predictable traffic
- Consider using reserved capacity if you
have steady and predictable traffic for
cost savings - Can result in throttling when
consumption shoots up (use auto-scaling) - Tends to be cost-effective as compared
to the on-demand capacity mode
On-Demand Capacity Mode
- Typically used in dev/test environments
or for small applications - Use this when you have variable,
unpredictable traffic - Instantly accommodates up to 2x the
previous peak traffic on a table - Throttling can occur if you exceed 2x
the previous peak within 30 minutes - Recommended to space traffic growth
over at least 30 mins before driving
more than 2x
Example 1: Calculating Capacity Units
Calculate capacity units to read and write a 15KB item
- RCUs with strong consistency:
- 15KB/4KB = 3.75 => rounded up => 4 RCUs
- RCUs with eventual consistency:
- (1/2) x 4 RCUs = 2 RCUs
- RCUs for transactional read:
- 2 x 4 RCUs = 8 RCUs
- WCUs:
- 15KB/1KB = 15 WCUs
- WCUs for transactional write:
- 2 x 15 WCUs = 30 WCUs
Example 2: Calculating Capacity Units
Calculate capacity units to read and write a 1.5KB item
- RCUs with strong consistency:
- 1.5KB/4KB = 0.375 => rounded up => 1 RCU
- RCUs with eventual consistency:
- (1/2) x 1 RCUs = 0.5 RCU => rounded up = 1 RCU
- RCUs for transactional read: * 2 x 1 RCU = 2 RCUs
- WCUs: * 1.5KB/1KB = 1.5 => rounded up => 2 WCUs
- WCUs for transactional write:
- 2 x 2 WCUs = 4 WCUs
Example 3: Calculating Throughput
A DynamoDB table has provisioned capacity of 10
RCUs and 10 WCUs. Calculate the throughput that
your application can support:
- Read throughput with strong consistency = 4KB x 10 = 40KB/sec
- Read throughput (eventual) = 2 (40KB/sec) = 80KB/sec
- Transactional read throughput = (1/2) x (40KB/sec) = 20KB/sec
- Write throughput = 1KB x 10 = 10KB/sec
- Transactional write throughput = (1/2) x (10KB/sec) = 5KB/sec
DynamoDB Burst Capacity
- To provide for occasional bursts
or spikes - 5 minutes or 300 seconds of
unused read and write capacity - Can get consumed quickly
- Must not be relied upon
DynamoDB Adaptive Capacity
- Total provisioned capacity = 600 WCUs per sec
- Provisioned capacity per partition = 200 WCUs per sec
- Unused capacity = 200 WCUs per sec
- So the hot partition can consume these
unused
200 WCUs per sec above its allocated capacity - Consumption beyond this results in throttling
- For Non-uniform Workloads
- Works automatically and applied in real time
- No Guarantees
DynamoDB LSI (Local Secondary Index)
- Can define up to 5 LSIs
- Has same partition/hash key attribute as the
primary index of the table - Has different sort/range key than the primary index of the table
- Must have a sort/range key (=composite key)
- Indexed items must be ≤ 10 GB
- Can only be created at the time of creating the table and cannot be deleted later
DynamoDB GSI (Global Secondary Index)
- Can define up to 20 GSIs (soft limit)
- Can have the same or different partition/hash key then the table’s primary index
- Can have the same or different sort/range key then the table’s primary index
- Can omit sort/range key (=simple and
composite) - No size restrictions for indexed items
- Can be created or deleted at any time. Can
delete only one GSI at a time - Can query across partitions (over the entire table)
- Support only eventual consistency
- Has its own provisioned throughput
- Can only query projected attributes (attributes included in the index)
When to choose which index? Local Secondary Indexes
- When application needs same partition key
as the table - When you need to avoid additional costs
- When application needs strongly consistent index reads
When to choose which index? Global Secondary Indexes
- When application needs different or same
partition key as the table - When application needs finer throughput control
- When application only needs eventually
consistent index reads
DynamoDB Indexes and Throttling, LOCAL SECONDARY INDEXES
- Uses the WCU and RCU of the main
table - No special throttling considerations
DynamoDB Indexes and Throttling - Global Secondary Indexes
- If the writes are throttled on the GSI, then the main table will be throttled! (even if the WCU on the main tables are fine)
- Choose your GSI partition key carefully!
- Assign your WCU capacity carefully!
Simple design patterns with DynamoDB
- You can model different entity relationships like 1:1, 1:N, N:M
- Store players’ game states – 1:1 modeling, 1:N modeling
- user_id as PK, game_id as SK (1:N modeling)
- Players’ gaming history – 1:N modeling
- user_id as PK, game_ts as SK (1:N modeling)
- Gaming leaderboard – N:M modeling
- GSI with game_id as PK and score as SK
DynamoDB Write Sharding
- Imagine we have a voting application with two candidates, candidate A and candidate B.
- If we use a partition key of candidate_id, we will run into partitions issues, as we
only have two partitions - Solution: add a suffix (usually random suffix, sometimes calculated suffix)
Error and Exceptions in DynamoDB
- Common Exceptions
- Access Denied Exception
- Conditional Check Failed Exception
- Item Collection Size Limit Exceeded Exception
- Limit Exceeded Exception
- Resource In Use Exception
- Validation Exception
- Provisioned Throughput Exceeded Exception
- Error Retries
- Exponential Backoff
DynamoDB Partitions
- Store DynamoDB table data (physically)
- Each (physical) partition = 10GB SSD volume
- Not to be confused with table’s partition/hash key (which is a logical
partition) - One partition can store items with
multiple partition keys - A table can have multiple partitions
- Number of table partitions depend on
its size and provisioned capacity - Managed internally by DynamoDB
- Provisioned capacity is evenly distributed across table partitions
- Partitions once allocated, cannot be
deallocated (important!)
Calculating DynamoDB Partitions
1 partition = 1000 WCUs or 3000 RCUs (Maximum supported throughput per partition)
* 1 partition = 10GB of data
* No. of Partitions = Either the number of partitions based on throughput or the number of partitions based on size, whichever is higher
Partition Behavior Example (Scaling up Capacity)
- Provisioned Capacity: 500 RCUs and 500 WCUs
- Storage requirement < 10 GB
- Number of Partitions:
PT = ( 500 RCUs/3000 + 500 WCUs/1000)
= 0.67 => rounded up => 1 partition - Say, we scale up the provisioned capacity
- New Capacity: 1000 RCUs and 1000 WCUs
PT = ( 1000 RCUs/3000 + 1000 WCUs/1000)
= 1.33 => rounded up => 2 partitions
DynamoDB Scaling
- You can manually scale up provisioned capacity as and when needed
- You can only scale down up to 4 times in a day
- Additional one scale down if no scale downs in last 4 hours
- Effectively 9 scale downs per day
- Scaling affects partition behavior
- Any increase in partitions on scale up will not result in decrease on scale down (Important!)
- Partitions once allocated will not get deallocated later
DynamoDB Accelerator (DAX)
- In-Memory Caching, microsecond latency
- Sits between DynamoDB and Client Application (acts a proxy)
- Saves costs due to reduced read load on DynamoDB * Helps prevent hot partitions
- Minimal code changes required to add DAX to your existing DynamoDB app
- Supports only eventual consistency (strong consistency requests pass-through
to DynamoDB) - Not for write-heavy applications
- Runs inside the VPC
- Multi AZ (3 nodes minimum recommended for production)
- Secure (Encryption at rest with KMS, VPC, IAM, CloudTrail…)
DAX architecture
- DAX has two types of caches (internally)
- Item Cache
- Query Cache
- Item cache stores results of index reads (=GetItem and BatchGetItem)
- Default TTL of 5 min (specified while creating DAX cluster)
- When cache becomes full, older and less popular items get removed
- Query cache stores results of Query and Scan operations
- Default TTL of 5 min
- Updates to the Item cache or to the underlying DynamoDB
table do not invalidate the query cache. So, TTL value of the query cache should be chosen accordingly.
DAX Operations
- Only for item level operations
- Table level operations must be sent directly to DynamoDB
- Only for item level operations
- Table level operations must be sent directly to DynamoDB
- Write Operations use write-through approach
- Data is first written to DynamoDB and then to DAX, and write operation is considered as successful only if both writes are successful
- You can use write-around approach to bypass DAX, e.g. for writing large amount of data, you can write directly to DynamoDB (Item cache goes out of sync)
DAX Operations 2
- Only for item level operations ]
- Table level operations must be sent directly to DynamoDB
- Write Operations use write-through approach
- Data is first written to DynamoDB and then to DAX, and write operation is considered as successful only if both writes are successful
- You can use write
-around approach to bypass DAX,
e.g. for writing large amount of data, you can write directly to DynamoDB (Item cache goes out of sync) - For reads, if DAX has the data (=Cache hit), it’s simply returned without going through DynamoDB
Implementing DAX
- To implement DAX, we create a DAX Cluster * DAX Cluster consists of one or more nodes (up to 10 nodes per cluster)
- Each node is an instance of DAX
- One node is the master node or primary node
- Remaining nodes act as read replicas
- DAX internally handles load balancing between these nodes
- 3 nodes minimum recommended for production
Backup and Restore in DynamoDB
- Automatically encrypted, cataloged and easily discoverable
- Highly Scalable - create or retain as many backups for tables of any size
- Backup operations complete in seconds
- Backups are consistent within seconds across thousands of partitions
- No provisioned capacity consumption
- Does not affect table performance or availability
- Backups are preserved regardless of table deletion
Backup and Restore in DynamoDB v2
- Can backup within the same AWS region as the table
- Restores can be within same region or cross region
- Integrated with AWS Backup service (can create periodic backup plans)
- Periodic backups can be scheduled using Lambda and CloudWatch triggers
- Cannot overwrite an existing table during restore, restores can be done only to a new table (=new name)
- To retain the original table name, delete the existing table before running restore
- You can use IAM policies for access control
Backup and Restore in DynamoDB v3
- Restored table gets the same
provisioned RCUs/WCUs as the source table, as recorded at the time of backup - PITR RPO = 5 minutes approx.
- PITR RTO can be longer as restore operation creates a new table
Backup and Restore in DynamoDB v4
- What gets restored:
- Table data
- GSIs and LSIs (optional, you can choose)
- Encryption settings (you can change)
- Provisioned RCUs / WCUs (with values at
the time when backup was created) - Billing mode (with value at the time when backup was created)
- What you must manually set up on the restored table:
- Auto scaling policies, IAM policies
- CloudWatch metrics and alarms
- Stream and TTL settings
- Tags
Continuous Backups with PITR
- Restore table data to any second in
the last 35 days! - Priced per GB based on the table size
- If you disable PITR and re-enable it,
the 35 days clock gets reset - Works with unencrypted, encrypted
tables as well as global tables - Can be enabled on each local replica
of a global table - If you restore a table which is part of
global tables, the restored table will be an
independent table (won’t be a global table
anymore!) - Always restores data to a new table
- What cannot be restored
- Stream settings
- TTL options
- Autoscaling config
- PITR settings
- Alarms and tags
- All PITR API calls get logged in CloudTrail
DynamoDB Encryption
Server-side Encryption at Rest
* Enabled by default
* Uses KMS
* 256-bit AES Encryption
* Can use AWS owned CMK, AWS managed
CMK, or customer managed CMK
* Encrypts primary key, secondary indexes,
streams, global tables, backups and DAX clusters
- Encryption in transit
- Use VPC endpoints for applications
running in a VPC - Use TLS endpoints for encrypting data in
transit
- Use VPC endpoints for applications
DynamoDB Encryption Client
- For client-side encryption
- Added protection with encryption in-transit
- Results in end-to-end encryption
- Doesn’t encrypt the entire table
- Encrypts the attribute values, but not the attribute names
- Doesn’t encrypt values of the primary key attributes
- You can selectively encrypt other attribute values
- You can encrypt selected items in a table, or selected attribute values in some or all items
DynamoDB Streams
- 24 Hours time-ordered log of all table-write activity
- React to changes to DynamoDB tables in real time
- Can be read by AWS Lambda, EC2, ES, Kinesis…
- DynamoDB Streams are organized into
shards - Records are not retroactively populated
in a stream after enabling it - Simply enable streams from DynamoDB
console
DynamoDB Streams - supported views - Keys only
captures only the key attributes
of the changed item
DynamoDB Streams - supported views - New image
captures the entire item
after changes
DynamoDB Streams - supported views - Old image
captures the entire item
before changes
DynamoDB Streams - supported views - New and old images
captures the entire item
before and after changes