NOSQL Database & DynamoDB Flashcards
DynamoDB
NoSQL Public Database-as-a-service(DBaas)-key/vale & document
Manual/Automatic provisioned performace in/out or on-demand
Really fast .. single-digit milisecond(ssd based)
What is the resiliency of DynamoDB?
highly resilient across AZ and optionally globally
What is the Capacity of a Dynamo Table and it’s units
Capacity is speed
(Writes) 1 WCU = 1KB per second
(Reads) 1 RCU = 4KB per second
DynamoDB Backups
On demand Backups
Point-in-time Recovery(PITR)
DynamoDB billing
billed based RCU ,WCU, storage and features
What the one Requirement for data entering DynamoDB ?
Has to have a unique simple(partition) or Composite (partition & Sort) Primary Key
Each item must have a unique value for PK and SK, Can have none ,all ,mixture or different attributes (DDB has no ridged attributes schema)
DynamoDB Query
Query accepts a single PK value and optionally a SK or range .Capacity consumed is the size of all returned items. Further filtering discards data-capacity is still consumed. Can Only query on PK or PK and SK.
Always beneficial to return more items because every read consumes as least 1RCU because the value read is always rounded up to 1RCU
DynamoDB capacity modes
on-Demand - you just pay for the operations on the table. unknown unpredictable, low admin.
provisioned - you have to set the capacity values on a per table basis. price per millions R or W units
more expensive - price per millions R or W units
DynamoDB Scan
Scan moves through a table consuming the capacity of every ITEM. You have complete control on what data is selected, any attributes can be used and any filters applied but scan consumes capacity for every item scanned through.
Most flexible but most expensive when it come to capacity
DynamoDB Consistency Model
Eventually consistent reads = check ½ nodes - could be unlucky with stale data if a node is checked before replication completes. 50% of the cost vs. strongly consistent.
Strongly consistent reads = connect to the leader node to get the most up-to-date copy of data
How would you calculate the WCU on your table if you need to store 10 items per second … with 2.5K average size per item?
calculate WCU per items .. round up (item size/1KB)(3)
Multiply by average number per second (30)
= WCU Required (30)
How would you calculate the RCU on your table if you need to store 10 items per second … with 2.5K average size per item? What if the capacity mode was eventually consistent ?
calculate RCU per item… round up (items size/4KB)(1)
Multiply by average read ops per second(10)
= strongly consistent RCU required(10)
DynamoDB Indexes
indexes are alternative views on table data
Different SK(LSI) or Different PK and SK(GSI)
some or all the attributes (projections)
Dynamo DB Local Secondary Indexes(LSI)
LSI is an alternative view for a table
Must be created with a table
5 LSI’s per base table
Shares the TCU and WCU with the table
Attributes - ALL, KEYS_ONLY & INCLUDE
DynamoDB Global Secondary Indexes(GSI)
can be created at any time
Default limit of 20 per base table
Alternative PK and SK
GSI’s have their own RCU and WCU allocations
Draw back of Global Secondary Indexes(GSI)
GSI’s are always eventually consistent, replication between base and GSI is Asynchronous
When would you use GSI vs LSI on a Dynamo table ?
Use GSI’s as default , LSI only when strong consistency is required
Dynamo Stream
Time ordered list of items changes in a table
24-hour rolling window
enabled on a per table basis
records INSERTS , UPDATES , and DELETES
Difference view types influence what is in the stream
DynamoDB Triggers
ITEM changes generate an event
that even contains the data which changed
A action is taken using that data
AWS = Streams + Lambda
Reporting & analytics
aggregation, message or notifications
DynamoDB Stream view types
KEYS_ONLY → PK and SK
NEW_IMAGE → entire item after change
OLD_IMAGE → entire item before change
NEW_AND_OLD_IMAGES → old item and new item
DynamoDB Global Tables
Global table provides multi-maser cross-region replication
Tables are created in multiple regions and added to the same global table (becoming replica table)
DynamoDB Global Tables : Last writer Wins
a way for conflict resolution the most recent write wins if there are two competing writes on a table
DynamoDB Global Tables resiliency
Read and Writes can occur to any region
Generally sub-second replication between regions
DynamoDB Global Tables consistency
Only strongly consistent in the same region as writes everything else is eventually consitent
DynamoDB Accelerator (DAX)
is an in-memory cache designed specifically for DynamoDB.
Primary Node(writes) and replicas (read)
in -memory cache - scaling much faster reads , reduce cost
scale up and scales out (Bigger or More)
Amazon Athena
serverless interactive querying service
Ad-hoc queries on data-pay only data consumed
schema-on read Table like translation
ElastiCache
In-memory database high performance
Managed Redis or Memcached as service
Can be used to cache data - for Heavy Workloads with low Latency requirements
reduces database Workloads(expensive
can be used to store session data(stateles servers)
Requires application code changes
ElastiCache MemcacheD engine
Simple data structure
no replication
multiple Nodes(sharing)
No backups
Muti-threaded
ElastiCache Redis engine
Advanced structures
multi-AZ
replication(Scale Reads)
Backups & Restores
Transactions
RedShift Architectures
Petabyte-scale Data Warehouse
Online Analytic Processing OLAP(Column based) not Online transaction processing i.e OLTP(row/transaction)
Pay as you use similar structure to RDS
RedShift Benefits
Direct Query s3 using Redshift Spectrum
Direct Query other DBS using Federated Query
Integrates with AWS tooling such as Quick Sight
SQL-like interface JDBC/ODBC connections
RedShift resiliency
one AZ in a VPC
How does RedShift work
Leader Nodes - query input, planning and aggregation
compute Node - performing queries of data
RedShift Intergrations
VPC security , IAM permissions , KMS at rest Encryption , CW monitoring
RedShift Enhanced VPC Routing
By default Redshift uses public routes for traffic when communicating with
external services or any AWS services such as S3if you enable enhanced VPC routing then traffic is route based on your VPC
networking configuration This means it can be controlled by security groups, NACLS , and it can use custom DNS. IT will also require the use of VPC gateways that any other traffic requires.
What is a burst pool and how many and can you count on it for normal Workloads ?
Every table has a RCU and WCU burst pool (300 seconds)
if you ever deplete the pool you will get an error provision throughput exceeded and be throttled.
What is the max item size in a DynamoDB table ?
400 KB
DynamoDB Demand Backups
full back up of the table and remain until you delete them
DynamoDB Backups Point-in-time Recovery(PITR)
Not enabled by default set per table when enabled it allows continuous record of changes with 1 second granularity allows replay to any point in the window
DynamoDB Restores
can be same or cross region
with or without indexes
with adjusted Encryption settings.
Where is DAX Deployed ?
Deployed width a VPC
Does DAX Supports Write- through ? and if so what does it mean for the application.
What makes Athena different than Redshift or DynamoDB
original data never changed - remains on S3
schema translate data => relational -like when read
output can be sent to other services