NoSQL Databases and Dynamo DB Flashcards
What is Dynamo DB (DDB)?
NoSQL (non-relational), DBaaS product within AWS; typically used for Serverless or Web-Scale applications
DDB Specs:
→ No self-managed servers or infra (like Aurora and RDS which are DB services that sit on servers you manage); delivered XaaS
→ You can control performance/capacity manually (Provisioned), or have it done automatically (On-Demand) where the system will scale as needed
○ Adding capacity means adding more SPEED/PERFORMANCE
→ Highly resilient - spans across Multiple AZ’s
→ Does encryption of data at rest, and performs backups and point in time recovery of data
What is the base entity that makes up DDB?
What are the 2 key options called?
Tables
Primary Key & Sort Key
What is the Primary Key?
What is the Sort Key?
What is an Item?
Primary Key - uniquely identifies each item in a table
Sort Key - minor key that further identifies the item. Sort Keys can share a common Primary Key but there can be only ONE Primary Key per item.
Item - basically the unit that get’s written to DDB and input into a table. EX) day of the week+all associated information for that day, when looking at a Weather Table.
What are the 2 types of Backups in DDB?
On-Demand - full backup of the table that is retained until you manually remove the backup.
Point in Time - Performs a continuous record of changes over a 35-day recovery window period. This is applied on a table-by-table basis.
DDB Summary:
→ If the use case is NoSQL, then it’s likely DDB
→ Relational Data use case is NOT going to be DDB solution (that would be RDS or SQL)
→ If you see any mention of Key/Value and DDB is a possible answer, then it’s probably DDB
→ Access is via Console, API, or CLI (you cannot use SQL or any query language since the DB is not relational)
DDB Tables are broken into which 2 types of capacity units?
Read (RCU)
Write (WCU)
What are the 2 modes that a Table can be created in?
On Demand - Used when you have an unknown or unpredictable load on the table.
Provisioned - you set the capacity value. This is when you know exactly how much load and capacity you’ll need for a given table.
How many KB is 1 x RCU operation?
How many KB is 1 x WCU operation?
What is the minimum cost for any operation?
1 RCU = 4KB
1 WCU = 1KB
1 RCU & 1 WCU minimum for an operation always round up to the nearest RCU/WCU
What are the 2 options for Query operations when performing a query?
Query Ops
Scan Ops
What happens in a Query Operation?
What happens in a Scan Operation?
Query
- when a query is done you start with picking a SINGLE particular Partition Key Value
- The operation can return zero items, one item or multiple items but you still only pick one value for the partition key
- The capacity consumed is the total of the size of all returned items i.e how much it costs to READ all the items.
Scan
- less efficient but more flexible; you have complete control what data get selected and returned; like a filter. You don’t have to pick a single Partition and optional Sort Key
- caveat is that Scan consumes entire capacity of table - so while you only get 2 rows back, you still pay for all 5 if it’s a 5-row table for example
- it does this because it scans the entire table for the exact value(s) we’re looking for to then present it back
- very expensive from a Capacity perspective
What is Consistency?
The process of how newly updated data is read.
Is the data being read immediately the same data as what was put into the recent update? OR is it eventual, over a period of time the same data but not immediately?
What are the 2 types of Consistency models?
Eventual Consistency - easier to implement and scales better; data gets scaled out and a READ operation might not show the updated data instantaneously
Strong Consistency - essential in some types of apps but is harder and more costly to achieve - data is instantaneously updated
What is a Storage Leader Node in a Redshift ARCH?
A “Leader Node” will be selected between the Storage Nodes; this Leader Node is where WRITES occur (any change/update to the data set/table).
Once the Leader Node has the data written to it, it become “Consistent” .. once Consistent it then starts the replication process to the other nodes.
What is a DDB Stream?
A time ordered, list of changes that will be applied to a DDB table i.e any updates/deletes will get added to the stream to then be applied to the DDB Table
What are the 4 view types that a Stream can be configured with?
- Keys Only - only shows the Partition and/or Partition + Sort Keys
- New Image - shows state of the ITEM after the change
- Old Image - shows the old image type before the change occurred so you can then compare it to new ITEM to see what changed
- New and Old Images - shows both side by side