NoSQL Databases and Dynamo DB Flashcards
What is Dynamo DB (DDB)?
NoSQL (non-relational), DBaaS product within AWS; typically used for Serverless or Web-Scale applications
DDB Specs:
→ No self-managed servers or infra (like Aurora and RDS which are DB services that sit on servers you manage); delivered XaaS
→ You can control performance/capacity manually (Provisioned), or have it done automatically (On-Demand) where the system will scale as needed
○ Adding capacity means adding more SPEED/PERFORMANCE
→ Highly resilient - spans across Multiple AZ’s
→ Does encryption of data at rest, and performs backups and point in time recovery of data
What is the base entity that makes up DDB?
What are the 2 key options called?
Tables
Primary Key & Sort Key
What is the Primary Key?
What is the Sort Key?
What is an Item?
Primary Key - uniquely identifies each item in a table
Sort Key - minor key that further identifies the item. Sort Keys can share a common Primary Key but there can be only ONE Primary Key per item.
Item - basically the unit that get’s written to DDB and input into a table. EX) day of the week+all associated information for that day, when looking at a Weather Table.
What are the 2 types of Backups in DDB?
On-Demand - full backup of the table that is retained until you manually remove the backup.
Point in Time - Performs a continuous record of changes over a 35-day recovery window period. This is applied on a table-by-table basis.
DDB Summary:
→ If the use case is NoSQL, then it’s likely DDB
→ Relational Data use case is NOT going to be DDB solution (that would be RDS or SQL)
→ If you see any mention of Key/Value and DDB is a possible answer, then it’s probably DDB
→ Access is via Console, API, or CLI (you cannot use SQL or any query language since the DB is not relational)
DDB Tables are broken into which 2 types of capacity units?
Read (RCU)
Write (WCU)
What are the 2 modes that a Table can be created in?
On Demand - Used when you have an unknown or unpredictable load on the table.
Provisioned - you set the capacity value. This is when you know exactly how much load and capacity you’ll need for a given table.
How many KB is 1 x RCU operation?
How many KB is 1 x WCU operation?
What is the minimum cost for any operation?
1 RCU = 4KB
1 WCU = 1KB
1 RCU & 1 WCU minimum for an operation always round up to the nearest RCU/WCU
What are the 2 options for Query operations when performing a query?
Query Ops
Scan Ops
What happens in a Query Operation?
What happens in a Scan Operation?
Query
- when a query is done you start with picking a SINGLE particular Partition Key Value
- The operation can return zero items, one item or multiple items but you still only pick one value for the partition key
- The capacity consumed is the total of the size of all returned items i.e how much it costs to READ all the items.
Scan
- less efficient but more flexible; you have complete control what data get selected and returned; like a filter. You don’t have to pick a single Partition and optional Sort Key
- caveat is that Scan consumes entire capacity of table - so while you only get 2 rows back, you still pay for all 5 if it’s a 5-row table for example
- it does this because it scans the entire table for the exact value(s) we’re looking for to then present it back
- very expensive from a Capacity perspective
What is Consistency?
The process of how newly updated data is read.
Is the data being read immediately the same data as what was put into the recent update? OR is it eventual, over a period of time the same data but not immediately?
What are the 2 types of Consistency models?
Eventual Consistency - easier to implement and scales better; data gets scaled out and a READ operation might not show the updated data instantaneously
Strong Consistency - essential in some types of apps but is harder and more costly to achieve - data is instantaneously updated
What is a Storage Leader Node in a Redshift ARCH?
A “Leader Node” will be selected between the Storage Nodes; this Leader Node is where WRITES occur (any change/update to the data set/table).
Once the Leader Node has the data written to it, it become “Consistent” .. once Consistent it then starts the replication process to the other nodes.
What is a DDB Stream?
A time ordered, list of changes that will be applied to a DDB table i.e any updates/deletes will get added to the stream to then be applied to the DDB Table
What are the 4 view types that a Stream can be configured with?
- Keys Only - only shows the Partition and/or Partition + Sort Keys
- New Image - shows state of the ITEM after the change
- Old Image - shows the old image type before the change occurred so you can then compare it to new ITEM to see what changed
- New and Old Images - shows both side by side
What is a DDB Trigger?
An ITEM change within a Table, input into a Stream, will generate an event which can result in a corresponding action by Lambda.
DDB Trigger = Streams + Lambda
DDB Trigger/Trigger Architecture Summary:
→ You use streams + Lambda so that a lambda function is invoked whenever specified for changes occur to a DDB table
→ The Trigger is the compute action that occurs based on the data change
→ Using Streams and Triggers allows you to respond to an event as it happens, and only consume the minimum amount of compute required to perform the action
→ We use Streams and Lambda together to implement a “Trigger ARCH” for DDB
→ Lambda is the compute piece (like Compute as a Service) that handles the action once it is triggered
What is a DDB Index?
A way to improve the efficiency of retrieval operations within DDB.
Indexes are basically an alternative view on the table data to enhance Query Operations; when you perform a query the data will come back in an alternative view.
This helps you avoid using a Scan Operation which eats the RCU’s in total for the whole table; different teams within the Org might want different view depending on their job role.
What are the 2 types of Indexes in DDB?
What are the main aspects of each?
→ Local Secondary Indexes (LSI) - allows you to view the table with a different Sort Key
- created @ the time a Table is initially created
- up to 5 LSI’s per table
- ** uses Shared Capacity settings of the Table
→ Global Secondary Indexes (GSI) - allows you to view the table with a different Partition Key and Sort Key; can be created @ any time
- up to 20 GSI’s per table
- ** uses its own Capacity settings for the Table
Index Considerations:
→ Use GSI’s by default; only use LSI’s when strong consistency is required
○ GSI’s are a lot more flexible and can be created after the point that a base table is created
→ Use indexes for alternative access patterns
○ When you create a Base Table - you choose the Partition and Sort Keys ahead of time for the primary way you will view and access the data in the table
○ Indexes offer an alternative perspective to that for any alternative access patterns
→ EX) a different team might be interested in different attributes; all data is kept in the same place, but can be access from various perspectives that are more relevant to different teams
What is a DDB Global Table?
Feature that provides multi-master global replication of DynamoDB tables which can be used for performance, HA or DR/BC reasons
All tables are the same i.e there is no Primary and Secondary tables – each table has full Read/Write replication.
What is DDB Accelerator (DAX)?
An in-memory cache designed specifically for DynamoDB which greatly improves the performance of DDB.
** It should be your default choice for any DynamoDB caching related questions **
** Supports WRITE-THROUGH and READ-CACHING **
How is DAX deployed?
DAX is a fully managed, in-memory caching cluster that sits inside a VPC and has direct access to DDB.
There is also a piece of SW (the DAX SDK) gets installed directly onto an application.
Instead of the app having to re-send a request to DDB for data after checking the in-memory cache (if the cache doesn’t have it), DAX handles that for the application. DAX either returns the data from it’s cache or fetches it + caches it from DDB on behalf of the App