NoSQL Databases and Dynamo DB Flashcards

Question

Benefits of DDB Accelerator (DAX)?

Answer 1

→ Reduces the time for READ operations; very ideal for data that is Read over and over again at a high frequency (maybe during a peak season or something; → Read-heavy workloads; this would normally eat up a ton of RCU's but since the data is cached in memory, we avoid reading from DDB

Answer 2

Serverless querying service which allows for ad-hoc questions/queries on large amounts of data EX) Take data stored in S3 and perform ad hoc queries on that data.

Answer 3

By creating a Schema. This is called "Schema on Read" The Schema basically translates the S3 data into the view, ad-hoc, for how you want to view it

Answer 4

If data is stored in S3, and the data is structured/semi-structured/unstructured, and you need to perform ad-hoc queries, then Athena is the service you'll want to use. The "Read-on-Schema" feature allows the data to be queried in a relational-style way, using normal SQL-like queries; the data can be saved or sent to other AWS tools.

Answer 5

Managed in-memory caching service which allows Apps to scale to very high levels of performance.

Answer 6

** REVIEW ** Databases store data persistently on disks.. So no matter how fast the disks can spin, there is always going to be a performance limit or ceiling for the DB → Elasticache in comparison is a cache engine which holds data in MEMORY; it's a DB alternative that's used for apps that require a very high level of performance → Elasticache can be used with DDB and RDS, but DAX is managed in-memory caching that's specific to DDB ○ Much faster in terms of throughput and latency ○ Takes a lot of load off the application itself -- if you have thousands of users accessing the same data, they'll hit Elasticache instead of always going to the App and corresponding DB

Answer 7

○ Used to cache data for READ-HEAVY workloads ○ Workloads with LOW LATENCY requirements ○ Can be used to store session data for users of an application, which allows the application to then be STATELESS ** EX) Session data is loaded into Elasticache externally to application instances; this allows the app to be Fault Tolerant (step above HA where if components of the app fail, the user will not notice)

Answer 8

NO. → if you want to use Elasticache, you have to change the code on your application(s) that will be using the service -- you can't just start using the service, your app needs to "understand" the caching architecture that's provided. → The app needs to know to use the cache to check for data; if the data isn't there ("Cache Miss") then it needs to check the underlying DB (like Aurora DB or RDS) ("Cache Hit") → This functionality does NOT come for free

Answer 9

Redis - advanced data structures + HA + backups MemcacheD - simple data structures + no HA + no Backups

Answer 10

Redshift is a Pedabyte-sized Data Warehousing service. A Data Warehouse is a location where many different operational DB's from across your business can pump data into for long-term analysis and trending Redshift is designed for reporting/analytics, NOT operational usage. You do not perform WRITES directly to Redshift, only Reads.

Answer 11

OLAP, which is used for complex queries of historical data. RDS (and others) are OLTP, which is used for capturing, storing, and processing data in real time.

Answer 12

Feature which allows you to query data on S3 without actually loading any data in beforehand. Typically with Redshift, you'd load all the data in ahead of time before it's worked on. This feature avoids having to pre-load data from S3.

Answer 13

○ Redshift is provisioned on servers i.e it's not Serverless or used for ad-hoc queries (that would be Athena) ○ Uses a cluster architecture, which is a PRIVATE network; multiple nodes and high-speed networking between them § Runs in ONE AZ so not HA by design ○ Each cluster has a "Leader Node" which is the one you interact with § Anything outside of the cluster that interacts with Redshift, will interact with the Leader Node ○ Compute Node runs the actual queries and is instructed by the Leader Node; Leader Node manages the distribution of work § Compute nodes are divided up into slices where each slice is given a portion of the node's overall memory and disk space to process the workload assigned to it § Storage is also attached to the compute nodes

Answer 14

Feature turned on in Redshift to allow for customized networking requirements.

Answer 15

No - it operates out of a single AZ. If that AZ fails, the Redshift cluster fails.

Answer 16

Backup the data to S3 via Automatic or Manual snapshots. When it's a restored, a new RedShift cluster will be created, where you can pick the AZ and a different Region.

NoSQL Databases and Dynamo DB Flashcards

(40 cards)