Databases Flashcards

Question

What are Aurora global databases?

Answer 1

Aurora global databases are a feature of Aurora Provisioned clusters which allow data to be replicated globally providing significant RPO and RTO improvements for BC and DR planning. Additionally, global databases can provide performance improvements for customers .. with data being located closer to them, in a read-only form. =<1s replication between regions.

Answer 2

An Aurora DB cluster can have up to 15 read replicas. The read replicas can be distributed across the Availability Zones that a DB cluster spans within an AWS Region.

Answer 3

Aurora Backtrack is a feature that allows you to revert your database to a previous point in time. This can be useful if you accidentally make a change to your database that you need to undo.

Answer 4

Aurora Parallel Query is a feature that allows you to run queries in parallel across multiple Aurora replicas. This can significantly improve the performance of your database for read-heavy workloads. Aurora Parallel Query works by breaking down your query into smaller subqueries and executing them in parallel across multiple Aurora replicas. The results of the subqueries are then merged together to produce the final result of your query. Aurora Parallel Query is enabled by default for all Aurora databases. You can disable it if you need to, but it is generally recommended to keep it enabled for performance reasons.

Answer 5

Aurora Database Activity Streams is a feature that allows you to stream database activity events to Amazon Kinesis Data Streams. This allows you to analyze database activity in real-time and build applications that react to database changes.

Answer 6

An Aurora cluster volume is a virtual storage volume that is shared by all of the instances in an Aurora database cluster. It is similar to a traditional EBS volume, but it is optimized for Aurora databases.

Answer 7

Multi-master write is a mode of Aurora Provisioned Clusters that allows multiple instances to perform reads and writes at the same time - rather than only one primary instance having write capability in a single-master cluster. Highly improves **fault tolerance**. Load balancing is handled by the application not by the cluster.

Answer 8

Is a fully managed, highly available database proxy for Amazon Relational Database Service (RDS) that makes applications more scalable, more resilient to database failures, and more secure. Keeps a connection pooling. Ideal when there are "too many connection errors" especially in smaller instances. Increases fault tolerance(by 60% compared to Aurora). Only accessible within a VPC.

Answer 9

Consolidating multiple individual connections into a single connection, reducing the overhead associated with connection establishment and management, and optimizing resource utilization. It can improve database efficiency and performance. Ideal for lambda because every function wants to open a new connection..

Answer 10

1. Full Load Migration: Transfers all data from the source to the target database. 2. CDC (Change Data Capture) Migration: Captures and replicates real-time changes to keep source and target databases synchronized. 3. Full Load + CDC Migration: Combines the initial data transfer and continuous change replication. For OLTP and OLAP. not for noSQL.

Answer 11

AWS SCT (Schema Conversion Tool) is a service that helps convert database schemas and code from one database engine to another. It's typically used when migrating from one database platform (e.g., Oracle, SQL Server) to another (e.g., Amazon RDS, Amazon Aurora) to ensure compatibility and efficient data migration. Can also be used with large multi-TB DBs with Snowball(because it would be costly through a connection).

Answer 12

In Amazon DynamoDB, Write Capacity Units (WCUs) and Read Capacity Units (RCUs) are measures of the throughput provisioned for write and read operations, respectively. These units represent the capacity allocated for handling write and read requests on a DynamoDB table. Provisioning WCUs and RCUs allows you to control the performance of your table based on anticipated workloads. - 1 RCU = 1 x 4KB read operation per second for strongly consistent reads(For eventually consistent reads is 50% ) - 1 WCU = 1 x 1KB write operation per second - Every table has an RCU and WCU burst pool (300 seconds)

Answer 13

- **Query:** - Query is used to retrieve items from a DynamoDB table based on the values of the primary key(PK or PK SK). - It's more efficient for retrieving a specific set of items. - You can specify conditions on the key attributes to filter the results. - **Scan:** - Scan reads every item in the table and returns the entire content. - It's less efficient than Query as it reads every item, making it slower and more resource-intensive. - Useful when you need to scan the entire table or apply non-key attribute filters.

Answer 14

- A Local Secondary Index (LSI) is an additional index that can be created on a DynamoDB table during table creation **only**. - LSIs have the same partition key as the base table but a different sort key, providing more querying flexibility within a specific partition key. - LSIs share the provisioned capacity (RCUs and WCUs) with the base table. - Up to 5 LSIs per table. The combined size of all LSIs must fit within the maximum table size limit (currently 10 GB). - They are sparse. They return items only if the sort key value exists in the item(So "scan" is more efficient because it acts only on meaningful data). - You can Attribute projections(Keys, all, or specific attributes

Answer 15

- A Global Secondary Index (GSI) is an additional index that can be created on a DynamoDB table(at any time) to provide flexible querying options. - Unlike Local Secondary Indexes (LSIs), GSIs have a different partition key and sort key than the base table, allowing broader querying capabilities. - GSIs have their own provisioned capacity (RCUs and WCUs), allowing independent scaling from the base table. - Up to 20 GSIs per table. The combined size of all LSIs must fit within the maximum table size limit (currently 10 GB). - They are sparse. They return items only if the sort key value exists in the item(So "scan" is more efficient because it acts only on meaningful data). - You can Attribute projections(Keys, all, or specific attributes - They are always eventually consistent. Suggested by AWS(compared to LSIs) if strong consistency is not needed.

Answer 16

- Streams in DynamoDB capture and stream modifications to items in a table in real-time. - There are four view types for stream records: 1. KEYS_ONLY: Records contain only the key attributes of the modified item. 2. NEW_IMAGE: Records include the entire item after it was modified. 3. OLD_IMAGE: Records include the entire item before it was modified. 4. NEW_AND_OLD_IMAGES: Records contain both the new and old images of the item.

Answer 17

- AWS Lambda can be integrated with DynamoDB Streams to provide event-driven functionality. - Lambda functions can be set up as triggers to automatically invoke when new entries are added to the DynamoDB Stream. - This enables seamless and real-time processing of changes to DynamoDB tables, allowing for custom business logic or additional processing to be executed in response to database modifications.

Answer 18

DynamoDB Accelerator (DAX) is an in-memory cache designed specifically for DynamoDB. - Primary Nodes (write) and replicas(read) - Nodes are HA. Primary failure = election. - Scale up and out. - DAX deployed within a VPC. - Eventually consistency.

Answer 19

Amazon DynamoDB Time to Live (TTL) allows you to define a per-item timestamp to determine when an item is no longer needed. Shortly after the date and time of the specified timestamp, DynamoDB deletes the item from your table without consuming any write throughput. TTL is provided at no extra cost as a means to reduce stored data volumes by retaining only the items that remain current for your workload’s needs

Answer 20

Amazon Redshift Spectrum is a feature of Amazon Redshift, a fully managed data warehouse service. Spectrum allows you to run queries on large datasets stored in Amazon Simple Storage Service (S3) directly from your Amazon Redshift cluster, without the need to load the data into the cluster. This enables you to analyze vast amounts of data in your S3 data lake with the power and flexibility of Amazon Redshift's query processing capabilities. Spectrum extends your data warehouse to query unstructured data in S3, providing a cost-effective solution for handling massive datasets.

Answer 21

Amazon Redshift Enhanced VPC Routing is a feature that allows Amazon Redshift clusters to route traffic to Amazon Simple Storage Service (S3) through a specified Amazon VPC (Virtual Private Cloud) endpoint, instead of using the public internet. By utilizing enhanced VPC routing, data transfers between Redshift and S3 remain within the AWS network, improving security, performance, and compliance. This feature is particularly beneficial for organizations with strict data governance requirements and those seeking to minimize exposure to the public internet.

Databases Flashcards

(45 cards)