Databases Flashcards

1
Q

What are the RDS Database Types?

A

RDS: Relational Database Service
RDS Types: SQL Server, Oracle, MySQL, PostgreSQL,
MariaDB, and Amazon Aurora.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In what scenarios is RDS well-suited?

A
  • RDS Is for OLTP Workloads: Great for processing lots of small
    transactions, like customer orders, banking transactions, payments, and booking systems.
  • Not Suitable for OLAP
    Use Redshift for data warehousing and OLAP tasks, like analyzing large amounts of data, reporting, and sales forecasting.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is OLTP and OLAP?

A

OLTP: Stands for Online Transaction Processing. It refers to systems designed for managing and processing high volumes of real-time transactions, such as updating records, making reservations, or processing orders.

OLAP: Stands for Online Analytical Processing. It pertains to systems optimized for complex analytical queries and reporting. OLAP systems enable users to analyze large volumes of data, spot trends, and gain insights through aggregation and multidimensional analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Read Replicas — Key Facts

A
  • Scaling Read Performance: Primarily used for scaling, not for disaster
    recovery!
  • Requires Automatic Backup: Automatic backups must be enabled in order to deploy a read replica.
  • Multiple Read Replicas Are Supported:MySQL, MariaDB, PostgreSQL, Oracle, and SQL Server allow you to add up to 5 read replicas to each DB instance.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

RDS Multi-AZ and Read Replicas

A

Multi-AZ:

  • An exact copy of your production database in another Availability Zone.
  • Used for disaster recovery.
  • In the event of a failure, RDS will automatically failover to the standby
    instance.

Read Replicas:
- A read-only copy of your primary database in the same Availability Zone, cross-AZ, or cross-region.
- Used to increase or scale read performance.
* Great for read-heavy workloads and takes the load off your primary database for read-only workloads (e.g., Business Intelligence reporting jobs).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

4 Aurora Exams Tips

A
  • 2 copies of your data are contained in each Availability Zone, with a minimum of 3 Availability Zones. 6 copies of your data.
  • You can share Aurora snapshots with other AWS accounts.
  • 3 types of replicas available: Aurora replicas, MySQL replicas, and PostgreSQL replicas. Automated failover is only available with Aurora replicas.
  • Aurora has automated backups turned on by default. You can also take snapshots with Aurora. You can share these snapshots with other AWS accounts.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Aurora Serverless Use Cases

A

Aurora Serverless provides a relatively simple, cost-effective
option
for infrequent, periodic, or unpredictable workloads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

4 Facts about DynamoDB

A
  • Stored on SSD storage.
  • Spread across 3 geographically distinct data centers.
  • Eventually consistent reads (default).
  • Strongly consistent reads.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What’s the difference between eventually consistent reads and
strongly consistent reads?

A

Eventually:
Consistency across all copies of data is usually reached within a second. Repeating a read after a short time should return the updated data. Best read performance. It reads from secondary nodes.

Strongly:
A strongly consistent read returns a result that reflects all writes that
received a successful response before the read. It reads straight from the leader node.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

7 Facts on DynamoDB Transactions

A
  • Multiple “all-or-nothing” operations
  • Financial transactions
  • Fulfilling orders
  • 3 options for reads: eventual consistency, strong consistency, and transactional
  • 2 options for writes: standard and transactional
  • Up to 25 items or 4 MB of data
  • DynamoDB transactions provide developers atomicity, consistency,
    isolation, and durability (ACID) across one or more tables within a single
    AWS account and region.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

DynamoDB On-Demand Backup and Restore

A
  • Full backups at any time
  • Zero impact on table performance or availability
  • Consistent within seconds and retained until deleted
  • Operates within the same region as the source table
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

DynamoDB Point-in-Time Recovery
(PITR)

A
  • Protects against accidental writes or deletes
  • Restore to any point in the last 35 days
  • Incremental backups
  • Not enabled by default
  • Latest restorable: 5 minutes in the past
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How DynamoDB Streams works?

A

It’s like a real-time feed of the changes happening to the data in the table. Stored for 24 hours

  1. Changes in the Table: DynamoDB Streams captures those changes.
  2. Events: These changes are turned into small messages called “events.” Each event describes what changed, like which book was checked out or returned.
  3. Stream: These events are collected in a stream, which is like a flow of events. This stream is separate from the table itself but closely connected.
  4. Process the Events: You can have something called a “listener” that’s always paying attention to this stream.

Real-time Tracking: DynamoDB Streams provide real-time tracking of changes. This can be super useful for things like updating search results as soon as new data is added or keeping backups of changes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are Global Tables in DynamoDB used for?

A

Managed Multi-Master, Multi-Region Replication

  • Globally distributed applications
  • Based on DynamoDB streams
  • Multi-region redundancy for disaster recovery or high availability
  • No application rewrites
  • Replication latency under one second. Strong consistent reads only in the same regions as writes.
  • Last write wins is used for conflict resolution.
  • DynamoDB streams should be enabled to enable Global tables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Amazon Keyspaces?

A

Amazon Keyspaces, formerly known as Amazon Managed Apache Cassandra Service (MCS), is a fully managed, serverless, and scalable database service offered by Amazon Web Services (AWS). It provides the benefits of Apache Cassandra’s NoSQL database technology without the operational overhead of managing and scaling the infrastructure yourself.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Amazon Neptune?

A

Amazon Neptune is a managed graph database service offered by Amazon Web Services (AWS). It’s designed for building applications that require storing, querying, and analyzing highly connected data, such as social networks, recommendation systems, fraud detection, and knowledge graphs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is Amazon QLDB?

A

Amazon Quantum Ledger Database (Amazon QLDB) is a fully managed, serverless, and highly scalable database. QLDB is designed to provide an immutable, transparent, and cryptographically verifiable ledger for applications that require a tamper-proof record of changes to data over time. It’s particularly useful for use cases like auditing, compliance, financial transactions, and supply chain tracking.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is Amazon Timestream?

A

Amazon Timestream is a fully managed, serverless, and scalable time-series database service. It’s designed specifically for handling and analyzing large volumes of time-series data, which is data that is generated with a timestamp and often comes from sources like sensors, logs, metrics, and IoT devices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the 6 categories of costs in RDS?

A
  1. Instance Size and Type
  2. Multi AZ or not.
  3. Storage type and amount.
  4. Data transferred.
  5. Backups and snapshots.
  6. Licencing(if applicable)
20
Q

What do we need to create before provisioning an RDS instance?

A

A subnet group(consists of at least 2 subnets)

21
Q

What are the two Multi-AZ options in Amazon Relational Database Service (RDS)?

A
  • Multi-AZ instance mode: This mode has one primary database instance and one standby instance. The standby instance synchronously replicates data from the primary instance, but it doesn’t serve read traffic. In the event of a failure of the primary instance, Amazon RDS automatically fails over to the standby instance.
  • Multi-AZ cluster mode: This mode has one writer instance and two reader instances. The reader instances synchronously replicate data from the writer instance, and they can also be used to serve read traffic. In the event of a failure of the writer instance, Amazon RDS automatically fails over to one of the reader instances.
22
Q

Key Differences Between Manual Snapshots and Automatic Backups in RDS?

A
  • Manual snapshots are performed manually and live past the termination of an RDS instance.
  • Automatic backups can be taken of an RDS instance with a 0 (Disabled) to 35-day retention.
  • Automatic backups also use S3 for storing transaction logs every 5 minutes - allowing for point-in-time(PTR) recovery.
  • Snapshots can be restored .. but create a new RDS instance.
  • To access them through the S3 console need to be exported.
  • Are not cross-region by default.
23
Q

What is the key difference regarding replication between Multi-AZ deployments and read replicas in Amazon RDS?

A

Multi-AZ deployments in RDS are synchronous, while read replicas in RDS are asynchronous.

24
Q

What is Amazon RDS Custom?

A

Amazon RDS Custom is a managed database service for applications that require customization of the underlying operating system and database environment. Benefits of RDS automation with the access needed for legacy, packaged, and custom applications.

25
Q

What are Aurora global databases?

A

Aurora global databases are a feature of Aurora Provisioned clusters which allow data to be replicated globally providing significant RPO and RTO improvements for BC and DR planning. Additionally, global databases can provide performance improvements for customers .. with data being located closer to them, in a read-only form.

=<1s replication between regions.

26
Q

How many read replicas can Aurora have?

A

An Aurora DB cluster can have up to 15 read replicas. The read replicas can be distributed across the Availability Zones that a DB cluster spans within an AWS Region.

27
Q

What is Aurora Backtrack?

A

Aurora Backtrack is a feature that allows you to revert your database to a previous point in time. This can be useful if you accidentally make a change to your database that you need to undo.

28
Q

What is Aurora Parallel Query?

A

Aurora Parallel Query is a feature that allows you to run queries in parallel across multiple Aurora replicas. This can significantly improve the performance of your database for read-heavy workloads.

Aurora Parallel Query works by breaking down your query into smaller subqueries and executing them in parallel across multiple Aurora replicas. The results of the subqueries are then merged together to produce the final result of your query.

Aurora Parallel Query is enabled by default for all Aurora databases. You can disable it if you need to, but it is generally recommended to keep it enabled for performance reasons.

29
Q

What are Aurora Database Activity Streams?

A

Aurora Database Activity Streams is a feature that allows you to stream database activity events to Amazon Kinesis Data Streams. This allows you to analyze database activity in real-time and build applications that react to database changes.

30
Q

What is an Aurora cluster volume?

A

An Aurora cluster volume is a virtual storage volume that is shared by all of the instances in an Aurora database cluster. It is similar to a traditional EBS volume, but it is optimized for Aurora databases.

31
Q

What is Multi-Master Write Mode in Amazon Aurora?

A

Multi-master write is a mode of Aurora Provisioned Clusters that allows multiple instances to perform reads and writes at the same time - rather than only one primary instance having write capability in a single-master cluster.

Highly improves fault tolerance.

Load balancing is handled by the application not by the cluster.

32
Q

What is Amazon RDS Proxy?

A

Is a fully managed, highly available database proxy for Amazon Relational Database Service (RDS) that makes applications more scalable, more resilient to database failures, and more secure.

Keeps a connection pooling.

Ideal when there are “too many connection errors” especially in smaller instances.

Increases fault tolerance(by 60% compared to Aurora).

Only accessible within a VPC.

33
Q

What is the concept of Multiplexing in Amazon RDS Proxy?

A

Consolidating multiple individual connections into a single connection, reducing the overhead associated with connection establishment and management, and optimizing resource utilization. It can improve database efficiency and performance.

Ideal for lambda because every function wants to open a new connection..

34
Q

What are the three primary types of migration jobs in AWS Database Migration Service (DMS)?

A
  1. Full Load Migration: Transfers all data from the source to the target database.
  2. CDC (Change Data Capture) Migration: Captures and replicates real-time changes to keep source and target databases synchronized.
  3. Full Load + CDC Migration: Combines the initial data transfer and continuous change replication.

For OLTP and OLAP. not for noSQL.

35
Q

What is AWS SCT, and when is it typically used?

A

AWS SCT (Schema Conversion Tool) is a service that helps convert database schemas and code from one database engine to another. It’s typically used when migrating from one database platform (e.g., Oracle, SQL Server) to another (e.g., Amazon RDS, Amazon Aurora) to ensure compatibility and efficient data migration.

Can also be used with large multi-TB DBs with Snowball(because it would be costly through a connection).

36
Q

What do WCUs and RCUs represent in Amazon DynamoDB?

A

In Amazon DynamoDB, Write Capacity Units (WCUs) and Read Capacity Units (RCUs) are measures of the throughput provisioned for write and read operations, respectively. These units represent the capacity allocated for handling write and read requests on a DynamoDB table. Provisioning WCUs and RCUs allows you to control the performance of your table based on anticipated workloads.

  • 1 RCU = 1 x 4KB read operation per second for strongly consistent reads(For eventually consistent reads is 50% )
  • 1 WCU = 1 x 1KB write operation per second
  • Every table has an RCU and WCU burst pool (300 seconds)
37
Q

Question: What are the primary differences between Query and Scan operations in Amazon DynamoDB?

A
  • Query:
    • Query is used to retrieve items from a DynamoDB table based on the values of the primary key(PK or PK SK).
    • It’s more efficient for retrieving a specific set of items.
    • You can specify conditions on the key attributes to filter the results.
  • Scan:
    • Scan reads every item in the table and returns the entire content.
    • It’s less efficient than Query as it reads every item, making it slower and more resource-intensive.
    • Useful when you need to scan the entire table or apply non-key attribute filters.
38
Q

What is a Local Secondary Index (LSI) in Amazon DynamoDB?

A
  • A Local Secondary Index (LSI) is an additional index that can be created on a DynamoDB table during table creation only.
  • LSIs have the same partition key as the base table but a different sort key, providing more querying flexibility within a specific partition key.
  • LSIs share the provisioned capacity (RCUs and WCUs) with the base table.
  • Up to 5 LSIs per table. The combined size of all LSIs must fit within the maximum table size limit (currently 10 GB).
  • They are sparse. They return items only if the sort key value exists in the item(So “scan” is more efficient because it acts only on meaningful data).
  • You can Attribute projections(Keys, all, or specific attributes
39
Q

What is a Global Secondary Index (GSI) in Amazon DynamoDB?

A
  • A Global Secondary Index (GSI) is an additional index that can be created on a DynamoDB table(at any time) to provide flexible querying options.
  • Unlike Local Secondary Indexes (LSIs), GSIs have a different partition key and sort key than the base table, allowing broader querying capabilities.
  • GSIs have their own provisioned capacity (RCUs and WCUs), allowing independent scaling from the base table.
  • Up to 20 GSIs per table. The combined size of all LSIs must fit within the maximum table size limit (currently 10 GB).
  • They are sparse. They return items only if the sort key value exists in the item(So “scan” is more efficient because it acts only on meaningful data).
  • You can Attribute projections(Keys, all, or specific attributes
  • They are always eventually consistent. Suggested by AWS(compared to LSIs) if strong consistency is not needed.
40
Q

What are the view types associated with Streams in Amazon DynamoDB?

A
  • Streams in DynamoDB capture and stream modifications to items in a table in real-time.
  • There are four view types for stream records:
    1. KEYS_ONLY: Records contain only the key attributes of the modified item.
    2. NEW_IMAGE: Records include the entire item after it was modified.
    3. OLD_IMAGE: Records include the entire item before it was modified.
    4. NEW_AND_OLD_IMAGES: Records contain both the new and old images of the item.
41
Q

How can AWS Lambda be integrated with DynamoDB Streams, and what trigger functionality does it provide?

A
  • AWS Lambda can be integrated with DynamoDB Streams to provide event-driven functionality.
  • Lambda functions can be set up as triggers to automatically invoke when new entries are added to the DynamoDB Stream.
  • This enables seamless and real-time processing of changes to DynamoDB tables, allowing for custom business logic or additional processing to be executed in response to database modifications.
42
Q

What is DAX?

A

DynamoDB Accelerator (DAX) is an in-memory cache designed specifically for DynamoDB.

  • Primary Nodes (write) and replicas(read)
  • Nodes are HA. Primary failure = election.
  • Scale up and out.
  • DAX deployed within a VPC.
  • Eventually consistency.
43
Q

What is Amazon DynamoDB TTL?

A

Amazon DynamoDB Time to Live (TTL) allows you to define a per-item timestamp to determine when an item is no longer needed. Shortly after the date and time of the specified timestamp, DynamoDB deletes the item from your table without consuming any write throughput. TTL is provided at no extra cost as a means to reduce stored data volumes by retaining only the items that remain current for your workload’s needs

44
Q

What is Amazon Redshift Spectrum?

A

Amazon Redshift Spectrum is a feature of Amazon Redshift, a fully managed data warehouse service. Spectrum allows you to run queries on large datasets stored in Amazon Simple Storage Service (S3) directly from your Amazon Redshift cluster, without the need to load the data into the cluster. This enables you to analyze vast amounts of data in your S3 data lake with the power and flexibility of Amazon Redshift’s query processing capabilities. Spectrum extends your data warehouse to query unstructured data in S3, providing a cost-effective solution for handling massive datasets.

45
Q

What is Amazon Redshift Enhanced VPC Routing?

A

Amazon Redshift Enhanced VPC Routing is a feature that allows Amazon Redshift clusters to route traffic to Amazon Simple Storage Service (S3) through a specified Amazon VPC (Virtual Private Cloud) endpoint, instead of using the public internet. By utilizing enhanced VPC routing, data transfers between Redshift and S3 remain within the AWS network, improving security, performance, and compliance. This feature is particularly beneficial for organizations with strict data governance requirements and those seeking to minimize exposure to the public internet.