RDS Flashcards

Question

What is Aurora Database Cloning?

Answer 1

By using Aurora cloning, you can create a new cluster that uses the same Aurora cluster volume and has the same data as the original. The process is designed to be fast and cost-effective. The new cluster with its associated data volume is known as a clone. Creating a clone is faster and more space-efficient than physically copying the data using other techniques, such as restoring a snapshot. It is useful for creating a staging database from production database without it.

Answer 2

Yes, they can be encrypted by using AWS KMS.

Answer 3

No, if the master is not encrypted, read replicas cannot be encrypted

Answer 4

To encrypt an unencrypted database, take a DB snapshot and restore it as encrypted

Answer 5

1. The database can be encrypted (at rest) by using AWS KMS, and it must be defined at the time of launching the database. 2. RDS and Aurora uses TLS by default for inflight encryption. 3. IAM roles to connect to the databases without using a username or password 4. You can control network access to your RDS and Aurora database by using security groups 5. You cannot assess the instance by SSH unless it's RDS custom

Answer 6

Amazon RDS Proxy is a fully managed, highly available database proxy for Amazon Relational Database Service (RDS) that makes applications more scalable, more resilient to database failures, and more secure. Many applications, including those built on modern serverless architectures, can have many open connections to the database server and may open and close database connections at a high rate, exhausting database memory and compute resources (opening and closing connections consume CPU and RAM). Amazon RDS Proxy allows applications to pool and share connections established with the database, improving database efficiency and application scalability. It's serverless, autoscaling, and highly available (multi-az) With RDS Proxy, failover times for Aurora and RDS databases are reduced by up to 66%, and database credentials, authentication, and access can be managed through integration with AWS Secrets Manager and AWS Identity and Access Management (IAM). https://aws.amazon.com/rds/proxy/ It supports MySQL, Postgres and MariaDB. No code change is required for most of the applications. It needs to be accessed by VPC, it's never publically accessible.

Answer 7

It supports mySQL, PostgresSQL, MariaDB and Aurora (mySQL, PostgresSQL)

Answer 8

No changes are required

Answer 9

Amazon ElastiCache is a fully managed, in-memory caching service supporting flexible, real-time use cases. You can use ElastiCache for caching, which accelerates application and database performance, or as a primary data store for use cases that don't require durability like session stores, gaming leaderboards, streaming, and analytics. ElastiCache is compatible with Redis and Memcached. It requires heavy code changes.

Answer 10

Redis is multi AZ with auto failover, read replicas to scale deeds and have high availability, backup and restore feature. Memcached uses multi node called sharding for partitioning the data. There's no high availability or application, nonpersistent, no backup and restore and it has multi threaded architecture.

Answer 11

No, the elastic cache does not support IAM authentication. IAM policies in elastic cache are only used for AWS API-level security

Answer 12

You can set password when you create Redis cluster. Redis also supports SSL in flight encryption.

Answer 13

Memcache supports SASL based authentication

Answer 14

Radis sorted set guarantee both uniqueness and element sorting therefore it's perfect for gaming leaderboards

Answer 15

AWS Elastic Beanstalk is an easy-to-use service for deploying and scaling web applications and services developed with Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker on familiar servers such as Apache, Nginx, Passenger, and IIS. You can simply upload your code and Elastic Beanstalk automatically handles the deployment, from capacity provisioning, load balancing, auto-scaling to application health monitoring. At the same time, you retain full control over the AWS resources powering your application and can access the underlying resources at any time. There is no additional charge for Elastic Beanstalk - you pay only for the AWS resources needed to store and run your applications.

Answer 16

Amazon DynamoDB is a fully managed, serverless, key-value NoSQL database designed to run high-performance applications at any scale. DynamoDB offers built-in security, continuous backups, automated multi-Region replication, in-memory caching, and data import and export tools. From exam perspective, you can choose DynamoDB if your schema is rapidly evolving. There are two read and write capacity modes: 1. Provision mode: you specify the number of read and writes per second. you need to plan capacity beforehand and you pay for the provisioned read capacity unit and write capacity unit. It is possible to add auto-scaling mode for RCU and WCU. 2. on-demand mode: read and write automatically, scale up and down with your load works. No capacity planning is needed. Pay for what you use - it is more expensive. Great for the unpredictable workload.

Answer 17

**1. Scalability:** **Seamless handling of massive growth: **DynamoDB excels at handling unpredictable workloads and massive data volumes without requiring manual sharding or complex partitioning strategies. It can scale up or down automatically to meet demand, ensuring optimal performance and cost-effectiveness. **Predictable performance:** DynamoDB guarantees consistent, single-digit millisecond response times, even under heavy load, making it ideal for applications with high-volume traffic and stringent latency requirements. **2. Flexibility:** **Schema-less design:** DynamoDB doesn't require a predefined schema, providing flexibility to store diverse data structures without schema migrations. This accommodates rapid application changes and evolving data models. **Support for various data types:** It supports structured, semi-structured, and unstructured data, including JSON, allowing for versatile application scenarios. **3. Performance for specific workloads:** **Optimized for key-value and document-based access: **DynamoDB shines for applications that predominantly involve key-value or document-based access patterns, such as social media, gaming, e-commerce, and IoT. **Fast writes and reads**: It delivers exceptionally fast read and write performance, making it well-suited for applications demanding real-time data access and updates. **4. Cost-effectiveness:** **Pay-per-use model**: DynamoDB operates on a pay-per-use model, eliminating upfront infrastructure costs and aligning expenses with actual usage. This leads to potential cost savings, especially for applications with fluctuating demand. **Serverless architecture**: It eliminates the need for provisioning and managing servers, reducing operational overhead and associated costs. **5. High Availability and Durability:** Built-in replication and fault tolerance: DynamoDB ensures data durability and availability across multiple Availability Zones within a region, offering strong protection against outages and data loss. Automatic backups and point-in-time recovery: It provides automatic backups and point-in-time recovery for added data protection and disaster recovery capabilities. **6. Integration with AWS Ecosystem:** Seamless integration with other AWS services: DynamoDB integrates seamlessly with other AWS services, such as Lambda, S3, Kinesis, and more, enabling the creation of robust and scalable cloud-native applications.

Answer 18

1. Export the data to S3 and use Athena 2. Connect with Redshift 3. Utilize the DynamoDB stream to capture the real-time changes and trigger the lambda function or KInesis data stream.

Answer 19

A simple primary key that consists of a single attribute is known as the partition key. DynamoDB uses the partition key's value as input to an internal hash function. The output from the hash function determines the partition (physical storage internal to DynamoDB) in which the item will be stored.

Answer 20

A composite primary key, also known as a composite sort key or range key, consists of two attributes: a partition key and a sort key. The partition key determines the partition in which the item will be stored, and the sort key determines the order of items with the same partition key. This allows you to perform range queries on the items in the table based on the sort key's value.

Answer 21

* Mobile apps * Gaming * Digital ad serving * Live voting * Audience interaction for live events * Sensor networks * Log ingestion * Access control for web based content * Metadata storage for Amazon S3 objects * E commerce shopping carts * Web session management

Answer 22

* Prewritten application tied to a traditional relational database: use RDS instead * Joins or complex transactions * Binary Large Object (BLOB) data: store data in S3 & metadata in DynamoDB * Large data with low I/O rate: use S3 instead

Answer 23

One Write Capacity Unit (WCU) represents one write per second for an item up to 1 KB in size

Answer 24

* When setting up RCU and WCU in DynamoDB, you don't need to overthink it. * Simply specify your target capacity usage, and DynamoDB will scale it for you. * If you go over the provisioned RCU or WCU due to increased consumption or writing, you can use Burst Capacity temporarily. * If Burst Capacity is fully consumed, you will receive a ProvisionThroughputExceededException. One Write Capacity Unit (WCU) represents one write per second for an item up to 1 KB in size * If the items are larger than 1 KB, more WCUs are consumed * Example 1: we write 10 items per second, with item size 2 KB * We need 10∗(2𝐾𝐵/ 1𝐾𝐵)=20𝑊𝐶𝑈𝑠 * Example 2: we write 6 items per second, with item size 4.5 KB * We need 6∗(5𝐾𝐵/1𝐾𝐵)=30𝑊𝐶𝑈𝑠 (4.5 gets rounded to the upper KB) * Example 3: we write 120 items per minute, with item size 2 KB * We need (120/60)∗(2𝐾𝐵/1𝐾𝐵)=4𝑊𝐶𝑈𝑠 ## Footnote Exam question will ask for the calculation

Answer 25

When dynamoDB writes, it can write to a server, and that gets written to other servers for replication purposes. If someone reads the data from the other server that has not been updated or the record has not been written, it will cause some issues. DynamoDB provides two types of reads: strongly consistent and eventually consistent. * Strongly consistent reads are the default setting, and they return the latest updated version of data. * Strongly consistent reads are helpful when you need the most current version of the data, like in financial transactions or other critical operations. * Eventually, consistent reads are not guaranteed to return the latest updated version of data but will provide the updated data eventually. * Eventually consistent reads are more efficient and less expensive than strongly consistent reads. * Eventually, consistent reads are useful when you can tolerate some delay in getting the latest updated data, like in caching or non-critical operations. * Developers can choose the type of reading they want to perform based on their use case requirements.

Answer 26

Set ConsistentRead ” parameter to True in API calls ( GetItem , BatchGetItem , Query, Scan)

Answer 27

1 RCU/second is used in a strongly consistent read, Which is twice the RCU used in eventually consistent read. There are 2 eventually consistent read per RCU/second * One Read Capacity Unit (RCU) represents one Strongly Consistent Read per second, or two Eventually Consistent Reads per second, for an item up to 4 KB in size * If the items are larger than 4 KB, more RCUs are consumed

Answer 28

* One Read Capacity Unit (RCU) represents one Strongly Consistent Read per second, or two Eventually Consistent Reads per second, for an item up to 4 KB in size * If the items are larger than 4 KB, more RCUs are consumed * Example 1: 10 Strongly Consistent Reads per second, with item size 4 KB * We need 10 ∗ (4𝐾𝐵/4𝐾𝐵) = 10𝑅𝐶𝑈𝑠 * Example 2: 16 Eventually Consistent Reads per second, with item size 12 KB * We need 162 ∗ (12𝐾𝐵/4𝐾𝐵) = 24𝑅𝐶𝑈𝑠 * Example 3: 10 Strongly Consistent Reads per second, with item size 6 KB * We need 10 ∗ (8𝐾𝐵/4𝐾𝐵) = 20𝑅𝐶𝑈𝑠 (we must round up 6 KB to 8 KB)

Answer 29

* DynamoDB uses partitioning to store and retrieve data in a scalable and efficient manner. * Each partition has a fixed amount of RCU and WCU capacity assigned to it, which is determined by the size of the partition. * When an item is written to DynamoDB, it is hashed to determine which partition it belongs to. * If the partition already has available WCU, the write request is processed immediately. * If the partition has no available WCU, the write request is throttled. * Similarly, when an item is read from DynamoDB, it is hashed to determine which partition it belongs to. * If the partition has available RCU, the read request is processed immediately. * If the partition has no available RCU, the read request is throttled.

Answer 30

* If you have 10 partitions in DynamoDB, and you provision 10 WCUs and 10 RCUs, they will be evenly spread across the partitions. * This means that each partition will receive one WCU and one RCU. * It is important to remember that WCUs and RCUs are divided and distributed evenly across partitions in DynamoDB.

Answer 31

When the provisioned capacity of a DynamoDB table is exceeded, DynamoDB throttles read and write requests, leading to increased response times and potentially even errors. 1. If you exceed the RCUs or WCUs at the partition level, you will get a ProvisionedThroughputExceededException. 2. This can happen because of a hot key, which means one partition key is being read too many times from a specific partition. 3. Hot partitions or very large items can also cause high RCU or WCU consumption. 4. Solutions for tackling this issue include using Exponential Backoff when encountering the exception and distributing partition keys as much as possible. 5. To avoid RCU issues due to reading one data point heavily on a wire partition, you can use DynamoDB Accelerator or DAX.

Answer 32

* Read/writes automatically scale up/down with your workloads * No capacity planning needed (WCU / RCU) * Unlimited WCU & RCU, no throttle, more expensive * You’re charged for reads/writes that you use in terms of RRU and WRU * Read Request Units (RRU) throughput for reads (same as RCU) * Write Request Units (WRU) throughput for writes (same as WCU) * 2.5x more expensive than provisioned capacity (use with care) * Use cases: unknown workloads, unpredictable application traffic, …

Answer 33

2.5x more expensive than provisioned capacity (use with care)

Answer 34

PutItem * Creates a new item or fully replace an old item (same Primary Key) * Consumes WCUs

Answer 35

UpdateItem * Edits an existing item’s attributes or adds a new item if it doesn’t exist * Can be used to implement Atomic Counters a numeric attribute that’s unconditionally incremented

Answer 36

Conditional Writes * Accept a write/update/delete only if conditions are met, otherwise returns an error * Helps with concurrent access to items * No performance impact

Answer 37

GetItem to get one item * Read based on a Primary key * Primary Key can be HASH or HASH+RANGE * Eventually Consistent Read (default) * Option to use Strongly Consistent Reads (more RCU might take longer) * ProjectionExpression can be specified to retrieve only certain attributes

Answer 38

* Query based one reading data from a partition, returns items based on: * KeyConditionExpression * Partition Key value ( must be = operator ) required * Sort Key value (=, <, <=, >, >=, Between, Begins with) optional * FilterExpression * Additional filtering after the Query operation (before data returned to you) * Use only with non key attributes (does not allow HASH or RANGE attributes) * Returns: * The number of items specified in Limit * Or up to 1 MB of data, ability to do pagination on the results * Can query table, a Local Secondary Index, or a Global Secondary Index

Answer 39

* Scan operation in AWS is used to read an entire table. * It is not efficient to filter data on the client side after scanning the entire table. * Scan operation returns up to one megabyte of data and requires pagination techniques to read more. * Scanning an entire table consumes a lot of RCU and may impact normal operations. * To limit the impact of a scan, a limit statement can be used to reduce the size of the result. * Parallel Scan can be used to scan multiple data segments at the same time, increasing the throughput and RCU consumed. * Limit queries and conditions can be used with Parallel Scan to further limit its impact. * Scans can be used with ProjectionExpression to retrieve only certain attributes and FilterExpression to change data on the client side.

Answer 40

yes, use DeleteItem. it can delete an individual item and it has the ability to perform a conditional delete

Answer 41

yes use DeleteTable. It can delete a whole table and all its items. It is more efficient to delete records then calling delete item on multiple items

Answer 42

DynamoDB Batch Operations allow you to perform multiple read or write operations in a single API call.

Answer 43

Batch Operations reduce the number of API calls required to perform the same number of read or write operations, which can reduce network traffic and improve application performance. if a batch operation fails, you will receive the failed items and you will have to it retry those items again.

Answer 44

The two types of Batch Operations are BatchGetItem and BatchWriteItem.

Answer 45

BatchGetItem is used to retrieve multiple items from one or more tables using their primary keys.

Answer 46

BatchWriteItem is used to put or delete items across one or more tables in a single API call. * Up to 25 PutItem and/or DeleteItem in one call * Up to 16 MB of data written, up to 400 KB of data per item * Can’t update items (use UpdateItem) * UnprocessedItems due to the lack of WCU (exponential backoff or add WCU)

Answer 47

The maximum number of items that can be processed in a single BatchGetItem or BatchWriteItem request is 25.

Answer 48

Use BatchGetItem * Return items from one or more tables * Up to 100 items, up to 16 MB of data * Items are retrieved in parallel to minimize latency * UnprocessedKeys for failed read operations (exponential backoff or add RCU)

Answer 49

PartiQL is a SQL-compatible query language for Amazon DynamoDB and other NoSQL databases.

Answer 50

Benefits of using PartiQL include its ease of use for SQL users, its flexibility for working with multiple data sources, and its ability to handle complex queries.

Answer 51

Some common use cases for PartiQL include querying and manipulating data in DynamoDB and other NoSQL databases, migrating data between different databases, and integrating data from multiple sources.

Answer 52

PartiQL is similar to SQL, but has some differences in syntax and functionality to support working with NoSQL databases and multiple data sources.

Answer 53

Yes, PartiQL can be used with other NoSQL databases as well as relational databases through the use of specialized adapters.

Answer 54

A PartiQL query begins with a SELECT statement, followed by the source of the data and any optional clauses such as WHERE, ORDER BY, and LIMIT.

Answer 55

DAX is a fully managed in-memory cache for DynamoDB. It can improve the performance of DynamoDB by up to 10x. DAX works by caching frequently accessed data in memory. This allows applications to access the data more quickly, without having to make a round trip to DynamoDB. To use DAX, you need to create a DAX cluster. A DAX cluster is made up of one or more cache nodes. The number of cache nodes in a cluster depends on the amount of data you need to cache and the expected load on your application. Once you have created a DAX cluster, you need to configure your application to use it. To do this, you need to update your application's connection string to point to the DAX cluster. DAX is a secure service. It offers a variety of security features, including encryption at rest, IAM authentication, VPC security, and CloudTrail integration. DAX can improve the performance of DynamoDB by up to 10x. This can lead to a better user experience for your application.

Answer 56

Both DAX and ElastiCache are in-memory caching services that can be used to improve the performance of DynamoDB. However, there are some key differences between the two services: * DAX is a fully managed service, while ElastiCache is a self-managed service. This means that DAX takes care of all the details of managing the cache, such as provisioning and scaling the cache nodes, while ElastiCache requires you to manage these tasks yourself. * DAX is optimized for DynamoDB, while ElastiCache can be used with a variety of data stores. This means that DAX can provide better performance for DynamoDB than ElastiCache. * DAX is more expensive than ElastiCache. This is because DAX is a fully managed service, while ElastiCache is a self-managed service.

Answer 57

DynamoDB DAX and ElastiCache can be used in combination, and exams may test your ability to determine which is best for a given situation. DAX is useful for caching individual objects, queries, or scans, making it suitable for simple types of queries. On the other hand, if your application is performing more complex logic, such as scanning, summing, filtering, and more, you can store the results in Amazon ElastiCache to avoid computationally expensive operations. By storing and retrieving data from ElastiCache instead of re-querying DAX and re-performing client-side aggregations, you can create a more efficient architecture by utilizing both services together.

Answer 58

* Definition: A primary key uniquely identifies each item in a DynamoDB table and consists of a partition key and an optional sort key. * Partition Key (Hash Key): Determines the physical partition where the item is stored. * Sort Key (Range Key): Determines the sorting order of items with the same partition key. * Purpose: Ensures the efficient retrieval of items based on the specified primary key.

Answer 59

When a Global Secondary Index (GSI) is created on a DynamoDB table, **a new index table is internally created and maintained by DynamoDB**. Although it is not a separate user-facing table, the GSI behaves like a distinct table with its own partition key and sort key, as well as its own provisioned throughput settings for read and write capacity units. The GSI is automatically updated by DynamoDB when items are added, updated, or deleted in the base table. This allows you to query the GSI using different attributes than the primary key of the base table, providing more flexibility in your queries. It is important to note that since GSIs consume additional write capacity for index maintenance, managing a GSI may incur additional costs for your DynamoDB usage. - Purpose: Allows querying on alternate keys for greater flexibility and query performance. - Key Features: 1. Can be created or deleted after the table is created. 2. Supports eventual or strong consistency. 3. Consumes additional write capacity for index maintenance. 4. Maximum of 20 GSIs per table.

Answer 60

- Definition: An LSI is created for an attribute that isn't included in the primary key. This allows efficient querying of data based on both the primary key and the chosen attribute, providing an alternative sorting order within the same partition. - Purpose: Provides a different view of the data, allowing efficient queries with alternate sort keys within the same partition. - Key Features: 1. Must be created when the table is created. Cannot be created afterwards 2. Supports strong consistency. 3. Shares provisioned throughput with the base table. 4. Maximum of 5 LSIs per table.

Answer 61

if there's a **write throttling** on a table that has global secondary index, the throttling happens on the **main table** and not the index table. Even if WCU are on the main table the Throttling will still happen.

Answer 62

There is No secondary table in case of local secondary index. WCU&RCU of the main table are are utilized. There is no spatial throttling consideration in case of LSI.

Answer 63

Dynamo DB accelerator cash is the query, and it is in front of Dynamo DB. It supports dynamo DB API and does not require application changes. The elastic cache can be utilized to aggregate results.

Answer 64

A DynamoDB stream is an ordered flow of information about changes to items in a DynamoDB table. When you enable a stream on a table, DynamoDB captures information about every modification to data items in the table. Whenever an application creates, updates, or deletes items in the table, DynamoDB Streams writes a stream record with the primary key attributes of the items that were modified. A stream record contains information about a data modification to a single item in a DynamoDB table. You can configure the stream so that the stream records capture additional information, such as the "before" and "after" images of modified items. when the changes are made to the dynamo DB, data about the changes can be sent either to DynamoDB streams or the Kinesis data stream. In the case of Kinesis, the stream can be sent to Kinesis data fire hose, and Data can be stored directly into rest shift amazon S3 or Amazon Open search.

Answer 65

When you enable DynamoDB Stream, be aware that the records will not be retroactively populated in the stream after it is enabled. This is a point that may come up in exams. Once the stream is enabled, only then will it receive updates based on the changes occurring in your DynamoDB table.

Answer 66

To understand how DynamoDB Streams and Lambda work together, we need to follow these steps: 1. Define an Event Source Mapping to read from a DynamoDB Stream. 2. Ensure the Lambda function has the necessary permissions to pull from the DynamoDB Stream. 3. The Lambda function will be invoked synchronously. For example, when a table goes into a DynamoDB Stream, the Lambda function will have an Event Source Mapping. This internal process pulls data from the DynamoDB Stream and retrieves records in batches. Once the Event Source Mapping receives some records, it will synchronously invoke the Lambda function with a batch of records from the DynamoDB Stream.

Answer 67

Global tables replicate your DynamoDB tables automatically across your choice of AWS Regions. Global tables eliminate the difficult work of replicating data between Regions and resolving update conflicts, enabling you to focus on your application's business logic. The replication is active-active. Enabling DynamoDB Stream is required for Global tables.

Answer 68

Amazon DynamoDB Time to Live (TTL) allows you to define a per-item timestamp to determine when an item is no longer needed. Shortly after the date and time of the specified timestamp, DynamoDB deletes the item from your table without consuming any WCU. TTL is provided at no extra cost as a means to reduce stored data volumes by retaining only the items that remain current for your workload’s needs. * Expired items are deleted from both LSIs and GSIs * A delete operation for each expired item enters the DynamoDB Streams (can help recover expired items) Web Session handling can be handled by TTL.

Answer 69

Yes, but point-in-time recovery must be enabled in DynamoDB. The export does not affect the reading capacity of your table in DynamoDB. Export data to S3 can be analyzed through Athena. ETL can be applied to S3 data before importing it back into Dynamo DB. DynamoDB can be exported in JSON or ION format.

Answer 70

DynamoDB DAX and ElastiCache can be used in combination, and exams may test your ability to determine which is best for a given situation. DAX is useful for caching individual objects, queries, or scans, making it suitable for simple types of queries. On the other hand, if your application is performing more complex logic, such as scanning, summing, filtering, and more, you can store the results in Amazon ElastiCache to avoid computationally expensive operations. By storing and retrieving data from ElastiCache instead of re-querying DAX and re-performing client-side aggregations, you can create a more efficient architecture by utilizing both services together.

Answer 71

Use S3 for Large Data Storage and DynamoDB for Metadata: In this strategy, you store large data files such as images, videos, or other large documents in Amazon S3, while keeping the metadata in DynamoDB. Metadata can include information like the file's name, date of creation, owner, and other related attributes. Steps to implement this strategy: * Upload the large data file to an S3 bucket. * Generate a unique identifier for the file in S3 (e.g., its object key). * Create a new item in the DynamoDB table with the unique identifier as the primary key and store the metadata associated with the file.

Answer 72

Security: * VPC Endpoints available to access DynamoDB without internet * Access fully controlled by IAM * Encryption at rest using KMS * Encryption in transit using SSL / TLS

Answer 73

Backup and Restore feature available * Point in time restore like RDS * No performance impact * Global Tables: Multi region, fully replicated, high performance * Amazon Database Migration Service (DMS) can be used to migrate to DynamoDB (from Mongo, Oracle, MySQL, S3, etc * You can launch a local DynamoDB on your computer for development purposes