RDS Flashcards

1
Q

How many read replicas can you have in Aurora?

A

you can have 15 replicas, while MySQL has five. The replication process is faster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How Aurora ensures high availability?

A

there are 6 copies of your data across 3 AZ.
- 4 copies out of 6 needed for writes
- 3 copies out of 6 needed for reads
- self-healing with peer-to-peer replication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What’s the advantage of using RDS over deploying databases on EC2?

A
  • Automated provisioning, OS patching
  • Continuous backups and restore to specific timestamp (Point in Time Restore)!
  • Monitoring dashboards
  • Read replicas for improved read performance
  • Multi-AZ setup for DR (Disaster Recovery)
  • Maintenance windows for upgrades
  • Scaling capability (vertical and horizontal)
  • Storage backed by EBS (gp2 or io1)
  • Automated Backup

But you cannot SSH into the instance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is storage autoscaling in RDS?

A

RDS storage autoscaling helps you increase the storage on your DB instance dynamically. When RDS detects you are running out of free database storage, it scales automatically. You have to set the maximum storage threshold so your database does not grow infinitely. This is useful for applications with unpredictable workloads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How many read replicas can be there in RDS?

A

There can be up to 5 read replicas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Can RDS read replicas be in the same AZ?

A

The replicas can be within the same AZ, cross AZ or cross-region

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Are replicas in-sync with the main RDS DB instance?

A

The replication to replicas is asynchronous, but they are eventually consistent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the use of RDS read replicas?

A

Replicas provide better performance for reading the data from the RDS for the stop it can be used by the reporting or any analytical tool that only needs to read the data.

Replicas can be promoted to their own DB. Applications must update the connection string to leverage the read replicas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Is there a network cost when data goes from one AZ to another?

A

The data synchronization between same or different AZ free. But there is a cost when data is synchronized across regions. Data synchronization within the same region is free but It’s not free between different regions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is disaster recovery configured in RDS?

A

The disaster recovery is configured across the AZs (Multi-AZ) with synchronous replication (instantly). One DNS name is configured for the application to the main RDS database instance. In the event of failure, the automation failover will occur and DNS will point to the DR instance. There won’t be any interruption in the application. The DR instance becomes the master DB Instance.
The DR instance cannot be used for scaling (no one can read or write to it, it’s just here for the failover if anything goes wrong with the master database).

Read Replicas can be setup as a Multi-AZ for disaster recovery (DR). It’s a common exam question.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Can RDS be changed from a single AZ to multiple AZ?

A

The RDS settings can be modified to make the database from a single AZ to multiple AZ. Behind the scene, AWS will take that snapshot of the master instance and restore it in the new AZ. A synchronization is set up between the master and the standby DB instances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference between RDS and RDS custom?

A

In RDS, the entire database at the OS is managed by AWS.

In RDS custom, you have access to the database and the OS. You don’t have control over the hardware. You also have access to the EC2 instance. RDS custom is available only for Oracle and Microsoft SQL Server. You can customize the instance or the database. But you must disable the automation mode before you perform any customization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Aurora is compatible with which databases?

A

It’s compatible Postgres and MySQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the writer endpoint and reader endpoint in Aurora DB?

A

The writer endpoint is a DNS name that is used for writing the data to the master DB instance. In case of failure, the writer endpoint may point to the read replica and promote it as a master DB instance. The application pointing to the master DB instance would not require any change.

Reader endpoint is also a DNS name used by the applications to read the data from the read replicas. The Aurora database automatically performs the auto-scaling, and new replicas are set up to ensure the desired performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Is it possible to backtrack data in Aurora?

A

Yes the database can be restored at any point of time without using backup put a stop. Aurora uses a different method to do that.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Aurora replicas autoscaling?

A

If there are multiple read requests AWS creates more replicas depending upon the higher CPU usage of the existing read replicas.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are Aurora’s custom endpoints?

A

The custom endpoints provide load balancing and high availability for each group of DB instances within your cluster. If one of the DB instances within a group becomes unavailable, Aurora directs subsequent custom endpoint connections to one of the other DB instances associated with the same endpoint.

The custom endpoints can be created to run analytical queries or any other specific purpose. These read replicas can be larger instances compared to other existing replicas.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is Aurora serverless?

A

Amazon Aurora Serverless is an on-demand, autoscaling configuration for Amazon Aurora. It automatically starts up, shuts down, and scales capacity up or down based on your application’s needs. You can run your database on AWS without managing database capacity.

Manually managing database capacity can take up valuable time and can lead to inefficient use of database resources. With Aurora Serverless, you create a database, specify the desired database capacity range, and connect your applications. You pay on a per-second basis for the database capacity that you use when the database is active, and migrate between standard and serverless configurations with a few steps in the Amazon Relational Database Service (Amazon RDS) console.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is Aurora multi-master?

A

Most kinds of Aurora clusters are single-master clusters. For example, provisioned, Aurora Serverless, parallel query, and Global Database clusters are all single-master clusters. In a single-master cluster, a single DB instance performs all write operations and any other DB instances are read-only. If the writer DB instance becomes unavailable, a failover mechanism promotes one of the read-only instances to be the new writer.

In a multi-master cluster, all DB instances can perform write operations. The notions of a single read/write primary instance and multiple read-only Aurora Replicas don’t apply. There isn’t any failover when a writer DB instance becomes unavailable, because another writer DB instance is immediately available to take over the work of the failed instance. We refer to this type of availability as continuous availability, to distinguish it from the high availability (with brief downtime during failover) offered by a single-master cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is global Aurora?

A

Amazon Aurora Global Database is designed for globally distributed applications, allowing a single Amazon Aurora database to span multiple AWS Regions. It replicates your data with no impact on database performance, enables fast local reads with low latency in each Region, and provides disaster recovery from Region-wide outages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is Aurora Machine Learning?

A

Amazon Aurora machine learning enables you to add ML-based predictions to applications via the familiar SQL programming language, so you don’t need to learn separate tools or have a prior machine learning experience. It provides simple, optimized, and secure integration between Aurora and AWS ML services without having to build custom integrations or move data around.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Which database takes less than one second for cross-region replication?

A

Global Aurora database. Cross-region replication within one second is a hint in the exam for the global Aurora database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

how many backup options are there in RDS?

A

Automated backup and manual backup.

In automated backup full backup is taken every day and transaction logs are backed up every five minutes. It gives the ability to restore the backup any point in time. The automated backup can be retained from 1 to 35 days.

The manual database snapshots are triggered by the users. The backups are retained as long as the user wants them.

A trick that can come up in the exam: if you have a requirement of having an RDS database at a specific time frame, you can reduce the cost by taking its backup and then deleting it. When you again need to have the database, you can restore it and start using it. This way, you can reduce the cost of having an RDS database. but you will still have to pay for the storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How the backup in Aurora is different than the backup in RDS?

A

Automated backup in Aurora cannot be disabled as it can be disabled in RDS by setting the retention period to zero.
Similar to RDS the backup in Aurora is retained from 1-35 days.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is Aurora Database Cloning?

A

By using Aurora cloning, you can create a new cluster that uses the same Aurora cluster volume and has the same data as the original. The process is designed to be fast and cost-effective. The new cluster with its associated data volume is known as a clone. Creating a clone is faster and more space-efficient than physically copying the data using other techniques, such as restoring a snapshot.

It is useful for creating a staging database from production database without it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Can RDS or Aurora databases can be encrypted?

A

Yes, they can be encrypted by using AWS KMS.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Can read replicas be encrypted if master is not encrypted?

A

No, if the master is not encrypted, read replicas cannot be encrypted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How to encrypt an unencrypted database?

A

To encrypt an unencrypted database, take a DB snapshot and restore it as encrypted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What are the security options for RDS and Aurora?

A
  1. The database can be encrypted (at rest) by using AWS KMS, and it must be defined at the time of launching the database.
  2. RDS and Aurora uses TLS by default for inflight encryption.
  3. IAM roles to connect to the databases without using a username or password
  4. You can control network access to your RDS and Aurora database by using security groups
  5. You cannot assess the instance by SSH unless it’s RDS custom
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is Amazon RDS proxy?

A

Amazon RDS Proxy is a fully managed, highly available database proxy for Amazon Relational Database Service (RDS) that makes applications more scalable, more resilient to database failures, and more secure.

Many applications, including those built on modern serverless architectures, can have many open connections to the database server and may open and close database connections at a high rate, exhausting database memory and compute resources (opening and closing connections consume CPU and RAM). Amazon RDS Proxy allows applications to pool and share connections established with the database, improving database efficiency and application scalability. It’s serverless, autoscaling, and highly available (multi-az) With RDS Proxy, failover times for Aurora and RDS databases are reduced by up to 66%, and database credentials, authentication, and access can be managed through integration with AWS Secrets Manager and AWS Identity and Access Management (IAM).
https://aws.amazon.com/rds/proxy/

It supports MySQL, Postgres and MariaDB. No code change is required for most of the applications.

It needs to be accessed by VPC, it’s never publically accessible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What databases are supported by RDS proxy?

A

It supports mySQL, PostgresSQL, MariaDB and Aurora (mySQL, PostgresSQL)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Is there any code changes required for using RDS proxy?

A

No changes are required

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is ElastiCache?

A

Amazon ElastiCache is a fully managed, in-memory caching service supporting flexible, real-time use cases. You can use ElastiCache for caching, which accelerates application and database performance, or as a primary data store for use cases that don’t require durability like session stores, gaming leaderboards, streaming, and analytics. ElastiCache is compatible with Redis and Memcached.

It requires heavy code changes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What’s the difference between Redis and memcached?

A

Redis is multi AZ with auto failover, read replicas to scale deeds and have high availability, backup and restore feature. Memcached uses multi node called sharding for partitioning the data. There’s no high availability or application, nonpersistent, no backup and restore and it has multi threaded architecture.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Does elastic cache uses IAM Authentication?

A

No, the elastic cache does not support IAM authentication. IAM policies in elastic cache are only used for AWS API-level security

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

How is Redis authenticated?

A

You can set password when you create Redis cluster. Redis also supports SSL in flight encryption.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What authentication mechanism is used by memcached?

A

Memcache supports SASL based authentication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What are the use cases for Redis?

A

Radis sorted set guarantee both uniqueness and element sorting therefore it’s perfect for gaming leaderboards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What is the Elastic Beanstalk?

A

AWS Elastic Beanstalk is an easy-to-use service for deploying and scaling web applications and services developed with Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker on familiar servers such as Apache, Nginx, Passenger, and IIS.

You can simply upload your code and Elastic Beanstalk automatically handles the deployment, from capacity provisioning, load balancing, auto-scaling to application health monitoring. At the same time, you retain full control over the AWS resources powering your application and can access the underlying resources at any time.

There is no additional charge for Elastic Beanstalk - you pay only for the AWS resources needed to store and run your applications.

40
Q

What is DynamoDB?

A

Amazon DynamoDB is a fully managed, serverless, key-value NoSQL database designed to run high-performance applications at any scale.

DynamoDB offers built-in security, continuous backups, automated multi-Region replication, in-memory caching, and data import and export tools.

From exam perspective, you can choose DynamoDB if your schema is rapidly evolving.

There are two read and write capacity modes:
1. Provision mode: you specify the number of read and writes per second. you need to plan capacity beforehand and you pay for the provisioned read capacity unit and write capacity unit. It is possible to add auto-scaling mode for RCU and WCU.
2. on-demand mode: read and write automatically, scale up and down with your load works. No capacity planning is needed. Pay for what you use - it is more expensive. Great for the unpredictable workload.

41
Q

Why use DynamoDB over relational databases?

A

1. Scalability:

**Seamless handling of massive growth: **DynamoDB excels at handling unpredictable workloads and massive data volumes without requiring manual sharding or complex partitioning strategies. It can scale up or down automatically to meet demand, ensuring optimal performance and cost-effectiveness.
Predictable performance: DynamoDB guarantees consistent, single-digit millisecond response times, even under heavy load, making it ideal for applications with high-volume traffic and stringent latency requirements.

2. Flexibility:
Schema-less design: DynamoDB doesn’t require a predefined schema, providing flexibility to store diverse data structures without schema migrations. This accommodates rapid application changes and evolving data models.
Support for various data types: It supports structured, semi-structured, and unstructured data, including JSON, allowing for versatile application scenarios.

3. Performance for specific workloads:
**Optimized for key-value and document-based access: **DynamoDB shines for applications that predominantly involve key-value or document-based access patterns, such as social media, gaming, e-commerce, and IoT.
Fast writes and reads: It delivers exceptionally fast read and write performance, making it well-suited for applications demanding real-time data access and updates.

4. Cost-effectiveness:
Pay-per-use model: DynamoDB operates on a pay-per-use model, eliminating upfront infrastructure costs and aligning expenses with actual usage. This leads to potential cost savings, especially for applications with fluctuating demand.
Serverless architecture: It eliminates the need for provisioning and managing servers, reducing operational overhead and associated costs.

5. High Availability and Durability:
Built-in replication and fault tolerance: DynamoDB ensures data durability and availability across multiple Availability Zones within a region, offering strong protection against outages and data loss.
Automatic backups and point-in-time recovery: It provides automatic backups and point-in-time recovery for added data protection and disaster recovery capabilities.

6. Integration with AWS Ecosystem:
Seamless integration with other AWS services: DynamoDB integrates seamlessly with other AWS services, such as Lambda, S3, Kinesis, and more, enabling the creation of robust and scalable cloud-native applications.

42
Q

How to use DynamoDB data for Analytics?

A
  1. Export the data to S3 and use Athena
  2. Connect with Redshift
  3. Utilize the DynamoDB stream to capture the real-time changes and trigger the lambda function or KInesis data stream.
43
Q

Exam Question

How to define Primary Key (Hash) in DynamoDB?

A

A simple primary key that consists of a single attribute is known as the partition key. DynamoDB uses the partition key’s value as input to an internal hash function. The output from the hash function determines the partition (physical storage internal to DynamoDB) in which the item will be stored.

44
Q

How to define Composite Key (hash + range) in DynamoDB?

A

A composite primary key, also known as a composite sort key or range key, consists of two attributes: a partition key and a sort key. The partition key determines the partition in which the item will be stored, and the sort key determines the order of items with the same partition key. This allows you to perform range queries on the items in the table based on the sort key’s value.

45
Q

What are DynamoDB use cases?

A
  • Mobile apps
  • Gaming
  • Digital ad serving
  • Live voting
  • Audience interaction for live events
  • Sensor networks
  • Log ingestion
  • Access control for web based content
  • Metadata storage for Amazon S3 objects
  • E commerce shopping carts
  • Web session management
46
Q

What is Anti-pattern (what you don’t do) with DynamoDB?

A
  • Prewritten application tied to a traditional relational database: use RDS instead
  • Joins or complex transactions
  • Binary Large Object (BLOB) data: store data in S3 & metadata in DynamoDB
  • Large data with low I/O rate: use S3 instead
47
Q

Exam Question

What is 1 Write Capacity Unit (WCU)?

A

One Write Capacity Unit (WCU) represents one write per second for an item up to 1 KB in size

48
Q

Exam Question

Read/Write Capacity Mode - Provisioned?

A
  • When setting up RCU and WCU in DynamoDB, you don’t need to overthink it.
  • Simply specify your target capacity usage, and DynamoDB will scale it for you.
  • If you go over the provisioned RCU or WCU due to increased consumption or writing, you can use Burst Capacity temporarily.
  • If Burst Capacity is fully consumed, you will receive a ProvisionThroughputExceededException.

One Write Capacity Unit (WCU) represents one write per second for an item up to 1 KB in size
* If the items are larger than 1 KB, more WCUs are consumed
* Example 1: we write 10 items per second, with item size 2 KB
* We need 10∗(2𝐾𝐵/ 1𝐾𝐵)=20𝑊𝐶𝑈𝑠
* Example 2: we write 6 items per second, with item size 4.5 KB
* We need 6∗(5𝐾𝐵/1𝐾𝐵)=30𝑊𝐶𝑈𝑠 (4.5 gets rounded to the upper KB)
* Example 3: we write 120 items per minute, with item size 2 KB
* We need (120/60)∗(2𝐾𝐵/1𝐾𝐵)=4𝑊𝐶𝑈𝑠

Exam question will ask for the calculation

49
Q

Exam Question

What is DynamoDb Strongly Consistent Read vs. Eventually Consistent Read

A

When dynamoDB writes, it can write to a server, and that gets written to other servers for replication purposes. If someone reads the data from the other server that has not been updated or the record has not been written, it will cause some issues.

DynamoDB provides two types of reads: strongly consistent and eventually consistent.
* Strongly consistent reads are the default setting, and they return the latest updated version of data.
* Strongly consistent reads are helpful when you need the most current version of the data, like in financial transactions or other critical operations.
* Eventually, consistent reads are not guaranteed to return the latest updated version of data but will provide the updated data eventually.
* Eventually consistent reads are more efficient and less expensive than strongly consistent reads.
* Eventually, consistent reads are useful when you can tolerate some delay in getting the latest updated data, like in caching or non-critical operations.
* Developers can choose the type of reading they want to perform based on their use case requirements.

50
Q

Exam Question

How to set a strongly consistent read in Dynamo DB?

A

Set ConsistentRead ” parameter to True in API calls ( GetItem , BatchGetItem , Query, Scan)

51
Q

Exam Question

how many RCU are consumed in is strongly consistent read?

A

1 RCU/second is used in a strongly consistent read, Which is twice the RCU used in eventually consistent read. There are 2 eventually consistent read per RCU/second

  • One Read Capacity Unit (RCU) represents one Strongly Consistent Read per second, or two Eventually Consistent Reads per second, for an item up to 4 KB in size
  • If the items are larger than 4 KB, more RCUs are consumed
52
Q

Exam Question

Read Capacity Units (RCU) Calculation?

A
  • One Read Capacity Unit (RCU) represents one Strongly Consistent Read per second, or two Eventually Consistent Reads per second, for an item up to 4 KB in size
  • If the items are larger than 4 KB, more RCUs are consumed
  • Example 1: 10 Strongly Consistent Reads per second, with item size 4 KB
    • We need 10 ∗ (4𝐾𝐵/4𝐾𝐵) = 10𝑅𝐶𝑈𝑠
  • Example 2: 16 Eventually Consistent Reads per second, with item size 12 KB
    • We need 162 ∗ (12𝐾𝐵/4𝐾𝐵) = 24𝑅𝐶𝑈𝑠
  • Example 3: 10 Strongly Consistent Reads per second, with item size 6 KB
    • We need 10 ∗ (8𝐾𝐵/4𝐾𝐵) = 20𝑅𝐶𝑈𝑠 (we must round up 6 KB to 8 KB)
53
Q

What is Partitioning in DynamoDB?

A
  • DynamoDB uses partitioning to store and retrieve data in a scalable and efficient manner.
  • Each partition has a fixed amount of RCU and WCU capacity assigned to it, which is determined by the size of the partition.
  • When an item is written to DynamoDB, it is hashed to determine which partition it belongs to.
    • If the partition already has available WCU, the write request is processed immediately.
    • If the partition has no available WCU, the write request is throttled.
  • Similarly, when an item is read from DynamoDB, it is hashed to determine which partition it belongs to.
    • If the partition has available RCU, the read request is processed immediately.
    • If the partition has no available RCU, the read request is throttled.
54
Q

How RCU and WCU are utilized in the partitions?

A
  • If you have 10 partitions in DynamoDB, and you provision 10 WCUs and 10 RCUs, they will be evenly spread across the partitions.
  • This means that each partition will receive one WCU and one RCU.
  • It is important to remember that WCUs and RCUs are divided and distributed evenly across partitions in DynamoDB.
55
Q

What is DynamoDB Throttling?

A

When the provisioned capacity of a DynamoDB table is exceeded, DynamoDB throttles read and write requests, leading to increased response times and potentially even errors.
1. If you exceed the RCUs or WCUs at the partition level, you will get a ProvisionedThroughputExceededException.
2. This can happen because of a hot key, which means one partition key is being read too many times from a specific partition.
3. Hot partitions or very large items can also cause high RCU or WCU consumption.
4. Solutions for tackling this issue include using Exponential Backoff when encountering the exception and distributing partition keys as much as possible.
5. To avoid RCU issues due to reading one data point heavily on a wire partition, you can use DynamoDB Accelerator or DAX.

56
Q

R/W Capacity Mode - on Demand

A
  • Read/writes automatically scale up/down with your workloads
  • No capacity planning needed (WCU / RCU)
  • Unlimited WCU & RCU, no throttle, more expensive
  • You’re charged for reads/writes that you use in terms of RRU and WRU
  • Read Request Units (RRU) throughput for reads (same as RCU)
  • Write Request Units (WRU) throughput for writes (same as WCU)
  • 2.5x more expensive than provisioned capacity (use with care)
  • Use cases: unknown workloads, unpredictable application traffic, …
57
Q

How much on-demand R/W capacity is expensive?

A

2.5x more expensive than provisioned capacity (use with care)

58
Q

What’s the DynamoDB API for right for writing a new record?

A

PutItem
* Creates a new item or fully replace an old item (same Primary Key)
* Consumes WCUs

59
Q

What’s the DynamoDB API for right for updating a record?

A

UpdateItem
* Edits an existing item’s attributes or adds a new item if it doesn’t exist
* Can be used to implement Atomic Counters a numeric attribute that’s unconditionally incremented

60
Q

What’s the DynamoDB API for right for a record based on a condition?

A

Conditional Writes
* Accept a write/update/delete only if conditions are met, otherwise returns an error
* Helps with concurrent access to items
* No performance impact

61
Q

What’s the DynamoDB API for reading a record?

A

GetItem to get one item
* Read based on a Primary key
* Primary Key can be HASH or HASH+RANGE
* Eventually Consistent Read (default)
* Option to use Strongly Consistent Reads (more RCU might take longer)
* ProjectionExpression can be specified to retrieve only certain attributes

62
Q

How to query data from DynamoDB?

A
  • Query based one reading data from a partition, returns items based on:
    • KeyConditionExpression
      * Partition Key value ( must be = operator ) required
      • Sort Key value (=, <, <=, >, >=, Between, Begins with) optional
    • FilterExpression
      • Additional filtering after the Query operation (before data returned to you)
      • Use only with non key attributes (does not allow HASH or RANGE attributes)
  • Returns:
    • The number of items specified in Limit
    • Or up to 1 MB of data, ability to do pagination on the results
    • Can query table, a Local Secondary Index, or a Global Secondary Index
63
Q

How to Read data (scan) data in DynamoDB?

A
  • Scan operation in AWS is used to read an entire table.
  • It is not efficient to filter data on the client side after scanning the entire table.
  • Scan operation returns up to one megabyte of data and requires pagination techniques to read more.
  • Scanning an entire table consumes a lot of RCU and may impact normal operations.
  • To limit the impact of a scan, a limit statement can be used to reduce the size of the result.
  • Parallel Scan can be used to scan multiple data segments at the same time, increasing the throughput and RCU consumed.
  • Limit queries and conditions can be used with Parallel Scan to further limit its impact.
  • Scans can be used with ProjectionExpression to retrieve only certain attributes and FilterExpression to change data on the client side.
64
Q

can you delete item or a record in DynamoDB?

A

yes, use DeleteItem.
it can delete an individual item and it has the ability to perform a conditional delete

65
Q

Can you delete an entire table in DynamDB?

A

yes use DeleteTable. It can delete a whole table and all its items. It is more efficient to delete records then calling delete item on multiple items

66
Q

What are DynamoDB Batch Operations?

A

DynamoDB Batch Operations allow you to perform multiple read or write operations in a single API call.

67
Q

What are the benefits of using Batch Operations?

A

Batch Operations reduce the number of API calls required to perform the same number of read or write operations, which can reduce network traffic and improve application performance.
if a batch operation fails, you will receive the failed items and you will have to it retry those items again.

68
Q

What are the two types of Batch Operations?

A

The two types of Batch Operations are BatchGetItem and BatchWriteItem.

69
Q

What is BatchGetItem used for?

A

BatchGetItem is used to retrieve multiple items from one or more tables using their primary keys.

70
Q

What is BatchWriteItem used for?

A

BatchWriteItem is used to put or delete items across one or more tables in a single API call.
* Up to 25 PutItem and/or DeleteItem in one call
* Up to 16 MB of data written, up to 400 KB of data per item
* Can’t update items (use UpdateItem)
* UnprocessedItems due to the lack of WCU (exponential backoff or add WCU)

71
Q

What is the maximum number of items that can be processed in a single BatchGetItem or BatchWriteItem request?

A

The maximum number of items that can be processed in a single BatchGetItem or BatchWriteItem request is 25.

72
Q

Can you get items in Batch?

A

Use BatchGetItem
* Return items from one or more tables
* Up to 100 items, up to 16 MB of data
* Items are retrieved in parallel to minimize latency
* UnprocessedKeys for failed read operations (exponential backoff or add RCU)

73
Q

What is PartiQL?

A

PartiQL is a SQL-compatible query language for Amazon DynamoDB and other NoSQL databases.

74
Q

What are the benefits of using PartiQL?

A

Benefits of using PartiQL include its ease of use for SQL users, its flexibility for working with multiple data sources, and its ability to handle complex queries.

75
Q

What are some common use cases for PartiQL?

A

Some common use cases for PartiQL include querying and manipulating data in DynamoDB and other NoSQL databases, migrating data between different databases, and integrating data from multiple sources.

76
Q

How does PartiQL differ from SQL?

A

PartiQL is similar to SQL, but has some differences in syntax and functionality to support working with NoSQL databases and multiple data sources.

77
Q

Can PartiQL be used with other databases besides DynamoDB?

A

Yes, PartiQL can be used with other NoSQL databases as well as relational databases through the use of specialized adapters.

78
Q

What is the syntax for a PartiQL query?

A

A PartiQL query begins with a SELECT statement, followed by the source of the data and any optional clauses such as WHERE, ORDER BY, and LIMIT.

79
Q

What is DynamoDB Accelerator (DAX)?

A

DAX is a fully managed in-memory cache for DynamoDB. It can improve the performance of DynamoDB by up to 10x. DAX works by caching frequently accessed data in memory. This allows applications to access the data more quickly, without having to make a round trip to DynamoDB.

To use DAX, you need to create a DAX cluster. A DAX cluster is made up of one or more cache nodes. The number of cache nodes in a cluster depends on the amount of data you need to cache and the expected load on your application.

Once you have created a DAX cluster, you need to configure your application to use it. To do this, you need to update your application’s connection string to point to the DAX cluster.

DAX is a secure service. It offers a variety of security features, including encryption at rest, IAM authentication, VPC security, and CloudTrail integration.

DAX can improve the performance of DynamoDB by up to 10x. This can lead to a better user experience for your application.

80
Q

Exam Question

What is the difference between DAX and Elastic Cache?

A

Both DAX and ElastiCache are in-memory caching services that can be used to improve the performance of DynamoDB. However, there are some key differences between the two services:

  • DAX is a fully managed service, while ElastiCache is a self-managed service. This means that DAX takes care of all the details of managing the cache, such as provisioning and scaling the cache nodes, while ElastiCache requires you to manage these tasks yourself.
  • DAX is optimized for DynamoDB, while ElastiCache can be used with a variety of data stores. This means that DAX can provide better performance for DynamoDB than ElastiCache.
  • DAX is more expensive than ElastiCache. This is because DAX is a fully managed service, while ElastiCache is a self-managed service.
81
Q

Exam Question

When use DynamoDB and DAX?

A

DynamoDB DAX and ElastiCache can be used in combination, and exams may test your ability to determine which is best for a given situation. DAX is useful for caching individual objects, queries, or scans, making it suitable for simple types of queries. On the other hand, if your application is performing more complex logic, such as scanning, summing, filtering, and more, you can store the results in Amazon ElastiCache to avoid computationally expensive operations.

By storing and retrieving data from ElastiCache instead of re-querying DAX and re-performing client-side aggregations, you can create a more efficient architecture by utilizing both services together.

82
Q

Primary Key Index in DynamoDB

A
  • Definition: A primary key uniquely identifies each item in a DynamoDB table and consists of a partition key and an optional sort key.
  • Partition Key (Hash Key): Determines the physical partition where the item is stored.
  • Sort Key (Range Key): Determines the sorting order of items with the same partition key.
  • Purpose: Ensures the efficient retrieval of items based on the specified primary key.
83
Q

Global Secondary Index (GSI) in DynamoDB

A

When a Global Secondary Index (GSI) is created on a DynamoDB table, a new index table is internally created and maintained by DynamoDB. Although it is not a separate user-facing table, the GSI behaves like a distinct table with its own partition key and sort key, as well as its own provisioned throughput settings for read and write capacity units.

The GSI is automatically updated by DynamoDB when items are added, updated, or deleted in the base table. This allows you to query the GSI using different attributes than the primary key of the base table, providing more flexibility in your queries.

It is important to note that since GSIs consume additional write capacity for index maintenance, managing a GSI may incur additional costs for your DynamoDB usage.

  • Purpose: Allows querying on alternate keys for greater flexibility and query performance.
  • Key Features:
    1. Can be created or deleted after the table is created.
    2. Supports eventual or strong consistency.
    3. Consumes additional write capacity for index maintenance.
    4. Maximum of 20 GSIs per table.
84
Q

Local Secondary Index (LSI) in DynamoDB

A
  • Definition: An LSI is created for an attribute that isn’t included in the primary key. This allows efficient querying of data based on both the primary key and the chosen attribute, providing an alternative sorting order within the same partition.
  • Purpose: Provides a different view of the data, allowing efficient queries with alternate sort keys within the same partition.
  • Key Features:
    1. Must be created when the table is created. Cannot be created afterwards
    2. Supports strong consistency.
    3. Shares provisioned throughput with the base table.
    4. Maximum of 5 LSIs per table.
85
Q

Exam Question

What happened if there is a Throttling on a table with Global Secondary Index?

A

if there’s a write throttling on a table that has global secondary index, the throttling happens on the main table and not the index table.

Even if WCU are on the main table the Throttling will still happen.

86
Q

Exam Question

What happened if there is a Throttling on a table with Local Secondary Index?

A

There is No secondary table in case of local secondary index. WCU&RCU of the main table are are utilized. There is no spatial throttling consideration in case of LSI.

87
Q

what’s the difference between DynamoDB Accelerator (DAX) and ElasticCache

A

Dynamo DB accelerator cash is the query, and it is in front of Dynamo DB. It supports dynamo DB API and does not require application changes. The elastic cache can be utilized to aggregate results.

88
Q

What is DynamoDB stream?

A

A DynamoDB stream is an ordered flow of information about changes to items in a DynamoDB table. When you enable a stream on a table, DynamoDB captures information about every modification to data items in the table.

Whenever an application creates, updates, or deletes items in the table, DynamoDB Streams writes a stream record with the primary key attributes of the items that were modified. A stream record contains information about a data modification to a single item in a DynamoDB table. You can configure the stream so that the stream records capture additional information, such as the “before” and “after” images of modified items.

when the changes are made to the dynamo DB, data about the changes can be sent either to DynamoDB streams or the Kinesis data stream. In the case of Kinesis, the stream can be sent to Kinesis data fire hose, and Data can be stored directly into rest shift amazon S3 or Amazon Open search.

89
Q

Exam Question

Can DynamoDB retroactively populate the records?

A

When you enable DynamoDB Stream, be aware that the records will not be retroactively populated in the stream after it is enabled. This is a point that may come up in exams. Once the stream is enabled, only then will it receive updates based on the changes occurring in your DynamoDB table.

90
Q

how DynamoDB Stream and Lambda works?

A

To understand how DynamoDB Streams and Lambda work together, we need to follow these steps:

  1. Define an Event Source Mapping to read from a DynamoDB Stream.
  2. Ensure the Lambda function has the necessary permissions to pull from the DynamoDB Stream.
  3. The Lambda function will be invoked synchronously.

For example, when a table goes into a DynamoDB Stream, the Lambda function will have an Event Source Mapping. This internal process pulls data from the DynamoDB Stream and retrieves records in batches. Once the Event Source Mapping receives some records, it will synchronously invoke the Lambda function with a batch of records from the DynamoDB Stream.

91
Q

What is dynamo DB global table?

A

Global tables replicate your DynamoDB tables automatically across your choice of AWS Regions. Global tables eliminate the difficult work of replicating data between Regions and resolving update conflicts, enabling you to focus on your application’s business logic.

The replication is active-active. Enabling DynamoDB Stream is required for Global tables.

92
Q

What is DynamoDB TTL (Time to live)?

A

Amazon DynamoDB Time to Live (TTL) allows you to define a per-item timestamp to determine when an item is no longer needed. Shortly after the date and time of the specified timestamp, DynamoDB deletes the item from your table without consuming any WCU. TTL is provided at no extra cost as a means to reduce stored data volumes by retaining only the items that remain current for your workload’s needs.
* Expired items are deleted from both LSIs and GSIs
* A delete operation for each expired item enters the DynamoDB Streams (can help recover expired items)

Web Session handling can be handled by TTL.

93
Q

can dynamo DB be exported to s3?

A

Yes, but point-in-time recovery must be enabled in DynamoDB. The export does not affect the reading capacity of your table in DynamoDB. Export data to S3 can be analyzed through Athena. ETL can be applied to S3 data before importing it back into Dynamo DB.
DynamoDB can be exported in JSON or ION format.

94
Q

Exam Question

When use DynamoDB and DAX?

A

DynamoDB DAX and ElastiCache can be used in combination, and exams may test your ability to determine which is best for a given situation. DAX is useful for caching individual objects, queries, or scans, making it suitable for simple types of queries. On the other hand, if your application is performing more complex logic, such as scanning, summing, filtering, and more, you can store the results in Amazon ElastiCache to avoid computationally expensive operations.

By storing and retrieving data from ElastiCache instead of re-querying DAX and re-performing client-side aggregations, you can create a more efficient architecture by utilizing both services together.

95
Q

Exam Question

How to store large data in DynamoDB?

A

Use S3 for Large Data Storage and DynamoDB for Metadata:

In this strategy, you store large data files such as images, videos, or other large documents in Amazon S3, while keeping the metadata in DynamoDB. Metadata can include information like the file’s name, date of creation, owner, and other related attributes.

Steps to implement this strategy:
* Upload the large data file to an S3 bucket.
* Generate a unique identifier for the file in S3 (e.g., its object key).
* Create a new item in the DynamoDB table with the unique identifier as the primary key and store the metadata associated with the file.

96
Q

DynamoDB Security

A

Security:
* VPC Endpoints available to access DynamoDB without internet
* Access fully controlled by IAM
* Encryption at rest using KMS
* Encryption in transit using SSL / TLS

97
Q

DynamoDB other features

A

Backup and Restore feature available
* Point in time restore like RDS
* No performance impact

  • Global Tables: Multi region, fully replicated, high performance
  • Amazon Database Migration Service (DMS) can be used to migrate to DynamoDB (from Mongo, Oracle, MySQL, S3, etc
  • You can launch a local DynamoDB on your computer for development purposes