Relational Database Service (RDS) Flashcards

Question

RDS - Read Replicas

Answer 1

Read Replicas are read only replicas of an RDS instance. Unlike MultiAZ, where you can't by default, use the StandBy replica for anything, you can use Read Replicas, but only for read operations. Now MultiAZ on Cluster mode, is like a combination of the old MultiAZ instance mode, together with Read Replicas. You have to think of Read Replicas as separate things. They aren't part of the main database instance in any way. They have their own database endpoint address and so applications need to be adjusted to use them. An application, say Wordpress, using RDS instance will have zero knowledge of any read replicas by default. Without application support, Read Replicas do nothing. They aren't functional from a usage perspective. There's no automatic failover. They just exist off to one side. -Now they're kept in sync using a synchronous replication, that means when data is written to the primary instance, at the same time, storing that data on disk on the primary, it's replicated to the standby. -With Asynchronous data is written to the primary first, at which point it's viewed as committed. Then after that, it's replicated to the read replicas. -For the exam, for any RDS questions (excluding Aurora), remember that synchronous = MultiAZ / Asynchronous = Read Replicas *** -Read Replicas can be created in the same region, as the primary instance, or in other AWS regions, known as "Cross-Region" Read Replicas *** -If you create a cross-region read replica, then AWS handles all of the networking between regionsand this occurs transparently to you and it's fully encrypted in transit ***

Answer 2

(Read) Performance and Scaling Improvements -You can create 5 direct read-replicas per DB instance -Each of these, provides an additional instance of read performance - this offers a simple way of scaling out, your read performance on a DB -Read-Replicas can have their own read replicas - BUT LAG STARTS TO BE A PROBLEM Because Asynchcronous replication is used, there can be a lag between the main database instance and any read reaplicas, and if you then create read replicas of read replicas, then this lag becomes more of a problem. -Can help you with Global performance improvements So if you have other read workloads in other AWS regions, then these workloads can directly connect to read replicas, and not impact the performance of the primary instance in any way. RPO/RTO Improvements -Snapshots and Backups improve RPOs The more frequent backups occur and the better backups are, this offers improved RPOs, because it limits, the amount of data which can be lost, but it doesn't really help us for recovery time objectives. -RTOs are a problem Because restoring snapshots takes a long time, especially for large databases. -RR's offer a near zero RPO Because the data, that's on the RR is synced from the main database instance. So there's very little potential for data loss assuming we're not dealing with data corruption. -RR's can be promoted quickly - low RTO So in a disaster scenario where you have a major problem with your RDS instance, you can promote a RR and this is a really quick process, but, you should only look at using RRs during disaster recovery scenarios, when you are recovering from a failure. -FAILURE ONLY - watch for data corruption If you are recovering from data corruption, then logically the read replica will probably have a replica of that corrupted data. -READ ONLY - UNTIL PROMOTED -Easy way to achive Global Availability improvements - Global Resilience Because you can create a cross-region RR, in another AWS region and use this as a failover region, If AWS ever have a major regional issue.

Answer 3

-SSL/TLS (in transit) is available for RDS, can be mandaroty - The data between the client and the RDS instance *** -RDS supports EBS volume encryption - KMS (at rest) *** -Handled by the RDS HOST/EBS -AWS or Customer Managed CMK generates DATA KEYS - KMS -Data Keys used to generate DEKs for encryption operations -Storage, Logs, Snapshots & Replicas are encrypted using the same CMK -Encryption can't be removed *** -RDS MS-SQL and RDS Oracle Support TDE *** -TDE = Transparent Data Encryption - This is encryption, which is supported and hadled within the database engine (data encrypted/decrypted) (less trust) -RDS Oracle supports ingegration with CloudHSM *** -Much stronger key controls (even from AWS) ↑↑↑ - Because CloudHSM is managed by you, with no key exposure to AWS

Answer 4

-With RDS Oracle - keys can be provided via CloudHSM - removing AWS from the chain of trust -TDE is native DB Engine encryption - Data is encrypted before leaving the instance, with AWS having no exposure outside of the RDS instance -KMS provides AWS or CMKs, which are used to generate DEKs for RDS These DEKs are loaded onto the RDS hosts as needed and are used by the Host, to perform the encryption or decryption operations. This means the database engine doesn't need to natively support encryption or decryption. It has no encryption awareness, from it's perspective, it's writing data as normal and its encrypted by the Host, before seding it on to EBS in it's final encrypted format. -Data that's transferred between replicas, is also encrypted as are any snapshots of the RDS EBS volumes. and these use the same encryption key (KMS)

Answer 5

-Normally, logins to RDS are controlled using local database users - these have their own usernames and passwords, they're not IAM users and are outside of the control of AWS. -One gets created when you provision an RDS instance, but that's it -You can configure RDS to allow IAM user authentication against a database -RDS Local DB Account is configured to use AWS Authentication Token We have IAM users and roles, in this case, an instance role, and attached to those roles and users are policies. These policies contain mapping between that IAM entity, so the user or role, and a local RDS database user. -Policy attached to Users and Roles maps that IAM identity onto the local RDS user *** This allows those idetities to run a "generate-db-auth-token" operation, which works with RDS and IAM, and based on the policies attached to the IAM identities, it generates a token with a 15min validity. This token can then be used to log in to the database user within RDS, without requiring a password. -Authorization is controlled by the DB Engine - Permissions are assigned to the local DB User. IAM IS NOT USED TO AUTHORISE, only for authentication. ***

Answer 6

-Fills the gap between RDS and EC2 running a DB Engine -RDS is a fully managed database server - OS/Engine access is limited -It gives you access to databases running on a database server, which is fully managed by AWS and so any OS or engine access is limited, using the main RDS product -DB on EC2 is self managed - but has overhead, because is done on this way, you're responsable for everything from the O.S upwards -Currently works for MS-SQL and Oracle -Can connect using SSH, RDP, Session Manager and actually get access to the O.S and Database Engine -Runs within your AWS account - Unlike normal RDS, then if you look in your account, you won't see any EC2 instance or EBS volumes or any backups within S3. That's because they're all occurring within an AWS-managed environment With RDS, the networking works by injecting elastic network interfaces (ENIs) into your VPC. That's how you get access to the RDS instance. With RDS Custom, you will see an EC2 instance, EBS volumes, and Backups inside your AWS account. - RDS Custom Database Automation - If you need to perform customization of RDS Custom settings -To ensure that you have no disruptions, caused by the RDS automation, while you're performing customizations, you need to PAUSE database automation, perform customizations and then resume automation. -Resume (full automation) for normal full automation

Answer 7

-Very different from RDS -User a "Cluster" - A Cluster is made up of a number of important things: -A single primary instance + 0 or more replicas The replicas within Aurora can be used fo read during normal operations. So it's not like the standby replica inside RDS. The replicas inside Aurora can actually provide the benefits of both RDS MultiAZ and RDS read replicas. So they can be inside a Cluster and they can be used to improve availability, but also they can be used for read operations, during the normal operation of a cluster. --You don't have to choose between read scaling and availability *** -NO LOCAL STORAGE - uses Cluster Volume Instead, an Aurora Cluster has a shared cluster volume. This is storage which is shared and available to all compute instances within a cluster. This provides a frew benefits such as... -Faster provisioning & Improved availability & Performance

Answer 8

-It functions across a number of AZs -Inside the cluster is a primary instance and optionally a number of replicas, they function as failover options, if the primary instance fails, but they can also be used during normal functioningof the cluster for read operations, from applications -The cluster has shared storage which is SSD-based, and it has a maximum size of 128TB and it also has 6 replicas, across multiple AZs. -When data is written to the primary DB instance, Aurora synchronously replicates that data across all of the 6 storage nodes spread the AZs, which are associated with your Cluster -All instances inside your Cluster, so the primary and all of the replicas, have access to all of these storage nodes -Replication happens at the storage level *** So no extra resources are consumed on the instances or the replicas during this replication process. -By default, the primary instance is the ONLY instance able to WRITE to the storage and the replicas and the primary can perform READ operations. -Aurora automatically detects failures in the disk volumes that make up the cluster shared storage *** -When a segment or a part of a disk volume fails, Aurora inmediately repairs that area of disk Wen Aurora does this, it uses the data inside the other storage nodes that make up the cluster volume and they automatically recreates that data. Ensures that the data is brought back into an operational state with no corruption. -Aurora avoids data loss and it reduces any need to perform pointing time restores or snapshot restores to recover from disk failures *** -MUCH MORE RESILIENT THAN RDS *** -With Aurora you can have up to 15 replicas, and any of them can be the fail over target for a fail over operation (Quicker than RDS)***

Answer 9

-ALL SSD-based - high IOPS, low latency -Storage is simply based on what's used - Because you don't have to allocate the storage that the Cluster uses -High water mark - Billed for the most used (Is being changed by AWS) So if you consume 50GB of storage, you are billed for 5GB. If you free up 10GB of data, so move down to 40GB of consumed data, you're still billed for that "high water mark" of 50GB. -Storage which is freed up can be re-used -If you want to reduce costs on storage, then you need to create a brand new cluster and migrate the data -Replicas can be addded and removed without requiring storage provisioning - because the storage is for the Cluster not for the instances Which massively improves the speed and efficiency of any replica changes within the Cluster. -Aurora Cluster like RDS clusters use an Endpoint - These are DNS addresses, which are used to connect to the Cluster Unline RDS, Aurora Clusters have multiple Endpoints that are available for an application. As a minimum you have the Cluster Endpoint and the Reader Endpoint. -The Cluster Enpoints always points at the primary instance, for write/read operations -The Reader Enpoints will point at the primary instance, if that's all there is, but if there are replicas, then the reader endpoint will load balance across all of the available replicas and this can be used for read operations. Much easier to manage read scaling using Aurora versus RDS, because as you add additional replicas, which can be used for reads, this Reader Endpoint is automatically updated to load balance across these new replicas. -You can Custom Endpoints -In addition to that, each instance (Primary and Replicas), have their own unique endpoint So Aurora allows for a much more custom and complex architecture versus RDS

Answer 10

-No free-tier option -Aurora doesn't support Micro Instances -Beyond RDS singleAZ (micro) - Aurora offers better value -Compute - hourly charge, per second, 10 minute minimum -Storage - GB-Month consumed, IO cost per request -100% DB Size in backups are are included So if your database cluster is 100GIB, then you're given 100GIB of storage for backups as part of what you pay for that cluster.

Answer 11

-Backups in Aurora work in the same way as RDS - So for normal backup features, for automatic backups, for manual snapshots backups -Restores create a new cluster -Backtrack can be used which allow IN-PLACE REWINDS to a previous point in time Needs to be enabled in a per cluster basis, and it will allow you to rollback your database. -Fast Clones make a new database MUCH faster than copying all the data - COPY-ON-WRITE It doesn't make a one-for-one copy of the storage for that database. What it does is it references the original storage, and it only stores any differences between those two. Differences can be either you update the storage in your cloned database, or it can also be that the data is updated in the original database, which means that your clone needs a copy of that data before it was changed on the source. Essentially your cloned database, only uses a tiny amount of storage. It only stores data that's changed in the clone or changed in the original after you make the clone.

Answer 12

Is a service, which is to Aurora what Fargate is to ECS. It provides a version of the Aurora database product, where you don't need to statically provision database instances of a certain size, or worry about managing those database instances. -It removes one more piece of admin overhead of managing individual database instances. Concepts -Scalable - ACU - Aurora Capacity Units (You still create a cluster) Capacity Units represent a certain amount of compute, and a corresponding amound of memory. -For a cluster, you can set minimum and maximum values and Aurora Serverless will scale between those values, adding or removing capacity based on load -Cluster adjusts based on load -Can go to 0 and be paused -Consumption billing per-second basis -Same resilience as Aurora (6 copies across AZs) Benefits -Removes the complexity of managing database instances and capacity -Easier to scale with no disruption client connections -Cost-effective - you only pay for the database resources that you consume on a per-second basis

Answer 13

-The Aurora Serverless Cluster, has the same cluster volume architecture, which Aurora provisioned uses. -In an Aurora Serverless Cluster though, instead of using provisioned servers, we have ACUs, which are Aurora Capacity Units. -These capacity units are actually allocated form a warm pool of Aurora Capacity Units, which are managed by AWS. -The ACUs are stateless, they're shred across many AWS customers and they have local storage. So they can be allocated to your Aurora Serverless Cluster, rapidly when required -When these ACUs are allocated to the cluster, they have access to the cluster storage, in the same way that a provisioned Aurora instance would have access to the storage. -If the load on an Aurora Serverless Cluster increases beyond the capacity units which are being used and assuming the maximum capacity setting of the cluster allows it, then more ACUs will be allocated to the cluster -Once the compute resource which represents this new potentially bigger ACU is active, then any old compute resources representing unused capacity, can be deallocated from your Aurora Serverless Cluster -Because of the ACU architecture, because the number of ACUs are dynamically increased and decreased based on load, that way that connections are managed within an Aurora Serverless Cluster has to be slightly more complex versus a provisioned cluster In an Aurora Serverless Cluster, we have a shared proxy fleet, which is managed by AWS. This happens transparently to you as a user of an Aurora Serverless Cluster, but if a user interacts with the cluster via an application, it actually goes via this proxy fleet. Any of the proxy fleet instances can be used, and they will broker a connection between the application and the Aurora Capacity Units. Because the client application is never directly connecting it to the compute resource that provides an ACU, it means that the scaling can be fluid, and it can scale in or out without causing any disruptions.

Answer 14

-Infrequently used applications - maybe a low volume blog site, where connections are only attempted for a few minutes, several times per day. -New applications - If you're deploying an application where you are unsure about the levels of load, that will be placed on the application. So unsure about the size of the database instance that you'll need, with Aurora provisioned you would still need to provision that in advance and potentially change it. -Variable workloads - If you're running a normally likely used application, which has peaks may be 30min out of an hour, or on certain days of the week during sale periods. -Unpredictble workloads -Develpoment and Test databases - Aurora Serverless can be configured to pause itself during periods of no load, and during the database pause, you're only billed for the storage. -Multi-tenant applications - If you've got an application where you're billing a user a set dollar amount per month per license to that application. If you're incoming load is directly aligned to your incoming revenue. -You don't mind if a database supporting your product, scales up and cost you more , if you also get more customer revenue

Answer 15

Global databases allow you to create, global level replication using Aurora from a master region to up to 5 secondary AWS regions. -Primary Region offers similar functionality to a normal Aurora Cluster. It has one read and write instance nad up to 15 read-only replicas in that cluster. -Secondary Region can have up to 16 read only replicas. The entire secondary cluster is read-only -The replication between those regions, occurs at the storage layer and replication is typically within 1second from the primary to all of the secondaries. -Applications can use the primary instance for write operations and then the replicas and primery or the replicas in the secondary regions for read operations. Use Cases -Cross-Region DR and BC (Disaster Recovery and Business Continuity) - Because of the 1sec replication, it makes sure the RPO and RTO values are really low, if do you perform a cross-region fail over -Global Read Scaling - If you want to offer low latency performance to any internatianal areas, where you have customers. - ~1 or less replication between regions - It's a one-way replication -Replication has NO IMPACT on DB performance -Secondary regions can have up to 16 replicas -All of these can be promoted to R/W -Currently MAX 5 secondary regions

Answer 16

This feature allows an Aurora Cluster to have multiple instances , which are capable of performaing both reads and writes. This is in contrast with default mode for Aurora, which only allows one writer and many readers. -Default Aurora mode is Single-Master - One R/W and 0+ Read Only Replicas -Cluster endpoint is used to write, read endpoint is used for load balanced reads -Failover takes time - replica promoted to R/W (Single-Master mode) -In Multi-Master mode all instances are R/W

Answer 17

-A Multi-Master Aurora Cluster might seem similar to a single-master one. The same cluster structure exists, the same shared storage. -Multiple Aurora provisioned instances also exist in the cluster. -The differences start with the fast that there is no Cluster Endpoint to use. An application is responsible for connecting to instances within the cluster. There's no load balancing across instances with a multi-master cluster. The application connects to one or all of the instances in the cluster and initiates operations directly. -The way that this architecture works, is that when one of the read write nodes, inside a multi-master cluster, receives a write request from the application, it inmediately proposes that data be commited to all of the storage nodes in that cluster. So it's proposing that the data that it receives to write is committed to storage. -At this point, each node that makes up a cluster, either confirms or rejects the proposed change It rejects it, if it conflicts with something that's already in flight. For example, another change from another application writing to another read-write instance inside the cluster. What the writing instance is looking for, is a quorum of nodes to agree. A quorum of nodes that allow it to write that data. At which point, it can commit the change to the shared storage. Assuming that it can get a quorum to agree to write, then that write is committed to storage and it's replicated across every storage node in the cluster. -With a Multi-Master Cluster, that change is then replicated to other nodes in the cluster. This means those other writers can add the updated data into that in memory caches. This means that any reads, from any other instances in the cluster, will be consistent with the data that's stored on the shared storage. Because instances cache data, we need to make sure in addition to committing it to disk, it's also updated inside any in-memory caches of any other instances within the Multi-Master Cluster. -------- Once the instance on the right, has got agreement to be able to commit that change to the cluster shared storage, it replicates that change to the instance on the left. The instance on the left updates it's in memory cache, and then if that instance is used for any read operations, it's always got access to the up-to-date data.

Answer 18

Single-Master -The configuration change to make one of the other replicas, the new primary instance inside the cluster, is not an immediate change - It causes disruption -NonFault-Tolerant Multi-Master -With Multi-Master, both instances are able to write to the shared storage -The application can connect with one or both of them -The application could maintain connection to both and be ready to act if one of them fails -When the writer fails, it could immediately just send 100% of any future data operations to the writer, which is working perfectly -There would be little, if any disruption -Fault-Tolerant Benefits -Offers much faster availability -The fail over events can be performed inside the application, and it doesn't even disrupt traffic, between the application and the database -Implement fault-tolerance, but the application logic needs to manually load balance across the instance

Answer 19

Is a fully managed, high available database proxy for RDS that makes applications more scalable, more resilient to database failures, and more secure. Why use it? -Opening and Closing connections consume resources -...It takes time which creates latency -With serverless.. every lambda opens and closes -Handling failure of Database instances is HARD - Doing it within your applicationn ADDS RISK -DB Proxies can help, but maybe you don't have any database proxy experience, and even if you do, can you manage them at scale? That's where RDS Proxy adds value. RDS Proxy does or indeed any database proxy is change your architecture. Instead of your application connecting to a database every time they use it, instead they connect to a proxy. -Application(s) > Proxy (connection pooling) => Database The proxy maintains a pool of connections to the database, which are open for the long term. Then any connections to the proxy can use this already established pool of database connections. It can also do multiplexing, where it can maintain a smaller number of connections to a database versus the connections to the proxy. A mutiplex requests over the connection pool, between the proxy and the database. So you can have a smaller number of actual connections to the database, versus the connections to the database proxy.

Answer 20

-It runs within a VPC, across all AZs. -Maintains a Long Term Connection Pool *** - In this case to the primary node of the database running in AZB. -The EC2 instances and Lambda functions, connect to the RDS proxy rather than directly to the database instances - MUCH quicker to establish vs direct to database *** -RDS Proxy connections to the DB Instance, can be reused... avoiding the lag of establishment, usage & termination for each invocation - Multiplexing is USED ***, so that a smaller number of database connections can be used for a larger number of client connections and this helps reduce the load placed on the database server. -RDS Proxy abstract client away from DB failure or failover events *** -Clients to Proxy connection is established and WAITS even if the Target DB is unresponsive - Might occur during failover events from the primary to the standby

Answer 21

-Too many connection errors -DB instances using T2/T3 (smaller/burst) instances -Useful when using AWS Lambda - time saved/connection reuse & IAM Auth -Long running connections (SAAS apps) - Low latency -Where resilience to database failure is priority -... RDS Proxy can reduce the time for failover -... and make it transparent to the application

Answer 22

-Fully Managed DB Proxy for RDS/Aurora -It's auto scaling, highly available by default -Provides connection pooling - reduces DB Load --We don't have to constant opening and closing of DB connections --We can Multiplex, to use a lower number of connections between the Proxy and the DB, relative to the number of connections between the client and the Proxy -ONLY accessible from a VPC -Accessed via Proxy Endpoint - no app changes -Can enforce SSL/TLS - ensure security of your applications -Can reduce failover time by over 60% -Abstracts failure away from your applications

Answer 23

-A managed database migration service -Runs using a replication instance -You need to define a Source and Destination endpoints, which point at the source and target databases -One of the Endpoints MUST be on AWS -No downtime migration Architecture -Common DB Support, MySQL, Aurora, MS-SQL, MariaDB, MongoDB, PostgreSQL, Oracle, Azure SQL... -Replication Instance = Is an EC2 instance, which sits in between the SRC and DST and uses a migration software -On this instance, you can define replication tasks, and each of these replication instances can run multiple replication tasks. Tasks define all of the options relating to the migration. -Replication instance performs the migration between SRC and DST endpoints which store connection information for SRC and Target databases -A task moves data from the SRC database, using the details in the source endpoint, to the target database using the details stored in the destination endpoint configuration -A job can be Full Load (one off migration of all data), Full Load + CDC (change data capture) for ongoing replication which captures changes or CDC Only (Only migrates data changes) -Schema Conversion Tool (SCT) can assist with Schema Conversion

Answer 24

-SCT is used when converting one database engine to another *** --On-premises MS-SQL -> RDS MySQL --On-premises Oracle -> Aurora -Another way of moving data between on-premises and AWS - DB -> S3 (Migrations using DMS) -SCT is NOT USED when migrating between DB's of the same type - On-premises MySQL > RDS mySQL -Works with OLTP DB Types (MySQL,MS-SQL, Oracle) -And OLAP (Teradata, Oracle, Vertica, Greenplum)

Answer 25

-Larger migrations might be multi-TB in size -It's not optimal to move data over networks, because it takes time and consumes capacity -Step 1: Use SCT to extract data locally and move to a snowball device -Step 2: Ship the device back to AWS. They load onto an S3 bucket -Step 3: DMS migrates from S3 into a target store -Step 3: Change Data Capture (CDC) can capture changes, and via S3 intermediary they are also written to the target database