Relational Database Service (RDS) Flashcards

1
Q

Database Refresher

A

Databases are systems which store and manage data, but there are number of different types of database systems and crucial differences, between how data is physically stored on disk and how it’s managed on disk and in memory, as well as how the systems retrieve data and present it to the user.

Databases systems are very broadly split into relational and non-relational.

Relational (SQL) (RDBMSs)

-Structured Query Language (SQL)

SQL = Is a language which is used to store, update, and retrieve data. It’s known as the structured query language and it’s a feature of most relational database platform.

-Structure in & between tables of data - Rigid Schema (Structure of the database)

Rigid Schema = Means it’s defined in advanced before you put any data into the system.
Schema = Defines the names of things, valid values of things, and the types of data which are stored and where.

-Fixed Relationships between tables

Non-Relational (NoSQL)

-NoSQL - Is everything that doesn’t fit into the sequel mold - different models

-Generally a much more relaxed Schema

-Relationships between tables are handled different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Relational Data Example

A

-Data which relates together are stored within a table

-Every row in the table has to be uniquely identifiable (Primary Key > PK)

-The PK is unique in the table and every row of that table has to have a unique value (ID) for this attribute

-A Joint Table, makes it easy to have many to many relationships and it has a “Composite Key”, which is a key form of two parts. (they have to be unique)

-The keys in different tables, are how the relationships between the tables are defined

-Table Schemas and Relationships defined in Advance

The fact that this schema is so fixed and has to be declared in advance, makes it difficult for a sequel or a relational system, to store any data which has rapidly changing relationships. (Social Network)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Key-Value (NoSQL)

A

Key value databases consist of sets of keys and values. There’s generally no concept of structure, it’s just list of keys and value pairs.

Date/Time&raquo_space;> Key Value «< Cookies eaten from the feeder during the previous 60min
2020-03-18 13:00 15
2020-03-18 14:00 30
2020-03-18 15:00 0

-As long as every single key is unique, then the value doesn’t matter, it has no real schema, nor does it have any real structure, because there are no tables or table relationships

-Some key value databases allow you to create separate lists of keys and values and present them as tables. But they’re only really used to divide data, there are no links between them

-Scalable - because sections of this data, could be split onto different servers

-Adjust really fast

-Only the key matters

-Also used for in-memory caching

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Wide Column Store (NoSQL)

A

-Each row or item has on or more keys, generally one of them is called the partition key, and then optionally you can have additional keys.

Partition Key > | KEY1 | | KEY2 | < Range key

-DynamoDB is an example of this type of database

-Every item in a table has to have the same key layout, so that’s one key or more keys, and they just need to be unique to that table

-Wide column stores offer groupings of items called tables (not the same types of tables as relational db)

-Every item in a table can also have attributes, but they don’t have to be the same between items, so mix and matching attributes on different items or..

-No attribute schema - Any/All/Schema

-Every item inside a table has to use the same key structure and it has to have a unique key

-It’s very fast
-Super scalable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Document (NoSQL)

A

Designed to store and query data as documents. Documents are generally formatted using a structure, such as JSON or XML, but often the structure can be different, between documents in the same database.

-Each document is interacted with via an ID, that’s unique to the document

-The value of the document’s content, is exposed to the database, allowing you to interact with it

-Ideal Scenarios: Interacting with whole documents or deep (nested data) attribute interactions within a document structure

-The document model works well with use cases, such as Catalogs, User profiles, and lots of different content management systems, where each document is unique but it changes over time.

-Provide flexible indexing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Column (NoSQL)

A

Row Store (MySQL)

-Databases where you interact with data based on rows
-Rows are stored on disk together
-Ideal if you are operating with rows adding, updating, deleting
-Online Transaction Processing (OLTP) = For systems which are performing transactions

Column Store (Redshift)

-Databases where you interact with data based on columns
-It’s grouped together on disk based on column - so every order value is stored together, all grouped by the column that the data is in
-Ideal for reporting or when all values for a specific attribute (size) are required
-Since the whole column is stored on disk together, you could perform a query to retrieve all products sold during a period

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Graph (NoSQL)

A

Relationships between things are formally defined and stored in the database itself, along with the data.

-Great for relationship driven data (Social Media or HR systems)
-There are nodes, they can have properties, which are simple key value pairs of data, and these are attached to the nodes.
-There are relationships between the nodes, which are known as “edges”
-Relationships themselves can have attached data, so name, value pairs/
-Can store a massive amount of complex relationships between data or between nodes, inside a database
-A query would run much quicker than with a SQL database

-Social media or systems with complex relationships = Graph Database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

ACID vs BASE

A

-ACID and BASE are DB transaction models - This governs how the database system itself, is architected

-CAP Theorem - Consistency, Availability, Partition Tolertant (resilience) - Choose 2

Consistency = Means that every read to a database, will receive the most recent write or it will get an error
Availability = Means that every request, will receive an non error response, but without the guarantee, that it contains the most recent write
Partition Toletant = Means that the system can be made of multple network partitions, and the system continous to operate, even if there are a number of dropped messages or errors, between these network nodes.

Cap Theorem states that any database product is only capable of delivering a maximum of two of these different factors. Imagine if communication fails, between some of the nodes or if any of the nodes fail.

You have two choices if somebody reads from that database: You can cancel the operation and thus decrease the availability but ensure the consistency, or you can proceed with the operation and improve the availability, but risk the consistency.

-ACID = Consistency

-BASE = Availability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

ACID

A

ACID means that transactions are atomic, consistent, isolated and durable.

-If you see ACID mentioned, it’s probably referring to RDS ***

-ACID limits the ability of a database to scale ***

-Atomic = ALL or NO components of a transaction SUCCEEDS or FAILS

-Consistent = Transactions move the database, from one valid state to another - nothing in-between is allowed

-Isolated = If multiple transactions occur at once, they don’t interfere with each other - Each executes as if it’s only one

-Durable = Once a transaction has been committed, it will remain committed, even in the case of a system failure. Once succeeded, that data is stored somewhere that system failure or power failure, or the restart of a database server or node, won’t impact the data

-Most relational database platforms, use ACID-based transactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

BASE

A

Stands for Basically Available, Soft State, Eventually Consistent

-BASE modeled NoSQL databases, will ensure availability of data, by spreading and replicating the data, across all of the different nodes of that database ***

-Basically Available = READ and WRITE operations are available “as much as possible”, but without any consistency guarantees - “kinda” “maybe”

-Soft State = The database doesn’t enforce consistency, this is offloaded onto the application/user

-Eventually Consistent = If we wait long enough, reads from the system wil be consistent - doesn’t enforce inmediate consistency

-Highly scalable and can deliver high performance

-DynamoDB is an example of BASE-like way - It offers both eventually and immediately consistent reads ***

-DynamoDB also offers some additional features, which offer ACID functionality, such as DynamoDB transactions ***

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Databases on EC2 - Why you might do it

A

-Access to the DB Instance OS

-Advanced DB Option Tuning (DBROOT)

-If the Vendor demands this level of access

-You want to run a DB or DB Version that AWS doesn’t provide

-Specific OS/DB Combination that AWS doesn’t provide

-Implement an architecture that AWS doesn’t provide (replication/resilience)

-Decision makers who “just want it”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Databases on EC2 - Why you shouldn’t do it

A

-Admin over head - managing EC2 and DBHost

Both of these require significant management effort, don’t underestimate the effort required, to keep an EC2 instance patched or keep a database host running, at a certain compatible level with your application. Whenever you perform upgrades or whenever you’re fault finding you need to do it, out of core usage hours which could mean additional time, stress and costs for staff to maintain both of these components.

-Backups / DR Management

So if you business has any disaster recovery planning running databases on EC2, adds a lot of additional complexity. Many of AWS has managed database products, include a lot of automation to remove a lot of this admin overhead.

-EC2 is running in a single AZ

If that zone fails access to the database could fail and you need to worry about taking EBS Snapshots or taking backups of the database, inside the database server and putting those on storage somewhere, maybe S3.

-Features - some of AWS DB products are amazing

-EC2 is ON or OFF - no serverless, no easy scaling

There are some AWS managed DB products, which can scale up or down, rapidly based on load. By running a DB product on EC2, you do limit you ability to scale and you do set a base minimum cost of whatever the hourly rate is for that particular size of EC2 instance.

-Replication - skills, setup time, monitoring & effectiveness

All of this tends to be handled by a lot of AWS’s managed DB products.

-Performance - AWS invest time into optimization & features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Relational Database Service (RDS)

A

-Database as a Service (DaaS) (NOT THE CASE)

DaaS is where you pay money and in return you get a database. This isn’t what RDS does, with RDS, you pay for and receive a database server, so it would be more accurate to call it…

-Database Server as a Service (DBSaaS)

It means that on this database server, or instance, which RDS provides, you can have multiple databases.

-Multiple databases on one DB Server (instance)

-RDS provides a managed version of a database server, that you might have on-premises, only with RDS, you don’t have to manage the hardware, the O.S or the installation, as well as much of the maintenance of the DB engine

-Choices of DB Engines (MySQL, MariaDB, PostgreSQL, Oracle, Microsoft SQL Server)

-Amazon Aurora is a different product

Aurora is a custom database engine and product, created by AWS, which has compatibility with some of the above engines, but it was designed entirely by AWS.

-Is a Managed service - NO ACCESS to O.S or SSH access*

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

RDS - Architecture

A

-RDS is a service that runs inside a VPC (NOT PUBLIC)

-It needs to operate in subnets within a VPC in a specific AWS Region

-RDS Subnet Group - Is something you create and is a list of subnets, which RDS can use for a given database instance or instances ***

-It picks at random, unless you indicate a specific preference, but it will put the primary and standby within different AZs

-RDS can be accessed from the VPC or any connected private networks (VPN or Direct Connect)

-RDS can be configured with public addressing allowing access from the public internet (in public subnets)

-If you want to split databases between different sets of subnets, then you need multiple DB subnet groups.

-RDS instances can have multiple databases on them ***

-Every RDS instance has it’s own dedicated storage, provided by EBS ***

-Primary instances replicate to the Standbys using Synchronous Replication ***

This means data is replicated to the standby, as soon as it’s received by the primary. The standby will have the same set of data as the primary.

-Read Replicas use Asynchronous Replication ***

They can be in the same Region, but also other AWS Regions. This can be used to scale read load or to add layers of resilience, if you ever need to recover in a different AWS Region.

-Backups & Snapshots to S3 ***

It’s to an AWS managed S3 Bucket, so you don’t see the Bucket within you account, but it does mean that data is replicated across multiple AZs in that Region. So if you have an AZ failure, backups will ensure that you data is safe.

If you use Multi-AZ mode, then backups occur from the standby instance, which means no negative performance impact **

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

RDS - Costs

A

Cost #1 - Instance Size & Type

Cost #2 - Multi-AZ or not (means more than one instance)

Cost #3 - Storage type & amount - per gig monthly fee (Storage based on EBS)

Cost #4 - Data Transferred - Cost per gig of data transfer, in & out of you DB instance, from or to the Internet and other AWS Regions

Cost #5 - Backups & Snapshots - You get the amount of storage that you pay for, for the DB instance in Snapshot storage for free

So if you have 2TB of storage, then that means 2TB of Snapshots for free. Beyond that, there is a cost and this cost is gig per month of storage. 1TB for one month is the same cost as 500GB for two months, so it’s a per GB month cost.

Cost #6 - Licensing (if applicable) - Based on using commercial DB engine types

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

RDS MultiAZ Instance Deployment (Availability)

A

RDS has a primary database instance, containing any databases that you create, and when you enable Multi-AZ mode, this primary instance is configured to replicate it’s data synchronously to a standby replica, which is running in another AZ. ***

-This mean that this standby, also has a copy of your databases.

In Multi-AZ instance mode, this replication is at the storage level. The exact method that RDS uses to do this replication depends on the database engine that you pick.

-MariaDB, MySQL, Oracle and PostgreSQL, use Amazon Failover Technology
-Microsoft SQL instances, use SQL Server database mirroring or AlwaysOn availability groups

-All accesses to the database, are via the database CNAME

This is a DNS name, which by default points at the primary database instance.

-With Multi-AZ instance architecture, you always access the primary database instance

-There’s no access to the standby, even for things like reads.

It’s job, it’s to simply sit there, until you have a failure scenario with the primary instance.

-Backups can occur from the standby, so data is moved to S3 and then replicated across multiple AZs, in that Region. ***

-Reads and Writes will occur to and form the primary instance ***

-In the event that anything happens to the primary instance, this will be detected by RDS and a failover will occur ***

Can be done manually or if you need to perform maintenance, but generally this will be an automatic process.

-In this scenario, the database CNAME changes, instead of pointing at the primary, it points at the standby, which becomes the new primary ***

Because this is a DNS change, it generally takes between 60 to 120 seconds, for this to occur, so there can be brief outages. This can be reduced by removing any DNS caching in your application for this specific DNS name.

If you do remove this caching, it means that the second RDS has finished the failover, and the DNS name has been updated, your application will use this name, which is now pointing at the new primary instance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

RDS - MultiAZ - Instance - Summarize

A

-Focuses on Availability

-Data is written to the Primary AND IMMEDIATELY Replicated to the StandBy before being viewed as Committed (Synchronous)

-MultiAZ does not come on the Free Tier - Extra cost for replica

-You ONLY have ONE StandBy replica ***

-The StandBy replica CAN’T BE USED for reads or writes

-StandBy’s job is to sit there and wait for a failover event

-Failover event can take from 60s to 120s

-Same Region ONLY - Different AZs in the same region

-Backups can be taken from StandBy to improve performance

-Failovers will occur for various different reasons: AZ Outage, Primary Failure, Manual Failure, Instance type change and Software Patching

-You can use failovers to move any consumers of you database onto a different instance, patch the instance which has no consumers and then flip it back

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

RDS MultiAZ Cluster Deployment

A

-RDS is capable of having one writer replicate to two reader instances ONLY, in different AZs

-Synchronous Replication to readers ***

-The difference between this mode and the instance mode, is that these readers are usable

-The writer is like the primary instance, and it can be used for writes and read operations

-The reader instances, unlike MultiAZ instance mode, these can be utilizedwhile they’re in this state, but ONLY for read operations

This will need application support, since you application needs to understand, that it can’t use the same instance for reads and writes, but it means that you can use this MultiAZ mode, to scale your read workloads, unlike MultiAZ instance mode.

-In terms of replication, data is sent to the writer and it’s viewed as being committed, when at least one of the readers confirms, that it’s been written ***

-At this point, it’s resilient across multiple AZs within that region

-In RDS MultiAZ mode, each instance still have it’s own local storage

You access the cluster, using with few endpoints types:

1st. The Cluster Endpoint (like the database CNAME) - Points at the writer. Used for reads, writes and administration ***

2nd. Reader Endpoint - Directs any reads at an available reader instance (In some cases to the writer instance)***

Applications can use the reader endpoint to balance their read operations, across readers within the cluster.

3rd. Instance Endpoint - Point at a specific instance. Generally these are used for testing/fault finding ***

Each instance in the cluster, get’s one of these. Generally it’s not recommended to use them directly, as it means any operations won’t be able to tolerate, the failure of an instance because they don’t switch over to anything, if there’s an instance failure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

RDS - MultiAZ - Cluster - Summarize

A

-1 Writer and 2 Reader DB instances (different AZs) (Higher level availability than Instance mode)

-Runs on much faster hardware, Graviton + local NVME SSD Storage

-Fast writes to local storage and then flushed through EBS

This gives you the benefit of local super fast storage, in addition to the availability and resilience benefits of EBS.

-Readers can be used for reads - allowing some read scaling

So if your applications support it, it means that you can set read operations to use the reader endpoint, which frees up capacity on the writer instance and allows your RDS implementation toscale to high levels of performance, versus any other mode of RDS.

-Replication is via transaction logs - more efficient (This also allows a faster failover)

-Failover is faster ~35s + any time required to apply transaction logs to the reader instances

-Writes are viewed as “committed” when 1 reader has confirmed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

RDS - Backup & Restore

A

-Within RDS, there are two types of backup-like functionality: Automated Backups & Snapshots

-Both of these are stored in S3, but they use AWS-managed buckets, so they won’t be visible to you, within your AWS console.

-You can see backups in the RDS console, but you can’t move to S3 and see any form of RDS bucket, which exist for backups ***

-The benefits of using S3, is that any data contained in backups is now Regionally Resilient, because it’s stored in S3, which replicates data across multiple AWS AZs, within that region.

-RDS backups, when they do occur are taken in most cases, from the StandBy instance, if you have MultiAZ enabled

So while they do cause an I/O pause, this occurs from the StandBy instance, and so there won’t be any application performance issues

-If you don’t use MultiAZ, the backups are taken from the only available instance

So you may have pauses in performance.

21
Q

RDS - Snapshots

A

-They aren’t automatic

They’re things that you run explicitly or via a script or custom application. You have to run them against an RDS database instance

-Stored in S3, but they use AWS-managed buckets

-They function like EBS Snapshots - Snapshots and Automated Backups are taken of the instance, which means all the databases within it, rather than just a single database

-The first Snapshot is a FULL copy of the data stored within the instance (takes more time), then onward is Incremental (data which has changed since the last snapshot)

Incremental is usually quicker than full, unless you have an instance where there’s a lot of data change

-When any Snapshots occur, there is a brief interruption to the flow of the data, between the compute resource and the storage

If your using Single-AZ, this can impact your application.
If your using Multi-AZ, this occurs on the StandBy, so it won’t have any noticeabe effect

-Snapshot DON’T EXPIRE - You have to clear them up yourself, it means that Snapshots live on part when you delete your RDS instance ***

The lower the timeframe between Snapshots, the lower the maximum data loss, that can occur when you have a failure

22
Q

RDS - Automated Backups

A

-These occur once per day - Defined on the instance or Ayou can allow AWS to pick one at random

If your using Single-AZ, make sure it happens during periods of little to no use, this can impact your application. (I/O pause)
If your using Multi-AZ, this occurs on the StandBy, so it won’t have any noticeabe effect

-The first Snapshot is a FULL copy of the data stored within the instance, then onward is Incremental (data which has changed since the last snapshot)

-Stored in S3, but they use AWS-managed buckets

Every five minutes, database transaction logs are written to S3.

Transaction logs store the actual operations which change the data, so operations which are executed on the database and together with the Snapshots. Means that a database can be restored, to a pount-in-time with a five minute granularity.

-A five minute, Recovery Point Objective (RPO), can be reached

-Automated backups aren’t retained indefinitely - They are automatically cleared by AWS

And for a given RDS instance, you can set a retention period from 0 to 35 days (0 means automated bakcups are disabled (35-days means, that you can restore to any point in time, over that 35-day period, using the Snapshots and transaction logs)

-Any data, older than 35-days, is automatically REMOVED ***

-If you delete the database, you can choose t retain any automered baclups. you can choose to retian any automated backups, but they’still expire based on the retention period

  • They way to maintain the contents of an RDS instance, pass this 35-days max retention peroid, is that if you delet ean RDS instacne, you need to create a snapshot, and this snapshots is fully under your control and has to be manually deleted as required
23
Q

RDS - Backuos - Cross-Region

A

-RDS can replicate backups to another AWS region

-Means both Snapshots and Transaction Logs

-Charges apply for both the cross-region data copy

-And any storage used in the destination region

-NOT DEFAULT - Has to be configured within automated backups

24
Q

RDS - Restores

A

-It creates a NEW RDS instance - This matters because you will need to update applications to use the new database endpoint address, because it will be different than the existing one.

-When you restore a Manual Snapshot - Restoring a single point in time, it’s fixed to the time that the Snapshot was created (Influences the RPO)

-When you restore an Automated Snapshot - Any 5 minute point in time

With these, you can choose a specific point, to restore the database to, and this offers substantial improvements to RPO. You can choose to restore to a time which was minutes before a failure

-Backups are restored and transaction logs are “replayed” to bring DB to desired point in time (GOOD RPO)

-Restoring Snapshot isn’t a fast process - Think about RTO (RRs)

So RDS Automated backups are great as a recovery to failure, or as a restoration method for any corruption, but they take time to perform a restore, so accountfor this within you RPO planning.

25
Q

RDS - Read Replicas

A

Read Replicas are read only replicas of an RDS instance.

Unlike MultiAZ, where you can’t by default, use the StandBy replica for anything, you can use Read Replicas, but only for read operations. Now MultiAZ on Cluster mode, is like a combination of the old MultiAZ instance mode, together with Read Replicas.

You have to think of Read Replicas as separate things. They aren’t part of the main database instance in any way. They have their own database endpoint address and so applications need to be adjusted to use them. An application, say Wordpress, using RDS instance will have zero knowledge of any read replicas by default.

Without application support, Read Replicas do nothing. They aren’t functional from a usage perspective. There’s no automatic failover. They just exist off to one side.

-Now they’re kept in sync using a synchronous replication, that means when data is written to the primary instance, at the same time, storing that data on disk on the primary, it’s replicated to the standby.

-With Asynchronous data is written to the primary first, at which point it’s viewed as committed. Then after that, it’s replicated to the read replicas.

-For the exam, for any RDS questions (excluding Aurora), remember that synchronous = MultiAZ / Asynchronous = Read Replicas ***

-Read Replicas can be created in the same region, as the primary instance, or in other AWS regions, known as “Cross-Region” Read Replicas ***

-If you create a cross-region read replica, then AWS handles all of the networking between regionsand this occurs transparently to you and it’s fully encrypted in transit ***

26
Q

Why do Read Replicas matter?

A

(Read) Performance and Scaling Improvements

-You can create 5 direct read-replicas per DB instance

-Each of these, provides an additional instance of read performance - this offers a simple way of scaling out, your read performance on a DB

-Read-Replicas can have their own read replicas - BUT LAG STARTS TO BE A PROBLEM

Because Asynchcronous replication is used, there can be a lag between the main database instance and any read reaplicas, and if you then create read replicas of read replicas, then this lag becomes more of a problem.

-Can help you with Global performance improvements

So if you have other read workloads in other AWS regions, then these workloads can directly connect to read replicas, and not impact the performance of the primary instance in any way.

RPO/RTO Improvements

-Snapshots and Backups improve RPOs

The more frequent backups occur and the better backups are, this offers improved RPOs, because it limits, the amount of data which can be lost, but it doesn’t really help us for recovery time objectives.

-RTOs are a problem

Because restoring snapshots takes a long time, especially for large databases.

-RR’s offer a near zero RPO

Because the data, that’s on the RR is synced from the main database instance. So there’s very little potential for data loss assuming we’re not dealing with data corruption.

-RR’s can be promoted quickly - low RTO

So in a disaster scenario where you have a major problem with your RDS instance, you can promote a RR and this is a really quick process, but, you should only look at using RRs during disaster recovery scenarios, when you are recovering from a failure.

-FAILURE ONLY - watch for data corruption

If you are recovering from data corruption, then logically the read replica will probably have a replica of that corrupted data.

-READ ONLY - UNTIL PROMOTED

-Easy way to achive Global Availability improvements - Global Resilience

Because you can create a cross-region RR, in another AWS region and use this as a failover region, If AWS ever have a major regional issue.

27
Q

RDS - Data Security

A

-SSL/TLS (in transit) is available for RDS, can be mandaroty - The data between the client and the RDS instance ***

-RDS supports EBS volume encryption - KMS (at rest) ***

-Handled by the RDS HOST/EBS

-AWS or Customer Managed CMK generates DATA KEYS - KMS

-Data Keys used to generate DEKs for encryption operations

-Storage, Logs, Snapshots & Replicas are encrypted using the same CMK

-Encryption can’t be removed ***

-RDS MS-SQL and RDS Oracle Support TDE ***

-TDE = Transparent Data Encryption - This is encryption, which is supported and hadled within the database engine (data encrypted/decrypted) (less trust)

-RDS Oracle supports ingegration with CloudHSM ***

-Much stronger key controls (even from AWS) ↑↑↑ - Because CloudHSM is managed by you, with no key exposure to AWS

28
Q

RDS - KMS Encryption & TDE

A

-With RDS Oracle - keys can be provided via CloudHSM - removing AWS from the chain of trust

-TDE is native DB Engine encryption - Data is encrypted before leaving the instance, with AWS having no exposure outside of the RDS instance

-KMS provides AWS or CMKs, which are used to generate DEKs for RDS

These DEKs are loaded onto the RDS hosts as needed and are used by the Host, to perform the encryption or decryption operations.

This means the database engine doesn’t need to natively support encryption or decryption. It has no encryption awareness, from it’s perspective, it’s writing data as normal and its encrypted by the Host, before seding it on to EBS in it’s final encrypted format.

-Data that’s transferred between replicas, is also encrypted as are any snapshots of the RDS EBS volumes. and these use the same encryption key (KMS)

29
Q

RDS - IAM Authentication

A

-Normally, logins to RDS are controlled using local database users - these have their own usernames and passwords, they’re not IAM users and are outside of the control of AWS.

-One gets created when you provision an RDS instance, but that’s it

-You can configure RDS to allow IAM user authentication against a database

-RDS Local DB Account is configured to use AWS Authentication Token

We have IAM users and roles, in this case, an instance role, and attached to those roles and users are policies. These policies contain mapping between that IAM entity, so the user or role, and a local RDS database user.

-Policy attached to Users and Roles maps that IAM identity onto the local RDS user ***

This allows those idetities to run a “generate-db-auth-token” operation, which works with RDS and IAM, and based on the policies attached to the IAM identities, it generates a token with a 15min validity.

This token can then be used to log in to the database user within RDS, without requiring a password.

-Authorization is controlled by the DB Engine - Permissions are assigned to the local DB User. IAM IS NOT USED TO AUTHORISE, only for authentication. ***

30
Q

RDS Custom

A

-Fills the gap between RDS and EC2 running a DB Engine

-RDS is a fully managed database server - OS/Engine access is limited

-It gives you access to databases running on a database server, which is fully managed by AWS and so any OS or engine access is limited, using the main RDS product

-DB on EC2 is self managed - but has overhead, because is done on this way, you’re responsable for everything from the O.S upwards

-Currently works for MS-SQL and Oracle

-Can connect using SSH, RDP, Session Manager and actually get access to the O.S and Database Engine

-Runs within your AWS account - Unlike normal RDS, then if you look in your account, you won’t see any EC2 instance or EBS volumes or any backups within S3. That’s because they’re all occurring within an AWS-managed environment

With RDS, the networking works by injecting elastic network interfaces (ENIs) into your VPC. That’s how you get access to the RDS instance.

With RDS Custom, you will see an EC2 instance, EBS volumes, and Backups inside your AWS account.

  • RDS Custom Database Automation - If you need to perform customization of RDS Custom settings

-To ensure that you have no disruptions, caused by the RDS automation, while you’re performing customizations, you need to PAUSE database automation, perform customizations and then resume automation.

-Resume (full automation) for normal full automation

31
Q

Amazon Aurora - Architecture - Key Differences

A

-Very different from RDS

-User a “Cluster” - A Cluster is made up of a number of important things:

-A single primary instance + 0 or more replicas

The replicas within Aurora can be used fo read during normal operations. So it’s not like the standby replica inside RDS.

The replicas inside Aurora can actually provide the benefits of both RDS MultiAZ and RDS read replicas. So they can be inside a Cluster and they can be used to improve availability, but also they can be used for read operations, during the normal operation of a cluster.

–You don’t have to choose between read scaling and availability ***

-NO LOCAL STORAGE - uses Cluster Volume

Instead, an Aurora Cluster has a shared cluster volume. This is storage which is shared and available to all compute instances within a cluster. This provides a frew benefits such as…

-Faster provisioning & Improved availability & Performance

32
Q

Aurora Storage - Architecture

A

-It functions across a number of AZs

-Inside the cluster is a primary instance and optionally a number of replicas, they function as failover options, if the primary instance fails, but they can also be used during normal functioningof the cluster for read operations, from applications

-The cluster has shared storage which is SSD-based, and it has a maximum size of 128TB and it also has 6 replicas, across multiple AZs.

-When data is written to the primary DB instance, Aurora synchronously replicates that data across all of the 6 storage nodes spread the AZs, which are associated with your Cluster

-All instances inside your Cluster, so the primary and all of the replicas, have access to all of these storage nodes

-Replication happens at the storage level ***

So no extra resources are consumed on the instances or the replicas during this replication process.

-By default, the primary instance is the ONLY instance able to WRITE to the storage and the replicas and the primary can perform READ operations.

-Aurora automatically detects failures in the disk volumes that make up the cluster shared storage ***

-When a segment or a part of a disk volume fails, Aurora inmediately repairs that area of disk

Wen Aurora does this, it uses the data inside the other storage nodes that make up the cluster volume and they automatically recreates that data. Ensures that the data is brought back into an operational state with no corruption.

-Aurora avoids data loss and it reduces any need to perform pointing time restores or snapshot restores to recover from disk failures ***

-MUCH MORE RESILIENT THAN RDS ***

-With Aurora you can have up to 15 replicas, and any of them can be the fail over target for a fail over operation (Quicker than RDS)***

33
Q

Now as well as the Resiliency that the Cluster volume provides, there are few other key elements, that you should be aware of:

A

-ALL SSD-based - high IOPS, low latency

-Storage is simply based on what’s used - Because you don’t have to allocate the storage that the Cluster uses

-High water mark - Billed for the most used (Is being changed by AWS)

So if you consume 50GB of storage, you are billed for 5GB. If you free up 10GB of data, so move down to 40GB of consumed data, you’re still billed for that “high water mark” of 50GB.

-Storage which is freed up can be re-used

-If you want to reduce costs on storage, then you need to create a brand new cluster and migrate the data

-Replicas can be addded and removed without requiring storage provisioning - because the storage is for the Cluster not for the instances

Which massively improves the speed and efficiency of any replica changes within the Cluster.

-Aurora Cluster like RDS clusters use an Endpoint - These are DNS addresses, which are used to connect to the Cluster

Unline RDS, Aurora Clusters have multiple Endpoints that are available for an application. As a minimum you have the Cluster Endpoint and the Reader Endpoint.

-The Cluster Enpoints always points at the primary instance, for write/read operations
-The Reader Enpoints will point at the primary instance, if that’s all there is, but if there are replicas, then the reader endpoint will load balance across all of the available replicas and this can be used for read operations.

Much easier to manage read scaling using Aurora versus RDS, because as you add additional replicas, which can be used for reads, this Reader Endpoint is automatically updated to load balance across these new replicas.

-You can Custom Endpoints

-In addition to that, each instance (Primary and Replicas), have their own unique endpoint

So Aurora allows for a much more custom and complex architecture versus RDS

34
Q

Aurora - Cost

A

-No free-tier option

-Aurora doesn’t support Micro Instances

-Beyond RDS singleAZ (micro) - Aurora offers better value

-Compute - hourly charge, per second, 10 minute minimum

-Storage - GB-Month consumed, IO cost per request

-100% DB Size in backups are are included

So if your database cluster is 100GIB, then you’re given 100GIB of storage for backups as part of what you pay for that cluster.

35
Q

Aurora Restore, Clone & Backtrack

A

-Backups in Aurora work in the same way as RDS - So for normal backup features, for automatic backups, for manual snapshots backups

-Restores create a new cluster

-Backtrack can be used which allow IN-PLACE REWINDS to a previous point in time

Needs to be enabled in a per cluster basis, and it will allow you to rollback your database.

-Fast Clones make a new database MUCH faster than copying all the data - COPY-ON-WRITE

It doesn’t make a one-for-one copy of the storage for that database. What it does is it references the original storage, and it only stores any differences between those two. Differences can be either you update the storage in your cloned database, or it can also be that the data is updated in the original database, which means that your clone needs a copy of that data before it was changed on the source.

Essentially your cloned database, only uses a tiny amount of storage. It only stores data that’s changed in the clone or changed in the original after you make the clone.

36
Q

Aurora Serverless

A

Is a service, which is to Aurora what Fargate is to ECS. It provides a version of the Aurora database product, where you don’t need to statically provision database instances of a certain size, or worry about managing those database instances.

-It removes one more piece of admin overhead of managing individual database instances.

Concepts

-Scalable - ACU - Aurora Capacity Units (You still create a cluster)

Capacity Units represent a certain amount of compute, and a corresponding amound of memory.

-For a cluster, you can set minimum and maximum values and Aurora Serverless will scale between those values, adding or removing capacity based on load

-Cluster adjusts based on load

-Can go to 0 and be paused

-Consumption billing per-second basis

-Same resilience as Aurora (6 copies across AZs)

Benefits

-Removes the complexity of managing database instances and capacity

-Easier to scale with no disruption client connections

-Cost-effective - you only pay for the database resources that you consume on a per-second basis

37
Q

Aurora Serverless - Architecture

A

-The Aurora Serverless Cluster, has the same cluster volume architecture, which Aurora provisioned uses.

-In an Aurora Serverless Cluster though, instead of using provisioned servers, we have ACUs, which are Aurora Capacity Units.

-These capacity units are actually allocated form a warm pool of Aurora Capacity Units, which are managed by AWS.

-The ACUs are stateless, they’re shred across many AWS customers and they have local storage. So they can be allocated to your Aurora Serverless Cluster, rapidly when required

-When these ACUs are allocated to the cluster, they have access to the cluster storage, in the same way that a provisioned Aurora instance would have access to the storage.

-If the load on an Aurora Serverless Cluster increases beyond the capacity units which are being used and assuming the maximum capacity setting of the cluster allows it, then more ACUs will be allocated to the cluster

-Once the compute resource which represents this new potentially bigger ACU is active, then any old compute resources representing unused capacity, can be deallocated from your Aurora Serverless Cluster

-Because of the ACU architecture, because the number of ACUs are dynamically increased and decreased based on load, that way that connections are managed within an Aurora Serverless Cluster has to be slightly more complex versus a provisioned cluster

In an Aurora Serverless Cluster, we have a shared proxy fleet, which is managed by AWS. This happens transparently to you as a user of an Aurora Serverless Cluster, but if a user interacts with the cluster via an application, it actually goes via this proxy fleet. Any of the proxy fleet instances can be used, and they will broker a connection between the application and the Aurora Capacity Units.

Because the client application is never directly connecting it to the compute resource that provides an ACU, it means that the scaling can be fluid, and it can scale in or out without causing any disruptions.

38
Q

Aurora Serverless - Use Cases

A

-Infrequently used applications - maybe a low volume blog site, where connections are only attempted for a few minutes, several times per day.

-New applications - If you’re deploying an application where you are unsure about the levels of load, that will be placed on the application. So unsure about the size of the database instance that you’ll need, with Aurora provisioned you would still need to provision that in advance and potentially change it.

-Variable workloads - If you’re running a normally likely used application, which has peaks may be 30min out of an hour, or on certain days of the week during sale periods.

-Unpredictble workloads

-Develpoment and Test databases - Aurora Serverless can be configured to pause itself during periods of no load, and during the database pause, you’re only billed for the storage.

-Multi-tenant applications - If you’ve got an application where you’re billing a user a set dollar amount per month per license to that application. If you’re incoming load is directly aligned to your incoming revenue.

-You don’t mind if a database supporting your product, scales up and cost you more , if you also get more customer revenue

39
Q

Aurora Global Database

A

Global databases allow you to create, global level replication using Aurora from a master region to up to 5 secondary AWS regions.

-Primary Region offers similar functionality to a normal Aurora Cluster. It has one read and write instance nad up to 15 read-only replicas in that cluster.

-Secondary Region can have up to 16 read only replicas. The entire secondary cluster is read-only

-The replication between those regions, occurs at the storage layer and replication is typically within 1second from the primary to all of the secondaries.

-Applications can use the primary instance for write operations and then the replicas and primery or the replicas in the secondary regions for read operations.

Use Cases

-Cross-Region DR and BC (Disaster Recovery and Business Continuity) - Because of the 1sec replication, it makes sure the RPO and RTO values are really low, if do you perform a cross-region fail over

-Global Read Scaling - If you want to offer low latency performance to any internatianal areas, where you have customers.

  • ~1 or less replication between regions - It’s a one-way replication

-Replication has NO IMPACT on DB performance

-Secondary regions can have up to 16 replicas

-All of these can be promoted to R/W

-Currently MAX 5 secondary regions

40
Q

Aurora Multi-Master Writes

A

This feature allows an Aurora Cluster to have multiple instances , which are capable of performaing both reads and writes.

This is in contrast with default mode for Aurora, which only allows one writer and many readers.

-Default Aurora mode is Single-Master - One R/W and 0+ Read Only Replicas

-Cluster endpoint is used to write, read endpoint is used for load balanced reads

-Failover takes time - replica promoted to R/W (Single-Master mode)

-In Multi-Master mode all instances are R/W

41
Q

Aurora Multi-Master Writes - Architecture

A

-A Multi-Master Aurora Cluster might seem similar to a single-master one. The same cluster structure exists, the same shared storage.

-Multiple Aurora provisioned instances also exist in the cluster.

-The differences start with the fast that there is no Cluster Endpoint to use. An application is responsible for connecting to instances within the cluster. There’s no load balancing across instances with a multi-master cluster. The application connects to one or all of the instances in the cluster and initiates operations directly.

-The way that this architecture works, is that when one of the read write nodes, inside a multi-master cluster, receives a write request from the application, it inmediately proposes that data be commited to all of the storage nodes in that cluster.

So it’s proposing that the data that it receives to write is committed to storage.

-At this point, each node that makes up a cluster, either confirms or rejects the proposed change

It rejects it, if it conflicts with something that’s already in flight. For example, another change from another application writing to another read-write instance inside the cluster.

What the writing instance is looking for, is a quorum of nodes to agree. A quorum of nodes that allow it to write that data. At which point, it can commit the change to the shared storage.

Assuming that it can get a quorum to agree to write, then that write is committed to storage and it’s replicated across every storage node in the cluster.

-With a Multi-Master Cluster, that change is then replicated to other nodes in the cluster. This means those other writers can add the updated data into that in memory caches.

This means that any reads, from any other instances in the cluster, will be consistent with the data that’s stored on the shared storage. Because instances cache data, we need to make sure in addition to committing it to disk, it’s also updated inside any in-memory caches of any other instances within the Multi-Master Cluster.

Once the instance on the right, has got agreement to be able to commit that change to the cluster shared storage, it replicates that change to the instance on the left. The instance on the left updates it’s in memory cache, and then if that instance is used for any read operations, it’s always got access to the up-to-date data.

42
Q

Aurora Single-Master & Multi-Master

A

Single-Master

-The configuration change to make one of the other replicas, the new primary instance inside the cluster, is not an immediate change - It causes disruption

-NonFault-Tolerant

Multi-Master

-With Multi-Master, both instances are able to write to the shared storage

-The application can connect with one or both of them

-The application could maintain connection to both and be ready to act if one of them fails

-When the writer fails, it could immediately just send 100% of any future data operations to the writer, which is working perfectly

-There would be little, if any disruption

-Fault-Tolerant

Benefits

-Offers much faster availability

-The fail over events can be performed inside the application, and it doesn’t even disrupt traffic, between the application and the database

-Implement fault-tolerance, but the application logic needs to manually load balance across the instance

43
Q

RDS Proxy

A

Is a fully managed, high available database proxy for RDS that makes applications more scalable, more resilient to database failures, and more secure.

Why use it?

-Opening and Closing connections consume resources

-…It takes time which creates latency

-With serverless.. every lambda opens and closes

-Handling failure of Database instances is HARD - Doing it within your applicationn ADDS RISK

-DB Proxies can help, but maybe you don’t have any database proxy experience, and even if you do, can you manage them at scale?

That’s where RDS Proxy adds value. RDS Proxy does or indeed any database proxy is change your architecture. Instead of your application connecting to a database every time they use it, instead they connect to a proxy.

-Application(s) > Proxy (connection pooling) => Database

The proxy maintains a pool of connections to the database, which are open for the long term. Then any connections to the proxy can use this already established pool of database connections. It can also do multiplexing, where it can maintain a smaller number of connections to a database versus the connections to the proxy.

A mutiplex requests over the connection pool, between the proxy and the database. So you can have a smaller number of actual connections to the database, versus the connections to the database proxy.

44
Q

RDS Proxy - Architecture

A

-It runs within a VPC, across all AZs.

-Maintains a Long Term Connection Pool *** - In this case to the primary node of the database running in AZB.

-The EC2 instances and Lambda functions, connect to the RDS proxy rather than directly to the database instances - MUCH quicker to establish vs direct to database ***

-RDS Proxy connections to the DB Instance, can be reused… avoiding the lag of establishment, usage & termination for each invocation - Multiplexing is USED ***, so that a smaller number of database connections can be used for a larger number of client connections and this helps reduce the load placed on the database server.

-RDS Proxy abstract client away from DB failure or failover events ***

-Clients to Proxy connection is established and WAITS even if the Target DB is unresponsive - Might occur during failover events from the primary to the standby

45
Q

RDS Proxy - Use cases

A

-Too many connection errors

-DB instances using T2/T3 (smaller/burst) instances

-Useful when using AWS Lambda - time saved/connection reuse & IAM Auth

-Long running connections (SAAS apps) - Low latency

-Where resilience to database failure is priority

-… RDS Proxy can reduce the time for failover

-… and make it transparent to the application

46
Q

RDS Proxy - KEY FACTS

A

-Fully Managed DB Proxy for RDS/Aurora

-It’s auto scaling, highly available by default

-Provides connection pooling - reduces DB Load

–We don’t have to constant opening and closing of DB connections
–We can Multiplex, to use a lower number of connections between the Proxy and the DB, relative to the number of connections between the client and the Proxy

-ONLY accessible from a VPC

-Accessed via Proxy Endpoint - no app changes

-Can enforce SSL/TLS - ensure security of your applications

-Can reduce failover time by over 60%

-Abstracts failure away from your applications

47
Q

Database Migration Service (DMS)

A

-A managed database migration service

-Runs using a replication instance

-You need to define a Source and Destination endpoints, which point at the source and target databases

-One of the Endpoints MUST be on AWS

-No downtime migration

Architecture

-Common DB Support, MySQL, Aurora, MS-SQL, MariaDB, MongoDB, PostgreSQL, Oracle, Azure SQL…

-Replication Instance = Is an EC2 instance, which sits in between the SRC and DST and uses a migration software

-On this instance, you can define replication tasks, and each of these replication instances can run multiple replication tasks. Tasks define all of the options relating to the migration.

-Replication instance performs the migration between SRC and DST endpoints which store connection information for SRC and Target databases

-A task moves data from the SRC database, using the details in the source endpoint, to the target database using the details stored in the destination endpoint configuration

-A job can be Full Load (one off migration of all data), Full Load + CDC (change data capture) for ongoing replication which captures changes or CDC Only (Only migrates data changes)

-Schema Conversion Tool (SCT) can assist with Schema Conversion

48
Q

Schema Conversion Tool (SCT)

A

-SCT is used when converting one database engine to another ***

–On-premises MS-SQL -> RDS MySQL
–On-premises Oracle -> Aurora

-Another way of moving data between on-premises and AWS - DB -> S3 (Migrations using DMS)

-SCT is NOT USED when migrating between DB’s of the same type - On-premises MySQL > RDS mySQL

-Works with OLTP DB Types (MySQL,MS-SQL, Oracle)

-And OLAP (Teradata, Oracle, Vertica, Greenplum)

49
Q

DMS & Snowball

A

-Larger migrations might be multi-TB in size

-It’s not optimal to move data over networks, because it takes time and consumes capacity

-Step 1: Use SCT to extract data locally and move to a snowball device

-Step 2: Ship the device back to AWS. They load onto an S3 bucket

-Step 3: DMS migrates from S3 into a target store

-Step 3: Change Data Capture (CDC) can capture changes, and via S3 intermediary they are also written to the target database