Relational Database Service (RDS) Flashcards
Database Refresher
Databases are systems which store and manage data, but there are number of different types of database systems and crucial differences, between how data is physically stored on disk and how it’s managed on disk and in memory, as well as how the systems retrieve data and present it to the user.
Databases systems are very broadly split into relational and non-relational.
Relational (SQL) (RDBMSs)
-Structured Query Language (SQL)
SQL = Is a language which is used to store, update, and retrieve data. It’s known as the structured query language and it’s a feature of most relational database platform.
-Structure in & between tables of data - Rigid Schema (Structure of the database)
Rigid Schema = Means it’s defined in advanced before you put any data into the system.
Schema = Defines the names of things, valid values of things, and the types of data which are stored and where.
-Fixed Relationships between tables
Non-Relational (NoSQL)
-NoSQL - Is everything that doesn’t fit into the sequel mold - different models
-Generally a much more relaxed Schema
-Relationships between tables are handled different
Relational Data Example
-Data which relates together are stored within a table
-Every row in the table has to be uniquely identifiable (Primary Key > PK)
-The PK is unique in the table and every row of that table has to have a unique value (ID) for this attribute
-A Joint Table, makes it easy to have many to many relationships and it has a “Composite Key”, which is a key form of two parts. (they have to be unique)
-The keys in different tables, are how the relationships between the tables are defined
-Table Schemas and Relationships defined in Advance
The fact that this schema is so fixed and has to be declared in advance, makes it difficult for a sequel or a relational system, to store any data which has rapidly changing relationships. (Social Network)
Key-Value (NoSQL)
Key value databases consist of sets of keys and values. There’s generally no concept of structure, it’s just list of keys and value pairs.
Date/Time»_space;> Key Value «< Cookies eaten from the feeder during the previous 60min
2020-03-18 13:00 15
2020-03-18 14:00 30
2020-03-18 15:00 0
-As long as every single key is unique, then the value doesn’t matter, it has no real schema, nor does it have any real structure, because there are no tables or table relationships
-Some key value databases allow you to create separate lists of keys and values and present them as tables. But they’re only really used to divide data, there are no links between them
-Scalable - because sections of this data, could be split onto different servers
-Adjust really fast
-Only the key matters
-Also used for in-memory caching
Wide Column Store (NoSQL)
-Each row or item has on or more keys, generally one of them is called the partition key, and then optionally you can have additional keys.
Partition Key > | KEY1 | | KEY2 | < Range key
-DynamoDB is an example of this type of database
-Every item in a table has to have the same key layout, so that’s one key or more keys, and they just need to be unique to that table
-Wide column stores offer groupings of items called tables (not the same types of tables as relational db)
-Every item in a table can also have attributes, but they don’t have to be the same between items, so mix and matching attributes on different items or..
-No attribute schema - Any/All/Schema
-Every item inside a table has to use the same key structure and it has to have a unique key
-It’s very fast
-Super scalable
Document (NoSQL)
Designed to store and query data as documents. Documents are generally formatted using a structure, such as JSON or XML, but often the structure can be different, between documents in the same database.
-Each document is interacted with via an ID, that’s unique to the document
-The value of the document’s content, is exposed to the database, allowing you to interact with it
-Ideal Scenarios: Interacting with whole documents or deep (nested data) attribute interactions within a document structure
-The document model works well with use cases, such as Catalogs, User profiles, and lots of different content management systems, where each document is unique but it changes over time.
-Provide flexible indexing
Column (NoSQL)
Row Store (MySQL)
-Databases where you interact with data based on rows
-Rows are stored on disk together
-Ideal if you are operating with rows adding, updating, deleting
-Online Transaction Processing (OLTP) = For systems which are performing transactions
Column Store (Redshift)
-Databases where you interact with data based on columns
-It’s grouped together on disk based on column - so every order value is stored together, all grouped by the column that the data is in
-Ideal for reporting or when all values for a specific attribute (size) are required
-Since the whole column is stored on disk together, you could perform a query to retrieve all products sold during a period
Graph (NoSQL)
Relationships between things are formally defined and stored in the database itself, along with the data.
-Great for relationship driven data (Social Media or HR systems)
-There are nodes, they can have properties, which are simple key value pairs of data, and these are attached to the nodes.
-There are relationships between the nodes, which are known as “edges”
-Relationships themselves can have attached data, so name, value pairs/
-Can store a massive amount of complex relationships between data or between nodes, inside a database
-A query would run much quicker than with a SQL database
-Social media or systems with complex relationships = Graph Database
ACID vs BASE
-ACID and BASE are DB transaction models - This governs how the database system itself, is architected
-CAP Theorem - Consistency, Availability, Partition Tolertant (resilience) - Choose 2
Consistency = Means that every read to a database, will receive the most recent write or it will get an error
Availability = Means that every request, will receive an non error response, but without the guarantee, that it contains the most recent write
Partition Toletant = Means that the system can be made of multple network partitions, and the system continous to operate, even if there are a number of dropped messages or errors, between these network nodes.
Cap Theorem states that any database product is only capable of delivering a maximum of two of these different factors. Imagine if communication fails, between some of the nodes or if any of the nodes fail.
You have two choices if somebody reads from that database: You can cancel the operation and thus decrease the availability but ensure the consistency, or you can proceed with the operation and improve the availability, but risk the consistency.
-ACID = Consistency
-BASE = Availability
ACID
ACID means that transactions are atomic, consistent, isolated and durable.
-If you see ACID mentioned, it’s probably referring to RDS ***
-ACID limits the ability of a database to scale ***
-Atomic = ALL or NO components of a transaction SUCCEEDS or FAILS
-Consistent = Transactions move the database, from one valid state to another - nothing in-between is allowed
-Isolated = If multiple transactions occur at once, they don’t interfere with each other - Each executes as if it’s only one
-Durable = Once a transaction has been committed, it will remain committed, even in the case of a system failure. Once succeeded, that data is stored somewhere that system failure or power failure, or the restart of a database server or node, won’t impact the data
-Most relational database platforms, use ACID-based transactions
BASE
Stands for Basically Available, Soft State, Eventually Consistent
-BASE modeled NoSQL databases, will ensure availability of data, by spreading and replicating the data, across all of the different nodes of that database ***
-Basically Available = READ and WRITE operations are available “as much as possible”, but without any consistency guarantees - “kinda” “maybe”
-Soft State = The database doesn’t enforce consistency, this is offloaded onto the application/user
-Eventually Consistent = If we wait long enough, reads from the system wil be consistent - doesn’t enforce inmediate consistency
-Highly scalable and can deliver high performance
-DynamoDB is an example of BASE-like way - It offers both eventually and immediately consistent reads ***
-DynamoDB also offers some additional features, which offer ACID functionality, such as DynamoDB transactions ***
Databases on EC2 - Why you might do it
-Access to the DB Instance OS
-Advanced DB Option Tuning (DBROOT)
-If the Vendor demands this level of access
-You want to run a DB or DB Version that AWS doesn’t provide
-Specific OS/DB Combination that AWS doesn’t provide
-Implement an architecture that AWS doesn’t provide (replication/resilience)
-Decision makers who “just want it”
Databases on EC2 - Why you shouldn’t do it
-Admin over head - managing EC2 and DBHost
Both of these require significant management effort, don’t underestimate the effort required, to keep an EC2 instance patched or keep a database host running, at a certain compatible level with your application. Whenever you perform upgrades or whenever you’re fault finding you need to do it, out of core usage hours which could mean additional time, stress and costs for staff to maintain both of these components.
-Backups / DR Management
So if you business has any disaster recovery planning running databases on EC2, adds a lot of additional complexity. Many of AWS has managed database products, include a lot of automation to remove a lot of this admin overhead.
-EC2 is running in a single AZ
If that zone fails access to the database could fail and you need to worry about taking EBS Snapshots or taking backups of the database, inside the database server and putting those on storage somewhere, maybe S3.
-Features - some of AWS DB products are amazing
-EC2 is ON or OFF - no serverless, no easy scaling
There are some AWS managed DB products, which can scale up or down, rapidly based on load. By running a DB product on EC2, you do limit you ability to scale and you do set a base minimum cost of whatever the hourly rate is for that particular size of EC2 instance.
-Replication - skills, setup time, monitoring & effectiveness
All of this tends to be handled by a lot of AWS’s managed DB products.
-Performance - AWS invest time into optimization & features
Relational Database Service (RDS)
-Database as a Service (DaaS) (NOT THE CASE)
DaaS is where you pay money and in return you get a database. This isn’t what RDS does, with RDS, you pay for and receive a database server, so it would be more accurate to call it…
-Database Server as a Service (DBSaaS)
It means that on this database server, or instance, which RDS provides, you can have multiple databases.
-Multiple databases on one DB Server (instance)
-RDS provides a managed version of a database server, that you might have on-premises, only with RDS, you don’t have to manage the hardware, the O.S or the installation, as well as much of the maintenance of the DB engine
-Choices of DB Engines (MySQL, MariaDB, PostgreSQL, Oracle, Microsoft SQL Server)
-Amazon Aurora is a different product
Aurora is a custom database engine and product, created by AWS, which has compatibility with some of the above engines, but it was designed entirely by AWS.
-Is a Managed service - NO ACCESS to O.S or SSH access*
RDS - Architecture
-RDS is a service that runs inside a VPC (NOT PUBLIC)
-It needs to operate in subnets within a VPC in a specific AWS Region
-RDS Subnet Group - Is something you create and is a list of subnets, which RDS can use for a given database instance or instances ***
-It picks at random, unless you indicate a specific preference, but it will put the primary and standby within different AZs
-RDS can be accessed from the VPC or any connected private networks (VPN or Direct Connect)
-RDS can be configured with public addressing allowing access from the public internet (in public subnets)
-If you want to split databases between different sets of subnets, then you need multiple DB subnet groups.
-RDS instances can have multiple databases on them ***
-Every RDS instance has it’s own dedicated storage, provided by EBS ***
-Primary instances replicate to the Standbys using Synchronous Replication ***
This means data is replicated to the standby, as soon as it’s received by the primary. The standby will have the same set of data as the primary.
-Read Replicas use Asynchronous Replication ***
They can be in the same Region, but also other AWS Regions. This can be used to scale read load or to add layers of resilience, if you ever need to recover in a different AWS Region.
-Backups & Snapshots to S3 ***
It’s to an AWS managed S3 Bucket, so you don’t see the Bucket within you account, but it does mean that data is replicated across multiple AZs in that Region. So if you have an AZ failure, backups will ensure that you data is safe.
If you use Multi-AZ mode, then backups occur from the standby instance, which means no negative performance impact **
RDS - Costs
Cost #1 - Instance Size & Type
Cost #2 - Multi-AZ or not (means more than one instance)
Cost #3 - Storage type & amount - per gig monthly fee (Storage based on EBS)
Cost #4 - Data Transferred - Cost per gig of data transfer, in & out of you DB instance, from or to the Internet and other AWS Regions
Cost #5 - Backups & Snapshots - You get the amount of storage that you pay for, for the DB instance in Snapshot storage for free
So if you have 2TB of storage, then that means 2TB of Snapshots for free. Beyond that, there is a cost and this cost is gig per month of storage. 1TB for one month is the same cost as 500GB for two months, so it’s a per GB month cost.
Cost #6 - Licensing (if applicable) - Based on using commercial DB engine types
RDS MultiAZ Instance Deployment (Availability)
RDS has a primary database instance, containing any databases that you create, and when you enable Multi-AZ mode, this primary instance is configured to replicate it’s data synchronously to a standby replica, which is running in another AZ. ***
-This mean that this standby, also has a copy of your databases.
In Multi-AZ instance mode, this replication is at the storage level. The exact method that RDS uses to do this replication depends on the database engine that you pick.
-MariaDB, MySQL, Oracle and PostgreSQL, use Amazon Failover Technology
-Microsoft SQL instances, use SQL Server database mirroring or AlwaysOn availability groups
-All accesses to the database, are via the database CNAME
This is a DNS name, which by default points at the primary database instance.
-With Multi-AZ instance architecture, you always access the primary database instance
-There’s no access to the standby, even for things like reads.
It’s job, it’s to simply sit there, until you have a failure scenario with the primary instance.
-Backups can occur from the standby, so data is moved to S3 and then replicated across multiple AZs, in that Region. ***
-Reads and Writes will occur to and form the primary instance ***
-In the event that anything happens to the primary instance, this will be detected by RDS and a failover will occur ***
Can be done manually or if you need to perform maintenance, but generally this will be an automatic process.
-In this scenario, the database CNAME changes, instead of pointing at the primary, it points at the standby, which becomes the new primary ***
Because this is a DNS change, it generally takes between 60 to 120 seconds, for this to occur, so there can be brief outages. This can be reduced by removing any DNS caching in your application for this specific DNS name.
If you do remove this caching, it means that the second RDS has finished the failover, and the DNS name has been updated, your application will use this name, which is now pointing at the new primary instance.
RDS - MultiAZ - Instance - Summarize
-Focuses on Availability
-Data is written to the Primary AND IMMEDIATELY Replicated to the StandBy before being viewed as Committed (Synchronous)
-MultiAZ does not come on the Free Tier - Extra cost for replica
-You ONLY have ONE StandBy replica ***
-The StandBy replica CAN’T BE USED for reads or writes
-StandBy’s job is to sit there and wait for a failover event
-Failover event can take from 60s to 120s
-Same Region ONLY - Different AZs in the same region
-Backups can be taken from StandBy to improve performance
-Failovers will occur for various different reasons: AZ Outage, Primary Failure, Manual Failure, Instance type change and Software Patching
-You can use failovers to move any consumers of you database onto a different instance, patch the instance which has no consumers and then flip it back
RDS MultiAZ Cluster Deployment
-RDS is capable of having one writer replicate to two reader instances ONLY, in different AZs
-Synchronous Replication to readers ***
-The difference between this mode and the instance mode, is that these readers are usable
-The writer is like the primary instance, and it can be used for writes and read operations
-The reader instances, unlike MultiAZ instance mode, these can be utilizedwhile they’re in this state, but ONLY for read operations
This will need application support, since you application needs to understand, that it can’t use the same instance for reads and writes, but it means that you can use this MultiAZ mode, to scale your read workloads, unlike MultiAZ instance mode.
-In terms of replication, data is sent to the writer and it’s viewed as being committed, when at least one of the readers confirms, that it’s been written ***
-At this point, it’s resilient across multiple AZs within that region
-In RDS MultiAZ mode, each instance still have it’s own local storage
You access the cluster, using with few endpoints types:
1st. The Cluster Endpoint (like the database CNAME) - Points at the writer. Used for reads, writes and administration ***
2nd. Reader Endpoint - Directs any reads at an available reader instance (In some cases to the writer instance)***
Applications can use the reader endpoint to balance their read operations, across readers within the cluster.
3rd. Instance Endpoint - Point at a specific instance. Generally these are used for testing/fault finding ***
Each instance in the cluster, get’s one of these. Generally it’s not recommended to use them directly, as it means any operations won’t be able to tolerate, the failure of an instance because they don’t switch over to anything, if there’s an instance failure.
RDS - MultiAZ - Cluster - Summarize
-1 Writer and 2 Reader DB instances (different AZs) (Higher level availability than Instance mode)
-Runs on much faster hardware, Graviton + local NVME SSD Storage
-Fast writes to local storage and then flushed through EBS
This gives you the benefit of local super fast storage, in addition to the availability and resilience benefits of EBS.
-Readers can be used for reads - allowing some read scaling
So if your applications support it, it means that you can set read operations to use the reader endpoint, which frees up capacity on the writer instance and allows your RDS implementation toscale to high levels of performance, versus any other mode of RDS.
-Replication is via transaction logs - more efficient (This also allows a faster failover)
-Failover is faster ~35s + any time required to apply transaction logs to the reader instances
-Writes are viewed as “committed” when 1 reader has confirmed