Database Specialty - Aurora Flashcards
Aurora compatible relational database
MySQL and PostgreSQL
Aurora faster standard MySQL
5x faster
Aurora faster standard PostgreSQL
3x faster
Aurora quantity copies maintains
6 copies across 3 AZs
Fast backtracking option for…
PITR
Aurora DB cluster types
Writer Endpoint (master) and Reader Endpoint (Connection Load Balancing)
Number Max Endpoints custom
5
Automatic failovers from the
master instance
Aurora Serverless automatically…
starts up, shuts down, scales up/down based on application needs
Aurora serverless results compared to RDS
40% lesser overall
What happens in case of cluster or AZ failure?
Creates the DB instance in another AZ
Data API types used for run queries
Query Editor within RDS console, Command Line, AWS SDK
To access VPC resources in relation for Lambda
No need to configure
Aurora Multi-Master - typically results in…
zero downtime
Global Aurora - Cross Region Read Replicas points principal
- useful for disaster recovery
- simple to put in place
- replica promotion can take a few minutes depending on worload
Aurora Global Databases (recommended) - points principal
- Primary Region (read / write)
- up to 5 secondary (read-only) regions
- up to 16 read replicas per secondary region
- helps for decreasing latency
- promoting another region (for DR)
Reliability features in Aurora - explain storage Auto-Repair principal points
- automatically detects and repairs disk volume failures in the cluster volume
- quorum model ensure that is no data loss due to disk failures
Principal points about Survivable Warming
- Aurora page cache is managed in a separate process from the DB
- Page cache stores pages for known common queries
- Every time Aurora starts/restarts, it preloads the buffer pool cache from the page cache
- Eliminates the need to warm up the buffer cache
Principal points Crash Recovery
- Designed to recover from crashes almost instantaneously
- DOES NOT need replay the redo log from DB checkpoint
- DOES NOT need binary logs for replication within cluster or for PITR
- Binary logging on Aurora directly affects the recovery time after a crash
- Higher the binlog data size, longer it takes for crash recovery
- Disable binary logging (binlog_format=OFF) to reduce recovery time
Principal Points in cluster cache management CCM
> Buffer cache is to reduce disk IO in RDBMS
Cached content in primary and replica may be different
Post a failover from primary to a replica, promoted replica takes some time to warm up its cache
this causes slower response time post failover
CCM improves the performance of the promoted instance post failover
Replica preemptively reads frequently accessed buffers cached from the primary
Simulating Fault Tolerance in Aurora points
- Two ways to test/simulate fault tolerance (Manual failover, Fault injection queries)
- Fault tolerance is synonymous to resiliency (or fault resiliency)
- You can use these options to simulate AZ Failure
- Can perform primary upgrade by force failover
Simulating fault tolerance w/ manua failover points
> Select the master instance and choose Actions -> Failover (or use failover-db-cluster command)
Failover to the replica with highest failover priority will be triggered
The read replica with highest failover priority will be the new master
The master instance that failed over will become a replica when it comes online
As each instance has its own endpoint address
So you should clean up and re-establish any existing connections that use the old endpoints post a failover
Simulating fault tolerance w/ fault injection
queries points
- Fault injection queries are issued as SQL commands
- You can schedule a simulated occurrence of
different failure events - writer/reader crash
- replica failure
- disk failure
- disk congestion
Fault injection queries – writer / reader crash
ALTER SYSTEM CRASH
[ INSTANCE | DISPATCHER | NODE ];
- Instance = DB instance (default crash type)
- Dispatcher = writes updates to the cluster volume
- Node = Instance + Dispatcher
Fault injection queries – replica failure
ALTER SYSTEM SIMULATE
percentage_of_failure PERCENT READ REPLICA FAILURE
[ TO ALL | TO “replica name” ]
FOR INTERVAL quantity
{ YEAR | QUARTER | MONTH | WEEK | DAY | HOUR | MINUTE |
SECOND };
- percentage_of_failure = % of requests to block
- TO ALL / TO = simulate failure of all or a specific replica
- quantity = duration of replica failure
Fault injection queries – disk failure
ALTER SYSTEM SIMULATE
percentage_of_failure PERCENT DISK FAILURE
[ IN DISK index | NODE index ]
FOR INTERVAL quantity
{ YEAR | QUARTER | MONTH | WEEK | DAY | HOUR |
MINUTE | SECOND };
- percentage_of_failure = % of the disk to mark as faulting
- DISK index = simulate failure of a specific logical block of data
- NODE index = simulate failure of a specific storage node
- quantity = duration of disk failure
Fault injection queries – disk congestion
ALTER SYSTEM SIMULATE
percentage_of_failure PERCENT DISK CONGESTION
BETWEEN minimum AND maximum MILLISECONDS
[ IN DISK index | NODE index ]
FOR INTERVAL quantity
{ YEAR | QUARTER | MONTH | WEEK | DAY | HOUR |
MINUTE | SECOND };
- percentage_of_failure = % of the disk to mark as congested
- DISK index / NODE index = simulate failure of a specific disk or node
- minimum / maximum = min and max amount of congestion delay in
milliseconds (a random number between the two will be used) - quantity = duration of disk congestion
Fast failover in Aurora PostgreSQL
- Use CCM (apg_ccm_enabled=1)
- Use cluster / reader / custom endpoint (instead of instance endpoints)
- Cleanup / re-establish connections, if using instance endpoints
- Add automatic retry capability to the application
- Aggressively set TCP keepalives (=low values) - Quickly closes active
connections if client is no longer able to connect to the DB - Reduce Java DNS Cache timeout value (low value for DNS TTL)
- Reader endpoint cycles through available readers. If a client caches DNS info,
requests might go to the old instance until DNS cache times out - Use separate connection objects for long and short running queries
- Use Aurora PostgreSQL connection string with multiple hosts
Fast failover in Aurora PostgreSQL (contd.)
- Use a list of hosts in your JDBC
connection string - Or maintain a file containing cluster
endpoints (reader / writer) - Your application can read this file to
populate the host section of the
connection string - JDBC connection driver will loop through all nodes on this list to find a valid connection
- Set connection parameters (in red) aggressively so your app doesn’t wait too long on any host
- Alternatively, you can also maintain a file containing instance endpoints
- When you add/remove nodes, you must update this file
Cluster Replication Options for Aurora MySQL
- Replication between clusters = can have
more than 15 read replicas - Replication
- between two Aurora MySQL DB clusters in
different regions (Cross-Region Replication) - between two Aurora MySQL DB clusters in
same region - between RDS MySQL DB instance and an
Aurora MySQL DB cluster
Cluster Replication Options for Aurora MySQL
Cross-Region Replication b/w two Aurora
MySQL DB clusters
- enable binary logging (binlog_format
parameter) - then, create a cross-region read replica
in another region - you can promote the replica to a
standalone DB cluster (typically, for DR
purposes)
Cluster Replication Options for Aurora MySQL
Replication b/w two Aurora MySQL
DB clusters in same region
- enable binary logging
(binlog_format parameter) on
source - then, replicate using a snapshot of
the replication master
Cluster Replication Options for Aurora
Replication b/w RDS DB instance and an Aurora DB cluster
- By creating an Aurora read replica of RDS DB instance
- Typically used for migration to Aurora rather than ongoing replication
- To migrate, stop the writes on master. After replication lag is zero, promote the
Aurora replica as a standalone Aurora DB cluster
Invoke Lambda functions from Aurora MySQL
- Give Aurora MySQL access to Lambda by setting DB cluster parameter
aws_default_lambda_role = IAM role ARN - Option 1 – Using mysql.lambda_async procedure (deprecated)
- Wrap calls in a stored procedure and call through triggers or application code
- Option 2 – Using native functions lambda_sync and lambda_async
- User must have INVOKE LAMBDA privilege
- GRANT INVOKE LAMBDA ON . TO
user@host
Load data from S3 into Aurora MySQL
- Use SQL statements
- LOAD DATA FROM S3, or
- LOAD XML FROM S3
- Must give the Aurora cluster access to S3 by setting DB cluster parameter
- aurora_load_from_s3_role = IAM role ARN, or
- aws_default_s3_role = IAM role ARN
- User must have LOAD FROM S3 privilege
- GRANT LOAD FROM S3 ON . TO
user@host
RDS / Aurora – Good things to know
- Stopping an RDS DB * Can stop an RDS instance only if it does not have a replica
- Cannot stop an RDS replica * Cannot stop RDS for SQL Server DB instance if it’s in
Multi-AZ - Stopping an Aurora DB * Can stop the cluster, not the individual instances
- Cannot manually stop Aurora Serverless * Cannot stop Aurora Multi-Master or Aurora Global DB
cluster - Cannot stop a cluster if it uses parallel query * Cannot delete a stopped cluster without starting if first
- If you don’t manually start your DB instance/Cluster after seven days, it will be automatically started
RDS / Aurora – Good things to know 2
- Maximum Connections in RDS or Aurora is controlled via parameter
groups - Each DB engine has a specified formula for the default max connections value
- You can override this value using a custom parameter group
- Ideally, you’d want to scale the instanceto get higher max connections