Databases Flashcards
RDS runs on VMs
T or F
T
RDS is serverless
T or F
F
it is not serverless
aurora serverless is serverless
T or F
T
read replicas are used for scaling, not DR
T or F
T
must have auto backups turned on in order to deploy a read replica
T or F
T
You can have up to ___ read replica copies of any DB
5
you can have read replicas of read replicas
T or F
T, but watch out for latency
read replica facts:
each read replica will have its own DNS end point
you can have read replicas that have multi AZ
you can create read replicas of multi az source databases
read replicas can be promoted to be their own DB. THis breaks replication
you can have a read replica in a second region
yes
2 types of backups for rds
automated backups
database backups
yes
read replica facts
multi az
used to increase performance
must have backups turned on
can be in different regions
can be mysql, postgres, mariadb, oracle, aurora
can be promoted to master, this will break the read replica
yes
multi az tips
used for DR
you can force a failover from one az to another by rebooting the instance.
yes
This DB service is:
stored on ssd storage
spread across 3 geographically distinct data centers
eventual consistent reads (default)
strongly consistent reads
what is dynamo DB?
consistency across all copies of data is usually reached within a second with this type of read. reoeating a read after a short time should return the updated data. (best read performance)
eventual consistent reads
A ____ consistent read returns a result that reflects all writes that received a successful response prior to the read
strongly
This is a fully managed, highly available, in memory cache for dynamo DB
10x performance improvement
reduces request time from milliseconds to microseconds - even under load
no need for developers to manage cache
compatible with dynamo db api calls
dynamo db accelerator (DAX)
dynamo db transactions notes:
multiple all or nothing operations
financial transactions
fulfilling orders
two underlying reads or writes - prepare/commit
up to 25 items or 4 mb of data
yes
this type of dynamo db capacity provides:
pay per request pricing
balance cost and performance
no minimum capacity
no charge for read/write - only storage and backups
pay more per request than with provisioned capacity
new product launches
on-demand capacity
dynamo db on demand backup and restore notes:
full backups at any time
zero impact on table performance or availability
consistent within seconds and retained until deleted
operates withing same region as source code
yes
dynamo db point in time recovery notes:
protects against accidental ______ or deletes
restore to any point in the last ____ days
_____ backups
not enabled by default
latest restorable: ____ minutes in the past
writes
35
incremental
five
dynamo db ___ are time ordered sequence of item lvel changes in a table
they are stored for 24 hours
inserts, updates, and deletes
combine with lambda functions for functionality like stored procedures
streams
dynamo db global tables notes
managed multi master, multi region replication
globally distributed apps
based on dynamo db streams
multi region redundancy for dr or ha
no app rewrites
replication latency under one second
yes
DMS =
database migration service
dynamo db security
encyption at rest using ___
site to site ___
direct ____
IAM policies and ____
___ grained access
CW and CT
VPC endpoints
KMS
vpn
connect
roles
fine
____ is a fast and powerful, fully managed, petabyte scale data warehouse service in the cloud. Customers can start small for just .25 per hour with no commitments or upfront costs and scale toa. apetabyte or more for 1,000 per TB per year, less than a tenth of most other data warehousing solutions
redshift
_____ transaction example:
net profit for EMEA and pacific for the digital radio product. pulls in large number of records
sum of radios sold in EMEA
sum of radios sold in pacific
unit cost of radio in each region
sales price of each radio
salce price - unit cost
OLAP
olap
online analytics processing
REdshift can be configured as follows
single node (160GB)
multi node
leader node (manages client connections and receives queries)
computer node (store data and perform queries and computations) up to 128 compute nodes
yes
redshit advanced ____
columnar data stores can be compressed much more than row based data stores because similar data is stored sequentially on disk. redshift employs multiple compression techniques and can often achieve significant compression relative to traditional relational indexes or materialized views, and so uses less space than traditional relational database systems. when loading data into an empty table, redshift automatically samples your data and selects the most appropriate compression scheme.
compression
mpp =
massive parallel processing
___ ___ ___
redshift automatically distributes data and query loads across all nodes. redshift makes it easy to add nodes to your data warehouse and enables you to maintain fast query performance as your data warehouse grows.
massively parallel processing
redshift backups
enabled by default with a 1 day retention period
max retention period is 35 days
redshift always attempts to maintain at least three copies of your data (the original and replica on the compute nodes and a backup in s3)
redshift can also asynchronously replicate your snapshots to s3 in another region for disaster recovery.
yes
redshift pricing
compute node hours (total number of house ou run across all your compute nodes for the billing period. you are billed for 1 unit per node per hour, so a 3 node data warehouse cluster running persistently for an entire month would incur 2,160 instance hours. you will not be charged for leader node hours; only compute nodes will uncur charges.)
charged for backups
charged for data transfer (only within vpc, not outside it)
yes
redshift security considerations
encrypted in transit using SQL
encrypted at rest using AES-256 encryption
by default redshift takes care of key management
- manage your own keys through HSM
- AWS key management service
yes
redshift availability
___ AZ(s)
can restore snapshots to new AZs in event of an outage
1
What is aurora?
it is a mysql and postgresql compatible _____ db engine that combines the speed and availability of high end commercial databases with the eimplicity and cost efefctiveness of open source databases.
relational
aurora provides up to ___ x better performance than mysql and ___x better than postgres dbs at a much lower price point, whilst delivering similar performance and availability
5, 3
THings to know about aurora
- start with __gb, scales in__gb increments to ___tb (storage autoscaling)
- compute resources can scale up to ___vCPUs and 244GB of RAM
- ___ copies of your data is contained in each AZ, with max of ___ AZs. ___ copies of your data.
10,10,64
34
2, 3, 6
aurora is designed to transparently handle the loss of up to ___ copies of data without affecting db write avialbility and up to ___ copies without affecting read availability
2,3
t or f
aurora storage is self healing. data blocks and disks are continuously scanned for errors and repaired automatically.
t
three types of aurora replicas are available:
aurora replicas (how many?)
mysql read replicas (how many?)
postgresQL (how many?)
15, 5, 1
t or f
backups are always enabled on aurora db instances
t
t or f
backups impact db performance and must be done during slow traffic periods
false, they do not impact business
t or f
aurora snapshots impact performance
f
they do not impact performance
t or f
aurora snapshots cannot be shared with other aws accounts
f
they can
aurora ____ is an on demand autoscaling capable edition of aurora. an aurora ___ db cluster automatically starts up, shuts down, and scales capacity up or down based on your apps needs.
serverless
t or f
aurora serverless provides a relatively simple, cost effective option for infrequent, intermittent, or unpredictable workloads
t
does memcached support simple cache to offload DB
yes
does memcached support ability to scale horizontally
YES
DOES MEMCAChed support multithreaded performance
yes
does memcached support advanced data types
no
does memcached support ranking/sorting data sets
no
does memcached support pub/sub capabilities
no
does memcached support persistence
no
does memcached support multi AZ
no
does memcached support backup and restore capabilities?
no
does redis support simple cache to offload DB
yes
does redis support ability to scale horizontally
yes
does redis support multi threaded performance
no
does redis support advanced data types
yes
does redis support ranking/sorting data sets
yes
does redis support ranking/sorting data sets
yes
does redis support pub/sub capabilities
yes
does redis support persistence?
yes
does redis support multi az?
yes
does redis support backup and restore capabilities?
yes
use ___ to increase DB and web application performance
elasticache
___ ___ ____ is a cloud service that makes it easy to migrate relational databses, data warehouses, nosql dbs, and other types of data stores. you cna use ___ __ ___ to migrate your data into the cloud, between on prem instances or between combinations of cloud and on prem setups.
database migration service (DMS)
SCT = ?
schema creation tool
t or f
you need SCT even if you are migrating to identical databases
f
you do not need sct if dbs are the same.
DMS - the source can either be on prem or inside aws iteself or another provider such as azure
t or f
t
t or f
dms allows you to migrate databses from one source to aws.
t
t or f
you can do homogenous migrations(same db engines) or heterogenous migrations (different db engines)
DMS
t
t or f
if you do a heterogenous migration with dms, you will need the aws schema conversion tool
t
the follwing services have caching capabilities
api gateway
cloudfront
elasticache - memcached and redis
dynamodb accelerator (DAX)
yes
emr = ?
elastic map reduce
____ is the industry leading cloud big data platform for processing vast amounts of data using open source tools such as apache spark, apache hive, hbase, flink, hudi, presto. with ____ you can run petabyte scale analysis at less than half the cost of traditional on prem solutions and over 3x faster than standard apache spark
emr
the central component of EMR is the ______
cluster
EMR match the nodes:
master, core, task
- a node w/ sw components that only runs tasks and does not store data in HDFS. they are optional
- a node that manages the cluster. this node tracks the status of tasks and monitors the health of the cluster. every cluster has one.
- a node with sw components that runs tasks and stores data in the hadoop distributed file system (HDFS) on your cluster. multinode clusters have at least one.
1 = task
2 = master
3 = core
emr archives log files to s3 at ___ minute intervals
5
emr log files are available even after the cluster terminates?
t or f
t
emr - by default log data is stored on core node.
t or f
f
data is stored on master
t or f
EMR
you can configure replication to s3 on 5 min intervals for all log data from the master node, however, this can only be configured when creating the cluster for the first time.
t
mysql default port is ___
3306
When you add a rule to an RDS DB security group, you must specify a port number or protocol.
false
a destination port is needed, but the rds instance port numbers is automatically applied to the rds db sg.
If you are using Amazon RDS Provisioned IOPS storage with a Microsoft SQL Server database engine, what is the maximum size RDS volume you can have by default?
16tb
What happens to the I/O operations of a single-AZ RDS instance during a database snapshot or backup?
I/O may be briefly suspended while the backup process initializes (typically under a few seconds), and you may experience a brief period of elevated latency.
In RDS, what is the maximum value I can set for my backup retention period?
35 days