Database - RDS, Dynamo, PostgreSQL, Aurora, MySQL, MariaDB, Elasticache Flashcards
RDS, Dynamo, PostgreSQL, Aurora, MySQL, MariaDB, Elasticache
What is a relational database?
a traditional database that is made up like a spreadsheet with tables, rows, and fields.
Name the database Types in AWS
SQL server Oracle MySQL Server PostgreSQL Aurora MariaDB
Describe how a non-relational database works
collection = table document = row key value pairs - fields
What is data warehousing?
It is used to pull in very large and complex data sets. Usutally used by management to do queries on data (such as current performance vs targets, etc.) It is used for business intelligence. Tools like Cognos, Jaspersoft, SQL Server Reporting Services, Oracle Hyperion, and SAP Netweaver.
What is OTLP?
Online Transaction Processing
What is OLAP?
Online Analytics Processing
How do OLAP and OTLP differ?
They differ in terms of queries you will run. OTLP example: 2120121 -pulls up a row of data such as NAME, Date, Address to deliver to, delivery status, etc. OLAP example: Pulls in large number of records. Net profit for EMEA and Pacific for the digital record product. Sum of radios sold in EMEA Sum of radios sold in pacific Unit cost of Radio in each region Sales price of each radio Sales price - unit cost
T or F data warehousing databases use the same type of architecture from a db perspective and infrastructure layer.
False data warehousing uses different type of architecture both from a DB perspective and infrastructure layer.
A web server that makes it easy to deploy, operate, and scale in memory cache in the cloud is called what?
Elasticache
T or F Elasticache improves the performance of web apps by allowing you to retrieve info from fast, managed, in-memory caches, instead of relying entirely on slower disk based databases.
True
Name the types of Elasticache:
Memcached and Redis
Name the two types of RDS backups for AWS:
Automated backups and DB snapshots
_____ allows you to recover your DB to any point in time within a retention period.
- Automated backups
- DB Snapshots
automated backups
These types of backups are done manually (user initiated)
- Automated backups
- snapshots
Snapshots
The retention period for autometed backups can be between _____ and ___ days.
- 1 and 5
- 1 and 35
- 1 and 100
- infinite
2) 1 and 35
Automated backups will take a full daily snapshot and will also store
____ logs during the day
transaction
T or F
When you do a recovery, AWS will first choose the most recent daily backup and then apply transaction logs relevant to that dayl. This allows you to do a point in time recovery down to a second, within the retention period.
True
T or F
Automated backups are disabled by default.
False
Automated backups are enabled by default
Where are automated backups stored?
S3
T or F
With automated backups, you get unlimited, free storage space in S3.
False.
You get free storage space = to the size of your DB.
T or F
Backups are taken every time a change is made to the DB.
false.
Backups are taken within a defined window.
T or F
During the backup window, storage IO may be suspended while your data is being backed up and you may experience elevated latency.
True
T or F
Snapshots are stored even after you delete the original RDS instance, unlike automated backups
True
T or F
Whenever you restore either an Automated backup or a manual snapshot, the restored version of the DB will be a new RDS instance with a new DNS endpoint.
True
_____ allows you to have an exact copy of your DB in another AZ
multi-az
T or F
AWS handles multi-AZ DB replciation for you, so when your prod DB is written to, this write will automatically be syncronized to the standby database.
True
T or F
IN the evnet of planned DB maintenance, DB intance failure, or an AZ failure, RDS will auto failover to the standby so that DB ops can resume quickly without administrative intervention.
True
T or F
Multi-AZ is for disaster recovery only
True
T or F
Multi-AZ is also designed to improve performance
False
It is not primarily used for improving performance. For performance improvement , you need read replicas.
Which DBes are types of multi-az databases:
- SQL server
- aurora
- oracle
- mysql
- postgreSQL
- mariaDB
All of the above
_________ allow you to have a read only copy of your production DB.
read replicas
Read replicas achieve syncronization from the primary DB by using ______ replication.
asyncronous
Why would you use read replicas for your DB?
You want to use read replicas when you have very read heavy db workloads.
Read replicas are availabe for the following DBes:
- SQL
- mysql
- postgreSQL
- mariDB
- aurora
mysql
postgreSQL
mariaDB
aurora
They are not available for SQL
T or F
read replicas are used for disaster recovery, not scaling.
false
read replicas are used for scaling, not DR
T or F
You must have automated backups turned on in order to deploy a read replica
True
You can have up to ____ read replica copies of any DB
5
T or F
You can have read replicas of read replicas
True, but you must watch latency
T or F
read replicas share DNS endpoints
False
each read replica will have its own DNS endpoint
T or F
You can have read replicas that have multi-AZ
True
T or F
you can create read replicas of multi-AZ source databses
True
T or F
read replicas can be promoted to be their own databses
true, but it breaks replication
T or F
you can’t have read replicas in another region
False
read relicas can be located in another region.
______ can be used to significantly improve latency and throughput for many read-heavy application workloads (such as social networking, gaming, media sharing, and Q&A portals or compute intensive workloads such as recommendation engines)
- magnetic storage
- elasticache
- load balancing
- container
Elasticache
_______ improves app performance by storing critical pieces of data in memory for low-latency access.
caching
T or F
cached info may include the results of IO intensive DB queries or the result of computationally intensive calculations.
True
A widely adopted memory object caching system. Elasticache is protocol compliant with _____, so popular tools that you use today with existing _____ environments will work seamlessly with the service.
memcached
A popular open source in-memory key-value store that supports data structures such as sorted sets and lists. Elasticache supports master/slace replication and multi-az, which can be used to achieve cross AZ redundancy.
Redis
Because of the replication and persistence features of ______, elasticache manages ______ more as a relational DB.
Redis
______ Elasticache clusters are managed as stateful entities tht include failover, similar to how Amazon RDS manages database failover.
Redis
Because _______ is designed as a pure coding solution with no persistence, Elasticache manages ______ nodes as a pool that can grow and shrink, similar to an EC2 ASG. Individual nodes are expendable and Elasticache provides additional capabilities here, such as auto node replacement and auto discovery.
memcached
Is object caching your primary goal, for example to offload your DB, if so, use this….
- memcached
- redis
memcached
are you interested in as simple caching model as possible? if so use this….
- memcached
- redis
memcached
Are you planning on running large cache nodes and require multi-threaded performance with utilization of multiple codes? if so use this….
- memcached
- redis
memcached
Do you want the ability to scale your cache horizontally as you grow, if so use this….
- memcached
- redis
memcached
are you looking for more advanced data types, such as lists, hashes, and sets? if so use this….
- memcached
- redis
redis
does sorting and ranking data sets in memory help you, such as with leaderboards? if so, use this…
- memcached
- redis
redis
is persistence of your key store important? if so, use this…
- memcached
- redis
redis
do you want to run with multiple AWS AZ with failover?
if so, use this…
- memcached
- redis
redis
______ is a fast and flexible NoSQL DB service for all apps that need consisten, single digit millisecond latency at any scale.
DynamoDB
_______ is a fully managed DB and supports both documents and key-value data models. Its flexible data model and reliable performance make it a great fir for mobile, web, gaming, ad-tech, IoT, many other apps.
DynamoDB
T or F
DynamoDB is stored on magnetic storage
False
DynamoDB is stored on SSD storage
DynamoDB is spread across ___ goegraphically distinc data centers
- 2
- 3
- 4
- 5
- 6
- 3
Which of these is the default for DynamoDB
Eventual Consistent reads
Strongly consisten reads
Eventual consistent reads
DynamoDB
Consistently across all copies of data is usually reached within a second. Repeating a red after a short time should return the updated data. Best read performance)
- Eventually consistent read
- Strongly consistent read
Eventually consistent read
DynamoDB
This returns a result that reflects all writes that received a successful response prior to the read.
- Eventually consistent read
- Strongly consistent read
Strongly consistent read
T or F
DynamoDB tables are made up of Items and Attributes
Items (think of a row of data in a table)
Attributes (Think of a column of data in a table)
True
T or F
DynamoDB supports key-value and document data structures
True
DynamoDB
The name of the data = ____
The data itself =_____
key
value
DocDB documents can be written in:
- JSON
- HTML
- HTTP
- XML
- JAVA
JSON
HTML
XML
DynamoDB stores and retrieves data based on a ______ key
primary
2 types of primary keys:
______ key and ____ key
partition and composite key
This Dynamo DB key is a unique attribute (userID) value of the part key is input to an internal hash function which determines the partition or physical location on which the data is stored.
partition key
T or F
If you are using the partition key as your primary key, then no two items can have the same partition key.
true
DynamoDB
The composite key is made up of the ____ key and the ____ key
partition and sort key
DynamoDB
Primary key can be a composite key consisting of _____ key and ___ key.
primary and sort
___key = user ID
____ key = timestamp of the past
partition and sort
T or F
DynamoDB
2 different items may have the same partition key, but they must have a different sort key.
True
T or F
DynamoDB Composite Key
All items with teh same partitionkey are stored together, then sorted according to the sort key value.
True
T or F
DynamoDB Composite key
Allows you to store multiple items with same partition key
True
T or F
DynamoDB Authentication and Access control is managed using AWS cognito
False
DynamoDB Authentication and Access control is managed using AWS IAM
T or F
YOu can create an IAM user within your AWS Account which has specific permissions to access and create DynamoDB tables.
True
T or F
You can create an IAM role which enables you to obtain temp access keys which can be used to access DynamoDB.
true
T or F
You can also use a special IAM condition to restrict user access to only their own records.
True
In SQL, and _____ is a data structure which allows you to perform fast queries on specific columns in a table. You select the columns that you want included in the ____ and run your searches on the _____ rather then on the entire dataset.
index
IN DynamoDB, there are 2 types of indexes that are supported to help speed up your DynamoDB queries:
_______Secondary index
______Secondary index
Local and Global
DynamoDB
This can only be created when you are creating your table
- Local secondary index
- Global secondary index
Local secondary index
DynamoDB
You can’t add, remove, or modify it later.
- local secondary index
- global secondary index
local secondary index
DynamoDB
It has the same partition key as your original table
- local secondary index
- global secondary index
local secondary index
DynamoDB
It uses a different sort key
local secondary index
global secondary index
local secondary index
DynamoDB
it gives you a differnt view of your data, organized according to an alternative sort key
local secondary index
global secondary index
local secondary index
DynamoDB
any queries based on this sort key are much faster using the index than the main table
local secondary index
global secondary index
local secondary index
DynamoDB
YOu can create when you create your table, or add it later
local secondary index
global secondary index
global seconday index
DynamoDB
different partition key as well as a different sort key
local secondary index
global secondary index
global secondary index
DynamoDB
gives a completely different view of the data (speeds up any queries relating to this alternative pertition and sort key)
local secondary index
global secondary index
global secondary index
A ____ operation finds items in the table based on floor on the primary key attribute and distinct value to search for.
IE: select an item where the user id = 212, will selecct all the atributes for that item, -first name, surname, email, etc.
query
T oR F
Dynamo DB
You can use an optional sort key name and value to refine the results of your query.
ie: if your sort key is a timestamp, you cna refine the query to only select items with a time stamp of the last 7 days.
True
DB queries
By default, a query returns all the attributes for the items but you can use the projection expression parameter if you want the query to only return the specific attributes you want. Ie: if you want to see the email address ranter than all the attributes.
True
T or F
Query results are always sorted by the partition key
False
Query results are always sorted by the sort key
T or F
DB queries
Numeric order - by default is in ascending order (1,2,3,4)
true
T or F Db queries
ASCII characters code value
true
T or F
Db queries
YOu can reverse the order by setting the ScanIndexForward
parameter to True
False
YOu can reverse the order by setting the ScanIndexForward
parameter to False
T or F DB queries
by default, queries are eventually consistent
true
T or F
DB queries
you need to explicitly set the query to be strongly consistent
True
DB
a ____ operation examines every item in the table
scan
scans return all data atributes by default?
T or F
True
Use the ProjectionExpression
parameter to refine the scan to only return the attributes you want.
T or F
True
T or F
Scan is more efficient than a query
False.
Query is more efficient than a scan
Scan dumps the entire table, then filters out the values to provide the desired result - removing the unwanted data.
T or F
True
Scna dumps add an extra step of removing the data you don’t want.
T or F
True
as a table grows, scan operations take longer
T or F
True
Scan operations on a large table cna use up the provisioned throughput for a large table in just a single operation.
T or F
True
DB queries and scans
You can improve the impact of the query or scan by setting a smaller page size which uses fweer read ops. IE: set page size to return 40 items.
T or F
True
DB queries and scans
Larger number of smaller ops will allow other requests to succeed without throttling.
T or F
True
Db queries and scans
Avoid using scan ops if you can: design tables in a way that you can use the query, get, or BatchGetItem APIs.
T or F
True
improving scan performance
by default, s scan operation processes data sequentially in returning 1MB increments before moving on to retrieve the next 1MB of data. It can only scan one parition at a time.
T or F
True
IMproving scan performance
YOu can configure DynamoDB to use parallel scans instead by logically dividing a table or index into segments and scanning each segment in parallel.
T or F
true
improve scan performance
Best to avoid using parallel scans if your table or index is already incurring heave read/write activity from other apps.
T or F
true
A scan operation finds items in a table using onlyt eh primary key attrribute
T or F
False
A query operation finds items in a table using onlyt eh primary key attrribute
Queries and scans
You privide the secondary key name and a distinct value to search for.
T or F
False
You privide the primary key name and a distinct value to search for.
A scan operation examines every item in teh table.
T or F
True
a scan operation returns all data attributes by default
T or F
True
You cna use the projection expression parameter to refine the results of a scan or query.
T or F
True
Query results are always sorted by teh sort key if there is one.
T or F
True
Query operation is generally more efficient than a scan
T or F
True
The following can improve scan performance:
T or F
Reducs the impact of a query or scan by setting a smaller page size, which uses fewer read operations.
- Isolate scan operations to specific table sand segregate them from your missions critical traffic
- try parallel scans, rather than the default sequential scan
- avoid using scan operations, if you can, design tables in a way that you cna use the query , Get or BatchGetItem API.
True
DynamoDB provision throughput is measured in _____ units
capacity
when you create your table, you specify your requirements in terms of _____ capacity unit and ____ capacity units.
read and write
1x ____ capacity unut = 1x 1KB _____/second
write
1x _____ capcacity unit = 1x strongly consistent ____ of 4kb/second
or
2x _______consisten reads of 4kb/second (default)
read, read, eventually
This is an example configuration of what?
Table with 5x read capacity units and 5x write capacity units
This consig will be able to perform 5x 4kb strongly consistent reads = 20kb/second
twice as many eventually consistent =40kb
5x1kb writes = 5kb/second
provisioned throughput
T or F
I your app reads or writes longer items, it iwll consume more capacity units and will cost you more as well.
True
T or F
Strongly consistent reads calculation
Your app needs to read 80 items (table rows)/ second
each item is 3kb in size
you need strongly consistent reads
first, calculate how many read capacity units needed for each read:
size of each items /4kb
3kb/4kb=.75
Rounded up to the nearest whole number, each read will need 1x
read capacity unit per operation.
multiplied by the number of reads per second = 80 read capacity units required.
true
If you need ______ consistent reads,
you do the same calculation as stronly consistent. However as this is for ______ consistent reads, you get 2x 4kb reads/second or double the throughput of strongly consistent reads.
size of each item/4kb
3kb/4kb=.75
round up to the nearest whole number, =1
multiply by the number of reads per second = 80
dicide 80 by 2, so you only need 40 read capacity units for ____ consistent reads
eventually
_____ capacity units calculation.
you want to write 100 items/second
each item is 512 bytes in size
First calculate how many capacity units for each _____:
size of each item /1kb (for _____ capacity units)
512 bytes/1kb = .05
rounded up to the nearest whole number, each ____ will need 1x ____ capacity unit per ____ operation.
multiplied by the number of _____ per second = 100 ______ capacity units required
write
_____ _____ is measured in capacity units
provisioned throughput
1x ____ capacity unit = 1x 1kb _____/second
write
1 x ___ capacity unit = 1 x 4kb _____consistent ____ or
2 x 4kb _____ consistent _____/second
read, strongly, read, eventually, read
T or F
In DynamoDB on demand, charged apply for reading, writing, and storing data.
True
T or F
in DynamoDB, with on demand, you don’t need to specify your requirement.
True
T or F
DynamoDB on demand is great for unpredicatable workloads
True
WIth DynamoDB on-demand, you want to pay only for waht you use (pay per request)
T or F
True
DynamoDB
unknown workloads
- on-demand capacity
- provisioned capacity
on-demand capacity
DynamoDB
you can forecast read and write capacity units
- on-demand capacity
- provisioned capacity
provisioned capacity
DynamoDB
unpredictable app traffic
- on-demand capacity
- provisioned capacity
on-demand capacity
DynamoDB
predictable app traffic
- on-demand capacity
- provisioned capacity
provisioned capacity
DynamoDB
app trffic is consistent or increases gradually
- on-demand capacity
- provisioned capacity
provisioned capacity
DynamoDB
you want to pay per use model
- on-demand capacity
- provisioned capacity
on-demand capacity
DynamoDB
spiky, short lived peaks
- on-demand capacity
- provisioned capacity
on-demand capacity
This is a fully managed, clustered in-memory cache for DynamoDB
DynamoDB Accelerator (DAX)
DAX delievers up to a ____x read performance improvement
- 5
- 10
- 20
- 30
10
DAX delivers microsecond performance for millions of requests per second.
T or F
True
DAX is ideal for read heavy and bursty workloads
Ie: auction apps, gaming, and retail sites during black Friday promotions.
T or F
True
DAX is a _____ through caching service. This means data is written to the cache as well as the back end store at the same time.
write
DAX allows you to point your DynamoDB API calls at the DAX cluster
True
IF the item you are querying is in the cache (cache hit), DAX returns the result to the application
T or F
True
If the item is not available (cache miss), then DAX performs an _______ consistent get item operation against DynamoDB
eventually
retrieval of data from DAX reduces the read load on DynamoDB
T or F
true
DAX may be able to reduce ______ read capacity
provisioned
DAX caters for _____ consistent reads only - so not suitable for apps that require strongly consistent reads.
Eventually
DAX is not suitable for ____ intensive apps
write
DAX is suitable for apps that do not perform many read ops
T or F
False
DAX is suitable for apps that do not perform many write ops
DAX is not suitable for apps that do not require microsecond response times
T or F
True
T or F
DAX provides in-memory caching for DynamoDB table
True
DAX is not suitable for write intensive apps or apps that require ______ consistent reads.
strongly
Elasticache is in memory cache in the cloud
T or F
True
Elasticache inproves performance of web apps, allowing you to retrieve info from fast in-memory caches ratherthan slower disk based databases
T or F
True
Elasticache sites between your app and the database.
ie: an app frequently requesting specific product info for your best selling products.
T or F
True
Elasticache takes the load off your databases
T or F
True
Elasticache is good if your database is ____ heavy and the data is not changing frequently
read
Elasticache improves performance for read heavy workloads
T or F
True
with elasticache, frequently accessed data is stored in memory for low-latency access, improving the overall performance of your app.
True
Elasticache is good for computer heavy workloads
T or F
true
elasticache can be used to store results of IO intensive DB queries or output of the compute intensive calculations.
T or F
True
Name the two types of Elasticache…
redis and memcached
Widely adopted memory object caching system
- memcached
- redis
memcached
open source in-memory key-value store
- memcached
- redis
redis
multithreaded
- memcached
- redis
memcached
no AZ capability
- memcached
- redis
memcached
supports complex data structures: sorted sets and lists
- memcached
- redis
redis
supports master/slave replciation and multi-AZ for cross AZ redundancy
- memcached
- redis
redis
what are the 2 strategies used in caching?
lazy loading and write-through
Loads the data into cache only when necessary
- Lazy Loading
- Write-Through
Lazy loading
if requested data is in the cache, EC returns the data to the app.
- lazy loading
- write-through
lazy loading
if the data is not in the cache, or has expired, EC returns a null.
- lazy loading
- write-through
lazy loading
Your app then fetches the data from DBs and writes tha data received into the cache so that its available next time
- lazy loading
- write-through
lazy loading
Lazy loading - advantage or disadvantage?
Only requested data is cached: avoids filling up cache with useless data
advantage
Lazy loading - advantage or disadvantage?
node failures are not fatal. a new empty node will just have a lot of cache misses initially.
advantage
Lazy loading - advantage or disadvantage?
cache miss penalty: initial request query to DB writing of data to the cache.
disadvantage
Lazy loading - advantage or disadvantage?
stale data - if data is only updated when there is a cache miss, it can become stale. doesn’t automatically update if teh data in teh DB changes.
disadvantage
TTL = ____
Time to live
TTL specifies the number of seconds until the ____ (data) expires to avoid keeping stale data in teh cache
key
_____ _____ treats an expired key as a cache miss and causes the app to retreive the data from the DB and sequentially write the data into the cache with the new TTL.
Lazy loading
____ does not eliminate stale data - but it helps avoid it.
TTL
_____-_____ adds or updates data to the cache whenever data is written to the databases.
write-through
write-through advantage or disadvantage?
write penalty: every write involves a write to the cache as well as write to the DBes.
disadvantage
write-through advantage or disadvantage?
data in cache is never stale
advantage
write-through advantage or disadvantage?
users are generally more tolerant of additionally latency when updating data than when retrieving it.
advantage
write-through advantage or disadvantage?
wasted resources if most of the data is never read
disadvantage
write-through advantage or disadvantage?
If a node fails a new node is spun up, data is missing until added or updated in the database (migrate by implementing Lazy Loading inconjunction with write-through)
disadvantage
What does ACID transactions stand for?
Atomic, Consistent, Isolated, Durable
DynamoDB transactions
read or write multiple items across multiple tables as an all or nothing operation
T or F
True
DynamoDB Transactions
Checks for a pre-requisite condition before writing to a table
T or F
True
DynamoDB TTL
TTL ______ defines an expiry time for your data
attribute
DynamoDB TTL
Expired items are marked for deletion
T or F
true
DynamoDB TTL
Great for removing irrelevant or old data:
Session data
event logs
temporary data
T or F
True
DynamoDB TTL
Reduces cost by automatically removing data when is no longer relevant
True
Session data table
TTL expressed as _____ time
epoch
session data table
expiration is set for ____ hours after the session began
- 1
- 2
- 3
- 4
- 5
2
session data table
when the current time is greater than the TTL, the item will be expired and marked for deletion.
T or F
True
session data table
you can’t filter out expired items for your queries and scans
false
you can filter out expired items for your queries and scans
DynamoDB Streams
These are Time ordered sequence of item level modifications
(insert, update, delete)
T or F
True
DynamoDB Streams
Logs are encrypted at rest and stored for 24 hours
T or F
True
DynamoDB Streams
Accessed using a dedicated endpoint
T or F
True
DynamoDB Streams
By default the Secondary Key is recorded before and after images can be captured
T or F
False
By default the Primary Key is recorded
Processing DynamoDB Streams
Events are recorded in near real time
T or F
True
Processing DynamoDB Streams
Apps can take actions based on contents
T or F
True
Processing DynamoDB Streams
Event source for Lambda
T or F
True
Processing DynamoDB Streams
Lambda polls the DynamoDB stream
T or F
True
Processing DynamoDB Streams
Executes Lambda code based on a DynamoDB treams event
T or F
True
DynamoDB streams is stored for 24 hours only
T or F
True
Processing DynamoDB Streams
Can be used as an event source for Lambda so you can create apps which table actions based on your events in your DynamoDB table.
T or F
True
ProvisionedThroughputExceededException
You request rate is too hight for the readwrite capacity provisioned on your DynamoDB table
T or F
True
ProvisionedThroughputExceededException
-SDK will automatically retry the request until successful
T or F
True
ProvisionedThroughputExceededException
If you are not using teh SDK, you can:
- Reduce request frequency
- Use ‘Exponential Backoff’
T or F
True
What is exponential Backoff?
-many components in a network can generate errors due to being overloaded
T or F
True
What is exponential Backoff?
In addition to simple retries, all AWS SDKs use ‘Exponential Backoff’
T or F
True
What is exponential Backoff?
Progressively longer waits between consecutive retries
ie: 50ms, 100ms, 200ms… for improved flow control
T or F
True
What is exponential Backoff?
If after 1 min this doesn’t work, your request size may be exceeding the throughput for your read/write capacity
True
T or F
If you use a ProvisionedThroughputExceeded Error, this means the number of requests is too low.
False
If you use a ProvisionedThroughputExceeded Error, this means the number of requests is too high.
T or F
Exponential Backoff improves flow by retrying requests using progressively longer waits.
True