Databases Flashcards

Question 1

Q

What is a relational database and the types available on AWS?

Answer

A

Relational database is a database with tables, and each table has rows and columns
- On AWS, you can use SQL Server, Oracle, MySQL Server, Amazon Aurora, MariaDB

Question 2

Q

Two key features of Relational Database Services (RDSs) on AWS

Answer

A

Multi AZ: RDSs on AWS have their own DNS addresses, which point you to your database. If the region goes down where that DB is, they will automatically reroute that DNS to the backed up version of the DB in another region
Read replicas: every time we write to a DB, it is replicated in a Read Replica DB. If the original DB goes down, we have to reroute traffic to the Read Replica DB manually through a new DNS

Question 3

Q

Non-relational Databases

Answer

A

Collection (a table)
Inside the collection, we have Documents (a row)
Key-value pairs (fields/columns)
You can add new key-value pairs that won’t create null data as would create a new column in a RDS for one row

Question 4

Q

Data Warehousing

Answer

A

Used for business intelligence. E.g. Cognos, Jaspersoft, SQL Server
Used to pull in very large and complex data sets. Usually used by management to do queries on data
Data Warehousing databases use different types of architecture, both from a database perspective and infrastructure layer. AWS’s answer to Online Analytics Processing (OLAP) is Redshift

Question 5

Q

ElastiCache

Answer

A

Web service that makes it easy to deploy, operate, and scale an in-memory cache in the cloud
Improves the performance of web applications by allowing you to retrieve information from fast, managed, in-memory caches, instead of relying entirely on slower disk-based databases
E.g. cache most common web queries, instead of pulling from the DB each time. Takes a huge load off databases
ElastiCache supports two, open-source in-memory caching engines: Memcached and Redis

Question 6

Q

AWS RDS (general)

Answer

A

RDS runs on virtual machines (but you don’t have access to these VMs, you cannot SSH in)
Patching of the RDS OS and DB is Amazon’s responsibility
RDS is NOT serverless (except for Aurora)
Used for Online Transaction Processing (OLTP)

Question 7

Q

RDS Back-ups

Answer

A

Automated backups: recover the DB to any point in time within a “retention period” (1 to 35 days). Point-in-time recovery down to the second within the retention period.
- Enabled by default
- Stored on S3 for free
- Backups taken in a defined window
Database snapshots: done manually and stored even after you delete the original RDS instance
Restoring backups:
- For both automated and snapshots, the restored version will be a new RDS instance with a new DNS endpoint

Question 8

Q

RDS encryption

Answer

A

Available for all RDS types
Done with KMS
Once your RDS is encrypted, the data stored at rest in the underlying storage is encrypted, as are its automated backups, read replicas and snapshots

Question 9

Q

RDS Multi-AZ

Answer

A

Allows you to have an exact copy of your production DB in another AZ.
Handled by AWS, synchronised to a standby database
In the event of DB maintenance, instance failure or AZ failure, RDS will automatically failover to the standby
Multi-AZ is for disaster recover only
Available for all 6 DBs except Aurora, which in itself is fault tolerent

Question 10

Q

RDS Read Replica

Answer

A

Allows a read-only copy of your production DB. Asynchronous replication from the primary RDS instance to the read replica
Available for all 6 DB types
Used for scaling (not DR)
Can have up to 5 read replicas of any DB
Each read replica can have its own DNS endpoint
Can be multi-az
Can be promoted to their own DB

Question 11

Q

DynamoDB

Answer

A

NoSQL DB service for applications that need consistent, single-digit millisecond latency at any scale
Supports both document and key-value data models
Great for mobile, web, gaming, ad-tech, IoT etc
Basics:
- Stored on SSD (hence its speed)
- Spread across 3 geographical data centres
- Eventual consistent reads (default, consistency across all copies of the data within a second)
- Strongly consistent reads (if you need reads faster than 1 second)

Question 12

Q

DynamoDB Accelerator (DAX)

Answer

A

Highly available, in-memory cache
Caching service that sits between your applications and DynamoDB
10x performance improvement
Reduces request time from milliseconds to microseconds - even under load
No need for developers to manage caching logic
Compatible with DynamoDB API calls

Question 13

Q

DynamoDB Transactions

Answer

A

Multiple “all-or-nothing” operations
E.g. financial operations, fulfilling orders, addition and subtraction at the same time
Two underlying reads or writes - prepare/commit
Can carry out 25 items or 4MB of data at one time

Question 14

Q

DynamoDB on-demand capacity

Answer

A

Pay-per-request pricing
Balances cost and performance automatically
No minimum capacity
No charge for read/write - only storage and backups
But you pay more per request than provisioned capacity, therefore best for scenarios such as product launches

Question 15

Q

DynamoDB backup and restore

Answer

A

Full backups at any time
Zero impact on table performance or availability (unlike backups of traditional RDSs)
Consistent within seconds and retained until deleted
Operates within same region as the source table

Question 16

Q

DynamoDB Point-in-time Recovery

Answer

A

Protects against accidental writes or deletes
Restore to any point in the last 35 days
Incremental backups
Not enabled by default
Latest restorable: five minutes in the past

Question 17

Q

DynamoDB Streams

Answer

A

Time-ordered sequence of item-level changes in a table
Stream records are individual pieces of data flowing through in shards
Stored for up to 24 hours
Inserts, updates and deletes
Combine with Lambda functions for functionality like stored procedures

Question 18

Q

Global tables

Answer

A

Managed multi-master, multi-region replication
Globally distributed applications
Based on DynamoDB streams
Multi-region redundancy for DR and HA
No application rewrites
Replication latency under one second

Question 19

Q

Database Migration Service

Answer

A

Enables easy migration from a source DB (e.g. on-prem, S3, Oracle etc.) to a target DB on AWS (DynamoDB, Aurora, Redshit)
All the while the source database will remain operational

Question 20

Q

Database security

Answer

A

Encryption at rest using KMS
Site-to-site VPN
Direct Connect
IAM policies and roles
Fine-grained access
CloudWatch and CloudTrail
VPC endpoints

Question 21

Q

Redshift

Answer

A

A way of doing BI/data warehousing in the cloud
Petabyte-scale data warehouse service in the cloud
Apparently less than a tenth the cost of most other data warehousing solutions
Online Transaction Processing (database) - where we can get an answer to a query from one row, such as looking up a particular transaction
Online Analytics Processing (BI, data warehouse) - where we need to perform transformations on the data to satisfy our query
Data Warehouses use different types of architecture both from database perspective and infrastructure layer
Can be configured as single node or multi-node (leader node with compute nodes)

Question 22

Q

Redshift advanced compression

Answer

A

Columnar data stores can be compressed much more than row-based data stores because similar data is stored sequentially on disk
Redshift employs multiple compression techniques and can achieve significant compression compared to other data stores
Redshift doesn’t require indexes or materialised views, and so uses less space than traditional RDSs
When loading data into a table, Redshift automatically samples your data and selects the most appropriate compression scheme

Question 23

Q

Redshift Massively Parallel Processing (MPP)

Answer

A

Redshift automatically distributes data and query load across all nodes
- It makes it easy to add nodes to your data warehouse and enables you to maintain fast query performance as your data warehouse grows (easy to scale)

Question 24

Q

Redshift backups

Answer

A

Enabled by default
1 - 35 days available
Always attempts to maintain at least three copies of your data (original, replica on compute nodes, and backup on S3)
Redshift can also asynchronously replication your snapshots to S3 in another region for DR

Question 25

Q

Redshift pricing, security and availability

Answer

A

Pricing:
- charged for compute node hours
- charged for backups
- charged for data transfer
Security:
- encrypted in transit using SSL
- encrypted at rest using AES-256 encryption
- by default, Redshift takes care of key management
Availability:
- Only in one AZ at a time
- Can restore snapshots to new AZs in the event of an outage

Question 26

Q

Amazon Aurora

Answer

A

A MySQL and PostgreSQL-compatible DB - Five times better performance than MySQL and 3 times better than PostgreSQL DBs at a much lower price point
Starts with 10GB, scales in 10GB increments to 64TB (autoscaling)
Compute resources can scale up to 32vCPUs and 244GB of Memory
2 copies of your data is contained in each AZ, with a minimum of 3 AZ (i.e. 6 copies of your data)

Question 27

Q

Scaling Aurora

Answer

A

Designed to transparently handle the loss of up to two copies of data without affecting DB write availability and up to three copies without affecting read availability
Storage is also “self-healing”, so data blocks and disks are continuously scanned for errors and repaired automatically

Question 28

Q

Types of Aurora replicas

Answer

A

Aurora replicas (15 available per DB)
MySQL Read Replicas (5)
PostgreSQL (1)

Question 29

Q

Aurora backups

Answer

A

Automated backups are always enabled and don’t impact performance
Can also take snapshots which doesn’t impact performance
Can share Aurora snapshots with other AWS accounts

Question 30

Q

Aurora serverless

Answer

A

On-demand, autoscaling configuration for the MySQL-compatible and PostgreSQL-compatible editions of Aurora
An Aurora Serverless DB cluster automatically starts up, shuts down and scales capacity up or down based on your app’s needs
Provides a cost-effective option for infrequent, intermittent or unpredictable workloads
Only pay on an invocation basis (i.e. when someone accesses your app/website), not per hr/min/sec

Question 31

Q

Elasticache

Answer

A

A web service that makes it easy to deploy, operate and scale an in-memory cache in the cloud. The service improves the performance of web applications by allowing you to retrieve information from fast, managed, in-memory caches, instead of relying on slower disk-based DBs
E.g. 10 most purchased items on amazon.com is stored on Elasticache
Supports two open-source in-memory caching engines:
- Memcached (simple cache to offload DB; multithreaded performance)
- Redis (for more advanced caching capabilities, such as ranking/sorting, persistance, multi AZ etc)

Question 32

Q

Database Migration Service

Answer

A

Service that makes it easy to migrate relational DBs, warehouses, NoSQL DBs and other types of data stores
Used to migrate data into AWS, between on-prem instances, or between combinations of cloud and on-prem setups
Basically is a server on AWS that runs application software in between your source and target connections
Also creates the tables and primary keys if they don’t exist on the target, or can pre-create them manually
Supports homogenous (e.g. Oracle on-prem to Oracle on-prem) and heterogenous (e.g. MySQL to Aurora) migrations

Question 33

Q

Caching strategies on AWS

Answer

A

The following services have caching capabilities:

CloudFront
API Gateway
ElastiCache
DynamoDB Accelerator

Caching leads to less latency for the end user. It is a balancing act between up-to-date, accurate information and latency

Question 34

Q

EMR

Answer

A

Big data solution for using tools such as Apache Spark, Hive, HBase, Hudi etc
Can run petabyte-scale analysis at less than half the cost of traditional on-prem solutions and over three times faster than standard Apache Spark
Central component of EMR is the cluster:
- Each node has a role within the cluster
- Master node: manages the cluster
- Core node: with software components that run tasks and stores data
- Task node: (optional) only runs tasks and does not store data in HDFS
Can configure cluster to periodically archive the log files stored on the master node on S3 (only when you set up the cluster)

Brainscape's Knowledge GenomeTM

Databases Flashcards

Brainscape's Knowledge Genome^TM