Databases Flashcards

1
Q

What is a relational database and the types available on AWS?

A

Relational database is a database with tables, and each table has rows and columns
- On AWS, you can use SQL Server, Oracle, MySQL Server, Amazon Aurora, MariaDB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Two key features of Relational Database Services (RDSs) on AWS

A
  1. Multi AZ: RDSs on AWS have their own DNS addresses, which point you to your database. If the region goes down where that DB is, they will automatically reroute that DNS to the backed up version of the DB in another region
  2. Read replicas: every time we write to a DB, it is replicated in a Read Replica DB. If the original DB goes down, we have to reroute traffic to the Read Replica DB manually through a new DNS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Non-relational Databases

A
  • Collection (a table)
  • Inside the collection, we have Documents (a row)
  • Key-value pairs (fields/columns)
  • You can add new key-value pairs that won’t create null data as would create a new column in a RDS for one row
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data Warehousing

A
  • Used for business intelligence. E.g. Cognos, Jaspersoft, SQL Server
  • Used to pull in very large and complex data sets. Usually used by management to do queries on data
  • Data Warehousing databases use different types of architecture, both from a database perspective and infrastructure layer. AWS’s answer to Online Analytics Processing (OLAP) is Redshift
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

ElastiCache

A
  • Web service that makes it easy to deploy, operate, and scale an in-memory cache in the cloud
  • Improves the performance of web applications by allowing you to retrieve information from fast, managed, in-memory caches, instead of relying entirely on slower disk-based databases
  • E.g. cache most common web queries, instead of pulling from the DB each time. Takes a huge load off databases
  • ElastiCache supports two, open-source in-memory caching engines: Memcached and Redis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

AWS RDS (general)

A
  • RDS runs on virtual machines (but you don’t have access to these VMs, you cannot SSH in)
  • Patching of the RDS OS and DB is Amazon’s responsibility
  • RDS is NOT serverless (except for Aurora)
  • Used for Online Transaction Processing (OLTP)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

RDS Back-ups

A
  1. Automated backups: recover the DB to any point in time within a “retention period” (1 to 35 days). Point-in-time recovery down to the second within the retention period.
    - Enabled by default
    - Stored on S3 for free
    - Backups taken in a defined window
  2. Database snapshots: done manually and stored even after you delete the original RDS instance
    Restoring backups:
    - For both automated and snapshots, the restored version will be a new RDS instance with a new DNS endpoint
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

RDS encryption

A
  • Available for all RDS types
  • Done with KMS
  • Once your RDS is encrypted, the data stored at rest in the underlying storage is encrypted, as are its automated backups, read replicas and snapshots
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

RDS Multi-AZ

A
  • Allows you to have an exact copy of your production DB in another AZ.
  • Handled by AWS, synchronised to a standby database
  • In the event of DB maintenance, instance failure or AZ failure, RDS will automatically failover to the standby
  • Multi-AZ is for disaster recover only
  • Available for all 6 DBs except Aurora, which in itself is fault tolerent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

RDS Read Replica

A
  • Allows a read-only copy of your production DB. Asynchronous replication from the primary RDS instance to the read replica
  • Available for all 6 DB types
  • Used for scaling (not DR)
  • Can have up to 5 read replicas of any DB
  • Each read replica can have its own DNS endpoint
  • Can be multi-az
  • Can be promoted to their own DB
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

DynamoDB

A
  • NoSQL DB service for applications that need consistent, single-digit millisecond latency at any scale
  • Supports both document and key-value data models
  • Great for mobile, web, gaming, ad-tech, IoT etc
  • Basics:
    • Stored on SSD (hence its speed)
    • Spread across 3 geographical data centres
    • Eventual consistent reads (default, consistency across all copies of the data within a second)
    • Strongly consistent reads (if you need reads faster than 1 second)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

DynamoDB Accelerator (DAX)

A
  • Highly available, in-memory cache
  • Caching service that sits between your applications and DynamoDB
  • 10x performance improvement
  • Reduces request time from milliseconds to microseconds - even under load
  • No need for developers to manage caching logic
  • Compatible with DynamoDB API calls
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

DynamoDB Transactions

A
  • Multiple “all-or-nothing” operations
  • E.g. financial operations, fulfilling orders, addition and subtraction at the same time
  • Two underlying reads or writes - prepare/commit
  • Can carry out 25 items or 4MB of data at one time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

DynamoDB on-demand capacity

A
  • Pay-per-request pricing
  • Balances cost and performance automatically
  • No minimum capacity
    No charge for read/write - only storage and backups
  • But you pay more per request than provisioned capacity, therefore best for scenarios such as product launches
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

DynamoDB backup and restore

A
  • Full backups at any time
  • Zero impact on table performance or availability (unlike backups of traditional RDSs)
  • Consistent within seconds and retained until deleted
  • Operates within same region as the source table
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

DynamoDB Point-in-time Recovery

A
  • Protects against accidental writes or deletes
  • Restore to any point in the last 35 days
  • Incremental backups
  • Not enabled by default
  • Latest restorable: five minutes in the past
17
Q

DynamoDB Streams

A
  • Time-ordered sequence of item-level changes in a table
  • Stream records are individual pieces of data flowing through in shards
  • Stored for up to 24 hours
  • Inserts, updates and deletes
  • Combine with Lambda functions for functionality like stored procedures
18
Q

Global tables

A
  • Managed multi-master, multi-region replication
  • Globally distributed applications
  • Based on DynamoDB streams
  • Multi-region redundancy for DR and HA
  • No application rewrites
  • Replication latency under one second
19
Q

Database Migration Service

A
  • Enables easy migration from a source DB (e.g. on-prem, S3, Oracle etc.) to a target DB on AWS (DynamoDB, Aurora, Redshit)
  • All the while the source database will remain operational
20
Q

Database security

A
  • Encryption at rest using KMS
  • Site-to-site VPN
  • Direct Connect
  • IAM policies and roles
  • Fine-grained access
  • CloudWatch and CloudTrail
  • VPC endpoints
21
Q

Redshift

A
  • A way of doing BI/data warehousing in the cloud
  • Petabyte-scale data warehouse service in the cloud
  • Apparently less than a tenth the cost of most other data warehousing solutions
  • Online Transaction Processing (database) - where we can get an answer to a query from one row, such as looking up a particular transaction
  • Online Analytics Processing (BI, data warehouse) - where we need to perform transformations on the data to satisfy our query
  • Data Warehouses use different types of architecture both from database perspective and infrastructure layer
  • Can be configured as single node or multi-node (leader node with compute nodes)
22
Q

Redshift advanced compression

A
  • Columnar data stores can be compressed much more than row-based data stores because similar data is stored sequentially on disk
  • Redshift employs multiple compression techniques and can achieve significant compression compared to other data stores
  • Redshift doesn’t require indexes or materialised views, and so uses less space than traditional RDSs
  • When loading data into a table, Redshift automatically samples your data and selects the most appropriate compression scheme
23
Q

Redshift Massively Parallel Processing (MPP)

A

Redshift automatically distributes data and query load across all nodes
- It makes it easy to add nodes to your data warehouse and enables you to maintain fast query performance as your data warehouse grows (easy to scale)

24
Q

Redshift backups

A
  • Enabled by default
  • 1 - 35 days available
  • Always attempts to maintain at least three copies of your data (original, replica on compute nodes, and backup on S3)
  • Redshift can also asynchronously replication your snapshots to S3 in another region for DR
25
Q

Redshift pricing, security and availability

A

Pricing:
- charged for compute node hours
- charged for backups
- charged for data transfer
Security:
- encrypted in transit using SSL
- encrypted at rest using AES-256 encryption
- by default, Redshift takes care of key management
Availability:
- Only in one AZ at a time
- Can restore snapshots to new AZs in the event of an outage

26
Q

Amazon Aurora

A
  • A MySQL and PostgreSQL-compatible DB - Five times better performance than MySQL and 3 times better than PostgreSQL DBs at a much lower price point
  • Starts with 10GB, scales in 10GB increments to 64TB (autoscaling)
  • Compute resources can scale up to 32vCPUs and 244GB of Memory
  • 2 copies of your data is contained in each AZ, with a minimum of 3 AZ (i.e. 6 copies of your data)
27
Q

Scaling Aurora

A
  • Designed to transparently handle the loss of up to two copies of data without affecting DB write availability and up to three copies without affecting read availability
  • Storage is also “self-healing”, so data blocks and disks are continuously scanned for errors and repaired automatically
28
Q

Types of Aurora replicas

A
  • Aurora replicas (15 available per DB)
  • MySQL Read Replicas (5)
  • PostgreSQL (1)
29
Q

Aurora backups

A
  • Automated backups are always enabled and don’t impact performance
  • Can also take snapshots which doesn’t impact performance
  • Can share Aurora snapshots with other AWS accounts
30
Q

Aurora serverless

A
  • On-demand, autoscaling configuration for the MySQL-compatible and PostgreSQL-compatible editions of Aurora
  • An Aurora Serverless DB cluster automatically starts up, shuts down and scales capacity up or down based on your app’s needs
  • Provides a cost-effective option for infrequent, intermittent or unpredictable workloads
  • Only pay on an invocation basis (i.e. when someone accesses your app/website), not per hr/min/sec
31
Q

Elasticache

A
  • A web service that makes it easy to deploy, operate and scale an in-memory cache in the cloud. The service improves the performance of web applications by allowing you to retrieve information from fast, managed, in-memory caches, instead of relying on slower disk-based DBs
  • E.g. 10 most purchased items on amazon.com is stored on Elasticache
  • Supports two open-source in-memory caching engines:
    • Memcached (simple cache to offload DB; multithreaded performance)
    • Redis (for more advanced caching capabilities, such as ranking/sorting, persistance, multi AZ etc)
32
Q

Database Migration Service

A
  • Service that makes it easy to migrate relational DBs, warehouses, NoSQL DBs and other types of data stores
  • Used to migrate data into AWS, between on-prem instances, or between combinations of cloud and on-prem setups
  • Basically is a server on AWS that runs application software in between your source and target connections
  • Also creates the tables and primary keys if they don’t exist on the target, or can pre-create them manually
  • Supports homogenous (e.g. Oracle on-prem to Oracle on-prem) and heterogenous (e.g. MySQL to Aurora) migrations
33
Q

Caching strategies on AWS

A

The following services have caching capabilities:

  • CloudFront
  • API Gateway
  • ElastiCache
  • DynamoDB Accelerator

Caching leads to less latency for the end user. It is a balancing act between up-to-date, accurate information and latency

34
Q

EMR

A
  • Big data solution for using tools such as Apache Spark, Hive, HBase, Hudi etc
  • Can run petabyte-scale analysis at less than half the cost of traditional on-prem solutions and over three times faster than standard Apache Spark
  • Central component of EMR is the cluster:
    • Each node has a role within the cluster
    • Master node: manages the cluster
    • Core node: with software components that run tasks and stores data
    • Task node: (optional) only runs tasks and does not store data in HDFS
  • Can configure cluster to periodically archive the log files stored on the master node on S3 (only when you set up the cluster)