Database Services Flashcards

1
Q

What DB can’t be autoscaled?

A

MS SQL Server and Oracle DBs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How are read replicas accessed?

A

Via API endpoint same as other DB instances in RDS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When can RDS be encrypted?

A

You can only enable encryption for an Amazon RDS DB instance when you create it, not after the DB instance is created.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

List Aurora Single Master Cluster types

A

Aurora Serverless, parallel query, and Global Database clusters are all single-master clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Aurora single-master clusters

A

a single DB instance performs all write operations and any other DB instances are read-only.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Aurora multi-master DB cluster

A

all DB instances can perform write operations. There isn’t any failover when a writer DB instance becomes unavailable, because another writer DB instance is immediately available to take over the work of the failed instance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Aurora DB cluster

A

DB cluster consists of one or more DB instances and a cluster volume that manages the data for those DB instances. An Aurora cluster volume is a virtual database storage volume that spans multiple Availability Zones, with each Availability Zone having a copy of the DB cluster data. Two types of DB instances make up an Aurora DB cluster:

Primary DB instance – Supports read and write operations, and performs all of the data modifications to the cluster volume. Each Aurora DB cluster has one primary DB instance.

Aurora Replica – Connects to the same storage volume as the primary DB instance and supports only read operations. Each Aurora DB cluster can have up to 15 Aurora Replicas in addition to the primary DB instance. Maintain high availability by locating Aurora Replicas in separate Availability Zones. Aurora automatically fails over to an Aurora Replica in case the primary DB instance becomes unavailable. Aurora Replicas can also offload read workloads from the primary DB instance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Amazon RDS Read Replicas

A

Amazon RDS Read Replicas provide enhanced performance and durability for RDS database (DB) instances. They make it easy to elastically scale out beyond the capacity constraints of a single DB instance for read-heavy database workloads.

Amazon RDS replicates all databases in the source DB instance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Amazon RDS Read Replicas - Where can they be deployed?

A

Read replicas can be within an Availability Zone, Cross-AZ, or Cross-Region.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

RedShift

A

RedShift is a columnar data warehouse DB that is ideal for running long complex queries. RedShift can also improve performance for repeat queries by caching the result and returning the cached result when queries are re-run. Dashboard, visualization, and business intelligence (BI) tools that execute repeat queries see a significant boost in performance due to result caching.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are foundational technologies used in Redshift?

A

OLAP (online analytic processing)
SQL
Colomnar data storage
Massively Parallel Processing (MPP) by distributing data and queries across all nodes
PostgreSQL compatible with JDBC and ODBC drivers available
EC2
HDD or SDD storage options (160GB + per node)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are features of Redshift?

A

Analyze all your data using standard SQL and existing Business Intelligence (BI) tools
Clustered peta-byte scale data warehouse
Query directly from data files on S3 via RedShift Spectrum
Provides advanced compression - compression scheme selected automatically

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Amazon RedShift Spectrum

A

a feature of Amazon Redshift that enables you to run queries against exabytes of unstructured data in Amazon S3, with no loading or ETL required.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Redshift Leader node

A
Manages client connections and receives queries.
Simple SQL end-point.
Stores metadata.
Optimizes query plan.
Coordinates query execution.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Redshift Compute Node

A

Stores data and performs queries and computations.
Local columnar storage.
Parallel/distributed execution of all queries, loads, backups, restores, resizes.
Up to 128 compute nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain how Redshift offers 1. Durability and 2. High Availability

A
  1. Replication and continuous backups

2. Automatically recover from component and node failures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the scope of RedShift?

A

RedShift is an AZ service. Clusters can be run across multiple AZs by loading data into two Amazon Redshift data warehouse clusters in separate AZs from the same set of Amazon S3 input files.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How can you stand up Redshift in another AZ from Redshift in single AZ?

A

Restore Redshift snapshot in the second AZ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How does RedShift keep copies of data?

A
  1. Stores the original
  2. A replica on compute nodes (within the cluster).
  3. A backup copy on S3.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How does RedShift provides continuous/incremental backups?

A

Multiple copies within a cluster.
Continuous and incremental backups to S3.
Continuous and incremental backups across regions.
Streaming restore.

20
Q

RedShift provides fault tolerance for what failures?

A

Disk failures

Nodes failures - For nodes failures the data warehouse cluster will be unavailable for queries and updates until a replacement node is provisioned and added to the DB

Network failures

AZ/region level disasters

21
Q

How do you achieve High availability for RedShift?

A

Currently, RedShift does not support Multi-AZ deployments.
The best HA option is to use multi-node cluster which supports data replication and node recovery.
A single node RedShift cluster does not support data replication and you’ll have to restore from a snapshot on S3 if a drive fails.

22
Q

How is disaster recovery implemented in RedShift?

A

RedShift can asynchronously replicate your snapshots to S3 in another region for DR.

23
Q

How do you scale RedShift?

A

Scaling requires a period of unavailability of a few minutes (typically during the maintenance window).

During scaling operations RedShift moves data in parallel from the compute nodes in your existing data warehouse cluster to the compute nodes in your new cluster.

24
Q

How does RedShift charging work?

A

Charged for compute nodes hours, 1 unit per hour (only compute node, not leader node).

Backup storage – storage on S3.

Data transfer – no charge for data transfer between RedShift and S3 within a region but for other scenarios you may pay charges.

25
Q

Amazon ElastiCache

A

Fully managed implementations of two popular in-memory data stores – Redis and Memcached

In-memory key/value store – not persistent in the traditional sense.

26
Q

How is ElastiCache Billed?

A

Billed by node size and hours of use.

27
Q

What are considerable constraints of ElastiCache?

A
  1. ElastiCache EC2 nodes can’t be accessed across internet or from outside the VPC
  2. Can be on-demand or reserved instances too (but not Spot instances).
  3. You cannot move an existing Amazon ElastiCache Cluster from outside VPC into a VPC.
28
Q

What are subnet groups in the context of ElastiCache?

A

Subnet groups are a collection of subnets designated for your Amazon ElastiCache Cluster.
You need to configure subnet groups for Elasticache for the VPC that hosts the EC2 instances and the Elasticache cluster.

29
Q

Cache Security Groups

A

When not using a VPC, Amazon ElastiCache allows you to control access to your clusters through Cache Security Groups (you need to link the corresponding EC2 Security Groups).

30
Q

ElastiCache Cluster

A

a collection of one or more nodes using the same caching engine.

31
Q

Memcached

A

One of two ElastiCache engines

simplest model, can run large nodes with multiple cores/threads, can be scaled in and out, can cache objects such as DBs

32
Q

Redis, list features

A

One of two ElastiCache engines

complex model, supports encryption, master / slave replication, cross AZ (HA), automatic failover and backup/restore.

33
Q

Typical use cases for ElastiCache

A

Web Session Store - In cases with load balanced web servers, store web session informtion in Redis so if the server is lost, the session info is not lost and another web server can pick it up

Database Caching - Use Memcached in front of AWS RDS to cache popular queries to offload work from RDS and return results faster to users

Leaderboards - Use Redis to provide a live leaderboard for millions of users of your mobile app

Streaming data for dashboards - Provide a landing spot for your streaming sensor data on the factory floor providing live real time dashboard displays

34
Q

When should you use Memcached instead of Redis?

A

Simple, no frills
You need to scale out and in as demand changes
You need to run multiple CPU cores/threads
You need to cache objects (e.g. DB queries)
Don’t need persistent cache

35
Q

When should you Redis use instead of Memcached?

A
You need
persistent cache
encryption
HIPAA compliance
Clustering
Complex data types
HA (e.g. replication)
Pub/Sub capability
Geospatial indexing
Backup and restore
36
Q

Memcached Features

A

Integrate with SNS for node failure/recovery notification
Auto-discovery for nodes added/removed to/from cluster
Place nodes in different AZs

37
Q

Memcached Constraints

A

No support for snapshots

No support for multi AZ failover or replication

38
Q

Aurora Serverless

A

Amazon Aurora Serverless is an on-demand, auto-scaling configuration for Amazon Aurora. The database automatically starts up, shuts down, and scales capacity up or down based on application needs. This is an ideal database solution for infrequently-used applications.

39
Q

How do read replica’s replicate data?

Async or sync replication?

A

Amazon RDS creates a second DB instance using a snapshot of the source DB instance. It then uses the engines’ native asynchronous replication to update the read replica whenever there is a change to the source DB instance.

40
Q

How does MultiAZ RDS replicate data?

A

Synchronous Replication except Aurora which does asynchronous replication

41
Q

How does Aurora fail over work?

A

Aurora will promote the read replica that has the highest priority in tiered levels from 0-15, with zero being highest priority

If priority is the same, it promotes the largest sized replica

42
Q

Amazon Dynamodb

A

a fully managed serverless key-value and document database, durable database with built-in security, backup and restore, and in-memory caching for internet-scale applications

43
Q

DAX

A

DynamoDB Accelerator (DAX) is a DynamoDB-compatible caching service that enables you to benefit from fast in-memory performance for demanding applications

44
Q

AWS Database Migration Service

A

helps you migrate databases to AWS securely

source database remains fully operational during the migration

supports homogeneous (x->x) and heterogeneous (x->y) migrations

45
Q

AWS Schema Conversion Tool

A

makes heterogeneous database migrations predictable by automatically converting the source database schema and a majority of the database code objects, including views, stored procedures, and functions, to a format compatible with the target database

46
Q

AWS Systems Manager

A

service that you can use to view and control your infrastructure on AWS or on prem

Using the Systems Manager console, you can view operational data from multiple AWS services (e.g. patch levels) and automate operational tasks across your AWS resources (e.g. patches)

helps you maintain security and compliance by scanning your managed nodes and reporting on (or taking corrective action on) any policy violations it detects

47
Q

DynamoDB global tables

A

replicate your data automatically across your choice of AWS Regions and automatically scale capacity to accommodate your workloads

48
Q

How does DynamoDB provide high availability and data durability?

A

DynamoDB synchronously replicates data across three facilities in an AWS Region