Databases 101 Flashcards

Relational Databases, OLTP vs. OLDP

1
Q

What is a relational database analagous to?

A

A traditional spreadsheet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does RDS stand for?

A

Relational Database Servers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 6 flavors of relational databases engines that are compatible with Amazon?

A
  • SQLServer
  • MySQLServer
  • Aurora
  • Oracle
  • PostgreSQL
  • MariaDB
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the 2 key features of RDS? What are they primarily for?

A
  • Multi-AZ (for distaster recovery)
  • Read Replicas (for performance)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Does RDS Multi-AZ require changing the connection string?

A

If you lose access to your primary database, AWS can automatically point incoming requests to your secondary database without changing the connection string

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do Read Replicas work in RDS?

A
  • There is no automatic failover. If your primary instance goes down, you’ll need to create a new URL to connect to the secondary instance
  • All writes to the primary instance are copied over to the read replica(s)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Suppose you want to be able to handle a surge of incoming traffic to your RDS instance and scale out your application. What key feature of RDS would you use to do this?

A

Read Replicas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the maximum number of read replicas that can be made from a single primary RDS instance?

A

up to 5 copies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a non-relational database best analagous to?

A

A JSON Object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does OLTP stand for?

A

Online Transaction Processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does OLAP stand for?

A

Online Analytics Processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference between OLTP and OLAP?

A
  • Big difference is in the types of queries you will run
  • OLTP is for a specific transaction
  • OLAP will pull in a lot of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Amazon’s data warehousing solution?

A

Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Does RDS run on virtual machines?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can you log in to the OS of an RDS instance?

A

You can’t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can you patch the OS or database of an RDS instance?

A

This is Amazon’s responsibility (you can’t)

17
Q

Is RDS Serverless?

A

No, (except for Aurora)

RDS runs on virtual machines

18
Q

What is AWS DMS used for?

A

DMS allows you to migrate services from one source to AWS

19
Q

What is the difference between a homogenous migration and a heterogeneous migration?

A
  • Homogenous migrations go between the same database engines
  • Heterogenous migrations go between two different database engines
20
Q

What types of infrastructures can be used as a DMS Source?

A

The source can be:

  • On-premises
  • inside AWS itself
  • another cloud storage provider (like Azure)
21
Q

What does SCT stand for?

A

Schema Conversion Tool

22
Q

Do you need AWS SCT for a homogenous migration?

23
Q

Do you need AWS SCT for a heterogenous migration?

24
Q

What does EMR stand for?

A

Elastic Map-Reduce

25
What is AWS EMR?
EMR is the industry-leading cloud _big data platform_ for processing vast amounts of data using open-source tools like Apache Splunk, Apache hive, Apache HBase, Apache Flink, Apache Hudi, and Presto
26
At what scale can AWS EMR run analysis?
EMR can run **petabyte-scale** analysis
27
What is the cost-saving estimate for EMR over traditional on-premises solutions?
EMR offers analysis at **less than half the cost** of traditional on-premises solutions
28
What is the estimated speed increase of EMR over Apache Spark?
**3x** faster
29
What is a **cluster** in AWS EMR?
A **collection of EC2 instances**, each of which is called a **node**
30
What are the three node types in EMR?
* Master Node * Core Node * Task Node
31
What is the purpose of a master node in EMR?
The master node manages the cluster and tracks the status of tasks. ## Footnote **Every cluster has a master node**
32
What is the purpose of a core node in EMR?
A core node **runs tasks and stores data in HDFS**. Multi-node clusters have at least one core node
33
What does **HDFS** stand for?
**H**adoop **D**istributed **F**ile **S**ystem
34
What is the purpose of a task node in AWS EMR?
A task node runs tasks, but does NOT store data in HDFS they are optional for your cluster.
35
Suppose your EMR cluster wants to be able view log files even after the master node terminates. Is this possible?
**Yes**, **you can configure a cluster to periodically archive the log files stored on the master node to S3**. This ensures log files are available after the cluster terminates, whether through a normal shutdown or due to an error. Note that **this is only possible when you are creating the cluster for the first time**
36
How often does EMR archive log files to S3?
**5-minute intervals**
37
By default, how is log data stored in EMR?
By default, log data is stored **on the master node**
38
Can you encrypt an unencrypted database snapshot?
**No**