Databases 101 Flashcards

Relational Databases, OLTP vs. OLDP

1
Q

What is a relational database analagous to?

A

A traditional spreadsheet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does RDS stand for?

A

Relational Database Servers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 6 flavors of relational databases engines that are compatible with Amazon?

A
  • SQLServer
  • MySQLServer
  • Aurora
  • Oracle
  • PostgreSQL
  • MariaDB
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the 2 key features of RDS? What are they primarily for?

A
  • Multi-AZ (for distaster recovery)
  • Read Replicas (for performance)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Does RDS Multi-AZ require changing the connection string?

A

If you lose access to your primary database, AWS can automatically point incoming requests to your secondary database without changing the connection string

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do Read Replicas work in RDS?

A
  • There is no automatic failover. If your primary instance goes down, you’ll need to create a new URL to connect to the secondary instance
  • All writes to the primary instance are copied over to the read replica(s)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Suppose you want to be able to handle a surge of incoming traffic to your RDS instance and scale out your application. What key feature of RDS would you use to do this?

A

Read Replicas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the maximum number of read replicas that can be made from a single primary RDS instance?

A

up to 5 copies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a non-relational database best analagous to?

A

A JSON Object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does OLTP stand for?

A

Online Transaction Processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does OLAP stand for?

A

Online Analytics Processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference between OLTP and OLAP?

A
  • Big difference is in the types of queries you will run
  • OLTP is for a specific transaction
  • OLAP will pull in a lot of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Amazon’s data warehousing solution?

A

Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Does RDS run on virtual machines?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can you log in to the OS of an RDS instance?

A

You can’t

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can you patch the OS or database of an RDS instance?

A

This is Amazon’s responsibility (you can’t)

17
Q

Is RDS Serverless?

A

No, (except for Aurora)

RDS runs on virtual machines

18
Q

What is AWS DMS used for?

A

DMS allows you to migrate services from one source to AWS

19
Q

What is the difference between a homogenous migration and a heterogeneous migration?

A
  • Homogenous migrations go between the same database engines
  • Heterogenous migrations go between two different database engines
20
Q

What types of infrastructures can be used as a DMS Source?

A

The source can be:

  • On-premises
  • inside AWS itself
  • another cloud storage provider (like Azure)
21
Q

What does SCT stand for?

A

Schema Conversion Tool

22
Q

Do you need AWS SCT for a homogenous migration?

A

No

23
Q

Do you need AWS SCT for a heterogenous migration?

A

Yes

24
Q

What does EMR stand for?

A

Elastic Map-Reduce

25
Q

What is AWS EMR?

A

EMR is the industry-leading cloud big data platform for processing vast amounts of data using open-source tools like Apache Splunk, Apache hive, Apache HBase, Apache Flink, Apache Hudi, and Presto

26
Q

At what scale can AWS EMR run analysis?

A

EMR can run petabyte-scale analysis

27
Q

What is the cost-saving estimate for EMR over traditional on-premises solutions?

A

EMR offers analysis at less than half the cost of traditional on-premises solutions

28
Q

What is the estimated speed increase of EMR over Apache Spark?

A

3x faster

29
Q

What is a cluster in AWS EMR?

A

A collection of EC2 instances, each of which is called a node

30
Q

What are the three node types in EMR?

A
  • Master Node
  • Core Node
  • Task Node
31
Q

What is the purpose of a master node in EMR?

A

The master node manages the cluster and tracks the status of tasks.

Every cluster has a master node

32
Q

What is the purpose of a core node in EMR?

A

A core node runs tasks and stores data in HDFS.

Multi-node clusters have at least one core node

33
Q

What does HDFS stand for?

A

Hadoop Distributed File System

34
Q

What is the purpose of a task node in AWS EMR?

A

A task node runs tasks, but does NOT store data in HDFS

they are optional for your cluster.

35
Q

Suppose your EMR cluster wants to be able view log files even after the master node terminates. Is this possible?

A

Yes, you can configure a cluster to periodically archive the log files stored on the master node to S3.

This ensures log files are available after the cluster terminates, whether through a normal shutdown or due to an error.

Note that this is only possible when you are creating the cluster for the first time

36
Q

How often does EMR archive log files to S3?

A

5-minute intervals

37
Q

By default, how is log data stored in EMR?

A

By default, log data is stored on the master node

38
Q

Can you encrypt an unencrypted database snapshot?

A

No