Redshift Flashcards

Question 1

Q

What is Redshift?

Answer

A

A fully managed, clustered petabyte-scale data warehouse.

Question 2

Q

What DB is Redshift based upon?

Answer

A

PostgreSQL

Question 3

Q

What database connection is Redshift compatible with?

Answer

A

ODBC and JBDC

Question 4

Q

What is the up-front cost of using Redshift?

Answer

A

None, as with all AWS services, it is pay-as-you-go.

Question 5

Q

What Redshift features allow for complex queries?

Answer

A

parallel processing

columnar data stores

Question 6

Q

What Redshift feature allows you to query directly from data files in s3?

Answer

A

Redshift Spectrum

Question 7

Q

How does columnar data storage improve Redshift?

Answer

A

It drastically reduces the overall disk I/O requirements and reduces the amount of data you need to load from disk.

Question 8

Q

What is parallel processing?

Answer

A

queries are segmented across every core on every node.

Question 9

Q

What is a data lake?

Answer

A

A repository of a variety of data, on top of which you place a framework or technology to make use of the data.

Question 10

Q

What types of frameworks or technologies are sometimes used in data lakes?

Answer

A

Machine learning, analytics, on-prem data movements, real-time data movements.

Question 11

Q

What are some benefits of data lakes?

Answer

A

Queiry raw data without extensive preprocessing.
lessens time from data collection to data value.
Identifies correlations between disparate data sets.

Question 12

Q

What is Redshift spectrum?

Answer

A

A service that allows fast and complex analysis against objects stored in the cloud, such as s3.