Storage and Data Management: 22% (Redshift, S3, Lake Formation, Glue Data Catalog, HDFS, EMRFS) Flashcards

Question 1

Q

Which services are appropriate for building data lakes on AWS?

Answer

A

S3, Lake Formation

Question 2

Q

Which services are appropriate for building data warehouses on AWS?

Question 3

Q

Which storage service is appropriate for highly structured data serving as a single point of truth?

Question 4

Q

Name three roles that Lake Formation fills

Answer

A

a) organising and curating ingested data
b) securing lake data
c) orchestrating transformation jobs with other services

Question 5

Q

What sort of data can be stored in an S3 data lake, structured, semistructured or unstructured?

Answer

A

All three

Question 6

Q

Is Lake Formation used to create ETL operations?

Question 7

Q

Name three user-defined components of an S3 object url

Answer

A

a) region
b) bucket name
c) object key

Question 8

Q

Is Redshift a relational or columnar database?

Question 9

Q

Name the key difference between columnar and relational databases

Answer

A

Relational databases are optimised for fast retrieval of rows, typically for transactional applications

Columnar databases are optimised for fast retrieval of columns, typically for analytical applications

Question 10

Q

Name two Apache columnar databases that can be hosted on AWS

Answer

A

Cassandra and HBase

Question 11

Q

What is the fastest way to load data into Redshift?

Answer

A

Bulk copying of multiple compressed files from S3

Question 12

Q

How can a manifest file be used with the Redshift copy command?