Intro to Data Lakes Flashcards

1
Q

What is a Data Lake?

A

generally describes a place where you can securely store various types of data of
all scales for processing and analytics. Data Lakes typically drive data analytics, data science and ML, and batch or streaming pipelines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Compare a data lake to a data warehouse

A
data lake is a capture of every
aspect of your business
operation. The data is stored in
its natural/raw format, usually as
object blobs or files.
Loaded only when its use is defined
● Processed/organized/transformed
● Provide faster insights
● Current/historical data for reporting
● Tends to have consistent schema
shared across applications
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When would one use EL (Extract & Load)?

A

When you have data that is readily ingestible by the cloud product. i.e. Avro file can be easily ingested by BQ. Assuming no other transformations are needed. Use EL when the data can be imported.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When to use ETL?

A

That’s when data is loaded in a cloud product but not in the final form you want it.

by transforming the data before
loading it into the cloud you might be able to greatly reduce the network bandwidth that you need and reducing the data size upfront that gets loaded.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an example of ELT?

A

Extract data from on-premise, load it into a Cloud Product then do a transformation.

ELT allows raw data to be loaded
directly into the target and then transformed there. For example, a very common example inside
a big query you could use SQL to transform that raw data that’s loaded into bigquery
and just simply write it to a new table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What comprises Cloud Storage?

A

Buckets and Objects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are buckets in Cloud Storage?

A

buckets are containers
which hold objects, and objects exist inside of those buckets and
not apart from them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are Multi-Region Buckets?

A

are buckets replicated across regions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How are region buckets replicated?

A

single region bucket as you might expect, the objects are replicated across zones within that one region.

when the object is
retrieved it’s served up from the closest replica to the requester. And that’s how the low-latency happens

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is high-throughput achieved with buckets?

A

multiple requesters could be retrieving the objects at the same time from different replicas, and that’s how high throughput is achieved

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can you manage costs with Cloud Storage?

A

move objects that have been accessed in 30 days to nearline storage class or after 90 days to coldline storage to help optimize your costs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Cloud Spanner?

A

But if you require a globally
distributed database, then you could use Cloud Spanner. You’d want a globally
distributed database if your database will see updates from applications running in different geographic regions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

if you need a horizontal read and write scaling with SQL Database ___________

A

consider the use of Cloud Spanner

How well did you know this?
1
Not at all
2
3
4
5
Perfectly