Delta Lake Flashcards

1
Q

The evolution of data management…

A

Data Warehouses Late 80s
Data Lake 2011
Cloud Data Platform 2020

While warehouses excel in handling structured data, most enterprises have to deal with unstructured, semi-structured, and data with high variety, velocity, and volume. Data warehouses are not suited for many of these use cases, and they are certainly not the most cost-efficient.

While suitable for storing data, data lakes lack some critical features. Data lakes do not support ACID transactions, do not enforce data quality, and their lack of consistency/isolation makes it almost impossible to mix appends and reads, and batch and stream jobs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Delta Lake?

A

Delta Lake is a storage solution specifically designed to work with Apache Spark and is read from and written to using Apache Spark.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Elements of Delta Lake

A

Delta tables
Delta optimization engine
Delta Lake storage layer
Delta architecture design pattern

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Apache Spark

A

A powerful paradigm in modern data storage and processing is the separation of compute and storage.

In this system, Apache Spark loads and performs computation on the data. It does not handle permanent storage. Apache Spark works with Delta Lake, the first storage solution specifically designed to do so.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The benefit of the separation of computation and storage:

Ex. Cloud Data Platform

A

makes it easier to allocate resources elastically.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A data lake is a NoSQL system meaning that

A

it can support structured as well as semi-structured and unstructured data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Enterprise Decision Support System Architectures

A

Inmon Architecture
on-premises data warehouse system
cloud-based data warehouse system
Cloud Data Platform

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Delta Architecture

A

With the Delta architecture, multiple tables of data are kept in the same data lake. We write batch and stream data to the same table. Data is written to a series of progressively cleaner and more refined tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly