Delta Flashcards

1
Q

What is it?

A

Delta Lake enables organizations to build Data Lakehouses, which enable data warehousing and machine learning directly on the data lake leverage super cheap cloud storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What format does Delta use and what file does it work on top of?

A

Table Format, Parquet files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What problem does Delta solve?

A

Data Lake RELIABILITY (Corrupted tables , failed jobs)
Data Inconsistency
DATA LAKE PERFORMANCE - Slow queries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why will customers care about Delta?

A

RELIABILITY AND PERFORMANCE - Delta always guarantees a success view of data and responsive queries at scale.
OPEN FORMAT - No vendor lock-in (cloud agnostic)
COST SAVINGS - Same ability as DW only cheaper and opensource

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When should you position Delta with Customers?

A

1/ If a customer is building a lakehouse and want low latency
2/ If you are talking to data engineers tasked with data management , concerns of performance and reliability with data storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does Delta Work?

A

Creates log file (transaction logs) with parque files, enables ability to track commits/change the dataset. Can understand what happens with dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the Key underlying capabilities of Delta?

A
1/ACID transactions for object storage (Parquet) 
2/Delta logs as “source of truth”
3/Time travel 
4/Easy rollback
5/Metadata management at scale
6/Schema Enforcement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are common misconceptions of delta?

A

OpenSource vs Databricks Delta: Delta Lakes enhanced capabilities are only within Databricks and not in open source Delta. NOT TRUE anymore, we are open sourcing all of Delta with delta 2.0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are red flags when pitching Delta to customers?

A

1/Data warehouse users - Touch on openness of Delta and cost savings, flexibility, collaboration with Delta sharing, unstructured,structured.
2/Small datasets - Delta shines better in large volume datasets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Who are Delta main competitors

A

Snowflake,BigQuery, Hudi,Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are some questions to ask around competition?

A

1/How long do your current queries/jobs take to complete? Do you have SLAs to meet?
2/Do you own your data?
3/Do you have a storage layer thats optimal for BI and ML at the same time?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How much does Delta Cost?

A

1/No extra Databricks costs outside of DBUs (compute usage)

2/Customers pay for storage on their accounts (S3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly