Delta Flashcards
What is it?
Delta Lake enables organizations to build Data Lakehouses, which enable data warehousing and machine learning directly on the data lake leverage super cheap cloud storage
What format does Delta use and what file does it work on top of?
Table Format, Parquet files
What problem does Delta solve?
Data Lake RELIABILITY (Corrupted tables , failed jobs)
Data Inconsistency
DATA LAKE PERFORMANCE - Slow queries
Why will customers care about Delta?
RELIABILITY AND PERFORMANCE - Delta always guarantees a success view of data and responsive queries at scale.
OPEN FORMAT - No vendor lock-in (cloud agnostic)
COST SAVINGS - Same ability as DW only cheaper and opensource
When should you position Delta with Customers?
1/ If a customer is building a lakehouse and want low latency
2/ If you are talking to data engineers tasked with data management , concerns of performance and reliability with data storage
How does Delta Work?
Creates log file (transaction logs) with parque files, enables ability to track commits/change the dataset. Can understand what happens with dataset
What are the Key underlying capabilities of Delta?
1/ACID transactions for object storage (Parquet) 2/Delta logs as “source of truth” 3/Time travel 4/Easy rollback 5/Metadata management at scale 6/Schema Enforcement
What are common misconceptions of delta?
OpenSource vs Databricks Delta: Delta Lakes enhanced capabilities are only within Databricks and not in open source Delta. NOT TRUE anymore, we are open sourcing all of Delta with delta 2.0
What are red flags when pitching Delta to customers?
1/Data warehouse users - Touch on openness of Delta and cost savings, flexibility, collaboration with Delta sharing, unstructured,structured.
2/Small datasets - Delta shines better in large volume datasets
Who are Delta main competitors
Snowflake,BigQuery, Hudi,Redshift
What are some questions to ask around competition?
1/How long do your current queries/jobs take to complete? Do you have SLAs to meet?
2/Do you own your data?
3/Do you have a storage layer thats optimal for BI and ML at the same time?
How much does Delta Cost?
1/No extra Databricks costs outside of DBUs (compute usage)
2/Customers pay for storage on their accounts (S3)