Terms Flashcards
What is heart of lake house
Delta lake
What is delta lake
An open approach to bringing data management and governance to data lakes
Benefits of delta lake
Better reliability
48x faster data processing with indexing
Data governance at scale with fine grained access control lists
Benefits of data bricks
Simple data only needs to exist once
Open based on open source
Collaborative. Can share across data engineering data analytics data science data applications. No longer siloed
Lake house exists on top of
Data lake
Control plane
Back end services that data bricks managed in its own cloud account
Notebook commands and workplace configurations stored here
Encrypted at rest
Data plane
Where data is processed
Resides in your own cloud account
Hooks into data bricks and other proprietary systems
Clusters
A set of computational resources and configurations on which your run data engineering, data science and data analytics workload
Clusters live
In the data plane, the cluster management is in control plane
Clusters are
Made up of one or more virtual machine instances
Driver
Part of a cluster, coordinators activities of executors
Distributes workload across worker nodes
Executor
Runs tasks composing a spark job
All purpose clusters
Analyze data collaboratively using interactive notebooks
Create clusters from the workspace or api
Retains up to 70 clusters for up to 30 days
Can manually stop and start
Multiple users can share them
Job clusters
Run automated jobs
The databricks job scheduler creates job clusters when running jobs
Created by a schedule and terminated when job is complete
Cannot restart a job cluster
Retains up to 30 clusters
What is job cluster retention
30 days unless manually pinned