Terms Flashcards
What is heart of lake house
Delta lake
What is delta lake
An open approach to bringing data management and governance to data lakes
Benefits of delta lake
Better reliability
48x faster data processing with indexing
Data governance at scale with fine grained access control lists
Benefits of data bricks
Simple data only needs to exist once
Open based on open source
Collaborative. Can share across data engineering data analytics data science data applications. No longer siloed
Lake house exists on top of
Data lake
Control plane
Back end services that data bricks managed in its own cloud account
Notebook commands and workplace configurations stored here
Encrypted at rest
Data plane
Where data is processed
Resides in your own cloud account
Hooks into data bricks and other proprietary systems
Clusters
A set of computational resources and configurations on which your run data engineering, data science and data analytics workload
Clusters live
In the data plane, the cluster management is in control plane
Clusters are
Made up of one or more virtual machine instances
Driver
Part of a cluster, coordinators activities of executors
Distributes workload across worker nodes
Executor
Runs tasks composing a spark job
All purpose clusters
Analyze data collaboratively using interactive notebooks
Create clusters from the workspace or api
Retains up to 70 clusters for up to 30 days
Can manually stop and start
Multiple users can share them
Job clusters
Run automated jobs
The databricks job scheduler creates job clusters when running jobs
Created by a schedule and terminated when job is complete
Cannot restart a job cluster
Retains up to 30 clusters
What is job cluster retention
30 days unless manually pinned
Notebooks
Primary way to interact with code
Notebook languages
Sql
Python
ArA
Scala
What can go in notebooks
Plots
Images
Markdown texcode
Do you need to restart cluster if you edit it
Depends on edit
What resources returned when cluster is terminated
Associates vm purged
Operational memory purged
Attached volume storage deleted
Network connection between nodes removed
When are clusters configurations terminated
Idle for 30 days
After 70 terminations
What should I do with results from a cluster job
If it needs to persist move it to permanent storage otherwise it will be removed with the cluster
Does cluster purge affect code
No
Do notebooks need a cluster
Yes
Can you mix languages
Yes, even if you set a default.
What is the magic databricks symbol
%
Where can you restart a cluster
From Custer menu
From Custer drop down In notebook.
Magic command to run one notebook from o other
%run
Databricks Utica name
Dbutils.fs.lis
Databricks versioning is inmjtable
False. Attached to a notebook copy the notebook history is lost.