Delta Live Tables Flashcards
What is Delta Live Tables
Delta Live Tables is a declarative framework for building reliable, maintainable, and testable data processing pipelines.
Delta Live table datasets are what
streaming tables, materialized views, and views maintained as the results of declarative queries.
For Streaming Tables, how are records processed through defined queries
Each record is processed exactly once. This assumes an append-only source.
For materialized views, how are records processed through defined queries
Records are processed as required to return accurate results for the current data state. Materialized views should be used for data sources with updates, deletions, or aggregations, and for change data capture processing (CDC).
For views, how are records processed through defined queries
Records are processed each time the view is queried. Use views for intermediate transformations and data quality checks that should not be published to public datasets.
What are streaming tables usually used for
Ingestion pipelines, are optimal for pipelines that require data freshness and low latency.
In DLT, when are mvs refreshed
according to the update schedule of the pipeline in which they’re contained
When should views be used in dlt
to enforce data quality constraints or transform and enrich datasets that drive multiple downstream queries.
Should not be exposed to end users or systems
When creating a DLT pipeline, what is a target
The target database where other authorized members can access the resulting data from the pipeline.
How do you create a delta live tables pipeline and deploy using DLT UI?
In the workspace UI, click on Workflows, select delta live tables and create a pipeline and select a notebook with dlt code
What are Databricks cluster pools
are a set of idle, ready-to-use instances