Build Data Pipelines with Delta Live Tables and Spark SQL (Databricks Q) Flashcards
Which of the following correctly describes how code from one library notebook can be referenced by code from another library notebook? Select one response.
Within a DLT Pipeline, code in a notebook library can reference tables and views created in another notebook library that is running on the same cluster.
Within a DLT Pipeline, code in notebook libraries cannot reference tables and views created in a different notebook library.
Within a DLT Pipeline, code in a notebook library can reference tables and views created in another notebook library as long as one notebook library references the other notebook library.
Within a DLT Pipeline, code in any notebook library can reference tables and views created in any other notebook library.
Within a DLT Pipeline, code in a notebook library can reference tables and views created in another notebook library as long as the referenced notebook library is installed on the other notebook library’s cluster.
Within a DLT Pipeline, code in any notebook library can reference tables and views created in any other notebook library.
A data engineer needs to add a file path to their DLT pipeline. They want to use the file path throughout the pipeline as a parameter for various statements and functions.
Which of the following options can be specified during the configuration of a DLT pipeline in order to allow this? Select one response.
They can add a parameter when scheduling the pipeline job and then perform a variable substitution of the file path.
They can specify the file path in the job scheduler when deploying the pipeline.
They can set the variable in a notebook command and then perform a variable substitution of the file path.
They can add a key-value pair in the Configurations field and then perform a string substitution of the file path.
They can add a widget to the notebook and then perform a string substitution of the file path.
They can add a key-value pair in the Configurations field and then perform a string substitution of the file path.
A data engineer needs to ensure the table updated_history, which is derived from the table history, contains all records from history. Each record in both tables contains a value for the column user_id.
Which of the following approaches can the data engineer use to create a new data object from updated_history and history containing the records with matching user_id values in both tables? Select one response.
The data engineer can create a new common table expression from the history table that queries the updated_history table.
The data engineer can merge the history and updated_history tables on user_id.
The data engineer can create a new view by joining the history and updated_history tables.
The data engineer can create a new temporary view by querying the history and updated_history tables.
The data engineer can create a new dynamic view by querying the history and updated_history tables.
The data engineer can create a new view by joining the history and updated_history tables.
A data engineer has a Delta Live Tables (DLT) pipeline that uses a change data capture (CDC) data source. They need to write a quality enforcement rule that ensures that values in the column operation do not contain null values. If the constraint is violated, the associated records cannot be included in the dataset.
Which of the following constraints does the data engineer need to use to enforce this rule? Select one
CONSTRAINT valid_operation EXPECT (operation IS NOT NULL)
CONSTRAINT valid_operation EXCEPT (operation) ON VIOLATION DROP ROW
CONSTRAINT valid_operation EXCEPT (operation != null) ON VIOLATION FAIL UPDATE
CONSTRAINT valid_operation EXCEPT (operation) ON VIOLATION DROP ROW
CONSTRAINT valid_operation EXPECT (operation IS NOT NULL) ON VIOLATION DROP ROW
CONSTRAINT valid_operation ON VIOLATION FAIL UPDATE
CONSTRAINT valid_operation EXPECT (operation IS NOT NULL) ON VIOLATION DROP ROW
A data engineer needs to review the events related to their pipeline and the pipeline’s configurations.
Which of the following approaches can the data engineer take to accomplish this? Select one response.
The data engineer can select events of type user_action in the output table of the pipeline.
The data engineer can query events of type user_action from the checkpoint directory.
The data engineer can query events of type user_action from the event log.
The data engineer can query events of type user_action from the configured storage location.
The data engineer can select events of type user_action in the resultant DAG.
The data engineer can query events of type user_action from the event log.
A data engineer is configuring a new DLT pipeline and is unsure what mode to choose. They are working with a small batch of unchanging data and need to minimize the costs associated with the pipeline.
Which of the following modes do they need to use and why? Select one response
Continuous; continuous pipelines run at set intervals and then shut down until the next manual or scheduled update.
Triggered; triggered pipelines update once and cannot be updated again until they are manually run.
Continuous; continuous pipelines ingest new data as it arrives.
Triggered; triggered pipelines update once and cannot be updated again for 24 hours.
Triggered; triggered pipelines run once and then shut down until the next manual or scheduled update.
Triggered; triggered pipelines run once and then shut down until the next manual or scheduled update.
Which of the following are guaranteed when processing a change data capture (CDC) feed with APPLY CHANGES INTO? Select three responses.
APPLY CHANGES INTO automatically orders late-arriving records using a user-provided sequencing key.
APPLY CHANGES INTO supports insert-only and append-only data.
APPLY CHANGES INTO defaults to creating a Type 1 SCD table.
APPLY CHANGES INTO automatically quarantines late-arriving data in a separate table.
APPLY CHANGES INTO assumes by default that rows will contain inserts and updates
APPLY CHANGES INTO automatically orders late-arriving records using a user-provided sequencing key.
APPLY CHANGES INTO defaults to creating a Type 1 SCD table.
APPLY CHANGES INTO assumes by default that rows will contain inserts and updates
A data engineer needs to query a Delta Live Table (DLT) in a notebook. The notebook is not attached to a DLT pipeline.
Which of the following correctly describes the form of results that the query returns? Select one response.
Queries outside of DLT will return snapshot results from DLT tables only if they were defined as a streaming table
Queries outside of DLT will return the most recent version from DLT tables only if they were defined as a streaming table
Queries outside of DLT will return the most recent version from DLT tables, regardless of how they were defined.
Queries outside of DLT will return snapshot results from DLT tables, regardless of how they were defined.
Live queries outside of DLT will return snapshot results from DLT tables only if they were defined as a batch table.
Queries outside of DLT will return snapshot results from DLT tables, regardless of how they were defined
Which of the following data quality metrics are captured through row_epectations in a pipeline’s event log? Select three responses.
Dataset
Failed records
Update ID
Name
Flow progress
Dataset
Failed records
Name
A data engineer has built and deployed a DLT pipeline. They want to see the output for each individual task.
Which of the following describes how to explore the output for each task in the pipeline? Select one response.
They can go to the Job Runs page and click on the individual tables in the job run history.
They can display the output for each individual command from within the notebook using the %run command.
They can go to the Pipeline Details page and click on the individual tables in the resultant Directed Acyclic Graph (DAG).
They can run the commands connected to each task from within the DLT notebook after deploying the pipeline.
They can specify a folder for the task run details during pipeline configuration.
They can go to the Pipeline Details page and click on the individual tables in the resultant Directed Acyclic Graph (DAG).
A data engineer needs to identify the cloud provider and region of origin for each event within their DLT pipeline.
Which of the following approaches allows the data engineer to view this information? Select one response.
The data engineer can load the contents of the event log into a view and display the view.
The data engineer can view this information in the Task Details page for each task in the pipeline.
The data engineer can use a utility command in Python to list information about each update made to a particular data object.
The data engineer can use a SELECT query to directly query the cloud_details field of the event.
The data engineer can view the event details for the pipeline from the resultant Directed Acyclic Graph (DAG).
The data engineer can load the contents of the event log into a view and display the view.
A data engineer has created the following query to create a streaming live table from transactions.
Code block:
CREATE OR REFRESH STREAMING LIVE TABLE transactions
AS SELECT timestamp(transaction_timestamp) AS transaction_timestamp, * EXCEPT (transaction_timestamp, source)
________________________
Which of the following lines of code correctly fills in the blank? Select two responses.
FROM DELTA STREAM(LIVE.transactions)
FROM LIVE.transactions
FROM STREAMING LIVE (transactions)
FROM STREAM(LIVE.transactions)
FROM STREAMING LIVE.transactions
FROM LIVE.transactions
FROM STREAM(LIVE.transactions)
Which of the following statements accurately describes the difference in behavior between live views and live tables? Select one response.
Metrics for live tables can be collected and reported, while data quality metrics for views are abstracted to the user.
The results of live tables are stored to disk, while the results of views can only be referenced from within the DLT pipeline in which they are defined.
The results of live tables can be viewed through a Directed Acyclic Graph (DAG), while the results for live views cannot.
Live tables can be used with a stream as its source, while live views are incompatible with structured streaming.
Live tables can be used to enforce data quality, while views do not have the same guarantees in schema enforcement.
The results of live tables are stored to disk, while the results of views can only be referenced from within the DLT pipeline in which they are defined.
A data engineer is running a Delta Live Tables (DLT) notebook. They notice that several commands display the following message:
This Delta Live Tables query is syntactically valid, but you must create a pipeline in order to define and populate your table.
Which of the following statements explains this message? Select one response.
DLT is not intended for interactive execution in a notebook.
DLT notebooks must be run at scheduled intervals using the job scheduler.
DLT does not support the execution of Python commands.
DLT does not support the execution of Python and SQL notebooks within a single pipeline.
DLT queries must be connected to a pipeline using the pipeline scheduler.
DLT is not intended for interactive execution in a notebook.
A data engineer is creating a live streaming table to be used by other members of their team. They want to indicate that the table contains silver quality data.
Which of the following describes how the data engineer can clarify this to other members of their team? Select two responses.
EXPECT QUALITY = SILVER
COMMENT “This is a silver table”
TBLPROPERTIES (“quality” = “silver”)
None of these answer choices are correct.
WHEN QUALITY = SILVER THEN PASS
COMMENT “This is a silver table”
TBLPROPERTIES (“quality” = “silver”)