DBT Cloud & YAML files Flashcards
What is required to materialize a model in your data platform with dbt? Select the best answer.
Note: All CFUs require 100% to receive the dbt Fundamentals badge.
a. A dbt project with .sql and/or .py files saved in a models directory
b. A connection to a data warehouse
c. a dbt_project.yml file
d. A & C only
e. A, B, & C
Correct Answer: e. A, B, & C
Explanation:
To materialize a model in your data platform using dbt, all of the following are required:
-
A dbt project with .sql and/or .py files saved in a models directory:
Models define the transformations that dbt will execute in the data platform. These must be organized in a structured project. -
A connection to a data warehouse:
dbt connects to your data warehouse (e.g., Snowflake, BigQuery, Redshift) to execute transformations and materialize models. -
A dbt_project.yml file:
This file contains the configuration for your dbt project, including naming, settings, and paths. It is required to properly define and execute the project.
When working in dbt, which one of the following should always be unique for each dbt developer when working in development?
a. Data platform
b. Git repository
c. Target schema
d. dbt project name
Correct Answer: C. Target schema
Explanation:
In dbt, the target schema should always be unique for each developer in the development environment. This ensures that each developer works in their own isolated schema, avoiding conflicts or overwrites while developing and testing models.
For example, a developer’s schema might be configured using environment variables:
yml
~~~
target: dev
schema: “{{ var(‘user’) }}”
~~~
Analysis of Options:
a. Data platform:
This is incorrect. All developers typically connect to the same data platform (e.g., Snowflake, BigQuery) during development.
b. Git repository:
This is incorrect. Developers usually work on the same Git repository, using branches to manage changes.
c. Target schema:
This is correct. Each developer needs their own unique schema in the data warehouse for isolated development.
d. dbt project name:
This is incorrect. The project name remains consistent across developers and environments.
Which one of the following is a true statement about dbt connections?
a. Data is stored in the data platform and code is stored in dbt
b. Data is stored in dbt and code is stored in the git repository
c. Data is stored in the data platform and code is stored in the git repository
d. Data and code are both stored in dbt
Correct Answer:
c. Data is stored in the data platform and code is stored in the git repository
Explanation:
In the dbt workflow:
- Data is stored in the data platform (e.g., Snowflake, BigQuery, Redshift).
- Code (e.g., models, tests, and configurations) is version-controlled and stored in a Git repository. dbt pulls this code during execution to apply transformations and build models in the data platform.
Which of the following is true about version control with git in dbt?
a. When making changes to a development branch, you will impact models running in production
b. Only one developer can work on a single development branch at a time
c. Separate branches allow dbt developers to simultaneously work on the same code base without impacting production
d. You have to update the code base to match the environment you are in
Correct Answer:
c. Separate branches allow dbt developers to simultaneously work on the same code base without impacting production
Reason:
- Version control with Git enables developers to work on separate branches independently. Changes made in a branch do not affect the production environment until they are merged into the production branch (e.g., main or master). This approach ensures that multiple developers can work concurrently without conflicts or disruptions to production.
- This is a fundamental feature of version control with Git, promoting safe and collaborative development in dbt projects.
Analysis of Options:
a. When making changes to a development branch, you will impact models running in production:
This is incorrect. Changes in a development branch are isolated and do not impact production until merged.
b. Only one developer can work on a single development branch at a time:
This is incorrect. Multiple developers can work on separate branches simultaneously, not just a single development branch.
c. Separate branches allow dbt developers to simultaneously work on the same code base without impacting production:
This is correct. This reflects Git’s capability to enable collaborative, isolated development through branches.
d. You have to update the code base to match the environment you are in:
This is incorrect. Git branches allow flexibility without requiring manual updates to match the environment.
You are working in the dbt Cloud IDE. How do you ensure the model you have created is built in your data platform?
a. You must save the model
b. You must use the dbt run
command
c. You must commit your changes to your branch
d. You must create a sql file containing a select statement
Correct Answer:
b. You must use the dbt run
command
Reason:
- In dbt, the dbt run command is required to materialize a model in your data platform. This command executes the SQL logic defined in your model and creates or updates the corresponding table or view in the database. Simply saving, committing, or writing the model file does not build it in the data platform.
- This is the essential step to ensure your model is built in the data platform.
Analysis of Options:
a.You must save the model:
This is incorrect. Saving the model only persists the file in the IDE but does not build it in the data platform.
b. You must use the dbt run command:
This is correct. The dbt run command compiles and executes the SQL to materialize the model.
c. You must commit your changes to your branch:
This is incorrect. Committing changes ensures version control but does not execute the model in the data platform.
d. You must create a SQL file containing a select statement:
This is incorrect. While defining the model is necessary, it needs to be executed via dbt run to be materialized.
What are two functions of a staging model in dbt?
a. Perform light transformations on your data set
b. Connect to upstream sources using the source macro
c. Connect to upstream models using the ref macro
d. Perform aggregations that apply business logic
Correct Answer:
a. Perform light transformations on your data set
b. Connect to upstream sources using the source macro
Reason:
- Staging models in dbt serve as an intermediary layer to prepare raw data for further transformations.
- They perform lightweight transformations (e.g., renaming columns or casting data types) and use the source macro to pull data directly from upstream sources. This setup standardizes and cleans the raw data before it is passed to downstream models.
- These functions define the role of staging models in dbt.
What are two functions of a marts model in dbt?
a. Reference upstream sources using the source macro
b. Perform light transformations on the raw data set
c. Apply business logic for stakeholders
d. Reference upstream models using the ref macro
Correct Answer:
c. Apply business logic for stakeholders
d. Reference upstream models using the ref macro
Reason:
- Marts models in dbt are the final layer of transformation, designed to deliver analytics-ready datasets.
- They apply business logic that aligns with stakeholder requirements and reference upstream models (e.g., staging models) using the ref macro. This ensures that marts models build upon standardized and cleaned data from earlier transformations.
- These functions define the purpose of marts models in dbt.
Analysis of Options:
a. Reference upstream sources using the source macro:
This is incorrect. Marts models reference upstream models, not raw sources. The source macro is typically used in staging models.
b. Perform light transformations on the raw data set:
This is incorrect. Light transformations are the responsibility of staging models. Marts models focus on complex business logic and aggregations.
c. Apply business logic for stakeholders:
This is correct. Marts models are designed to include calculations, metrics, and other business logic that stakeholders rely on.
d. Reference upstream models using the ref macro:
This is correct. Marts models use the ref macro to build upon staging or intermediate models.
You just finished developing your first model that directly references and transforms customers source data. Where in your file tree should this model live?
a. models/stg_customers.sql
b. models/staging/stg_customers.sql
c. models/marts/stg_customers.sql
d. models/sources/stg_customers.sql
Correct Answer:
b. models/staging/stg_customers.sql
Reason:
- A model that directly references and transforms raw source data should be placed in the staging directory. Staging models are meant for performing lightweight transformations on source data to prepare it for further downstream modeling. Placing the model in the models/staging folder aligns with dbt’s best practices for organizing files by their role in the transformation process.
- This placement adheres to dbt’s directory structure for staging models.
Analysis of Options:
a. models/stg_customers.sql: This is incorrect.
While the filename follows the stg_ naming convention, it should be placed in the staging subdirectory for proper organization.
b. models/staging/stg_customers.sql: This is correct.
The models/staging directory is the appropriate location for staging models that transform source data.
c. models/marts/stg_customers.sql: This is incorrect.
Marts models are for final transformations and business logic, not initial transformations of source data.
d. models/sources/stg_customers.sql: This is incorrect.
The sources directory typically contains configuration files (e.g., sources.yml), not models.
You want all models in a folder to be materialized as tables.
Where can you accomplish this?
a. In the model itself
b. in the dbt_project.yml file
c. in the sources.yml file
d. in the models.yml file
Correct Answer: b. in the dbt_project.yml file
This file is the appropriate place to configure materializations for all models in a folder.
Reason:
You can configure the materialization for all models in a specific folder in the dbt_project.yml file. This centralized configuration ensures consistency and reduces the need to set materialization for each model individually. You can define the folder-level settings like this:
yaml
models: project_name: folder_name: materialized: table
Analysis of Options:
a. In the model itself: This is incorrect.
While you can set materialization within an individual model file using {{ config(materialized=’table’) }}, this approach is not suitable for configuring all models in a folder.
b. in the dbt_project.yml file: This is correct.
The dbt_project.yml file allows you to configure all models within a folder, including materialization settings.
c. in the sources.yml file: This is incorrect.
The sources.yml file defines raw data sources but does not configure models or their materializations.
d. in the models.yml file: This is incorrect.
There is no models.yml file in dbt by default.
You have just built a dim_customers.sql model that relies on data from the upstream model stg_customers.
How would you reference this model in dim_customers?
a. select * from {{ ref(‘stg_customers.sql’) }}
b. select * from {{ ref(‘stg_customers’) }}
c. select * from {{ source(stg_customers.sql) }}
d. select * from {{ source(stg_customers) }}
Correct Answer: b. select * from {{ ref(‘stg_customers’) }}
This ensures proper referencing of the stg_customers model, allowing dbt to manage dependencies and order of execution.
Reason:
In dbt, the ref macro is used to reference upstream models. The ref macro allows dbt to manage dependencies and execution order within the Directed Acyclic Graph (DAG). You should use the model name (stg_customers) without the .sql extension to reference it properly.
Analysis of Options:
a. select * from {{ ref(‘stg_customers.sql’) }}: This is incorrect.
The .sql extension is not included when using the ref macro.
b. select * from {{ ref(‘stg_customers’) }}: This is correct.
The ref macro correctly references the stg_customers model using its name without the file extension.
c. select * from {{ source(stg_customers.sql) }}: This is incorrect.
The source macro is used for referencing raw data sources defined in sources.yml, not dbt models.
d. select * from {{ source(stg_customers) }}: This is incorrect.
The source macro is used for sources, not for referencing upstream models in dbt.
Which command will only materialize dim_customers.sqland its downstream models?
a. dbt run –select dim_customer
b. dbt run –select dim_customers
c. dbt run –select dim_customers+
d. dbt run –select +dim_customer+
Correct Answer: c. dbt run –select dim_customers+
This command runs dim_customers and its downstream models as expected.
Reason:
The + symbol in dbt selectors indicates downstream dependencies. By using dim_customers+, dbt will materialize the dim_customers model and all of its downstream models in the Directed Acyclic Graph (DAG). This ensures only dim_customers and its dependents are run.
Analysis of Options:
a. dbt run –select dim_customer: This is incorrect.
The model name here has a typo (dim_customer instead of dim_customers), so it wouldn’t work unless another model exists with the name dim_customer.
b. dbt run –select dim_customers: This is incorrect.
This would only materialize the dim_customers model but would not include its downstream models.
c. dbt run –select dim_customers+: This is correct.
The + ensures that both dim_customers and all of its downstream models are materialized.
d. dbt run –select +dim_customer+: This is incorrect.
The selector includes a typo in the model name and would also materialize upstream models, which is beyond the scope of the question.
Which of the following is a benefit of using subdirectories in your models directory?
a. Subdirectories allow you to configure materializations at the folder level for a collection of models
b. Subdirectories allow you to include multiple dbt projects in a single project
c. Subdirectories allow you to explicitly build dependencies between models based on the naming of the folders
d. Subdirectories will automatically change the schema where a model is built based on the name of the folder the model is located in
Correct Answer: a. Sub-directories allow you to configure materializations at the folder level for a collection of models
Reason:
Using subdirectories in the models directory allows you to group related models and configure their settings (e.g., materialization type, schema) collectively in the dbt_project.yml file. This promotes better organization and simplifies configuration for multiple models.
Example configuration in dbt_project.yml:
yaml
~~~
models:
project_name:
subdirectory_name:
materialized: table
~~~
Analysis of Options:
a. Subdirectories allow you to configure materializations at the folder level for a collection of models: This is correct.
Subdirectories enable logical grouping of models and allow you to apply configurations collectively, like materialization type.
b. Subdirectories allow you to include multiple dbt projects in a single project: This is incorrect.
Subdirectories organize models within a single dbt project, not across multiple projects.
c. Subdirectories allow you to explicitly build dependencies between models based on the naming of the folders: This is incorrect.
Dependencies are determined by the ref function, not by the folder names.
d. Subdirectories will automatically change the schema where a model is built based on the name of the folder the model is located in: This is incorrect.
Schema changes must be explicitly configured in the dbt_project.yml file; folder names alone do not affect schema.
You are auditing your dbt project’s DAG. Where will sources appear?
a. Sources are the last node in a DAG, all the way to the right.
b. Sources are the second node in a DAG, connected to the raw data.
c. Sources are the first node in a DAG, all the way to the left.
d. Sources appear directly to the right of a model in the DAG.
Correct Answer: c. Sources are the first node in a DAG, all the way to the left.
Reason:
In a dbt DAG, sources represent the raw data that serves as the starting point for transformations. They are the first nodes in the DAG, positioned on the far left, and are connected to the staging models or other transformations that consume the raw data.
Analysis of Options:
a. Sources are the last node in a DAG, all the way to the right: This is incorrect.
Sources are the starting point, not the endpoint, of the data pipeline.
b. Sources are the second node in a DAG, connected to the raw data: This is incorrect.
Sources are the first node, not the second.
c. Sources are the first node in a DAG, all the way to the left: This is correct.
Sources are where the data flow begins in a dbt project’s DAG.
d. Sources appear directly to the right of a model in the DAG: This is incorrect.
Sources feed data into models; they do not appear downstream of models.
What is the primary purpose of a sources.yml file in dbt?
a. To define tests for models in your project
b. To document and define raw data sources in your project
c. To configure materializations for models
d. To define transformations for staging models
Correct Answer: b. To document and define raw data sources in your project
Reason:
The sources.yml file is used to define sources (raw data from the data platform) in dbt. It also allows you to document these sources and apply tests (e.g., unique, not_null) to validate the data quality at the source level.
Example:
yaml
sources: - name: salesforce schema: raw_salesforce tables: - name: accounts description: "Raw accounts data from Salesforce" columns: - name: account_id tests: - unique - not_null
Which of the following can you define in a .yml file in dbt?
a. Sources and tests for raw data
b. Materializations for models
c. SQL logic for transformations
d. Environment variables for different deployment setups
Correct Answer:
a. Sources and tests for raw data
Explanation:YAML
files in dbt are used to define sources, tests, and descriptions for your models and columns. They help document your data and validate its integrity, especially for raw sources or transformed models. YAML files do not handle materializations or environment variable setups.
Why the Other Options Are Incorrect:
b. Materializations for models:
This is incorrect. Materializations (e.g., table, view) are configured in the dbt_project.yml file or within the model SQL files themselves using the config function. YAML files do not manage materializations.
c. SQL logic for transformations:
This is incorrect. SQL logic is written in .sql files, not in YAML files. YAML files complement SQL files by adding metadata, documentation, and tests.
d. Environment variables for different deployment setups:
This is incorrect. Environment variables are defined outside YAML files, often in the dbt_project.yml file or via environment-level configurations (e.g., .env files or dbt Cloud settings).