Basics Flashcards by Sunny Patel

What is the default materialization used by DBT if you don’t specify one?

A. Table
B. View
C. Incremental
D. Ephemeral

B. View

Explanation:

In DBT, if no materialization is specified in the config block of a model, the default behavior is to create a view.

Materialization determines how DBT creates and persists a model in the database.

View: A logical query stored in the database that gets executed when queried.

Table: A physical table stored in the database with persistent data.

Incremental: A table that updates or appends data in smaller batches.

Ephemeral: A temporary result that exists only in the context of a single query.

Key Takeaway

View is the default materialization in DBT because it is lightweight and does not require extra storage.

How well did you know this?

Not at all

Perfectly

Question 2:
What is the purpose of the on_schema_change configuration in DBT?

A. To automatically rebuild a model if the schema of upstream tables changes.
B. To allow DBT to update or replace target table columns when the schema changes.
C. To enforce strict validation rules for schema consistency.
D. To control whether views or ephemeral models are rebuilt on schema changes.

Correct Answer: B. To allow DBT to update or replace target table columns when the schema changes. 🎉

Explanation:

The on_schema_change configuration in DBT is used in incremental models to handle changes in the schema of the target table. It determines how DBT reacts when columns in the upstream source or model change (e.g., new columns are added).

Options for on_schema_change

ignore: DBT will ignore schema changes. This is the default behavior.
append_new_columns: DBT will add new columns to the target table without deleting or replacing the existing ones.
fail: DBT will throw an error if there are schema changes, stopping the execution.

Key Takeaway:

The on_schema_change setting is critical for incremental models where upstream schema changes might happen. It enables DBT to adapt to those changes without manual intervention.

Example Usage:
{{ config(
materialized=’incremental’,
on_schema_change=’append_new_columns’
) }}

SELECT *
FROM my_source_table

In this case, if new columns are added to my_source_table, DBT will add those new columns to the target table when the model runs.

How well did you know this?

Not at all

Perfectly

What does the dbt seed command do?

A. Loads CSV files into the database as tables.
B. Generates SQL scripts for data models.
C. Runs tests on the source data to ensure data quality.
D. Creates ephemeral models from raw data files.

Correct Answer: A. Loads CSV files into the database as tables.

Explanation:

The dbt seed command is used to load CSV files into the database as tables. These tables can then be referenced like any other model in DBT.

How dbt seed Works:

Place your CSV files in the data directory of your DBT project.
Example project structure:

my_dbt_project/
├── models/
├── data/
│ ├── customers.csv
│ ├── products.csv

Run the dbt seed command:

dbt seed

DBT reads the CSV files and creates database tables using the data in the CSV files.

Key Takeaway:

The dbt seed command simplifies loading static CSV data into your database, making it accessible for models and transformations.

How well did you know this?

Not at all

Perfectly

What is the purpose of the dbt snapshot command?

A. To create ephemeral models for temporary data storage.
B. To capture and track historical changes in source data over time.
C. To test and validate the freshness of source data.
D. To rebuild tables and ensure schema consistency.

Correct Answer: B. To capture and track historical changes in source data over time.

Explanation:

The dbt snapshot command is used to create snapshots of source data that track historical changes over time. This is particularly useful for slowly changing dimensions (SCDs) or auditing data changes.

How DBT Snapshots Work:

Snapshots capture the state of a table or query at specific points in time.
When data in the source table changes, DBT detects those changes and records them as new rows in a snapshot table.
This allows you to analyze historical versions of data and track changes like updates or deletions.

Key Features of Snapshots:

Versioning: Snapshots help you maintain historical records of rows when values change.

Slowly Changing Dimensions: Perfect for scenarios where you need to track data over time, such as:

Customer address changes

Product price updates

How well did you know this?

Not at all

Perfectly

What Happens When You Run dbt Snapshot?

Initial Run: DBT creates a snapshot table and loads the data as-is.
Subsequent Runs:

DBT compares the current source data to the existing snapshot.

If changes are detected, DBT inserts new records with timestamps to indicate the changes.

Key Takeaway:

Use dbt snapshot to track historical changes in data, enabling auditing and slowly changing dimension (SCD) analysis.

How well did you know this?

Not at all

Perfectly

What does the dbt test command do?

A. It checks for syntax errors in the SQL models.
B. It runs assertions on the data to ensure data quality.
C. It validates database connections before running models.
D. It rebuilds all models and refreshes test cases.

The correct answer is B. It runs assertions on the data to ensure data quality.

Explanation:

The dbt test command is used to run data quality tests on your models, sources, and seeds. These tests ensure that your data meets specific criteria, such as:

Uniqueness: No duplicate values in a column.
Not null: A column does not contain null values.
Referential integrity: A foreign key relationship exists between two tables.
Custom tests: You can write custom SQL-based tests for any business rule.

Example Usage

Adding a Test to a Model:
Define tests in your schema.yml file:

version: 2

models:
- name: customers
columns:
- name: id
tests:
- unique
- not_null
- name: email
tests:
- not_null

Run Tests:
To run all tests in your project:

dbt test

Test Output

Success: The data meets the criteria (e.g., no duplicates, no nulls).

Failure: DBT identifies rows that fail the test and outputs them for review.

Why Not C?

Option C: “It validates database connections before running models” describes the dbt debug command, which checks for database connections and project setup issues.

Key Takeaway

The dbt test command ensures data quality by validating data against specified criteria.

Use schema.yml to define built-in or custom tests.

How well did you know this?

Not at all

Perfectly

What is the purpose of the dbt build command?

A. It runs models, tests, and snapshots in the correct order.
B. It only runs incremental models in the project.
C. It rebuilds all tables and refreshes seeds.
D. It validates SQL syntax before running models.

Correct Answer: A. It runs models, tests, and snapshots in the correct order. 🎉

Use dbt run when you only need to materialize models quickly.

Use dbt build for a complete pipeline workflow that includes testing and validation.

Explanation:

The dbt build command is an all-in-one command introduced in recent versions of DBT. It runs the following tasks in the correct dependency order:

Models: Executes and materializes models (tables, views, incremental, ephemeral).
Tests: Runs data quality tests on the results of models and sources.
Snapshots: Runs snapshots to capture historical data changes.
Seeds: Loads CSV files into the database.

How dbt build Works

Running the dbt build command in the terminal:

dbt build

DBT determines the dependency graph of your project and ensures that tasks are executed in the correct order.

It combines the behavior of multiple commands like dbt run, dbt test, and dbt snapshot.

Key Benefits of dbt build

Efficiency: Instead of running multiple commands (dbt run, dbt test, dbt snapshot), you can use one command to execute all tasks.
Order Management: Ensures tasks like models and snapshots are built before running tests.
Consistency: Simplifies the build process and ensures that the output is reliable.

Why the Other Options Are Incorrect

B: “It only runs incremental models” — Incorrect. dbt build runs all models, not just incremental ones.

C: “It rebuilds all tables and refreshes seeds” — Partially correct but incomplete. dbt build does more than just rebuild tables and refresh seeds; it also runs tests and snapshots.

D: “It validates SQL syntax before running models” — Incorrect. This describes what happens during SQL compilation or the dbt compile command.

Key Takeaway

The dbt build command is a powerful, all-in-one tool to:

Run models

Execute tests

Build snapshots

Load seeds
…in the correct dependency order.

How well did you know this?

Not at all

Perfectly

dbt deps

What is the main purpose of the dbt deps command?

A. It compiles SQL code into runnable queries.
B. It installs packages listed in the packages.yml file.
C. It updates all DBT models in the target schema.
D. It runs incremental models and refreshes seeds.

Correct Answer: B. It installs packages listed in the packages.yml file. 🎉

Explanation:
The dbt deps command is used to install and manage DBT packages specified in your project’s packages.yml file.

What are DBT Packages?
DBT packages are reusable pieces of code that you can include in your DBT project.
They can contain models, macros, tests, and other configurations that enhance your DBT project.
Packages are typically hosted on platforms like dbt Hub or GitHub.
How to Use dbt deps
Add the package(s) to the packages.yml file:

yaml
Copy code
packages:
- package: dbt-labs/dbt_utils
version: 1.0.0
Run the dbt deps command:

bash
Copy code
dbt deps
DBT will download and install the package into the dbt_packages directory in your project.

Key Takeaway
Use dbt deps to install and update DBT packages defined in packages.yml.
Packages help you reuse code and streamline project development.

Why the Other Options Are Incorrect
A: “It compiles SQL code into runnable queries” — This is done by the dbt compile command.
C: “It updates all DBT models in the target schema” — This is achieved by the dbt run command.
D: “It runs incremental models and refreshes seeds” — These tasks are managed by dbt build or specific commands like dbt seed and dbt run.

How well did you know this?

Not at all

Perfectly

dbt clean

What does the dbt clean command do?

A. Deletes the target directory and removes compiled artifacts.
B. Removes unused models from the database schema.
C. Drops temporary views created during DBT runs.
D. Refreshes seeds and snapshots to clean outdated data.

The correct answer is A. Deletes the target directory and removes compiled artifacts.
Explanation:
The dbt clean command is used to delete directories like target/ and dbt_packages/ where DBT stores compiled artifacts and package dependencies.

What Does dbt clean Do?
The clean command removes:
* target/ directory: Contains compiled SQL and execution artifacts.
* dbt_packages/ directory: Contains downloaded DBT packages installed using dbt deps.

This command helps to clear up space or reset the project to a clean state.

Why Not C?
Option C: “Drops temporary views created during DBT runs” — Incorrect. DBT does not automatically drop temporary views during dbt clean. Temporary views are handled as part of the database’s behavior.

Key Takeaway:
Use dbt clean to remove compiled artifacts and reset your DBT project to a clean state.
This is useful when preparing for a fresh project run or troubleshooting.

How well did you know this?

Not at all

Perfectly

dbt compile

What does the dbt compile command do?

A. Executes models and generates tables/views in the database.
B. Validates the data source configuration.
C. Translates DBT models into raw SQL files without executing them.
D. Runs tests and validates data quality.

Correct Answer: C. Translates DBT models into raw SQL files without executing them.

Explanation:
The dbt compile command translates your DBT models (written in Jinja and SQL) into raw SQL queries that are ready to be executed against your database. However, the queries are not executed—they are simply generated and stored for inspection.

How It Works:
DBT processes all your models and macros.
It resolves all Jinja syntax, including:
* ref() and source() functions.
* Loops, conditionals, and macros.

The result is a set of pure SQL files saved in the target/compiled directory.

Key Benefits:
* Debugging: You can review the compiled SQL before running models.
* Validation: Helps identify Jinja syntax errors or broken references.
* Efficiency: Allows you to generate queries without querying the database.

Why the Other Options Are Incorrect:
A: “Executes models and generates tables/views” — This describes the dbt run command.
B: “Validates the data source configuration” — This is part of dbt debug.
D: “Runs tests and validates data quality” — This describes the dbt test command.

Key Takeaway:
Use dbt compile to generate raw SQL without executing it. This is great for debugging and validating SQL logic.

How well did you know this?

Not at all

Perfectly

ref()

What happens if you use the ref() function in a model?

A. DBT creates a temporary table for the referenced model.
B. DBT ensures dependencies are respected and the referenced model is built first.
C. DBT caches the results of the referenced model to improve query performance.
D. DBT automatically materializes the referenced model as an incremental table.

Correct Answer: B. DBT ensures dependencies are respected and the referenced model is built first.

Explanation:
The ref() function in DBT is a critical feature that allows you to reference other models. It ensures that dependencies between models are respected and handled automatically.

When you use ref():
1. DBT determines the order in which models should be built based on their dependencies.
1. The referenced model is built first before the current model runs.
1. The ref() function dynamically resolves to the correct schema and table name based on the target environment (e.g., dev, prod).

How DBT Handles Dependencies:
* DBT builds a Directed Acyclic Graph (DAG) to determine the correct execution order for models.
* This ensures there are no circular dependencies, and all upstream models are completed before downstream models.

Why the Other Options Are Incorrect:
A: “DBT creates a temporary table” — Incorrect. The behavior of the referenced model depends on its materialization (e.g., table, view, incremental).
C: “DBT caches the results” — Incorrect. DBT does not cache results; it dynamically resolves references.
D: “DBT automatically materializes the referenced model as incremental” — Incorrect. Materialization must be explicitly defined in the referenced model.

Key Takeaway:
The ref() function is fundamental for managing model dependencies. It ensures upstream models are built first and resolves schema and table names dynamically.

How well did you know this?

Not at all

Perfectly

dbt source freshness

What does the dbt source freshness command do?

A. Checks if source data has been updated recently.
B. Refreshes source tables to match the latest data.
C. Runs tests on source tables to ensure data quality.
D. Deletes outdated rows in source tables.

Correct Answer: A. Checks if source data has been updated recently. 🎉

Explanation:
The dbt source freshness command is used to check how fresh your source data is by comparing the most recent timestamp in the source table with an expected freshness threshold.

How It Works:
1. In your schema.yml file, you define freshness criteria for your sources:

yaml
Copy code
version: 2

sources:
- name: raw
database: analytics
schema: public
tables:
- name: customers
freshness:
error_after: { count: 12, period: hour }
warn_after: { count: 6, period: hour }
loaded_at_field: updated_at
freshness: Defines acceptable time thresholds for source updates:
* error_after: If the data is older than this threshold, the test will fail.
* warn_after: If the data is older than this threshold, it will produce a warning.

loaded_at_field: The column used to determine when the data was last updated (e.g., updated_at).

Run the freshness test:

bash
Copy code
dbt source freshness

DBT checks the most recent value in the loaded_at_field and compares it to the current time.

Example Output:
After running the command, DBT provides a status report:

Source freshness:
| Source | Table | Max loaded_at | Status |
|————-|————-|—————-|———|
| raw | customers | 2024-06-16 | WARN |
* If the data is older than the warn threshold but within the error threshold, DBT will report a warning.
* If the data is older than the error threshold, the test fails.

Why the Other Options Are Incorrect:
* B: “Refreshes source tables to match the latest data” — Incorrect. dbt source freshness only checks freshness; it does not refresh or modify the data.
* C: “Runs tests on source tables to ensure data quality” — Incorrect. This describes dbt test for data quality, not freshness.
* D: “Deletes outdated rows in source tables” — Incorrect. DBT does not delete rows; it only tests and builds models.

Key Takeaway:
* Use dbt source freshness to monitor source data freshness and ensure your data pipelines are processing recent updates.
* This is crucial for time-sensitive analytics and reporting.

How well did you know this?

Not at all

Perfectly

What is the purpose of the dbt debug command?

A. Compiles models to check for SQL errors.
B. Validates the DBT project setup and database connection.
C. Runs all tests and snapshots in the project.
D. Clears the target directory and refreshes packages.

The correct answer is B. Validates the DBT project setup and database connection.
Explanation:
The dbt debug command is used to validate that your DBT project is correctly configured and that DBT can successfully connect to the database.

What dbt debug Checks:
1. Profiles.yml: Ensures your profiles.yml file is configured correctly.
1. Database Connections: Verifies that DBT can connect to your target database.
1. Environment Setup: Checks the availability of required credentials, permissions, and DBT version.
1. Project Integrity: Confirms that the DBT project structure and required files are in place.

Why the Other Options Are Incorrect:
* A: “Compiles models to check for SQL errors” — This describes the dbt compile command.
* C: “Runs all tests and snapshots in the project” — This describes the dbt test and dbt snapshot commands.
* D: “Clears the target directory and refreshes packages” — This describes the dbt clean and dbt deps commands.
* Key Takeaway:

Use dbt debug when:
* Setting up a new DBT project.
* Troubleshooting issues with database connections.
* Validating the integrity of your project environment.

How well did you know this?

Not at all

Perfectly

Which materialization type does not create any persistent objects in the database?

A. Table
B. View
C. Incremental
D. Ephemeral

Correct Answer: D. Ephemeral 🎉
Explanation:
The ephemeral materialization in DBT does not create any persistent objects (tables or views) in the database. Instead, ephemeral models are inlined as subqueries into downstream models where they are referenced.

Advantages of Ephemeral Models:
1. No Storage Overhead: Since ephemeral models don’t create tables or views, they save storage space.
1. Efficient for Small Queries: Best used for lightweight transformations that don’t need to be persisted.
1. Inline Query Execution: The SQL logic is included directly in downstream models for better performance.

Why the Other Options Are Incorrect:
* A (Table): Creates a physical table in the database.
* B (View): Creates a logical view that persists in the database.
* C (Incremental): Creates or updates a physical table in batches.

Key Takeaway:
* Ephemeral models are inlined as subqueries and do not create any persistent database objects.
* Use them for lightweight transformations that don’t need to be stored independently.

How well did you know this?

Not at all

Perfectly

What happens when you run dbt run?

A. DBT runs all models and creates tables/views based on their materializations.
B. DBT runs all tests defined in the project.
C. DBT compiles models into raw SQL but does not execute them.
D. DBT refreshes seed data and snapshots.

Correct Answer: A. DBT runs all models and creates tables/views based on their materializations.

Explanation:
The dbt run command executes all models in your project and creates tables, views, or other database objects based on their materializations.

Key Behaviors of dbt run:
1. Executes Models: Runs all .sql files in the models/ directory that are not disabled.
2. Materializations: DBT uses the materialization specified in each model’s configuration to determine how to build it:
* Table: Creates a physical table.
* View: Creates a logical view.
* Incremental: Updates or appends to a table in small batches.
* Ephemeral: Inlines the query into downstream models (no database object created).

Dependencies: Models are executed in the correct order based on the dependency graph (determined by ref() and source relationships).

What Happens:
1. DBT compiles the models (resolving ref() and other Jinja logic).
1. It executes the SQL queries against your target database.
1. DBT creates the appropriate database objects (tables or views) based on the specified materialization.

Why the Other Options Are Incorrect:
* B: “DBT runs all tests” — This is done by the dbt test command.
* C: “DBT compiles models into raw SQL but does not execute them” — This describes the dbt compile command.
* D: “DBT refreshes seed data and snapshots” — Seeds are refreshed using dbt seed, and snapshots are run using dbt snapshot.

Key Takeaway:
dbt run executes your models and creates tables/views in the database based on their materializations.

How well did you know this?

Not at all

Perfectly

Which of the following DBT commands runs both models and tests in one go?

A. dbt build
B. dbt run
C. dbt test
D. dbt compile

Correct Answer: A. dbt build 🎉
Explanation:
The dbt build command is an all-in-one command that:
1. Runs models: Executes SQL to create tables, views, or other objects.
1. Executes tests: Runs data quality checks defined in your project.
1. Runs snapshots: Captures historical data changes.
1. Loads seeds: Updates static data from CSV files into your database.

How dbt build Works
When you run: dbt build
DBT executes tasks in the correct dependency order (DAG structure):
1. Models are materialized first.
1. Tests are run on the newly created models or sources.
1. Snapshots and seeds are executed where applicable.

Why dbt build is Important:
* It’s a comprehensive command to build and validate your entire DBT project in one step.
* It reduces the need to manually run multiple commands like dbt run, dbt test, and dbt snapshot.

Why the Other Options Are Incorrect:
* B (dbt run): Only runs models (creates tables/views) without executing tests.
* C (dbt test): Only runs tests to check data quality.
* D (dbt compile): Compiles SQL but does not execute it against the database.

Key Takeaway:
Use dbt build to run models, tests, and snapshots together in the correct order, ensuring your entire pipeline is validated and up-to-date.

What is the purpose of the dbt snapshot command?

A. To create ephemeral models for temporary transformations.
B. To capture and track historical changes in source data over time.
C. To rebuild all incremental models and seeds.
D. To delete outdated rows in a target table.

Correct Answer: B. To capture and track historical changes in source data over time. 🎉

Explanation:
The dbt snapshot command is used to track changes in source data over time. It enables versioning of rows in a table, which is particularly useful for Slowly Changing Dimensions (SCDs) or historical audits.

How Snapshots Work:
1. Snapshots store historical versions of data as it changes, using a strategy like:
timestamp: Tracks changes based on an updated_at column.
check: Tracks changes based on the value of a column or set of columns.
2. When data changes in the source table, DBT creates a new version of the row in the snapshot table.

Why Snapshots Are Useful:
Historical Audits: Allows you to see how data looked at different points in time.
Slowly Changing Dimensions (SCD Type 2): Helps version rows when data changes.
Compliance: Useful for tracking updates to critical data for auditing purposes.

Why the Other Options Are Incorrect:
A: “To create ephemeral models for temporary transformations” — Ephemeral models do not persist in the database.
C: “To rebuild all incremental models and seeds” — This is unrelated to snapshots.
D: “To delete outdated rows in a target table” — Snapshots add rows to track changes; they don’t delete rows.

Key Takeaway:
Use dbt snapshot to track and capture historical changes in your data for auditability and versioning.

dbt run-operation

What does the dbt run-operation command do?

A. Executes a specific macro defined in the project.
B. Runs all models that are materialized as tables.
C. Refreshes incremental models and snapshots.
D. Executes tests on sources and models.

To execute a macro using dbt run-operation, use this syntax:

dbt run-operation <macro_name></macro_name>

Correct Answer: A. Executes a specific macro defined in the project. 🎉

Explanation:
The dbt run-operation command is used to execute a specific macro in your DBT project. This allows you to run custom tasks or operations without building models or running the entire project.

What are Macros?
Macros are reusable pieces of Jinja SQL logic in DBT.
They can be used to automate repetitive tasks or execute operations that don’t fit neatly into a model.

When to Use dbt run-operation
* Running ad hoc tasks like vacuuming, truncating, or updating specific tables.
* Automating database management operations outside of standard DBT model builds.
* Executing helper utilities that interact with your database.

Why the Other Options Are Incorrect:
B: “Runs all models that are materialized as tables” — This describes the dbt run command.
C: “Refreshes incremental models and snapshots” — This is done by dbt build or dbt snapshot.
D: “Executes tests on sources and models” — This is done by the dbt test command.

Key Takeaway:
Use dbt run-operation to execute specific macros for tasks that extend beyond running models, like database cleanup or administrative operations.

What is the purpose of the dbt ls command?

A. To list all models, sources, and snapshots in the project.
B. To execute all models in the project.
C. To show test results for each model.
D. To delete compiled artifacts and reset the project.

Correct Answer: A. To list all models, sources, and snapshots in the project. 🎉
Explanation:
The dbt ls command is used to list objects in your DBT project. This includes models, sources, seeds, snapshots, and tests. It is especially helpful for exploring your project structure and identifying specific objects.

What Does dbt ls Do?
Displays a list of all resources in the project, such as:
* Models (tables, views, incremental, ephemeral)
* Sources
* Seeds
* Snapshots
* Tests

You can apply filters to narrow the list to specific resources.

Key Takeaway:
* Use dbt ls to list and explore resources in your DBT project. Combine it with filters for precise outputs.
* Quickly identify which models, seeds, or snapshots exist in your project.
* Generate lists of resources for further automation or scripting.
* Debugging: Verify that certain models or tests are included in the project.

What does the dbt seed command do?

A. Loads CSV files into the database as tables.
B. Runs SQL tests on model outputs.
C. Creates ephemeral models in the database.
D. Compiles models into raw SQL without running them.

Correct Answer: A. Loads CSV files into the database as tables. 🎉

Explanation:
The dbt seed command is used to load static data from CSV files into your database as tables. This is particularly useful for small, static datasets like lookup tables or configuration values.

Why Use dbt seed?
* Static Data: Great for reference or lookup tables that don’t frequently change.
* Version Controlled: Since seeds are plain CSV files, they can be tracked in version control (Git).
* Reusability: You can reference seed tables in your models using ref():

sql
Copy code
SELECT *
FROM {{ ref(‘customers’) }}

Key Takeaway:
Use dbt seed to load CSV files as tables into your database for static or reference data.

What is the primary use of the dbt source command?

A. To refresh data in the source tables.
B. To test the freshness and availability of source data.
C. To create incremental models based on source tables.
D. To compile and validate all source definitions.

The correct answer is B. To test the freshness and availability of source data.

Explanation:
The dbt source command is primarily used to check the freshness and availability of source tables in your database. It allows you to ensure that your raw data sources are up-to-date and available for downstream transformations.

Why the Other Options Are Incorrect:
A: “To refresh data in the source tables” — DBT does not modify or refresh raw source data; it only queries and tests it.
C: “To create incremental models based on source tables” — Incremental models are defined with materialized=’incremental’ and use ref() or source().
D: “To compile and validate all source definitions” — This is not the primary purpose of dbt source freshness. Compiling source definitions is part of dbt compile.

Key Takeaway:
Use dbt source freshness to monitor the freshness and availability of your source data, ensuring reliable upstream data for transformations.

Category: Materializations

What is the default materialization in DBT when no configuration is specified?

A. Table
B. View
C. Incremental
D. Ephemeral

Correct Answer: B. View 🎉

Explanation:
In DBT, when no materialization is specified, the default is view.

A view is a logical object in the database. It does not store data but instead runs a query every time it is accessed.

Why View is the Default:
* Lightweight: Views don’t take up physical storage.
* Quick Development: Faster to create and modify during development.
* Flexibility: You can later change the materialization to table, incremental, or ephemeral based on requirements.

Key Takeaway:
view is the default materialization. Explicitly specify other materializations (e.g., table or incremental) when necessary.

Category: Sources, Tests, Seeds, and Snapshots

What does the dbt seed command do?

A. Loads CSV files into the database as tables.
B. Creates views for raw source tables.
C. Tests data models for null and unique constraints.
D. Captures changes in source data over time.

Correct Answer: A. Loads CSV files into the database as tables. 🎉

Explanation:
The dbt seed command loads CSV files located in the data/ directory of a DBT project into your target database as tables. These tables are often used for static data, such as lookup values or configuration datasets.

Why the Other Options Are Incorrect:
B: “Creates views for raw source tables” — This describes sources with source() references, not seeds.
C: “Tests data models for null and unique constraints” — This describes dbt test.
D: “Captures changes in source data over time” — This is done by snapshots using dbt snapshot.

Key Takeaway:
Use dbt seed to load small, static datasets (CSV files) as tables in your database

Category: Jinja and Macros

What is the purpose of the ref() function in DBT?

A. To create a snapshot of the source table.
B. To reference models and enforce execution order.
C. To load static data files as tables.
D. To create temporary views for incremental models.

Correct Answer: B. To reference models and enforce execution order. 🎉

Explanation:
The ref() function in DBT is used to reference other models in your project. It allows DBT to:

Enforce Execution Order: DBT ensures that all upstream dependencies are built first before running the current model.
Enable Dynamic Schema Resolution: ref() dynamically resolves the table/view name and schema based on the target environment (e.g., dev, prod).
Build the Directed Acyclic Graph (DAG): DBT uses ref() to determine the relationships between models and optimize execution.

Key Benefits of ref():
* Environment Flexibility: ref() ensures models work across multiple environments (e.g., dev, staging, prod).
* Dependency Management: DBT automatically orders model execution based on ref() relationships.
* Readability: Improves SQL maintainability and readability by avoiding hardcoded table names.

Use the ref() function to create model dependencies, dynamically resolve table names, and ensure execution order in your DBT project.

# **Category**: Incremental Logic In an incremental model, what does the unique_key configuration do? A. Ensures rows with the same key are updated instead of being duplicated. B. Adds a primary key constraint to the table. C. Deletes outdated rows before inserting new data. D. Prevents schema changes during incremental updates.

Correct Answer: A. Ensures rows with the same key are updated instead of being duplicated. 🎉 **Explanation**: In an incremental model in DBT, the unique_key configuration ensures that rows with the same key are updated rather than being duplicated in the target table. This behavior aligns with the merge strategy, where DBT performs an upsert: 1. If a row with the same unique_key already exists, it gets updated. 1. If no matching row exists, a new row is inserted. **Why Is unique_key Important?** Without a unique_key, DBT will append new rows, which may lead to duplicates. unique_key ensures clean and efficient incremental updates. **Key Takeaway**: Use the unique_key configuration in incremental models to avoid duplication and ensure rows are updated correctly. ## Footnote **Why the Other Options Are Incorrect**: B: "Adds a primary key constraint to the table" — DBT does not add database constraints; it just handles data logic. C: "Deletes outdated rows before inserting new data" — This describes full refresh behavior, not incremental updates. D: "Prevents schema changes during incremental updates" — Schema changes are controlled using on_schema_change, not unique_key.

# **Category**: DBT Project Setup Which file in a DBT project specifies database connection settings? A. dbt_project.yml B. profiles.yml C. schema.yml D. packages.yml

The correct answer is B. profiles.yml **Explanation**: The profiles.yml file is where database connection settings are configured for a DBT project. It defines; 1. credentials, 2. database schemas, 3. target configurations (e.g., development, staging, production environments). **How profiles.yml Works**: Located outside your DBT project folder (typically in ~/.dbt/), the profiles.yml file defines: * Database type: BigQuery, Snowflake, Redshift, etc. * Connection credentials: Host, username, password, API keys, etc. * Target schema: Specifies where DBT will build tables/views. **Key Takeaway**: * Use the profiles.yml file to configure database connection settings. * The dbt_project.yml file handles project-specific configurations. ## Footnote **Why the Other Options Are Incorrect**: **A: dbt_project.yml** --> Contains DBT project-level settings, like model configurations and versioning. Does not store database connection details. **C: schema.yml** --> Defines metadata like model documentation, column-level tests, and sources. **D: packages.yml** --> Specifies external DBT packages to include in your project (e.g., dbt_utils).

# Category: Materializations What materialization type in DBT is best suited for handling large tables with incremental updates? A. Table B. View C. Incremental D. Ephemeral

Correct Answer: C. Incremental 🎉 **Explanation**: The incremental materialization in DBT is designed for handling large tables by processing only new or updated rows instead of rebuilding the entire table. This improves performance for data pipelines with frequent updates. **Example** {{ config( materialized='incremental', unique_key='id' ) }} SELECT id, name, updated_at FROM source('raw_data', 'customers') {% if is_incremental() %} WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }}) {% endif %} ## Footnote **Key Takeaway**: Use incremental materialization for large tables with frequent updates to optimize performance and scalability.

What technological changes have contributed to the evolution of Extract-Transform-Load (ETL) to Extract-Load-Transform (ELT)? a. Modern cloud-based data warehouses with scalable storage and compute b. Off-the-shelf data pipeline/extraction tools (ex: Stitch, Fivetran) c. Self-service business intelligence tools increased the ability for stakeholders to access and analyze data d. All of the above

**Correct Answer**: d. All of the above **Explanation**: Each of the options has played a significant role in the shift from traditional ETL to modern ELT workflows: a. **Modern cloud-based data warehouses with scalable storage and compute**: These allow raw data to be loaded first and transformed later, leveraging the scalability and performance of tools like Snowflake, BigQuery, and Redshift. b. **Off-the-shelf data pipeline/extraction tools (e.g., Stitch, Fivetran)**: These tools simplify and automate the process of extracting and loading raw data into the data warehouse, aligning with the ELT paradigm. c. **Self-service business intelligence tools**: These tools empower stakeholders to interact with and analyze data, increasing the demand for flexible and iterative transformations within the warehouse.

Which one of the following is true about YAML files in dbt? a. YAML files are used for building models b. YAML files are used for writing singular tests c. YAML files are used for configuring generic tests d. YAML files are used for setting your development schema

**Answer**: **C**. YAML files are used for configuring generic tests YAML files define and configure generic tests, ensuring data quality and validation. **Explanation**: In dbt, YAML files are primarily used for configuration and documentation purposes, including setting up generic tests (e.g., unique, not_null) on models, columns, or sources. **Example of a Generic Test in a YAML File**: `yaml` ``` models: - name: customers columns: - name: customer_id tests: - unique - not_null ``` ## Footnote **Analysis of Options**: a. **YAML files are used for building models**: This is incorrect. Models are built using .sql files, not YAML files. b. **YAML files are used for writing singular tests**: This is incorrect. Singular tests are written in .sql files, not YAML files. c. **YAML files are used for configuring generic tests**: This is correct. `YAML` files configure generic tests, such as unique, not\_null, and relationships. d. **YAML files are used for setting your development schema**: This is incorrect. Development schema settings are typically configured in the dbt\_project.yml file, not in schema-specific YAML files.

dbt is part of which step of the Extract-Load-Transform (ELT) process? a. Extract b. Transform c. Load d. Model

**Correct Answer: B**. Transform **Explanation**: dbt focuses on the Transform step of the ELT process. It enables users to write SQL to transform raw data into clean, analytics-ready datasets directly within the data warehouse, leveraging its processing power. ## Footnote **Analysis of Options**: a. **Extract**: This is incorrect. The extraction of data from source systems is typically handled by tools like Fivetran or Stitch. b. **Transform**: This is correct. dbt specializes in transforming raw data after it has been loaded into the data warehouse. c. **Load**: This is incorrect. Data loading is typically handled by data pipeline tools, not dbt. d. **Model**: This is incorrect. While dbt uses models to define transformations, modeling is a technique within the Transform step, not a separate step.

Which one of the following is true about dbt in the context of the modern data stack? a. dbt connects directly to your data platform to model data b. dbt stores and processes data directly c. dbt is responsible for moving data back into applications d. dbt connects to multiple data sources to move data into your platform

**Correct Answer: A. dbt connects directly to your data platform to model data** **Explanation**: In the modern data stack, dbt connects to your data platform (e.g., Snowflake, BigQuery, Redshift) to transform and model data using SQL. It leverages the power of the data warehouse to process transformations rather than storing or moving data itself. ## Footnote **Analysis of Options**: a. **dbt connects directly to your data platform to model data**: This is correct. dbt transforms data already stored in your data platform by executing SQL queries directly against it. b. **dbt stores and processes data directly**: This is incorrect. dbt does not store data or act as a processing engine—it relies on the underlying data platform for storage and computation. c. **dbt is responsible for moving data back into applications**: This is incorrect. Moving data into applications is typically the role of reverse ETL tools, not dbt. d. **dbt connects to multiple data sources to move data into your platform**: This is incorrect. Extracting and loading data into a platform is done by ETL/ELT tools like Fivetran or Stitch, not dbt.