Real World Scenarios for pipeline reliability and deployment Flashcards
Advanced Scenario
You are using an incremental model to update a large table daily. A new column, customer_status, was added to the upstream table, but your incremental model failed with a schema mismatch error.
Which of the following steps will resolve the issue without dropping the target table?
A. Add on_schema_change=’sync_all_columns’ to the model configuration.
B. Drop the target table and rerun the incremental model.
C. Use the dbt deps command to refresh package dependencies.
D. Run dbt test to ensure data integrity before rerunning the model.
Correct Answer: A. Add on_schema_change=’sync_all_columns’ to the model configuration.
Explanation:
When using incremental models in DBT, schema mismatches can occur when new columns are added or removed in the upstream source. The on_schema_change configuration allows DBT to handle schema changes gracefully without requiring you to drop or rebuild the table.
Available Options for on_schema_change
ignore (default):
- DBT ignores schema changes and continues without modifying the target table schema.
- This can result in missing or misaligned columns in the target table.
append_new_columns:
- DBT adds new columns to the target table but does not remove existing ones.
- This is ideal when upstream changes are additive.
sync_all_columns (Correct for this scenario):
- DBT synchronizes the target table schema with the upstream changes:
- Adds new columns.
- Removes dropped columns.
Ensures full schema alignment between source and target.