Data Modeling Flashcards
Consolidated Fact Tables
Combined Fact Tables that have a similar grain.
ex. Sales Actual & Sales Forecast
Star Schema
Normalized fact table surrounded by denormalized dimension tables
5 Way to handle Early Arriving Fact / Late Arriving Dimension
- Never process
- Send record to error log
- Park and retry later
- Insert dummy value
- Insert dimension and update fact later
SCD Type 0
Fixed, never changes
ex. Social Security Number, Hire Date
Slightly Ragged / Variable Depth Hierarchies
Doesn’t have a fixed number of levels, but range should be shallow
ex. Geography hierarchies range from 3 to 6 levels
Degenerate Dimensions
Dimensions that have no context except for it’s primary key
Placed in fact table, not it’s own dimension table
Ex. Invoice line item numbers
Ragged / Variable Depth Hierarchies
Unknown depth
Use bridge table for every possible path to join to proper hierarchy
SCD Type 5
- The type 5 technique builds on the type 4 mini-dimension by embedding a “current profile” mini-dimension key in the base dimension that’s overwritten as a type 1 attribute
- This approach, called type 5 because 4 + 1 equals 5, allows the currently-assigned mini-dimension attribute values to be accessed along with the base dimension’s others without linking through a fact table.
Customer Key (PK)
Customer ID (NK)
Customer Name
…
Currrent Profile Key (FK)
Current Profile Key (PK)
Current Age Band
Current Score
Current Income Level
Accumulating Snapshot Fact
Stores what happened over period of time
Ex. Sales Order that updates columns (ship date, received date) that updates columns as it progresses
Three Types of Fact Tables
- Transactions
- Periodic Snapshot
- Accumulating Snapshot
SCD Type 3
Add a previous column value to the dimension table
Slowly changing dimension type 3 changes add a new attribute in the dimension to preserve the old attribute value; the new value overwrites the main attribute as in a type 1 change. This kind of type 3 change is sometimes called an alternate reality. A business user can group and filter fact data by either the current value or alternate reality. Rarely used.
Multiple Hierarchies in Dimensions
Multiple grouping levels
Ex. Dates: year, month, day
geography: Country. region, city
category: subcategory, line item
Conceptual Data Model
What the system will contain
Typically created by business stakeholders
Entities, attributes and relationships
Logical Data Model
How the system is implemented
Done by architects and analysts
Date Dimensions
Static dimensions that are generally loaded into the system to be used to create variations of the date
Can also be boolean is_holiday, is_weekend
Outrigger Dimensions
Dimension table that is reference by another dimension table
Happens with SCD Type 4 (dimension -> mini dimension current)
Conformed Dimensions
Single dimension table used in multiple fact tables
ex. date dimension, product dimension, store dimension (address dim used for store location + where customer order shipped to)
Role-playing Dimensions
Dimensions that can mean different things
Ex. Date -> Ship Date, Order Date
Address -> Ship Address, Billing Address
Factless Fact Table
Record of dimensional entities coming together at a point in time
Ex. student attending class on a given day. Nothing to measure
Fixed Depth Positional Hierarchies
Many-to-one relationships
Ex. product -> category
SCD Type 7
With type 7, the fact table contains dual foreign keys for a given dimension: a surrogate key linked to the dimension table where type 2 attributes are tracked, plus the dimension’s durable supernatural key linked to the current row in the type 2 dimension to present current attribute values.
Type 7 delivers the same functionality as type 6, but it’s accomplished via dual keys instead of physically overwriting the current attributes with type 6. Like the other hybrid approaches, the current dimension attributes should be distinctively labeled to minimize confusion.
SCD Type 1
No history - overwrite value
SCD Type 6
Type 6 builds on the type 2 technique by also embedding current attributes in the dimension so that fact rows can be filtered or grouped by either the type 2 value in effect when the measurement occurred or the attribute’s current value.
i.e., add new columns in the same dimension table that track the ‘current values’
Product Key (PK)
Description
Historic Department name
Current Department Name
Effective Date
Expiration Date
Current Row Indicator (True/False)
SCD Type 2
Row versioning / history
Add 3 new columns - effective_date, end_date, is_current
Not suitable for rapid changing dimensions. Use Type 4