Data Modeling Flashcards
Consolidated Fact Tables
Combined Fact Tables that have a similar grain.
ex. Sales Actual & Sales Forecast
Star Schema
Normalized fact table surrounded by denormalized dimension tables
5 Way to handle Early Arriving Fact / Late Arriving Dimension
- Never process
- Send record to error log
- Park and retry later
- Insert dummy value
- Insert dimension and update fact later
SCD Type 0
Fixed, never changes
ex. Social Security Number, Hire Date
Slightly Ragged / Variable Depth Hierarchies
Doesn’t have a fixed number of levels, but range should be shallow
ex. Geography hierarchies range from 3 to 6 levels
Degenerate Dimensions
Dimensions that have no context except for it’s primary key
Placed in fact table, not it’s own dimension table
Ex. Invoice line item numbers
Ragged / Variable Depth Hierarchies
Unknown depth
Use bridge table for every possible path to join to proper hierarchy
SCD Type 5
- The type 5 technique builds on the type 4 mini-dimension by embedding a “current profile” mini-dimension key in the base dimension that’s overwritten as a type 1 attribute
- This approach, called type 5 because 4 + 1 equals 5, allows the currently-assigned mini-dimension attribute values to be accessed along with the base dimension’s others without linking through a fact table.
Customer Key (PK)
Customer ID (NK)
Customer Name
…
Currrent Profile Key (FK)
Current Profile Key (PK)
Current Age Band
Current Score
Current Income Level
Accumulating Snapshot Fact
Stores what happened over period of time
Ex. Sales Order that updates columns (ship date, received date) that updates columns as it progresses
Three Types of Fact Tables
- Transactions
- Periodic Snapshot
- Accumulating Snapshot
SCD Type 3
Add a previous column value to the dimension table
Slowly changing dimension type 3 changes add a new attribute in the dimension to preserve the old attribute value; the new value overwrites the main attribute as in a type 1 change. This kind of type 3 change is sometimes called an alternate reality. A business user can group and filter fact data by either the current value or alternate reality. Rarely used.
Multiple Hierarchies in Dimensions
Multiple grouping levels
Ex. Dates: year, month, day
geography: Country. region, city
category: subcategory, line item
Conceptual Data Model
What the system will contain
Typically created by business stakeholders
Entities, attributes and relationships
Logical Data Model
How the system is implemented
Done by architects and analysts
Date Dimensions
Static dimensions that are generally loaded into the system to be used to create variations of the date
Can also be boolean is_holiday, is_weekend
Outrigger Dimensions
Dimension table that is reference by another dimension table
Happens with SCD Type 4 (dimension -> mini dimension current)
Conformed Dimensions
Single dimension table used in multiple fact tables
ex. date dimension, product dimension, store dimension (address dim used for store location + where customer order shipped to)
Role-playing Dimensions
Dimensions that can mean different things
Ex. Date -> Ship Date, Order Date
Address -> Ship Address, Billing Address
Factless Fact Table
Record of dimensional entities coming together at a point in time
Ex. student attending class on a given day. Nothing to measure
Fixed Depth Positional Hierarchies
Many-to-one relationships
Ex. product -> category
SCD Type 7
With type 7, the fact table contains dual foreign keys for a given dimension: a surrogate key linked to the dimension table where type 2 attributes are tracked, plus the dimension’s durable supernatural key linked to the current row in the type 2 dimension to present current attribute values.
Type 7 delivers the same functionality as type 6, but it’s accomplished via dual keys instead of physically overwriting the current attributes with type 6. Like the other hybrid approaches, the current dimension attributes should be distinctively labeled to minimize confusion.
SCD Type 1
No history - overwrite value
SCD Type 6
Type 6 builds on the type 2 technique by also embedding current attributes in the dimension so that fact rows can be filtered or grouped by either the type 2 value in effect when the measurement occurred or the attribute’s current value.
i.e., add new columns in the same dimension table that track the ‘current values’
Product Key (PK)
Description
Historic Department name
Current Department Name
Effective Date
Expiration Date
Current Row Indicator (True/False)
SCD Type 2
Row versioning / history
Add 3 new columns - effective_date, end_date, is_current
Not suitable for rapid changing dimensions. Use Type 4
SCD Type 4
Add mini-dimension
The type 4 technique is used when a group of dimension attributes are split off into a separate mini-dimension. This approach is useful when dimension attribute values are relatively volatile. Frequently-used attributes in multi-million row dimension tables are also mini-dimension design candidates, even if they don’t change frequently. A surrogate key is assigned to each unique profile or combination of attribute values in the mini-dimension. The surrogate keys of both the base dimension and mini-dimension profile are captured as foreign keys in the fact table.
Date Key (FK)
Customer Key (FK) - Goes to main customer dim table with scd’s (name, address, date of birth)
Customer Profile Key (FK) - Goes to mini-dimension dim with rapidly changing dims (age band, score, income level)
How to deal with Nulls in Fact Tables
Nulls behave gracefully for aggregate functions
AVOID nulls in foreign keys to prevent referential integrity issues. Set an ‘Unknown’ value
Junk Dimension
Miscellaneous, low-cardinality flags and indicators
Instead of making separate dimension tables, create a single junk dimension combing them together. Will be the the permutations of all possible values that actually occur in source data
What is a Fact Table
Table that can be summed, averaged or manipulated
Durable Key
Surrogate key that does not change value in SCDs
Periodic Snapshot Fact
Summary of events at specific points in space/time/
Usually semi-additive or non-additive
Ex: customer bank account balance on given day
Dimension Surrogate Key
Incrementing integers used as primary keys
Steps to Create Star Schema
- Identify Business Process
- Identify the Grain
- Identity Dimensions
- Identify Facts
Type of Fact Table Measurements
- Additive: Sum across any dimensions
- Semi-additive: Sum across any EXCEPT time (ex. headcount)
- Non-additive: Cannot sum (ex. % off, unit price of item)
Natural/Business Key
Identifier used by the business (ex. employee number). Generally not used as primary keys as these could change (update employee numbers to new system)
Three Levels of Data Modeling
- Conceptual: Scope
- Logical: Table/columns
- Physical: Physical structure
Physical Data Model
How to implement. Done by DBA and developers
Galaxy Schema
AKA Fact Constellation Schema
Multiple fact tables connected with shared (conformed) dimensions
Conformed Fact
Fact used in more than one fact table
Used across multiple marts
Snowflake Schema
Can have normalized dimension tables
Transaction Fact Table
Facts of events at specific points in space and time
Shrunken Rollup Dimension
Used to develop aggregate fact tables
Ex. Ticket sold per venue. Main dim is by venue. Want to also see by city.
Shrunken dimensions are conformed dimensions that are a subset of rows and /or columns of a base dimension. Shrunken rollup dimensions are required when constructing aggregate fact tables. They are also necessary for business processes that naturally capture data at a higher level of granularity, such as a forecast by month and brand (instead of the more atomic date and product associated with sales data). Another case of conformed dimension subsetting occurs when two dimensions are at the same level of detail, but one represents only a subset of rows.
Centipede Fact Tables
Multiple foreign keys in a fact table for many-to-one hierarchy dimensions
Some designers create separate normalized dimensions for each level of a many-to- one hierarchy, such as a date dimension, month dimension, quarter dimension, and year dimension, and then include all these foreign keys in a fact table. This results in a centipede fact table with dozens of hierarchically related dimensions. Centipede fact tables should be avoided.
Centipede fact tables also result when designers embed numerous foreign keys to individual low-cardinality dimension tables rather than creating a junk dimension.
Dimension Tables
Wide, low-cardinality, denormalized tables