Data Warehouses and Dimensional Modelling Flashcards
Main goal of DW and Dimensional Modelling
Simplicity.
Ensures that users can easily understand databases.
Allows software to efficiently navigate databases.
Dimensional Modelling (DM) Definition
A logical design technique for structuring data so that it’s intuitive to business users and delivers fast query performance.
Three primary design goals
- Present the required information to users as simply as possible (understandability).
- Return query results to the users as quickly as possible (query performance)
- Provide relevant information that accurately tracks the underlying business processes
Present the required information to users as simply as possible (understandability).
- Make everything as simple as possible, but not simpler. (Einstein)
- DMs are much easier for users to understand - information is grouped into coherent business categories.
- DMs typically contain exactly the same content as a normalized model, but with far fewer tables.
- Simplicity is relative - the model must reflect complex business processes.
Return query results to the users as quickly as possible (query performance)
- Relational environment:
- Help query performance because of denormalization.
- Pre-join hierarchies and lookup tables - fewer join paths and less immediate temporary tables.
- Predictable framework allows the DBMS to make strong assumptions which aid in performance. - OLAP environment:
- Engine designed to support DMs
- Performance due to aggregation within and across dimensions.
Provide relevant information that accurately tracks the underlying business processes
Requires full range of design patterns to create models that accurately capture and track the business.
Benefits of Dimensional Modelling
- Understandability
- Query performance
- Each dimension is an equivalent entry point into the fact table. Therefore, query performance is simple, predictable and controllable.
- Graceful extensibility to accommodate new data. No query or BI application needs to be reprogrammed to accommodate changes.
Star Join
Each business process can be represented by a DM that consists of a normalized fact table surrounded by denormalized dimension tables.
Fact table
- Highly normalized, storing little redundant data.
- Contains the measurements associated with a specific business process.
- Record in fact table = measurement of a business event.
- Primary key is a multi-part/composite key made up of a subset of foreign keys from business process dimensions.
- Fact tables always express many to many relationships between dimensions.
- Every foreign key must match a unique primary key in the corresponding dimension. - Some business processes track events that do not contain facts. Then a factless fact table is created. Add a dummy counter variable with value 1.
Facts
Numeric values (typically continuous) that quantify the magnitude of the event.
Three types of facts
- Numeric and additive
- Can meaningfully be aggregated across dimensions.
- Most useful facts. - Semi-additive
- Can be meaningfully aggregated across some dimensions.
- Can typically not be aggregated across time dimensions. - Non-additive
- Cannot be meaningfully aggregated across any dimensions.
- Stored in dimension tables
When do facts conform?
If their definitions are the same in different fact tables
What is the grain of a fact table?
The level of detail contained in a fact table.
Lowest level of detail that is possible
Atomic level.
Complete flexibility to roll up to any summary level.
Three fundamental grains
- Transaction fact table
- Track each transaction as it occurs at a point in time
- One record per transaction/line of transaction - Periodic fact table
- Captures cumulative performance over specific time intervals.
- One record per group of transactions made over a period of time.
- Complements detailed transaction facts. - Accumulating snapshot fact table
- Constantly updated over time
- One record for the entire lifetime of an event.
- Used to combine data across several business processes in a value chain.