Dimensional Modelling Flashcards
What is Dimensional modelling (DML)
A logical design technique to present data in standard intuitive form for high-performance access.
Forms a star schema
What are keys ?
One table with COMPOSITE PRIMARY KEYS called FACT TABLE and set of smaller tables called DIMENSION TABLE.
What is primary and surrogate keys?
PRIMARY:
each dimension has one of these that corresponds to one component from composite key in the fact table.
SURROGATE:
all dimension and fact tables are done using surrogate keys.
Warehouse can independently decide how to store data.
What is a fact table ?
Facts are events generated by events occuring in the past. Contains most of the data and can be read-only.
Usually they are numerical or additive.
What is a dimension table ?
Contain descriptive textual info
Are used as constraints in queries
They are useful by denormalizing reference info into single dimension table.
What is a snowflake schema ?
A variant of star that has fact table in center and has normalized dimension table.
What is a starflake schema?
Hybrid structure that has a mix of star and snow dimension tables.
What are the advantages of DML?
Efficiency.
All dimensions are equivalent to access the fact table.
Adding new facts, dimensions, dimensional attributes
Predictable query processing.
Difference b/w ER and DML
ER:
Single model decomposes to multiple DML
Used for identifying relations among entities to remove redundancy
DML:
Multiple DM’s are associated through shared dimension tables
Attraction is high performance of ad-hoc user queries
What is DML stage of Kimball’s Business Dimensional Lifecycle
This stage can result in creation of DML for data mart or dimensionalize a schema for OLTP.
Allows easy integration with other related data marts to form an enterprise wide DW
A DML containing more than 1 fact table sharing more than 1 conformed dim tables - FACT CONSTELLATION.
Starts by defining high-level DML which gains detail and is a two phased approach
1. Creation of high-level DM - 4 steps
2. Adding details to model through identification of attributes.
Describe all the phases in Kimball’s
Phase 1 step 1 SELECT BUSINESS PROCESS: Process refers to SUBJECT MATTER of particular data mart.
First data mart built:
be delivered on time
within budget
answer commercially imp questions
Phase 1 step 2 DECLARE GRAIN: Decide what each record of the fact table will represent. Grain decision for fact table determines grain of each dimension table.
Includes TIME AS CORE DIMENSION, ALWAYS present in star schemas.
Phase 1 step 3 CHOOSE DIMENSIONS: set the context for asking questions about facts on fact table. if it occurs in two data marts they must be exactly he same or one is a subset of the other. And this type of dimension is called CONFORMED DIMENSION.
Phase 1 step 4 IDENTIFY FACTS: The garin determines which facts can be used in data mart.
Phase 2: IDENTIFY ALL DIMENSION ATTRIBUTES FOR DM:
add attributes to dim tables
Identified by business requirements
text description added to tables for better understanding
what are the Issues while developing DML?
Choosing a duration of database:
How far back DB goes
Some want a year or two back info for comparison
Other legally require docs dating 5 years back or more
Old ones there is problem with reading and interpretation
What is slowly changing dimension problem ?
Proper description of old client and old branch mist be used with old transaction history. DW must assign key to these dimensions to distinguish clients and branches
3 Types:
1. Changed dim attribute overwritten
2. Changed dim leads to new dim record to be created.
3. old changed to new attribute so new values are simultaneously available.