Data Warehousing Flashcards
What is the process of removing all redundant information for a schema
Normalisation
What are the 4 steps of the dimensional design process?
1) Select the business process
2) Declare the grain in business terms
3) Identify the dimensions
4) Identify the facts
What is Grain?
What a single fact table row represent
What is a dimension?
Provides context for the Fact measures.
Context being What, Where, When, Why, Who and How.
What is a Fact?
Measurements that result from a business process event.
What is a degenerate dimension?
TBC
What are the 3 types of fact measure
1) Additive - Summed across all dimensions
2) Semi Additive - Summed across some dimensions (e.g. Not Date)
3) Non Additive - Cannot be summed
Give an example of a Factless Fact
A fact table recording calendar day, student, teacher, class and location.
What are the 3 main types of fact table
1) Transnational
2) Snapshot
3) Accumulating snapshot
What is a consolidated fact table?
Where 2 or more linked business processes are combined in a single fact table. e.g sales and forecasts
What is the name of the type of key is used to join dimensions and facts?
Surrogate Key or Primary Key / Foreign Key
What is the name of the type of key which is used in operational systems to uniquely identify resources
Natural Key / Business Key
What is a degenerate dimension and how does it occur?
This is a dimension key in a fact table that has no associated dimension. This could happen where all the information from an invoice is expressed in the rows and dimensions of a fact leaving the invoice number which is still useful for analysis.
What name is given to a dimension which is joined to multiple times on a single fact table
Role-playing Dimension e.g. Date
What are conformed dimensions?
Where the same dimensions apply to multiple facts which allows data to be easily combined.
What is a shrunken dimension?
One which works on a subset of rows or attributes. This could be a dimension which is associated with an aggregated fact table
What is value chain?
Natural flow of a businesses primary business e.g. Quote,Application,Completion
What is a Data Warehouse Bus Matrix?
A matrix of rows representing business processes and columns representing the conformed dimensions which relates to these business processes.
What is a Implementation Bus Matrix?
A matrix of rows representing the data warehouse facts (Within each business process) and columns representing the conformed dimensions which relates to these facts
What is a Opportunity / Stakeholder Matrix?
A matrix of rows representing business processes and columns representing the business groups / departments interested in these business processes
What are the 7 slowly changing dimension attribute definitions?
Type 0: Retain original Type 1: Overwrite Type 2: Add new row Type 3: Add new attribute (alt Reality) Type 4: Add mini dimension (rapid change) Type 5: Add mini dimension + Type 1 (outrigger) Type 6: Add Type 1 and Type 2 Type 7: Duel Type 1 and Type 2
What would you do if you have a fact table where a dimension can be multi-valued e.g. Hospital diagnosis?
Use a bridging table which has a row for each diagnosis
What is behavioral tag time series?
a customer is assigned a tag on a periodic basis based on behavioral traits. The current value is stored as well as a string showing how the tags have changed over time
How should you store textual comments in a data warehouse
Not on the fact table but using a text comment dimension
What is a step dimension?
In a sequential process such as a web session each event is often stored in a fact table. The step dimension associated with the fact would identify what steps the row represents and how many steps occurred in session
What is a causal dimension?
One which describes the indirect conditions which existed when the fact event occurred e.g. promotion
What type of dimension describes the indirect conditions which existed when the fact event occurred
causal dimension (e.g. Promotion)
In a fact table that contains rows representing the stages of an event what type of dimension is used to store the stage number?
What is a step dimension
What is the name given to periodic behavioral data codes stored in a string in the data warehouse?
Behavioral tag time series
What is the name given to the natural flow of a businesses primary business?
Value chain
What are the 3 types of bus matrices used in data warehousing?
1) Implementation Bus Matrix
2) Data Warehouse Bus Matrix
3) Opportunity / Stakeholder Matrix
What 4 questions could you ask when you are deciding whether to combine facts into a single fact table?
1) What are users analytic requirements?
2) Are there really multiple unique business processes
3) Are there multiple source systems capturing metrics at different granuality
4) What are the dimensionality of the facts
What are the three popular attributes used in behavioral tag time series
1) Recency - How many days since last purchase
2) Frequency - How many times has customer ordered or visted
3) Intensity / monetary - How much have they spend (e.g this year)