Data Warehousing Flashcards

1
Q

What is the process of removing all redundant information for a schema

A

Normalisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the 4 steps of the dimensional design process?

A

1) Select the business process
2) Declare the grain in business terms
3) Identify the dimensions
4) Identify the facts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Grain?

A

What a single fact table row represent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a dimension?

A

Provides context for the Fact measures.

Context being What, Where, When, Why, Who and How.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Fact?

A

Measurements that result from a business process event.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a degenerate dimension?

A

TBC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 3 types of fact measure

A

1) Additive - Summed across all dimensions
2) Semi Additive - Summed across some dimensions (e.g. Not Date)
3) Non Additive - Cannot be summed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Give an example of a Factless Fact

A

A fact table recording calendar day, student, teacher, class and location.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the 3 main types of fact table

A

1) Transnational
2) Snapshot
3) Accumulating snapshot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a consolidated fact table?

A

Where 2 or more linked business processes are combined in a single fact table. e.g sales and forecasts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the name of the type of key is used to join dimensions and facts?

A

Surrogate Key or Primary Key / Foreign Key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the name of the type of key which is used in operational systems to uniquely identify resources

A

Natural Key / Business Key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a degenerate dimension and how does it occur?

A

This is a dimension key in a fact table that has no associated dimension. This could happen where all the information from an invoice is expressed in the rows and dimensions of a fact leaving the invoice number which is still useful for analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What name is given to a dimension which is joined to multiple times on a single fact table

A

Role-playing Dimension e.g. Date

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are conformed dimensions?

A

Where the same dimensions apply to multiple facts which allows data to be easily combined.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a shrunken dimension?

A

One which works on a subset of rows or attributes. This could be a dimension which is associated with an aggregated fact table

17
Q

What is value chain?

A

Natural flow of a businesses primary business e.g. Quote,Application,Completion

18
Q

What is a Data Warehouse Bus Matrix?

A

A matrix of rows representing business processes and columns representing the conformed dimensions which relates to these business processes.

19
Q

What is a Implementation Bus Matrix?

A

A matrix of rows representing the data warehouse facts (Within each business process) and columns representing the conformed dimensions which relates to these facts

20
Q

What is a Opportunity / Stakeholder Matrix?

A

A matrix of rows representing business processes and columns representing the business groups / departments interested in these business processes

21
Q

What are the 7 slowly changing dimension attribute definitions?

A
Type 0: Retain original
Type 1: Overwrite
Type 2: Add new row
Type 3: Add new attribute (alt Reality)
Type 4: Add mini dimension (rapid change)
Type 5: Add mini dimension + Type 1 (outrigger)
Type 6: Add Type 1 and Type 2 
Type 7: Duel Type 1 and Type 2
22
Q

What would you do if you have a fact table where a dimension can be multi-valued e.g. Hospital diagnosis?

A

Use a bridging table which has a row for each diagnosis

23
Q

What is behavioral tag time series?

A

a customer is assigned a tag on a periodic basis based on behavioral traits. The current value is stored as well as a string showing how the tags have changed over time

24
Q

How should you store textual comments in a data warehouse

A

Not on the fact table but using a text comment dimension

25
Q

What is a step dimension?

A

In a sequential process such as a web session each event is often stored in a fact table. The step dimension associated with the fact would identify what steps the row represents and how many steps occurred in session

26
Q

What is a causal dimension?

A

One which describes the indirect conditions which existed when the fact event occurred e.g. promotion

27
Q

What type of dimension describes the indirect conditions which existed when the fact event occurred

A

causal dimension (e.g. Promotion)

28
Q

In a fact table that contains rows representing the stages of an event what type of dimension is used to store the stage number?

A

What is a step dimension

29
Q

What is the name given to periodic behavioral data codes stored in a string in the data warehouse?

A

Behavioral tag time series

30
Q

What is the name given to the natural flow of a businesses primary business?

A

Value chain

31
Q

What are the 3 types of bus matrices used in data warehousing?

A

1) Implementation Bus Matrix
2) Data Warehouse Bus Matrix
3) Opportunity / Stakeholder Matrix

32
Q

What 4 questions could you ask when you are deciding whether to combine facts into a single fact table?

A

1) What are users analytic requirements?
2) Are there really multiple unique business processes
3) Are there multiple source systems capturing metrics at different granuality
4) What are the dimensionality of the facts

33
Q

What are the three popular attributes used in behavioral tag time series

A

1) Recency - How many days since last purchase
2) Frequency - How many times has customer ordered or visted
3) Intensity / monetary - How much have they spend (e.g this year)