Data Warehouse Toolkit Flashcards

1
Q

Why should reporting not be done from an Operational Source System?

A

The collection of data happens on a row-by-row basis. Reporting for the sake of making business decisions happens at an aggregated level. These are two different worlds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name the 6 goals of a Data Warehouse.

A
  • Data should be easily accessible.
  • The data should be consistent and have a high level of quality.
  • The DW must be resilient to change.
  • The data must meet the security needs of the business.
  • The DW must improve business decisions.
  • The DW must be accepted and used by Business Users.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 4 components of a data warehouse

A
  • Operational Source Systems
  • Data Staging Area
  • Data Presentation Area
  • Data Access Tools
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an Operational Source System?

A

It is where business transaction data is collected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What happens in the Data Staging Area?

A

ETL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are Data Marts?

A

‘A wedge of the overall presentation area pie.’ A single data mart represents the relevant data for a specific business process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why should Data Marts use a Dimensional Model instead of 3rd-normal-form?

A

3rd-normal-form may be superior for operational processing, but it results in complicated schemas and is therefore confusing and cumbersome to query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is it important to give access to data more granular than the relevant summary/aggregated data?

A

Even if Business Users will rarely be concerned with the row-level detail, providing the most granular possible data will allow flexibility in what questions Business Users can ask.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Where do numerical data generally get stored?

A

In the Fact Tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Where do descriptor data generally get stored?

A

In Dimension Tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the difference between a Star Schema and a snowflake Schema? Which is superior and why?

A

In a Star Schema there is one fact table surrounded by only one level of Dimension Tables. In a snowflake Schema, your dimension tables may be further normalized into multiple levels.
Star schema is preferred in a dimension model. While snowflake can be efficient storage-wise, it is harder to query in the same way that 3NF is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Marketing and Sales are both concerned with lead generation metrics. Should I give each of them a lead generation data mart?

A

No. Data Marts should be process-centric, not department centric. This simplifies the data warehousing system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the 4 Steps in the Dimensional Design Process?

A
  • Select the business process to model
  • Declare the grain of the business process.
  • Choose the dimensions.
  • Identify the numeric facts.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can you choose which business model to process when there may be many?

A

Choose the business model that represents the most pressing business question.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can you choose the grain?

A

The most atomic (most granular) level is almost always the preferred level of granularity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How many dimensions should I have

A

You generally should prefer 15 or less. If you have more than 25, consider combining some of the dimensions.

17
Q

How should we model a hierarchy?

A

Combine it into one dimension. Avoid snowflaking by having the first level of the hierarchy point at a higher level’s dimension table. Also avoid centipeding by having a dimension table for every level of the hierarchy.

18
Q

Why should I avoid putting profit ratio in my fact table?

A

Because it is an issue it is not additive. Adding together the gross margin for a set of rows will give the incorrect result because of the incorrect order of operations.

19
Q

Why should we prefer values such as ‘Holiday’/’Nonholiday’ instead of ‘Y’/’N’?

A

The dimension values should be as descriptive as possible for ease of use. On a report, it can be hard to understand what ‘Y’ or ‘N’ means why “Nonholiday’ is quite clear.

20
Q

How should we model time of day?

A

In a separate dimension. The Fact table will have another column that points at this time of day dimension.

21
Q

Should sqft be modeled as a dimension or a numeric fact?

A

Generally this will be used to categorize so it should be a dimension. If we know that the business users find doing calculations on sq footage important, then we might decide to include it as a numeric fact.

22
Q

What is a causal dimension?

A

A dimension that is believed to possibly change the numeric facts of the modeled business process.

23
Q

If there is a dimension that doesn’t apply to all of the rows in the fact table, what should we do?

A

We need to avoid NULLs, so we need to add a value to this dimension called “Not applicable’ or something similar.

24
Q

What is a degenerate dimension?

A

A dimension which is important for the analysis but for which the table would only be on column wide because all relevant fields are modeled in other dimensions. These are common when there is a parent level of granularity which is identified by one of our fields.

25
Q

How can I add a dimension to an already-modeled business process.

A

Simply add a dimension table and add a column in the fact table. To avoid NULLs we will need to have values such as ‘Before ‘ which we can point the old fact tables at.

26
Q

What is centipeding?

A

Adding way too many dimensions to a star schema.

27
Q

What are surrogate keys?

A

Keys for joining which are not based on codes or keys from the business process.

28
Q

What are 4 benefits to surrogate keys?

A
  • It prevents changes in the business process’s codes from affecting our data warehouse.
  • It allows flexibility for catch values such as ‘Not available’.
  • It prevents people from trying to bypass the joining to dimension tables by parsing the key.
  • It is more efficient storage-wise.