Data Warehouse Toolkit Flashcards
Why should reporting not be done from an Operational Source System?
The collection of data happens on a row-by-row basis. Reporting for the sake of making business decisions happens at an aggregated level. These are two different worlds.
Name the 6 goals of a Data Warehouse.
- Data should be easily accessible.
- The data should be consistent and have a high level of quality.
- The DW must be resilient to change.
- The data must meet the security needs of the business.
- The DW must improve business decisions.
- The DW must be accepted and used by Business Users.
What are the 4 components of a data warehouse
- Operational Source Systems
- Data Staging Area
- Data Presentation Area
- Data Access Tools
What is an Operational Source System?
It is where business transaction data is collected.
What happens in the Data Staging Area?
ETL.
What are Data Marts?
‘A wedge of the overall presentation area pie.’ A single data mart represents the relevant data for a specific business process.
Why should Data Marts use a Dimensional Model instead of 3rd-normal-form?
3rd-normal-form may be superior for operational processing, but it results in complicated schemas and is therefore confusing and cumbersome to query.
Why is it important to give access to data more granular than the relevant summary/aggregated data?
Even if Business Users will rarely be concerned with the row-level detail, providing the most granular possible data will allow flexibility in what questions Business Users can ask.
Where do numerical data generally get stored?
In the Fact Tables.
Where do descriptor data generally get stored?
In Dimension Tables.
What is the difference between a Star Schema and a snowflake Schema? Which is superior and why?
In a Star Schema there is one fact table surrounded by only one level of Dimension Tables. In a snowflake Schema, your dimension tables may be further normalized into multiple levels.
Star schema is preferred in a dimension model. While snowflake can be efficient storage-wise, it is harder to query in the same way that 3NF is.
Marketing and Sales are both concerned with lead generation metrics. Should I give each of them a lead generation data mart?
No. Data Marts should be process-centric, not department centric. This simplifies the data warehousing system.
What are the 4 Steps in the Dimensional Design Process?
- Select the business process to model
- Declare the grain of the business process.
- Choose the dimensions.
- Identify the numeric facts.
How can you choose which business model to process when there may be many?
Choose the business model that represents the most pressing business question.
How can you choose the grain?
The most atomic (most granular) level is almost always the preferred level of granularity.