Data Warehouse Design - Chapter 1.5 & 1.6 Flashcards
How is data from a data warehouse normally represented?
With the multidimensional model
Why is the multi-dimensional model used as a paradigm of data warehouse representation?
- Ease of use and intuitiveness
- The widespread use of productivity tools, such as excel, that adopt the multidimensional model as visualization paradigm.
What are the five most important concepts of the multi-dimensional model?
- Facts
- Events
- Measures
- Dimension
- Attributes
What are facts in the multi-dimensional model?
Enterprise-specific factors that affect decision-making processes, such as sales, shipments, surgeries etc.
What are events in the multi-dimensional model?
Instances of a fact, such as every single sale.
What is a measure in the multi-dimensional model?
Quantitative descriptions of events that describe each fact. For example: sales receipts, amounts shipped, surgery time etc.
What are analysis dimensions in the multi-dimensional model?
The space axis of your model that define the different perspectives to single out events.
Say that you have the fact sales. Dimensions could be: products, stores and dates.
Why do we use a multi-dimensional model?
To easily select the events based on their dimensions by using the visualized model.
What is a cube?
The multi-dimensional model looks like a cube.
What is a hypercube?
A multi-dimensional model with more than 3 dimensions.
What does it mean when a cube is sparse?
Not every cube cell is filled in. It can happen that at a specifc date there is no product sold by that store.
How could you represent the sales cube with the relational model schema?
SALES (store, product, date, quanitity, receipts)
Underline store, product, date because they form the primary key.
How could you represent the dependency of the sales cube?
store, product, date -> quantity, receipts
What is a roll-up hierarchy?
Each dimension normally has a hierarchy of aggregation levels.
Product -> type -> category
^ these are dimensional attributes
Why do you need to reduce the quantity of the data and which two ways are there to do so?
Information in a multi-dimensional cube is too large to be analyzed without relying on automatic tools.
- Restriction
- Aggregation