Data Warehouse Design - Chapter 1.5 & 1.6 Flashcards
How is data from a data warehouse normally represented?
With the multidimensional model
Why is the multi-dimensional model used as a paradigm of data warehouse representation?
- Ease of use and intuitiveness
- The widespread use of productivity tools, such as excel, that adopt the multidimensional model as visualization paradigm.
What are the five most important concepts of the multi-dimensional model?
- Facts
- Events
- Measures
- Dimension
- Attributes
What are facts in the multi-dimensional model?
Enterprise-specific factors that affect decision-making processes, such as sales, shipments, surgeries etc.
What are events in the multi-dimensional model?
Instances of a fact, such as every single sale.
What is a measure in the multi-dimensional model?
Quantitative descriptions of events that describe each fact. For example: sales receipts, amounts shipped, surgery time etc.
What are analysis dimensions in the multi-dimensional model?
The space axis of your model that define the different perspectives to single out events.
Say that you have the fact sales. Dimensions could be: products, stores and dates.
Why do we use a multi-dimensional model?
To easily select the events based on their dimensions by using the visualized model.
What is a cube?
The multi-dimensional model looks like a cube.
What is a hypercube?
A multi-dimensional model with more than 3 dimensions.
What does it mean when a cube is sparse?
Not every cube cell is filled in. It can happen that at a specifc date there is no product sold by that store.
How could you represent the sales cube with the relational model schema?
SALES (store, product, date, quanitity, receipts)
Underline store, product, date because they form the primary key.
How could you represent the dependency of the sales cube?
store, product, date -> quantity, receipts
What is a roll-up hierarchy?
Each dimension normally has a hierarchy of aggregation levels.
Product -> type -> category
^ these are dimensional attributes
Why do you need to reduce the quantity of the data and which two ways are there to do so?
Information in a multi-dimensional cube is too large to be analyzed without relying on automatic tools.
- Restriction
- Aggregation
What is meant by restricting data?
Seperating part of the data from a cube to mark out an analysis field.
Also named: making selections and/or projections
What are two ways of selection to restrict data (RESTRICTION)?
- Data slicing.
You set one or more dimensions to a specific value to call of events associated to that value - Dicing
Generalization of slicing - so you put in general constraints due to which you can only select a specific number of events.
What is meant by projection in relation to restricting data?
Making a choice to keep just one subgroup of measures for every event and reject other measures.
What is meta-data?
Data used to define other data.
How does meta-data play an important role in data warehousing?
- It specifies source, values, usage and features of data warehouse data.
- It defines how data can be changed and processed at every architecture layer.
What are the two categories to classify meta-data in?
- Internal meta-data
Useful for system administrators, defines sources, policies, constraints etc. - External meta-data
End-users. Definitions, quality standards, units of measure etc.
What are the 5 things a meta-data management tool should be able to do?
- Allow administrators to perform system administration operations and manage security.
- Allow end-users to navigate and query meta data
- Use a GUI
- Allow end-users to extend meta data
- Allow meta-data to be imported/exported from other tools and formats.
What is aggregation?
Adjusting the granularity of the dimension and thereby decreasing the amount of events associated.