Article - Multidimensional Database Technology Flashcards
When is a multidimensional data model better suitable than a relational data model?
- When the objective is to analyze data rather than to perform online transactions.
- Key factor in the interactive analysis of large amounts of data for decision-making purposes.
How is data categorized in multidimensional models?
- As facts with associated numerical measures
Fact: a purchase | Measure: purchase amount/price - As textual dimensions that characterize the facts
Fact: A puchase | Textual dimensions: Purchase time & location
What three application areas are there for multidimensional data models?
*Means: In which processes are they used?
- Data warehouses
Large repositories that integrate data from several sources in an enterprise for analysis - Online analytical processing (OLAP)
Finding overall trends by aggregating data search queries - Data mining applications
To discover knowledge by searching for unknown patterns and relationships in multidimensional databases
What is a pivot table?
A two-dimensional spreadsheet with associated subtotals and totals that supports viewing more complex data by nesting several dimensions on the x- or y-axis and displaying data on multiple pages.
Why are spreadsheets not adequate for managing and storing multidimensional data?
They tie data storage to tightly to the presentation - they do not seperate the structural information from the desired views of the information.
Why are relational databases and SQL databases not adequate for managing and storing multidimensional data?
- The use of aggregation functions can be difficult
- It is harder to combine all the dimensions you are interested in
What is a multidimensional database/data warehouse built up?
A collection of related cubes
Why can cubes easily manage the addition of new dimension values?
Because dimensions in a cube are first-class, built-in concepts with associated domains
How many dimensions can a cube have?
Unlimited, but current tools start having performance problems with more than 10 dimensions.
When do cubes become more sparse?
As there are more dimensions added to the cube and when the level of detail (granularity) becomes finer
How many dimensions can you view of a cube?
Generally, only two or three. But it can show up to 4 low-cardinality dimensions by nesting one dimension within another on the axis.
What is an important use of dimensions in a multidimensional database?
To use dimensions to provide as much context as possible for facts.
Why are data redundancies sometimes okay in multidimensional databases?
It is okay if it increases the data’s information value.
-> Since the data is derived from other sources, and not born into the multidimensional database, redundancy problems related to updates can be managed more readily
Where is in general redundancy in a multidimensional database?
In the dimensions, not the facts.
What is the relation of hierarchy and dimension?
Dimensions are used for selecting and aggregating data at the desired level of detail. These levels of details are part of the hierarhy (time -> year, month, day).
-> Sometimes define multiple hierarchies for a dimension (fiscal year and calender year)
How do you avoid duplicate definitions when defining multiple hierarchies for dimension?
The metadata of a cube defines the dimension hierarchy.
How can you avoid adding dimensions to a cube if you want to add extra information?
Add a property that you initially wanted as a dimension as a level under an existing dimension:
Dimension: product –> you add the level of package size instead of making package-size an dimension.
What kind of ordering is available in a multi-dimensional database?
- Typically no ordering
- Only ordering is by hierarchy
- Ordering can happen by cumalative information, such as total sales to date.
What are facts?
The representation of the subject for a specific combination of dimensions
How is the granularity of a fact determined?
By the levels from which its combination of dimension values is drawn.
Which three types of facts are commonly included in data warehouses?
- Events
sales, clicks on a page, movement of goods - Snapshots
An entity at a given point in time.
Warehouse inventory levels at a time - Cumulative snapshots
Information about an activity up to a certain moment.
Total sales up to and including the current month
–> Because these 3 are complementary, a database often has them all.
Of which two components does a measure consist?
- A facts numerical property (sales price, profit)
2. A formula that can combine several measure values into one (aggregation function sum)
What is a measure?
A representation of the properties of the fact that the user wants to optimize.
-> Takes on different values for various dimension combinations.
Which three classes of measures behave differently in computations?
- Additive
- Semiadditive
- Non-additive
How do additive measures behave in computations?
They can be meaningfully combined along any dimension, as it does not cause an overlap among the real-world phenomena.
(to add total sales for product, location and time)
How do semiadditive measures behave in computations?
They can not be combined along one or more dimensions.
- > You can sum up inventory across products and warehouses.
- > you can not sum up inventory levels across time, cause you will count inventory several times then.
How do nonadditive measures behave in computates?
They can not be combined along any dimension, usually because the chosen formula prevents combining lower-level averages into higher-level averages.
Which measure(s) can occur for any kind of fact?
Additive and nonadditive facts
Which measure(s) can occur only for snapshot or cumalative snapshot facts?
Semi-additive facts
Which 5 types of querying are there in a multidimensional database?
- Slice-and-dice
- Drill-down and roll-up
- Drill-across
- Ranking
- Rotating
What is the slice and dice query?
Making selections to reduce a cube.
-> slice the cube to only consider those sells concering bread
What is the drill-down and roll-up query?
Inverse operations that use dimension hierarcies and measures to perform aggregations.
Select country instead of cities. If there are two cities in a country, the numbers of these cities are added up to get to the country number.
What is the drill-across query?
Combining cubes that share one or more dimensions (joining)
What is the ranking query?
Returning the cells that appear at the top or bottom of a specified order.
What is the rotating query?
Rotating to see the data grouped by another dimension
Which two basic forms are there for multidimensional database implementation?
- MOLAP systems
They include provisions for handling sparse arrays and apply advanced indexing and hashing to locate the data when performing queries - ROLAP systems
Using relational database technology for storing data, employing specialized index structures to achieve good query performance
What are the benefits of a MOLAP implementation?
- More space-efficient storage;
- Faster query response times;
What are the benefits of a ROLAP implementation?
- Better scalability in the number of facts they can store.
- More flexible with cube redefinitions
- Provide better support for frequent updates
What does a ROLAP implementation typically need to employ?
A star or a snowflake model.
Which two assumptions do traditional multidimensional data models and implemenation techniques have?
- All facts map directly to the lowest level dimension values and only to one value in each dimension
- Dimension hierarchies are balanced trees
Why is complex multidimensional data problematic?
It is not summarizable.
When is a dimension hierarchy considered strict?
If no dimension value has more than one direct parent
When is a dimension hierarchy considered onto?
If the hierarchy is balanced
When is a dimension hierarchy considered covering?
If no containment paths skips a level.
When is a dimension hierarchy irregular?
If a lower level is not connected to the middle level for all data points (Country - State - City) –> Denmark does not have states.
Or if a lower level is connected to two middle levels.
(see figure 5)
How can you solve irregular dimensions?
By normalizing the hierarchies –> adding in dummy variables.
What are trends in the multidimensional database world?
- Data that must be analyzed is getting increasingly distributed. –> need for more integrated data warehouses
- It is being applied to data that current technology can not adequately analyze yet, such as data from sensors of moving objects (GPS)
- Multidimensional database technology will increasingly be applied where analysis results are fed directly into other systems, thereby eliminating humans from the loop.