Article: Multidimensional Database Technology Flashcards
Multidimensional data model
Categorize data as facts with associated numerical measures or textual dimensions that characterize facts. Three important implications: - Data warehouses - Online analytical processing (OLAP) - Data mining applications
Queries
Aggregate measure values over a range of dimensions values to provide results such as total sales per month of a given product.
Pivot table
A two-dimensional spreadsheet with associated subtotals and totals that support viewing more complex data by using several dimensions on the x- or y-axis and displaying data on multiple pages.
Cubes
Support hierarchies in dimensions and formulas without duplicating their definitions. Combinations of dimensions define a cube’s cells. Depending on the specific application, the cells in a cube range from sparse to dense. Cubes tend to become sparser as dimensionality increases and as the dimension values granularities become finer.
Dimensions
Are used for selecting and aggregating data at the desired level of detail. The goal of dimensions is to provide as much context for facts as possible. In contrast to relational databases, controlled redundancy is appropriate in multidimensional databases if it increases the data’s information value. Mostly, there is only redundancy in dimensions and not in facts.
Facts
Implicitly defined by their combination of dimension values. Three types of facts:
- Events
- Snapshots
- Cumulative snapshots
Events
One fact represents the same instance on an underlying phenomenon.
Snapshots
Model an entity’s state at a given point of time.
Cumulative snapshots
Handles information about an activity up to a certain moment.
Measure
Consists of two components:
- A fact’s numerical property.
- A formula
Three classes of measures
- Additive
- Semi additive
- Non additive
Additive measure
Can be meaningfully combined along any dimension. Can occur from any kind of fact.
Semi additive measure
Cannot be combined along one or more dimensions. Generally occurs for snapshots and cumulative snapshots.
Non additive measure
Cannot be combined along any dimensions. Can occur from any kind of fact.
Ways of implementing multidimensional modelling
- MOLAP: includes provisions for handling sparse arrays and apply advanced indexing and hashing to locate the data when performing queries. MOLAP is more flexible in redefinitions of the cube and handling updates.
- ROLAP: uses relational database technology for storing data, and they also employ specialized index structures to achieve good query performance. ROLAP is better in scaling in the number of facts that it needs to store.