05 Conceptual (and logical) multidimensional modeling Flashcards
List the main steps involved in schema design process for DWH Environments:
-
User requirements and OLTP Schemas are used as inputs into a Conceptual data model
- (M E/R, YAM2, ADAPT).
- A Logical data model follows
- ROLAP, MOLAP, HOLAP, Star Schema, Snowflake Schema
- Finally, a Physical data model is created.

What are the steps to design a Dimensional Model?
4 steps are need to design a Dimensional Model:
-
Choose the business process
- Usually done by business department, business processes are the fundamental building block of the dimensional DWH, each process equals at least one fact table.
-
Declare the grain
- Declare level of detail in the fact table, what does a fact table measurement represents? (a purchase, a transaction, a rental..)
-
Identify the dimensions
- Determine applicable dimensions, dimensions usually come from grain, originally planned 1-dimensions may be two or three at the end.
- Reference + Ratio -> Fact
-
Identify the facts
- Identify facts and measures from the business process.
- Quantity and amount as fundamental facts in transaction-oriented processes.

What is the purpose/mission of a conceptual schema?
- Multidimensional data modeling can play a key role in the design of the dimensional model.
- Assure the flexibility and re-usability of the schema after the user requirements have changed.
- Not assume any facts that are the results of further design steps (DB technology used).
- Semantic relativism: model can accomodate not only one, but many different concepts.
- Enabling the discussion between business and IT about the requirements of information analysis (example: identification and elimination of weaknesses).

Mention key characteristics of OLAP Databases:
-
Dimension with (complex) hierarchies.
- Different levels correspond to different data granularities
- Multiple hierarchies on a single dimension
- Alternative hierarchy paths
-
Multiple cubes (cube architecture).
- Superior cubes
- Analysis of different facts with different set of dimensions
- Summarizability constraints
- Calculated measures, Ratio systems
Why the ER model is not appropiate for modeling Multidimensional Databases? (MDB)
- Distinct from ER, dimensional modeling divides databases into 2 types of objects: fact and dimension tables.
- Dimension tables are de-normalized.
- THe inherent separation of qualifying and quantifying data cannot be expressed in an E/R Model.
- Semantincs of the complex dimension structures are to specific for E/R models.
- ER are closely aligned with a physical implementation that reflects a relational DB approach.
- Therefore: an enhanced ERM (Multidimensional ERM ME/R is proposed)
Mention the graphical elements used to enhance the traditional ER language into a Multidimensiona ERM:
- Idea: slightly enhance the ER language to ensure the flexibility and the simplicity of the ER notification, but allow the definition of hierarchies.
-
Elements added:
- A fact relationship set
- A dimension level set
- A classification relationship set
What are the goals of the Object-Oriented Multidimensional Modeling?
(With YAM2)
- Paradigm of object orientation is the fundament for software engineering.
- The long-term use of UML as modeling and design language has been proven as highly useful.
- As UML is widespread in science and practice, it can be assumed, that there is less adoption effort.
- UML-based multidimensional models can be homogeneously integrated into software engineering.
- UML schema can be enhanced easily by applying specialization techniques.
What are some critics and evaluations of YAM2?
- YAM2 modelling captures all aspects but its criticized as too cumbersome and complex.
- The language is a complete extension of UML for multidimensional modeling.
- UML analytical methods can be used for analyzing purposes.
- Avoid definitions from scratch.
- All elements are specializations of UML constructs.
- YAM2 fulfills almost all requirements for multi-dimensional models
- Explicit separation of structure and contents
- Measure sets, Multi-star schemas, many-to-many relationships
Define ADAPT and describe its 8 basic modeling objects:
- ADAPT = Application Design for Analytical Processing Technologies.
- ADAPT enables the modeler to build manifold dimensional and hierarchical structures.
- Basic modeling objects:
-
Hypercube (cube)
- N-dimensional arrary, in an OLAP DB a hypercube is the basic unit of storage for business data
- Context as a portion of a hypercube that provides a context for analysis.
-
Dimension
- Axis or index of an hypercube
-
Hiearchy
- Set of parent/child member combinations that define aggregation.
-
Level
- Collection of members used to define hierarchical precedence (calendar types).
-
Model
- Algebraic process to calculate derived data. Document the source and target data of a calculation.
-
Member
- An individual dimension value (used as “examples” in models).
-
Attribute
- Information about a dimension member. Applied to dimensions/levels (as “additional information”)
-
Scope
- A (sub-set) collection of dimension members. Difference between calculated and enumerated scopes (examples: “current month”, “year-to-date”).
-
Hypercube (cube)

What are the (5) main ADAPT Connecting Elements:
ADAPT Connecting elements:
- Loose precedence (there are objects not associated to the superior group)
- Strict precedence (every object is associated to exactly one superior object)
- Self precedence
- Used by
- Connector

# **Define the 2 Core ADAPT Objects** and describe its graphical representation: **Hypercube and Dimension.**
-
Hypercube:
- N-dimensional array
- In an OLAP DB, a hypercube is the basic unit of storage for business data.
-
Dimension
- Axis or index of an hypercube
# Define the 2 **ADAPT Hierarchy Objects** and describe its graphical representation: **Hierarchy and Level.**
-
Hierarchy
- Set of parent/child member combinations that define aggregation.
-
Level
- Collection of members used to define hierarchical precedence.

# Define the 2 **ADAPT Dimension Objects** and describe its graphical representation: **Member and Attribute.**
-
Member
- An individual dimension value (“examples” in the model).
-
Attribute
- Information about a dimension member (“additional information”).
- Can apply to both dimension and level.

Define the ADAPT Scope object and describe its graphical representation:
-
Scope
- A collection of dimension members
- Difference between calculated and enumerated scopes
- Other examples: current month, year-to-date.

# Define the 2 **ADAPT Dimension Objects** and describe its graphical representation: **Model and Context.**
-
Model
- An algebraic process to calculate derived data.
- Document the source and target data of a calculation.
-
Context
- A portion of a hypercube that provides a context for analysis.

Mention the 3 strategies used to balance Response time and Resource use when doing a physical design:
When doing the physical design, one has to find out how the derived data is computed.
Strategies:
- Compute all derived data before user issue any queries.
- Use DB Server to compute all data at query time
- First user request pays the price of computing
What are the (3) main approaches used to implement multidimensional models:
-
Relational OLAP (ROLAP)
- Servers store data in relational DBs and support extensions to SQL to efficiently implement the multidimensional data model.
-
Multidimensional OLAP (MOLAP)
- Servers directly store multidimensional data in special data structures (e.g: arrays) and implement the OLAP operations over those data structures.
-
Hybrid OLAP (HOLAP)
- Servers combine both technologies, benefiting from the storage capacity of ROLAP and the processing capabilities of MOLAP.
Mention and describe the (3) types of schemas used in DWH Schemata:
-
Star schema
- One central fact table
- Dimension tables are not normalized
-
Snowflake schema
- Several normalized tables per dimension
-
Hybrid approach
- Duplication of attributes in dimension tables
- Attributes of higher levels can be duplicated into lower leves (explicit redundancy)
- Avoids query performance losses
- Problem: large dimension tables become even larger (maintenance problems)

Explain the Hybrid Approach in DWH Schemata:
Hybrid approach
- Duplication of attributes in dimension tables
- Attributes of higher levels can be duplicated into lower leves (explicit redundancy)
- Avoids query performance losses
- Problem: large dimension tables become even larger (maintenance problems)

What are the key/typical characteristics of DWH Applications:
Typical characteristics in DWH Applications:
- Less data volume in dimension tables than in fact tables.
- Only rare changes in dimension tables (danger of update anomalies).
- Typical queries contain selection predicates on fact table using one or more dimension tables.
- Stronger focus on query performance than on avoiding redundancy and data storage.
Mention some Pros and Cons of the Snowflake Schema:
-
Cons:
- Performance is affected, since more joins need to be performed, when executing queries along hierarchy paths.
- Benefit of normalization is quite insignificant.
- More complicated structure than the star schema.
-
Pros:
- Better storage and querying with sparse dimensions.
- Reflects the way users think about data.
How is represented the extended star schema in SAP Business Warehouse?
-
SAP BW automatically creates SID-Tables for dimension objects
- easier consideration of multilingual attributes
- easier consideration of slowly changing dimensions
- multiple use of object master data
- use of auto-generated keys

For which reasons is the step of conceptual modeling often omitted in practice projects?
- Time pressure
- Trend for agile DWH development, rapid prototyping
- Business users are not able to formulate requirements
- Physical design already determined (vendor decision)
- No established tools available on the market that perform a model-based DB implementation