Data Warehouses and Dimensional Modelling Flashcards

1
Q

Main goal of DW and Dimensional Modelling

A

Simplicity.

Ensures that users can easily understand databases.

Allows software to efficiently navigate databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Dimensional Modelling (DM) Definition

A

A logical design technique for structuring data so that it’s intuitive to business users and delivers fast query performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Three primary design goals

A
  1. Present the required information to users as simply as possible (understandability).
  2. Return query results to the users as quickly as possible (query performance)
  3. Provide relevant information that accurately tracks the underlying business processes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Present the required information to users as simply as possible (understandability).

A
  1. Make everything as simple as possible, but not simpler. (Einstein)
  2. DMs are much easier for users to understand - information is grouped into coherent business categories.
  3. DMs typically contain exactly the same content as a normalized model, but with far fewer tables.
  4. Simplicity is relative - the model must reflect complex business processes.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Return query results to the users as quickly as possible (query performance)

A
  1. Relational environment:
    - Help query performance because of denormalization.
    - Pre-join hierarchies and lookup tables - fewer join paths and less immediate temporary tables.
    - Predictable framework allows the DBMS to make strong assumptions which aid in performance.
  2. OLAP environment:
    - Engine designed to support DMs
    - Performance due to aggregation within and across dimensions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Provide relevant information that accurately tracks the underlying business processes

A

Requires full range of design patterns to create models that accurately capture and track the business.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Benefits of Dimensional Modelling

A
  1. Understandability
  2. Query performance
  3. Each dimension is an equivalent entry point into the fact table. Therefore, query performance is simple, predictable and controllable.
  4. Graceful extensibility to accommodate new data. No query or BI application needs to be reprogrammed to accommodate changes.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Star Join

A

Each business process can be represented by a DM that consists of a normalized fact table surrounded by denormalized dimension tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Fact table

A
  1. Highly normalized, storing little redundant data.
  2. Contains the measurements associated with a specific business process.
  3. Record in fact table = measurement of a business event.
  4. Primary key is a multi-part/composite key made up of a subset of foreign keys from business process dimensions.
    - Fact tables always express many to many relationships between dimensions.
    - Every foreign key must match a unique primary key in the corresponding dimension.
  5. Some business processes track events that do not contain facts. Then a factless fact table is created. Add a dummy counter variable with value 1.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Facts

A

Numeric values (typically continuous) that quantify the magnitude of the event.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Three types of facts

A
  1. Numeric and additive
    - Can meaningfully be aggregated across dimensions.
    - Most useful facts.
  2. Semi-additive
    - Can be meaningfully aggregated across some dimensions.
    - Can typically not be aggregated across time dimensions.
  3. Non-additive
    - Cannot be meaningfully aggregated across any dimensions.
    - Stored in dimension tables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When do facts conform?

A

If their definitions are the same in different fact tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the grain of a fact table?

A

The level of detail contained in a fact table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Lowest level of detail that is possible

A

Atomic level.

Complete flexibility to roll up to any summary level.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Three fundamental grains

A
  1. Transaction fact table
    - Track each transaction as it occurs at a point in time
    - One record per transaction/line of transaction
  2. Periodic fact table
    - Captures cumulative performance over specific time intervals.
    - One record per group of transactions made over a period of time.
    - Complements detailed transaction facts.
  3. Accumulating snapshot fact table
    - Constantly updated over time
    - One record for the entire lifetime of an event.
    - Used to combine data across several business processes in a value chain.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Dimension Tables

A
  1. The nouns of the dimensional model.
  2. Describes the objects that participate in the business.
  3. Spot dimensions and/or their attributes in conversations.
  4. Provide entry points into the data.
  5. Has single surrogate primary key.
  6. Conformed dimension: single dimension that is shared by two or more business processes.
  7. Embedded hierarchy: some dimension attributes relate to each other in a hierarchical or 1:M fashion.
  8. Relational transaction systems designed using normalization.
  9. Each row has a set of attributes that provides context which is true the moment the fact table is recorded.
  10. Dimension attributes are typically textual fields.
17
Q

What is denormalization?

A

Process of recombining the attributes into a single dimension.

Nothing is lost. It presents exactly the same information.

18
Q

What are conformed dimensions?

A

Two dimensions are conformed if they contain one or more fields with the same names and contents.

Two types:

  1. Identical conformed dimensions.
  2. One dimension is a perfect subset of a more detailed granular dimension table.
19
Q

Benefits of conformed dimensions

A
  1. Consistency
    - Ensure that every fact table is filtered consistently and the resulting query answer sets are labeled consistently.
  2. Integration
    - Allows queries to drill across fact tables representing different processes (supports true cross-business process analysis)
  3. Reduced development time to market
    - Do not need to recreate the wheel over and over again.
20
Q

What is a Bus Matrix?

A

It is a representation of conformed dimensions’ involvement/participation in multiple business processes.

Each row represents a business process and defines at least one fact table and its associated dimensions.

21
Q

What do you call it when one business process has more than one DM

A

Business process dimensional model

22
Q

What does it mean to “Drill Across”?

A

Analysis involving data from more than one business process.

23
Q

What are Surrogate Keys?

A
  1. New set of keys, separate from keys in source system.
  2. Also known as meaningless keys, substitute keys, non-natural keys or artificial keys.
  3. Unique value, usually an integer, assigned to each row in a dimension.
  4. Becomes a primary key of dimension table used to join with associated foreign key in fact table.
24
Q

Benefits of surrogate keys

A
  1. Helps protect DW from unexpected administrative changes in source system keys (called business keys or natural keys)
  2. Allow the DW to integrate the same data with different keys from multiple source systems.
  3. Enable you to add rows to dimensions that do not exist in the source system.
  4. Provide the means for tracking changes in dimension attributes over time.
  5. Integer surrogate keys can improve query and processing performance.
25
Q

What is snowflaking?

A

Connecting lookup tables in fields of dimension tables.

26
Q

Why is snowflaking a discouraged technique?

A

It makes the model more complex, less usable and more difficult to maintain.

27
Q

What is an outrigger table?

A

It is a purpose-specific snowflake table.

Typical reasons for creating outrigger table:

  1. Rarely used lookups
  2. Large number of attributes
  3. Different grain
  4. Different update frequency
28
Q

What are slowly changing dimensions?

A

Techniques used to manage attribute changes over time.

Attributes are not fixed, can change over time (though relatively infrequently compared to facts).

Ability to track changes over time is one of the fundamental reasons for the existence of DWs.

29
Q

Four basic types of slowly changing dimensions

A

Type 1:

  • Changes business doesn’t care about
  • Overwrite existing attribute value with new value

Type 2:

  • Create new row in dimension table to capture new value(s)
  • New surrogate key and data stamps to show when the row came into effect and when it expires.
  • Existing row is marked to show that its attributes were in effect right up until the creation of the new row.
  • All fact rows going forward will be assigned the new surrogate key.
  • Existing fact rows keep old surrogate key.

Type 3:

  • Keeps separate columns for old and new attribute values (called alternative realities)
  • Can lead to very wide table.

Type 4:

  • Separate more frequently changing/used attributes into their own separate dimension table, a.k.a. mini-dimension.
  • Create surrogate primary key for mini-dimension with corresponding foreign key in fact table, distinct from the surrogate key of the base dimension.
30
Q

Describe Dates

A

Fundamental business dimension across all organizations and industries.

One row for every day for which you expect data in the fact table.

Grain is at a day level rather than time of day.

It is a role playing dimension.

Use surrogate key for date. This helps to manage the problems of missing dates.

31
Q

What are degenerate dimensions?

A

Transaction identifiers often end up as degenerate dimensions without joining to an actual dimension table. It does not exist outside the transaction and has no associated descriptive attributes that aren’t already in dimensions.

Include in fact table because it serves an analytical purpose - tie together all line items in a market basket.

32
Q

What are Hierarchies?

A

Meaningful, standard ways to group data within a dimension.

Possibly many hierarchies in one dimension.

Allows drill down - begin with big picture and “drill down” to lower levels in the hierarchy.

Main paths for summarizing the data.

Flattening hierarchies is one of the main design tasks of creating a dimensional table.

33
Q

What are junk dimensions?

A

Combination of columns of the separate small dimensions into a single table.

One row for each combination of values.

One fact row joins to one dimension row.

34
Q

Two ways of aggregating dimensions

A
  1. Remove dimension entirely

2. Roll up in a dimension’s hierarchy.