Schemas (Data Models) Flashcards
5 steps in the data model design process
- Connect to Data Sources
- Prepare and Transform Data
- Configure Table and Column Properties
- Create Relationships in model
- Create Columns and Measures using DAX
What is a schema?
- A structure that defines the organization and relationships of tables within a dataset
- A logical framework of how the data is organized and connected
- Defines the rules that govern the relationships
Data Modeling in Power BI
Intro to Data Models Week 1
Benefits of an Optimized Data Model
- Improve performance significantly
- Easier Aggregation of structured data in a data model due to clear relationships and hierarchies
- Facilitates complex measures and and predictive analysis
- Improved Accuracy and Reliability of Reports
What are the benefits of creating an appropriately designed schema?
Data
1. Data Structure Definition
2. Faster Data Exploration
Analysis
3. Efficient analysis
4. Assists with visualizations
5. Meaningful Insights
6. Easier aggregation
Reporting
7. Precise reporting
8. Quicker report creation
9. Simpler report maintenance
Data Modeling in Power BI
Intro to Data Models Week 1
What are the three types of schemas?
- Flat Schema
- Star Schema
- Snowflake Schema
Data Modeling in Power BI
Intro to Data Models Week 1
What is a flat schema?
A flat schema stores all data in a single table
Data Modeling in Power BI
Intro to Data Models Week 1
What are three benefits of a flat schema?
- Easy to retrieve data from
- Less complex to analyze
- Simpler way to visualize data
Data Modeling in Power BI
Intro to Data Models Week 1
Disadvantages of flat schema
- Typically has a large dataset
- More suited to smaller datasets
- Can be slow to query
- Can be difficult to maintain
- Leads to data redundancy and data inconsistency
- Doesn’t allow for complex datasets that require more flexibility and detail
Data Modeling in Power BI
Intro to Data Models Week 1
What is a star schema?
- Allows for multiple tables to be connected through one central table
- A central fact table connects to multiple dimension tables
Data Modeling in Power BI
Intro to Data Models Week 1
Example of a Star Schema
What are the advantages of a star schema?
- Reduces data redundancy
- Boosts query performance
- Easy to understand
- The tables you have to navigate through to answer questions are minimal
Data Modeling in Power BI
Intro to Data Models Week 1
What are the disadvantages of a star schema?
-
Lacks flexibility
Adding or modifying tables can require extensive changes to the schema - Can struggle to manage complex relationships
Data Modeling in Power BI
Intro to Data Models Week 1
What is a snowflake schma?
- An extension of the star schema
- It breaks down the dimension tables into multiple related tables
- Can take advantage of an existing star schema and normalize it to create a snowflake schema
Data Modeling in Power BI
Intro to Data Models Week 1
Example of extending an existing star schema to make it a snowflake schema
What are the advantages of a snowflake schema?
General
- It provides for more efficient storage and retrieval
- It improves data integrity and consistency
- Eliminates data redundancy
- Offers scalability and flexibility by integrating new tables as required
- Less storage requirements
Analysis
- Allows for more granularity
- Improves data governance
- Easier to identify patterns and relationships
Dimension Hierarchies
- More flexibility with dimension hierarchies (think product to sub category to category)
- Facilitates sub dimensions within hierarchies
Data Modeling in Power BI
Intro to Data Models Week 1
What are the disadvantages of a snowflake schema?
- More difficult to analyze because of the extra relationships
- Can be more challenging to understand and manage
- Slower queries
Data Modeling in Power BI
Intro to Data Models Week 1
How do you validate a schema?
- Make sure each column has the correct data type
- Make sure each column has the correct formatting
- Make sure all columns have clear descriptions
- Make sure all table and column properties are correctly configured
Data Modeling in Power BI
Intro to Data Models Week 1
What is an important thing to do around summarization when setting up columns?
-
Set the default summarization for the column. This could be sum, count, avg, etc,
OR - You can choose DON’t SUMMARIZE.
Setting the default summarize behavior for the column ensures that when you bring the column into a visualization that automatically aggregates, it will apply the action you want.
What settings are important to set up for a column when working on the data model/schema?
- Data Type
- Format
- Summarization (default you want)
- Data Category
- Sort by Column
- Data Groups
Data Modeling in Power BI
Intro to Data Models Week 1
What does Sort Order on a column do?
Specifies the default sorting of a columns values such as:
- Ascending
- Descending
- Custom
Data Modeling in Power BI
Intro to Data Models Week 1
What is the merge function the same as in database language?
A Join
Merging in PowerBI joins two data sources based on a common key.
Writing a long sentence so it will left justify
Data Modeling in Power BI
Intro to Data Models Week 1
How are the columns from the table you merged with represented right after the merge?
- They are all in one column where every value says Table.
- You have to expand the column and choose which columns from the table you merged to that you want to include.
Data Modeling in Power BI
Intro to Data Models Week 1
How do you know if an aggregation default has been applied to a column just by looking at the column list?
- There will be a ∑ in front of the column
- If there is no ∑ then the default is DON’t SUMMARIZE for that column
Data Modeling in Power BI
Intro to Data Models Week 1
What do fact tables hold?
- Quantifiable, measurable data about a business
- All foreign keys of the dimension tables
- Facts or measures (a property on which calculations can be made) such as quantity sold, amount sold and average sales.
What do dimension tables hold?
- Description information for all the measurements in the fact table
-
Descriptive attributes related to fact data
They offer the context surrounding a business event
What is normalization?
- A data model design technique that involves structuring data to minimize redundancy to ensure data integrity
- It divides data into multiple related tables, each with a specific purpose
- Reduces data duplication
Why might you normalize data?
To reduce data redundancy
What are some advantages of normalization?
- Remove data redundancy
- Improve data integrity
- Easily maintain your data model
What is denormalization?
- Converting the normalized schema into a schema that has redundant information.
- Implementing denormalization helps to avoid expensive queries between the tables but at the cost of creating redundant data
What are the benefits of denormalization?
- Instead of loading multiple tables you load one large table, which may be more efficient in performance
- With no relationships between tables, queries to join the data are reduced
- Filters amongst tables may be less efficient than filters applied to a single table
- Establishing a hierarchy from fields within a table is simpler
Does normalization or denormalization optimize memory?
Normalization optimizes memory use since duplicate data is eliminated
What is an advantage of normalization over denormalization?
- Data integrity is maintained during the normalization process
- It is harder to maintain with a denormalized model.
What factors should you take into account when choosing normalization or denormalization?
What strikes the right balance between:
- Data integrity
- Query performance
- Ease of maintenance.
There is no one-size-fits-all solution, and the choice depends on your project’s specific business and analytical requirements.
What is cardinality?
- Cardinality refers to the nature of the relationship between two datasets.
- In other words, how tables in your database relate to each other
What happens if cardinality is not configured correctly?
This can lead to:
- innacurate data analysis
- flawed business decisions
What are the three types of cardinalities?
- one-to-one
- one-to-many
- many-to-many
Why is assessing the granularity of data important?
- It is crucial for establishing the correct cardinality
- It influences how you set up your cross-filter direction in Power BI
What can happen if you incorrectly assess granularity?
It can lead to:
- Misrepresented data
- Incorrect business insights
What does cross-Filter direction in Power BI determine?
The direction of automatic filtering between tables
How are Power BI relationships different than other database management systems?
- They are directional in nature
- The direction significantly impacts how filtering operates
How does a single direction cross-filter work?
- A filter applied to table B will affect table A, but not vice versa.
- For example, filtering on a dimension table filters the fact table it has a single direction cross-filter to. However, filtering on the fact table will not filter the dimension table
How does a bi-directional cross-filter work?
Bi-directional cross-filters
- Sales (Fact table) to Product (Dimension table)
- Sales (Fact table) to Salesperson (Dimension table)
Number of products sold by each salesperson
1. You apply a filter on Salesperson dimension table
- This automatically filters the Sales fact table since there is a bi-directional cross-filter between Sales fact table and Salesperson dimension table
- Then this automatically filters the Product dimension table since there is a bi-directional cross-filter between Sales fact table and Product dimension table
Disadvantages of bi-directional cross-filter direction
- Can negatively impact performance
- Can result in ambiguous filter propagation (spreading throughout the model) paths
What are the steps for setting up a star schema in Power BI
- Disable auto detect
- Load Fact and Dimension tables
- Create relationships
- Set cardinality
- Set cross-filter direction
How can you normalize tables?
Create calculated tables using DAX