Schemas (Data Models) Flashcards
5 steps in the data model design process
- Connect to Data Sources
- Prepare and Transform Data
- Configure Table and Column Properties
- Create Relationships in model
- Create Columns and Measures using DAX
What is a schema?
- A structure that defines the organization and relationships of tables within a dataset
- A logical framework of how the data is organized and connected
- Defines the rules that govern the relationships
Data Modeling in Power BI
Intro to Data Models Week 1
Benefits of an Optimized Data Model
- Improve performance significantly
- Easier Aggregation of structured data in a data model due to clear relationships and hierarchies
- Facilitates complex measures and and predictive analysis
- Improved Accuracy and Reliability of Reports
What are the benefits of creating an appropriately designed schema?
Data
1. Data Structure Definition
2. Faster Data Exploration
Analysis
3. Efficient analysis
4. Assists with visualizations
5. Meaningful Insights
6. Easier aggregation
Reporting
7. Precise reporting
8. Quicker report creation
9. Simpler report maintenance
Data Modeling in Power BI
Intro to Data Models Week 1
What are the three types of schemas?
- Flat Schema
- Star Schema
- Snowflake Schema
Data Modeling in Power BI
Intro to Data Models Week 1
What is a flat schema?
A flat schema stores all data in a single table
Data Modeling in Power BI
Intro to Data Models Week 1
What are three benefits of a flat schema?
- Easy to retrieve data from
- Less complex to analyze
- Simpler way to visualize data
Data Modeling in Power BI
Intro to Data Models Week 1
Disadvantages of flat schema
- Typically has a large dataset
- More suited to smaller datasets
- Can be slow to query
- Can be difficult to maintain
- Leads to data redundancy and data inconsistency
- Doesn’t allow for complex datasets that require more flexibility and detail
Data Modeling in Power BI
Intro to Data Models Week 1
What is a star schema?
- Allows for multiple tables to be connected through one central table
- A central fact table connects to multiple dimension tables
Data Modeling in Power BI
Intro to Data Models Week 1
Example of a Star Schema
What are the advantages of a star schema?
- Reduces data redundancy
- Boosts query performance
- Easy to understand
- The tables you have to navigate through to answer questions are minimal
Data Modeling in Power BI
Intro to Data Models Week 1
What are the disadvantages of a star schema?
-
Lacks flexibility
Adding or modifying tables can require extensive changes to the schema - Can struggle to manage complex relationships
Data Modeling in Power BI
Intro to Data Models Week 1
What is a snowflake schma?
- An extension of the star schema
- It breaks down the dimension tables into multiple related tables
- Can take advantage of an existing star schema and normalize it to create a snowflake schema
Data Modeling in Power BI
Intro to Data Models Week 1
Example of extending an existing star schema to make it a snowflake schema
What are the advantages of a snowflake schema?
General
- It provides for more efficient storage and retrieval
- It improves data integrity and consistency
- Eliminates data redundancy
- Offers scalability and flexibility by integrating new tables as required
- Less storage requirements
Analysis
- Allows for more granularity
- Improves data governance
- Easier to identify patterns and relationships
Dimension Hierarchies
- More flexibility with dimension hierarchies (think product to sub category to category)
- Facilitates sub dimensions within hierarchies
Data Modeling in Power BI
Intro to Data Models Week 1
What are the disadvantages of a snowflake schema?
- More difficult to analyze because of the extra relationships
- Can be more challenging to understand and manage
- Slower queries
Data Modeling in Power BI
Intro to Data Models Week 1
How do you validate a schema?
- Make sure each column has the correct data type
- Make sure each column has the correct formatting
- Make sure all columns have clear descriptions
- Make sure all table and column properties are correctly configured
Data Modeling in Power BI
Intro to Data Models Week 1
What is an important thing to do around summarization when setting up columns?
-
Set the default summarization for the column. This could be sum, count, avg, etc,
OR - You can choose DON’t SUMMARIZE.
Setting the default summarize behavior for the column ensures that when you bring the column into a visualization that automatically aggregates, it will apply the action you want.