Model - Design a data model Flashcards
What is a data model?
A way to organise tables from data/information systems in a way which makes it easier for people to understand your data
Benefits of a good data model?
Data exploration is faster
Aggregations are simpler to build
Power BI Reports
Reports are more accurate
Writing reports takes less time
Reports are easier to maintain in the future
Explain how data models make data easier to understand?
One table with 30 columns would be harder to work with compared to those 30 columns being logically decomposed into 5 data tables. Alternatively, 30 tables with limited columns would be better illustrated in 5 tables.
What is best practice for data models?
Aim for simplicity: make the data model as intuitive as possible for the user (e.g., could a lay person understand?)
Make the data model as small as possible (i.e., fewer tables and fewer columns)
What are relationships in a data model?
Relationships are defined between tables through Primary and Foreign keys.
Primary key = columns that identify unique values
Foreign keys = reference rows or primary keys in another table
What is Power BI’s data modelling capability?
You can pull data from different data sources, create relationships between those tables and treat it as a unified dataset
What are the data schema components?
Tables: contain fields and values
Relationships between tables: primary keys and foreign keys
What is a database schema?
A database schema defines how data is organized within a relational database
What is a star schema?
A specific type of schema design that is optimized for high performance and usability.
Components of a star schema?
Tables are conceptually classified into two types:
Fact table
- Dimension table
Difference between measures and dimensions?
Dimensions contain qualitative values (such as names, dates, or geographical data). You can use dimensions to categorize, segment, and reveal the details in your data. Dimensions affect the level of detail in the view.
What are the differences between fact and dimension tables?
Fact table
- Observational/event data
- Contains measures and numbers
- Distinct values in multiple rows
Dimension table
- Contains details about the fact table
- Unique values appear in one row
What is the relationship between fact and dimension tables?
Fact tables are filtered by dimension tables
Example: Total sales can be filtered by the products
What makes a simple table structure?
Simple to navigate and user friendly = logical column and table properties
Have merged or appended tables to simplify the tables within your data structure.
Have good-quality relationships between tables that make sense.
How to configure data model and build relationships between tables?
Manage relationships
Configure relationships between tables
Explain the autodetect feature?
When you load you Queries into Power BI it will automatically establish relationships between columns based on column name.
When you load data, Power BI automatically looks for relationships that exist within the data by matching column names.
How to configure table and column properties?
Manage properties
Shortcut: Ctrl+clicking or Shift+clicking items on this page.
What are the main sections within the table properties pane?
General tab
Formatting tab
Advanced tab
How to configure many tables and fields at once?
Select many fields to configure at once: Power BI has a new functionality to update these properties on many tables and fields by Ctrl+clicking or Shift+clicking items.
Why is creating date tables useful?
Common business requirement to make calculations based on data and time
Businesses want to know how their business are doing over months, quarters and fiscal years etc
It is crucial to format time-oriented values correctly to enable time-based calculations
Some businesses might prefer date/time in particular format
Challenges with autodetect on date tables?
Power BI autodetects for data columns and tables however sometimes you will need to take extra steps to format the dates as the organisation requires
Why create a common date table?
Some tables have their own dates fields so you will need to create a common date table to enable interaction between the tables à need to build a common date table
What is the best practice around date tables?
Use a source date table because it is likely logically structured and shared with other applications
Source databases and data warehouses have their own date tables. These date tables can be used to perform tasks such as:
Identify company holidays
Separate calendar and fiscal year
Identify weekends versus weekdays
What are the 3 ways to create a common date table?
Source data
DAX
Power Query
How to use DAX to create a date table?
Use DAX functions to build your common date table.
CALENDAR(): returns a contiguous range of dates based on a start and end date that are entered as arguments in the function
CALENDARAUTO(): returns a contiguous, complete range of dates that are automatically determined from your dataset
How to mark as an official date table?
Right click on fields pane
Data table settings: choose specific columns to be marked as the date
What does marking as date table do?
Power BI performs validations to ensure:
that the data contains zero null values
is unique, and contains continuous date values over a period.
What is auto hierarchy?
When you are building visuals Power BI automatically enters values of the date type as a hierarchy (if not marked as date table)
How to remove auto hierarchy?
- Selecting Mark as date table will remove autogenerated hierarchies from the Date field in the table that you marked as a date table.
- Or establish a relationship between that field and the date table
- Turn off the Auto Date/Time feature.
How to establish a new date hierarchy?
You can manually add a hierarchy to your common date table by right-clicking the year, month, week, or day columns in the Fields pane and then selecting New hierarchy.
How can building a visual rely on a common date table?
You will need to connect your existing date columns with the new common date table which will enable the visualisation to work
What are dimensions?
Dimensions are details or descriptions collected about events connected in fact tables
What are hierarchies?
Data hierarchy refers to the systematic organization of data layers such as a parent-child relationships or tree structure.
How do hierarchies relate to dimensions?
You can use hierarchies to help find detail in dimension tables
What are hierarchies?
Natural segments in data that are capable of being decomposed
Systemic layers such as parent-child relationships or tree structures
What is a parent-child hierarchy?
This is a situation when a natural hierarchy exists in your data. For example multiple employees can have the same manager.
A specific column will determine the hierarchy
One column is only one level
What is flattening the parent-child hierarchy?
The process of viewing multiple child levels based on a top-level parent is known as flattening the hierarchy.
These uses multiple columns to indicate multiple levels
Flatten the hierarchy so you can see multiple individual levels
In this process, you are creating multiple columns in a table to show the hierarchical path of the parent to the child in the same record.
How to use DAX to flatten the parent-child hierarchy?
PATH function - Returns a delimited text with the identifiers of all the parents to the current row, starting with the oldest or top most until current.
What is a role-playing dimension?
Role-playing dimensions have multiple valid relationships with fact tables, meaning that the same dimension can be used to filter multiple columns or tables of data.
Why are role-playing dimensions important to understand?
As a result, you can filter data differently depending on what information you need to retrieve
What is data granularity?
Data granularity is the level of detail that is represented within your data
time-based detail
Defining the correct data granularity can have a big impact on the performance and usability of your Power BI reports and visuals.
What are the implications of data granularity?
Slower Reports: Generally, the fewer the records that you are working with, the faster your reports and visuals will function. This approach translates to a faster refresh rate for the entire dataset, which might mean that you can refresh more frequently.
More scope for analysis: if your users want to drill into every single transaction, summarizing the granularity will prevent them from doing that, which can have a negative impact on the user experience. It is important to negotiate the level of data granularity with report users so they understand the implications of these choices.
Why do you need to change data granularity to build a relationship?
Incongruent data granularities need to be reconciled before you build relationships between tables (e.g., weekly vs monthly time). Solution is to transform the granularities so they have common values. This is so you can match the granularity format
Why check autodetect relationships?
You need to make sure that the relationships accurately reflect those that exist in your data
What is data cardinality?
Within data modelling, the cardinality of a join between two tables is the numerical relationship between rows of one table and rows in the other.
What is cardinality best practice?
Avoid one-to-one: Is not recommended because this relationship stores redundant information and suggests that the model is not designed correctly. It is better practice to combine the tables.
Avoid many-to-many: a lack of unique values introduces ambiguity and your users might not know which column of values is referring to what.
Explain the types of cardinality within Power BI?
Many-to-one (*:1) or one-to-many (1: *) cardinality:
Describes a relationship in which you have many instances of a value in one column that are related to only one unique corresponding instance in another column.
Describes the directionality between fact and dimension tables.
Is the most common type of directionality and is the Power BI default when you are automatically creating relationships.
One-to-one (1:1) cardinality:
Describes a relationship in which only one instance of a value is common between two tables.
Requires unique values in both tables.
Many-to-many (.) cardinality:
Describes a relationship where many values are in common between two tables.
Does not require unique values in either table in a relationship.
What is cross-filter direction?
Data can be filtered on one or both sides of a relationship.
Explain the two different approaches to cross-filtering?
With a single cross-filter direction:
Only one table in a relationship can be used to filter the data. For instance, Table 1 can be filtered by Table 2, but Table 2 cannot be filtered by Table 1.
For a one-to-many or many-to-one relationship, the cross-filter direction will be from the “one” side, meaning that the filtering will occur in the table that has many values.
With both cross-filter directions or bi-directional cross-filtering:
One table in a relationship can be used to filter the other. For instance, a dimension table can be filtered through the fact table, and the fact tables can be filtered through the dimension table.
You might have lower performance when using bi-directional cross-filtering with many-to-many relationships.
What is best practice for relationships and cardinality?
A word of caution regarding bi-directional cross-filtering: You should not enable bi-directional cross-filtering relationships unless you fully understand the ramifications of doing so. Enabling it can lead to ambiguity, over-sampling, unexpected results, and potential performance degradation.
Arrows should point to fact tables
many-to-many relationships and/or bi-directional relationships are complicated. Unless you are certain what your data looks like when aggregated, these types of open-ended relationships with multiple filtering directions can introduce multiple paths through the data.
Explain the link between cardinality and cross-filter direction?
One-to-one relationships:
The only option available is bi-directional cross filtering
Data can be filtered on either side of the relationship and result in only one value (e.g. they both return the same value both sides of the relationship)
Many-to-many relationships:
you can choose to filter in a single direction or in both directions by using bi-directional cross-filtering
The ambiguity that is associated with bi-directional cross-filtering is amplified in a many-to-many relationship because multiple paths will exist between different tables.
If you create a measure, calculation, or filter, unintended consequences can occur where your data is being filtered and, depending on which relationship that the Power BI engine chooses when applying the filter, the final result might be different.
What are modelling challenges?
When the relationships established in the data model are failing to interact with visualisation
What is a circular relationship?
In the world of relational databases circular references are schema structures where foreign keys relating the tables create a loop. Circular references cause special types of issues when trying to synchronize two relational database where the foreign keys are enforced
This web of relationships is difficult to manage and becomes a daunting task to build visuals because it is no longer clear what relationships exist. Therefore, it is important that you are able to identify circular relationships so that your data is usable.
If there are circular references (“loops”) in a data structure, the tables are associated in such a way that there is more than one path of associations between two fields.
This type of data structure should be avoided as much as possible, since it might lead to ambiguities in the interpretation of data.