Model - Design a data model Flashcards
What is a data model?
A way to organise tables from data/information systems in a way which makes it easier for people to understand your data
Benefits of a good data model?
Data exploration is faster
Aggregations are simpler to build
Power BI Reports
Reports are more accurate
Writing reports takes less time
Reports are easier to maintain in the future
Explain how data models make data easier to understand?
One table with 30 columns would be harder to work with compared to those 30 columns being logically decomposed into 5 data tables. Alternatively, 30 tables with limited columns would be better illustrated in 5 tables.
What is best practice for data models?
Aim for simplicity: make the data model as intuitive as possible for the user (e.g., could a lay person understand?)
Make the data model as small as possible (i.e., fewer tables and fewer columns)
What are relationships in a data model?
Relationships are defined between tables through Primary and Foreign keys.
Primary key = columns that identify unique values
Foreign keys = reference rows or primary keys in another table
What is Power BI’s data modelling capability?
You can pull data from different data sources, create relationships between those tables and treat it as a unified dataset
What are the data schema components?
Tables: contain fields and values
Relationships between tables: primary keys and foreign keys
What is a database schema?
A database schema defines how data is organized within a relational database
What is a star schema?
A specific type of schema design that is optimized for high performance and usability.
Components of a star schema?
Tables are conceptually classified into two types:
Fact table
- Dimension table
Difference between measures and dimensions?
Dimensions contain qualitative values (such as names, dates, or geographical data). You can use dimensions to categorize, segment, and reveal the details in your data. Dimensions affect the level of detail in the view.
What are the differences between fact and dimension tables?
Fact table
- Observational/event data
- Contains measures and numbers
- Distinct values in multiple rows
Dimension table
- Contains details about the fact table
- Unique values appear in one row
What is the relationship between fact and dimension tables?
Fact tables are filtered by dimension tables
Example: Total sales can be filtered by the products
What makes a simple table structure?
Simple to navigate and user friendly = logical column and table properties
Have merged or appended tables to simplify the tables within your data structure.
Have good-quality relationships between tables that make sense.
How to configure data model and build relationships between tables?
Manage relationships
Configure relationships between tables
Explain the autodetect feature?
When you load you Queries into Power BI it will automatically establish relationships between columns based on column name.
When you load data, Power BI automatically looks for relationships that exist within the data by matching column names.
How to configure table and column properties?
Manage properties
Shortcut: Ctrl+clicking or Shift+clicking items on this page.
What are the main sections within the table properties pane?
General tab
Formatting tab
Advanced tab
How to configure many tables and fields at once?
Select many fields to configure at once: Power BI has a new functionality to update these properties on many tables and fields by Ctrl+clicking or Shift+clicking items.
Why is creating date tables useful?
Common business requirement to make calculations based on data and time
Businesses want to know how their business are doing over months, quarters and fiscal years etc
It is crucial to format time-oriented values correctly to enable time-based calculations
Some businesses might prefer date/time in particular format
Challenges with autodetect on date tables?
Power BI autodetects for data columns and tables however sometimes you will need to take extra steps to format the dates as the organisation requires
Why create a common date table?
Some tables have their own dates fields so you will need to create a common date table to enable interaction between the tables à need to build a common date table
What is the best practice around date tables?
Use a source date table because it is likely logically structured and shared with other applications
Source databases and data warehouses have their own date tables. These date tables can be used to perform tasks such as:
Identify company holidays
Separate calendar and fiscal year
Identify weekends versus weekdays
What are the 3 ways to create a common date table?
Source data
DAX
Power Query
How to use DAX to create a date table?
Use DAX functions to build your common date table.
CALENDAR(): returns a contiguous range of dates based on a start and end date that are entered as arguments in the function
CALENDARAUTO(): returns a contiguous, complete range of dates that are automatically determined from your dataset
How to mark as an official date table?
Right click on fields pane
Data table settings: choose specific columns to be marked as the date
What does marking as date table do?
Power BI performs validations to ensure:
that the data contains zero null values
is unique, and contains continuous date values over a period.
What is auto hierarchy?
When you are building visuals Power BI automatically enters values of the date type as a hierarchy (if not marked as date table)
How to remove auto hierarchy?
- Selecting Mark as date table will remove autogenerated hierarchies from the Date field in the table that you marked as a date table.
- Or establish a relationship between that field and the date table
- Turn off the Auto Date/Time feature.
How to establish a new date hierarchy?
You can manually add a hierarchy to your common date table by right-clicking the year, month, week, or day columns in the Fields pane and then selecting New hierarchy.
How can building a visual rely on a common date table?
You will need to connect your existing date columns with the new common date table which will enable the visualisation to work
What are dimensions?
Dimensions are details or descriptions collected about events connected in fact tables
What are hierarchies?
Data hierarchy refers to the systematic organization of data layers such as a parent-child relationships or tree structure.
How do hierarchies relate to dimensions?
You can use hierarchies to help find detail in dimension tables
What are hierarchies?
Natural segments in data that are capable of being decomposed
Systemic layers such as parent-child relationships or tree structures
What is a parent-child hierarchy?
This is a situation when a natural hierarchy exists in your data. For example multiple employees can have the same manager.
A specific column will determine the hierarchy
One column is only one level
What is flattening the parent-child hierarchy?
The process of viewing multiple child levels based on a top-level parent is known as flattening the hierarchy.
These uses multiple columns to indicate multiple levels
Flatten the hierarchy so you can see multiple individual levels
In this process, you are creating multiple columns in a table to show the hierarchical path of the parent to the child in the same record.
How to use DAX to flatten the parent-child hierarchy?
PATH function - Returns a delimited text with the identifiers of all the parents to the current row, starting with the oldest or top most until current.
What is a role-playing dimension?
Role-playing dimensions have multiple valid relationships with fact tables, meaning that the same dimension can be used to filter multiple columns or tables of data.
Why are role-playing dimensions important to understand?
As a result, you can filter data differently depending on what information you need to retrieve
What is data granularity?
Data granularity is the level of detail that is represented within your data
time-based detail
Defining the correct data granularity can have a big impact on the performance and usability of your Power BI reports and visuals.
What are the implications of data granularity?
Slower Reports: Generally, the fewer the records that you are working with, the faster your reports and visuals will function. This approach translates to a faster refresh rate for the entire dataset, which might mean that you can refresh more frequently.
More scope for analysis: if your users want to drill into every single transaction, summarizing the granularity will prevent them from doing that, which can have a negative impact on the user experience. It is important to negotiate the level of data granularity with report users so they understand the implications of these choices.
Why do you need to change data granularity to build a relationship?
Incongruent data granularities need to be reconciled before you build relationships between tables (e.g., weekly vs monthly time). Solution is to transform the granularities so they have common values. This is so you can match the granularity format
Why check autodetect relationships?
You need to make sure that the relationships accurately reflect those that exist in your data
What is data cardinality?
Within data modelling, the cardinality of a join between two tables is the numerical relationship between rows of one table and rows in the other.
What is cardinality best practice?
Avoid one-to-one: Is not recommended because this relationship stores redundant information and suggests that the model is not designed correctly. It is better practice to combine the tables.
Avoid many-to-many: a lack of unique values introduces ambiguity and your users might not know which column of values is referring to what.
Explain the types of cardinality within Power BI?
Many-to-one (*:1) or one-to-many (1: *) cardinality:
Describes a relationship in which you have many instances of a value in one column that are related to only one unique corresponding instance in another column.
Describes the directionality between fact and dimension tables.
Is the most common type of directionality and is the Power BI default when you are automatically creating relationships.
One-to-one (1:1) cardinality:
Describes a relationship in which only one instance of a value is common between two tables.
Requires unique values in both tables.
Many-to-many (.) cardinality:
Describes a relationship where many values are in common between two tables.
Does not require unique values in either table in a relationship.
What is cross-filter direction?
Data can be filtered on one or both sides of a relationship.
Explain the two different approaches to cross-filtering?
With a single cross-filter direction:
Only one table in a relationship can be used to filter the data. For instance, Table 1 can be filtered by Table 2, but Table 2 cannot be filtered by Table 1.
For a one-to-many or many-to-one relationship, the cross-filter direction will be from the “one” side, meaning that the filtering will occur in the table that has many values.
With both cross-filter directions or bi-directional cross-filtering:
One table in a relationship can be used to filter the other. For instance, a dimension table can be filtered through the fact table, and the fact tables can be filtered through the dimension table.
You might have lower performance when using bi-directional cross-filtering with many-to-many relationships.
What is best practice for relationships and cardinality?
A word of caution regarding bi-directional cross-filtering: You should not enable bi-directional cross-filtering relationships unless you fully understand the ramifications of doing so. Enabling it can lead to ambiguity, over-sampling, unexpected results, and potential performance degradation.
Arrows should point to fact tables
many-to-many relationships and/or bi-directional relationships are complicated. Unless you are certain what your data looks like when aggregated, these types of open-ended relationships with multiple filtering directions can introduce multiple paths through the data.
Explain the link between cardinality and cross-filter direction?
One-to-one relationships:
The only option available is bi-directional cross filtering
Data can be filtered on either side of the relationship and result in only one value (e.g. they both return the same value both sides of the relationship)
Many-to-many relationships:
you can choose to filter in a single direction or in both directions by using bi-directional cross-filtering
The ambiguity that is associated with bi-directional cross-filtering is amplified in a many-to-many relationship because multiple paths will exist between different tables.
If you create a measure, calculation, or filter, unintended consequences can occur where your data is being filtered and, depending on which relationship that the Power BI engine chooses when applying the filter, the final result might be different.
What are modelling challenges?
When the relationships established in the data model are failing to interact with visualisation
What is a circular relationship?
In the world of relational databases circular references are schema structures where foreign keys relating the tables create a loop. Circular references cause special types of issues when trying to synchronize two relational database where the foreign keys are enforced
This web of relationships is difficult to manage and becomes a daunting task to build visuals because it is no longer clear what relationships exist. Therefore, it is important that you are able to identify circular relationships so that your data is usable.
If there are circular references (“loops”) in a data structure, the tables are associated in such a way that there is more than one path of associations between two fields.
This type of data structure should be avoided as much as possible, since it might lead to ambiguities in the interpretation of data.
What is a relational dependency?
When a field depends on other fields to function correctly
For example TotalSales depends on Quantity x Price
Name the types of relationship dependencies between tables?
you can also have dependencies between Columns Measures Tables Relationships
What data modelling capabilities exist in Power BI?
- 1) creating relationships
- 2) Hierarchies
- 3) Calculations
When is DAX used?
- As soon as the data is loaded into the data model
- DAX is PBI’s native query language
What is a data model?
- One or more tables…potentially with relationships
Why are relationships important for reports?
- You must establish relationships before you use visuals and calculations that query different tables
What happens when queries are loaded?
- Once queries are loaded into Power BI desktop they become tables in Power BI
- Tables can then be organised into different data models
- Different data model types are called schemas
What are 3 of the most common data model types?
- Data models aka schemas
- 1) Flat (fully denormalised) schema
- 2) Star schema
- 3) Snowflake schema
What are flat (fully denormalised) schemas?
- All attributes are denormalised into a single table (e.g., one table with no relationship)
- Flat schemas often used when connecting to a single simple source (e.g., a table)
Evaluate flat schemas?
Advantages
- very performant and very efficient
- most cases you don’t have to establish relationships (no need for keys)
Negative
- Single tables can be cumbersome and difficult to navigate
- Columns and data can often be duplicated leading to large file size (data redundancy)
- Mixing facts with different grains results in more complex DAX formulas
- Avoid flat schema for complex data models
What are some characteristics of fact tables?
Called Fact or data tables
- 1) Metrics you want to aggregate
- 2) Have Foreign Keys that are required to create relationships with dimension tables
What are some characteristics of dimension tables?
Called Dimension or Lookup tables
- 1) Contain attributes that help you slice and dice fact tables
- 2) Contains a Primary Key (key column + descriptive columns)
How many fact tables are in a star schema?
- It is possible to have more than one fact table
Evaluate star schemas?
Advantages
- Preferred over star schemas due to their shortcomings (e.g., cumbersome)
- Fields are logically grouped so the model is easier to understand
- There is less duplication resulting in more efficient table storage
- You don’t need to write complex DAX formulas to work with fact tables with different fact grains
What is a snowflake schema?
- Similar to a star schema but has an extra dimension tables that “snowflake” from other dimension tables
When would you use a snowflake schema?
- Use when there are fact tables with different grains
Can you configure all tables?
- It depends on the storage mode
- Import mode allows more configurability
How can you configure table properties?
- Name
- Description: stored in models metadata (see it when you hover over the table)
- Synonyms: can add synonyms to help Q&A recognise the table
- Row label: tells which column values will serve as the row label for the table (e.g., ProductName column being the row label for the product table)
- Key column: this is for primary values
- Is hidden: can hide a table from fields pane
- Is featured table: could be a featured table which will allow it to be used in Excel
- Storage mode
How can you configure column properties?
- Name
- Description: stored in models metadata (see it when you hover over the table)
- Synonyms: can add synonyms to help Q&A recognise the column (SalesTargets may not be picked up by Q&A but Sales would)
- Display folder: group columns from same table into folders
- Is hidden: hide columns
- Data type: different data types to Power Query (e.g., percentage, Timezone, Location)
- Format: different data types have different format properties (e.g., percentage > decimal places)
- Sort by column: sort names by numbers (i.e., give ordinality to nominal labels)
- Data category: default is uncategorised…depending on data type (e.g., Location) you can select City, Continent etc
- Summarise by: determines how column is aggregated when you put it in the visual
- Is nullable: can disallow null values for a column (be careful because this can cause the refresh to fail if a null appears)
What does summarise by mean?
- Summarise by: determines how column is aggregated when you put it in the visual
- Option depends on data type (e.g., statistical functions for numbers or counts for text)
- Default is Don’t Summarise/None
- Power BI attempts to do this automatically but is not always correct
What are the troubleshoots with is nullable?
- Is nullable: can disallow null values for a column (be careful because this can cause the refresh to fail if a null appears)
How can you configure measures properties?
The same as column properties except
- Sort by column
- Summarise by
- Is nullable
What is the difference between FORMAT in DAX and formatting a column?
- DAX creates a new column and always outputs text
- Formatting a column retains the original data type
What is a measure?
- A measure is a dynamic evaluation of a DAX query that will change in response to interactions with other visuals
- This allows you to analyse your data further
What are the different ways to produce a quick measure?
- Right click on field or table
- Select Quick Measure from Home ribbon
- Select drop-down arrow next to the field
What calculations categories are available in quick measures?
- Aggregation per category
- Filters
- Time intelligence
- Totals
- Mathematical operations
- Text
What is a troubleshoot with hidden fields?
- They will not sure up in the Quick Measure dialog box
What are parent-child hierarchies?
Parent-child hierarchies are defined by two columns
- Node key = child only has one parent
- Parent key = parent can have many children
What are role-playing dimensions?
- Occurs when there is more than one way to filter a fact table by a dimension (e.g., data table can be filtered by two different date columns)
- You can have multiple physical relationships but only can be active (active relationship propagates filters)
- The functionality of role-playing dimensions will be configured for business requirements
Explain autodetect?
- Autodetect automatically detects relationships once you load data
- Looks for identical column names
- Can go wrong so you can turn it off (Options > Current File > Data load)
What information is displayed by a relationship in model view?
- Arrow (cardinality: one or many)
- Line (active/inactive: straight or dotted)
- Cross filter (filter propagation)
What is the difference between Merge and establishing relationships?
- Relationships established in PBI desktop and Merge is done in Power Query
- Merge can use multiple columns relationships are restricted to just one column
What is cardinality?
1) Define relationships
- Many = key appears more than once in the column
- One = key appears only once
2)
- Refers to the number of distinct values in a column
Explain and evaluate one-to-one cardinality?
- One-to-one is a special kind of relationship that only appears once on both sides of the relationship
- Useful for splitting a single dimension table with many columns into separate tables
- Only use if you are confident there will be no duplicates (duplicates would cause immediate errors in your data model)
What is the cross-filter direction for many-to-one and one-to-many relationships?
- They can use single and both
What is ‘Apply security filter in both directions’?
- Apply security filter in both directions enables the flow of row-level security in both directions
What is a common troubleshoot of cross-filter direction?
- A table may display just one value
- This is usually because the cross-filter direction is not enable both ways
What two things should you have in mind when performance tuning a data model?
- Business requirements: speed, security etc
- Constraints of data source: storage mode, transformations etc
What factors affect report performance?
- Storage mode
- Relationships
- Aggregations
- Cardinality
Explain how composite mode impacts performance?
- Relationships perform differently depending on the storage mode of the related tables
Explain how aggregations impacts performance?
- When using DirectQuery you can import some of the summarised data
- This is so some of the most frequently queried data stays in-memory
- More detailed data queried from underlying source
- This feature is called aggregations
Explain the island analogy?
- Import mode = resources on same island (i.e., it is queried from in-memory)
- Direct Query = have to swim to nearby island to get resources
Rank the relationships from fastest to slowest?
- One-to-many intra-island relationships
- Direct many-to-many relationships
- Many-to-many relationships with bridge tables
- Cross-island relationships
How does cardinality impact performance?
- Power BI imports table in columns not rows
- In general the fewer distinct values there are the better the performance
What are the most common reasons why many-to-many relationships occur?
- Many-to-many relationships between dimensions
- Relationships between tables at different granularities
Evaluate the different kinds of many-to-many relationships?
Direct many-to-many relationships
- This method performs well when the number of unique values on each side of the relationship is fewer than 1000
- Cannot use the RELATED function
Bridge table
- More efficient method when more than 1000 unique values
What is a bridge table?
- A table that allows you to create one-to-many relationships with each table that is in a many-to-many relationship
What are the different kinds of bridge table?
- One column with unique values: bridge table on the one side of the relationship (normally used for relating fact tables with different grains)
- Two column table with unique combination of values: the bridge table is on the many side of each relationship (commonly used for many-to-many relationships between dimensions)
Why is it best practice to create your own date table?
- Can use a calendar other than Gregorian
- You can have weeks in the calendar
- Filter multiple fact tables by using a single date dimension table
- Can configure to organisations preference (e.g., mmm-yy)
How to create a date table in Power Query?
- You can use the List.Date function
- Returns a list of dates > convert list to table > add columns
How can you create a date table in DAX?
- CALENDARAUTO = scans your model for dates and returns an appropriate date range automatically
- CALENDAR = requires you to provide start and end date
What is data granularity?
- Date grain refers to the level of detail a table can provide
Describe a common data granularity problem?
- For unsupported levels of granularity it won’t return any meaningful results (will return repeat value)
- For example: if you filter a date table by months instead of days
- Can use ISFILTERED function
What is data model development?
- The enhancements you make after loading your data model and creating relationships between tables
What is a common troubleshoot with a bi-directional cross-filter?
- The security filter will not be applied in both direction despite it being bi-directional
Evaluate ‘apply security filter in both directions’?
- It is by default set in one direction (security filter is passed on to filtered table)
- This setting affects performance of the data model so in some cases it may be undesirable
What are calculate tables?
- They are defined by DAX and can only be created once data is loaded into the data model or on new data generated by DAX
What are the use cases for calculated tables?
- Cloning tables
- Creating tables from different data sources
- Pre-calculating measures to improve performance
How can you clone a table?
- Data View > New table > “Clone table name” = ‘Existing table’
How can you create tables from different sources?
Use DISTINCT and UNION
New table = DISTINCT( UNION(
DISTINCT( Table[Column]),
DISTINCT(Table2[Column]) ) )
Why might you create tables from different sources?
- You can extract distinct values from different tables into one calculated table (improves performance instead of being in different islands)
- Use instead of bridge tables
Why might you precalculate measures?
- Some complex measures perform poorly
- You can precalculate them in a calculated table and then create measures that aggregate the precalculated values
- Usually works for additive measures
- Aggregations are an example of calculated tables
Explain the characteristics of hierarchies?
- One column can be part of multiple hierarchies
- Only use columns in same table
Why might you not use hierarchies?
- You can achieve the same result by dragging multiple fields into a table
- May be extra hassle to set up hierarchy
What main components are involved with developing data models?
- Hierarchies
- Quick measures
- Calculated table
- Calculated column
- Row-level security
- Relationships (cardinality, cross-filter, aggregations and storage mode)
What are calculated columns?
- DAX expression similar to calculated tables
- They add columns which widen the data tables and could slow down performance of the data model
Name some attributes of DAX?
- It is columnal and there is no concept of a cell
- To get to a specific value you must filter the column
- DAX is strongly typed so it is not possible to have different data types in one column
Should you use a calculated column or a measure?
- Should only use calculated column when you also want to use the generated values from a calculated column as filters or categories
What is row-level security?
- Row-level security restricts data by filtering at the row-level depending rules defined for each user
- Different report users can see different subsets of data
How to use row-level security?
- Define each role in Power BI desktop
- Assign individual users or Active Directory security groups to the roles in the Power BI service
Where does defining roles apply to?
- Only works for imported data and DirectQuery
- If you connect to a Power BI dataset or an Analysis Service data model Power BI will rely on row-level security configured at the source (which you cannot override by creating roles in Power BI desktop)
How do you configure row-level security?
Report View > Modelling > Manage roles
- Create new role (with user-friendly name)
- Duplicate
- Rename
- Delete
Explain how to use DAX expressions with row-level security?
DAX: for each role you can define a DAX expression
- Define a DAX expression for each role
- These expressions are evaluated against each rows of the relevant table
- ONLY those that are evaluated as TRUE will be visible
- A single user can have multiple security roles
Menu offers 3 options:
- Add filter
- Copy table filter from
- Clear table filter
- Validate expression = Verify DAX Expressions
- N.b. if you duplicate a role before it is verified the table filter will not be copied over to the new role
Describe the application of role-level security?
- It will only be applied at query time
- This means that the values are already filtered by the time they go into the data model
- Measures will therefore be based on the filtered rows
What is dynamic row-level security?
- Allows you to show different data to different users within the same role
Evaluate dynamic row-level security?
- Can use a single role which is preferable to large-scale implementation where there are many report users (won’t have to add or remove roles to the data model)
-
How to use dynamic row-level security?
- USERNAME = domain and login of the user
- USERPRINCIPLENAME = depending on how the active directory was set up the function usually returns the email address of the user
What is the Q&A feature?
- In Power BI desktop and service you can create visuals by asking specific questions
How to teach the Q&A feature?
- Terms not recognised appear in read
- To define terms select the ones not recognised and enter a new field
- This can be time consuming
Why would you use synonyms?
- Very good for keeping language consistent
- For example a firm may say units instead of quantity
- Enter singular