Model - Optimise model performance Flashcards
How does cardinality impact model performance?
- Power BI uses a columnstore database
- This means that it is the number of distinct values that matters not the number of rows
Example
- You could have 3 distinct values or 300 rows but it is the distinct values that impacts performance
Name some data loading best practice?
- Only include data that is needed for reporting
- You can add more data later
- Only load queries that you need
What is the best practice way to remove rows?
- Necessary only = only load data necessary for you analysis (e.g., UK sales for UK demand analysis instead of global sales)
- Use Parameters = can use parameters to make this more manageable AND also makes it possible to change filters once the dataset is published in the service
- Filter dates = only uses time-period you want (i.e., usually the most recent time period)
- Filtering rows after you create reports won’t break any visuals in existing reports
What is the best practice way to remove columns?
- Remove primary keys for fact tables = some data warehouses contain primary keys for fact tables…primary keys in fact tables can occupy up to 50% of a data model without any benefit
- Use COUNTROWS = this is a more efficient way of counting rows in a fact table
- Removing columns will break data report visuals
What do columns in a data model do?
Columns in a data model serve one of two purposes: a) support visuals or calculations b) support both
How can you prevent new columns from breaking reports?
- Use the ‘remove all other columns’ function in Power Query
What is Performance Analyser?
- A feature that allows you to track performance bottlenecks
- Can track slow-performing visuals and see the DAX queries behind them
How does Performance Analyser work?
- It works by recording traces and then telling you how long each visual took to render
Explain how to use Performance Analyser?
- View > Performance analyser
- Start recording > perform actions (e.g., apply filters, refresh visuals) > see how long it takes it visual to render
- Identify the slowest visuals in the Performance Analyser pane in (ms)
- Each visual has a DAX query behind it which you can copy by expanding the line of the visual in Performance Analyser
How can you improve cardinality by changing data types?
- You can decrease the data size by reducing the cardinality of columns by changing data types
In Power BI two data types can be used for decimal numbers:
- Decimal number: can store more than four decimal numbers
- Fixed decimal numbers: can store only four
HOW:
- 1) If you don’t need more than four decimal places then change your data type to Fixed Decimal numbers
- 2) Split decimals into two columns (as close to data source as possible) this can decrease cardinality and decrease data model size and then aggregate the two using SUMX
- 3) Split Date/Time this will increase the number of duplicates but reduce cardinality
Recognise which columns could be split to reduce cardinality
How can you improve cardinality through summarisation?
- If your report contains too much detail (you want months but have daily) then you can reduce cardinality by summarising the data
- This can reduce size of the data model but make analysis of the lower grain impossible and this trade-off depends on business requirements
HOW
- Summarise data as close to the data source as possible
- Can use the Group by function in the Transform ribbon
How can aggregations impact data model performance?
- By using composite mode you can optimise the performance of data sources that use DirectQuery mode by using aggregations (use it as a transform)
How can you create an aggregation table?
- Aggregation table has to have some of the same keys and columns you want to aggregate (doesn’t have to come from data source)
3 ways include
- Import M query
- Calculated table in DAX
- Another table in DirectQuery mode
How can you manage an aggregation table?
Fields pane > Manage aggregations
For each column you select the appropriate summarisation and corresponding details table:
- Count
- GroupBy: used for relationship keys
- Max: correspond to max value
- Min: corresponds to min value
- Sum
- Count table rows
Select apply all
What happens if your queries cannot be resolved using an aggregation table?
In case the aggregation table is insufficient the detail table will be used