Model - Optimise model performance Flashcards
How does cardinality impact model performance?
- Power BI uses a columnstore database
- This means that it is the number of distinct values that matters not the number of rows
Example
- You could have 3 distinct values or 300 rows but it is the distinct values that impacts performance
Name some data loading best practice?
- Only include data that is needed for reporting
- You can add more data later
- Only load queries that you need
What is the best practice way to remove rows?
- Necessary only = only load data necessary for you analysis (e.g., UK sales for UK demand analysis instead of global sales)
- Use Parameters = can use parameters to make this more manageable AND also makes it possible to change filters once the dataset is published in the service
- Filter dates = only uses time-period you want (i.e., usually the most recent time period)
- Filtering rows after you create reports won’t break any visuals in existing reports
What is the best practice way to remove columns?
- Remove primary keys for fact tables = some data warehouses contain primary keys for fact tables…primary keys in fact tables can occupy up to 50% of a data model without any benefit
- Use COUNTROWS = this is a more efficient way of counting rows in a fact table
- Removing columns will break data report visuals
What do columns in a data model do?
Columns in a data model serve one of two purposes: a) support visuals or calculations b) support both
How can you prevent new columns from breaking reports?
- Use the ‘remove all other columns’ function in Power Query
What is Performance Analyser?
- A feature that allows you to track performance bottlenecks
- Can track slow-performing visuals and see the DAX queries behind them
How does Performance Analyser work?
- It works by recording traces and then telling you how long each visual took to render
Explain how to use Performance Analyser?
- View > Performance analyser
- Start recording > perform actions (e.g., apply filters, refresh visuals) > see how long it takes it visual to render
- Identify the slowest visuals in the Performance Analyser pane in (ms)
- Each visual has a DAX query behind it which you can copy by expanding the line of the visual in Performance Analyser
How can you improve cardinality by changing data types?
- You can decrease the data size by reducing the cardinality of columns by changing data types
In Power BI two data types can be used for decimal numbers:
- Decimal number: can store more than four decimal numbers
- Fixed decimal numbers: can store only four
HOW:
- 1) If you don’t need more than four decimal places then change your data type to Fixed Decimal numbers
- 2) Split decimals into two columns (as close to data source as possible) this can decrease cardinality and decrease data model size and then aggregate the two using SUMX
- 3) Split Date/Time this will increase the number of duplicates but reduce cardinality
Recognise which columns could be split to reduce cardinality
How can you improve cardinality through summarisation?
- If your report contains too much detail (you want months but have daily) then you can reduce cardinality by summarising the data
- This can reduce size of the data model but make analysis of the lower grain impossible and this trade-off depends on business requirements
HOW
- Summarise data as close to the data source as possible
- Can use the Group by function in the Transform ribbon
How can aggregations impact data model performance?
- By using composite mode you can optimise the performance of data sources that use DirectQuery mode by using aggregations (use it as a transform)
How can you create an aggregation table?
- Aggregation table has to have some of the same keys and columns you want to aggregate (doesn’t have to come from data source)
3 ways include
- Import M query
- Calculated table in DAX
- Another table in DirectQuery mode
How can you manage an aggregation table?
Fields pane > Manage aggregations
For each column you select the appropriate summarisation and corresponding details table:
- Count
- GroupBy: used for relationship keys
- Max: correspond to max value
- Min: corresponds to min value
- Sum
- Count table rows
Select apply all
What happens if your queries cannot be resolved using an aggregation table?
In case the aggregation table is insufficient the detail table will be used
What are the benefits of efficient data models?
Fast running queries in the report
Fast running measures and calculations
What is the cause of bad model performance?
Bad performance is usually a result of:
Bad data models (too much unnecessary data)
Bad DAX (too many unnecessary row calculations)
Mix of the two
Name the characteristic of bad report performance? What’s the consequence of this?
From a report user’s perspective, poor performance is characterized by:
Loading pages into report taking too long
Visuals not responding fast enough
Tables not refreshing quickly enough
Large files put strain on organisations resources
The poor performance of a report leads to a negative user experience.
This poor performance results in a negative user experience.
Name the query optimisation process steps:
Minimising the size of the data model
Deleting unnecessary columns and rows.
Summarizing data where possible (must ensure correct data types are applied to enable this by analysing model metadata)
Reducing cardinalities.
Replacing numeric columns with measures.
Avoiding repeated values.
What are the limitations of data model optimisation practices?
- Need for a data engineer: keep in mind that, while the basic performance and best practices guidance in
- Power BI will lead you a long way, to optimize a data model for query performance, you will likely have to partner with a data engineer to drive data model optimizing in the source data sources.
Explain how data model size can impact performance?
- Power BI desktop file size (.pbix)
- The data that you load into Power BI Desktop is compressed and stored to the disk by the VertiPaq storage engine.
- The size of your data model has a direct impact on its performance; a smaller sized data model uses less resources (memory) and achieves faster data refresh, calculations, and rendering of visuals in reports.
When is a report likely to have poor performance?
If your data model has: multiple tables complex relationships intricate calculations multiple visuals redundant data
How to run performance analyser?
To ensure you get the most accurate results in your analysis (test) you need to clear the visual cache and data engine cache
Visual cache – (a) When you load a visual, you can’t clear this visual cache without closing Power BI Desktop and opening it again. To avoid any caching in play, you need to start your analysis with a clean visual cache. (b) To ensure that you have a clear visual cache, add a blank page to your Power BI Desktop (.pbix) file and then, with that page selected, save and close the file. Reopen the Power BI Desktop (.pbix) file that you want to analyze. It will open on the blank page.
Data engine cache: (a) When a query is run, the results are cached, so the results of your analysis will be misleading. You need to clear the data cache before rerunning the visual. (b) To clear the data cache, you can either restart Power BI Desktop or connect DAX Studio to the data model and then call Clear Cache.
After you have cleared cache go to performance analyser
Start recording and select the page of the report you want to analyse and select the elements you want to measure
When you are finished press stop
How to review the performance analyser results?
Check the performance analyser pane
Sort the results
DAX query - The time it took for the visual to send the query, along with the time it took Analysis Services to return the results.
Visual display - The time it took for the visual to render on the screen, including the time required to retrieve web images or geocoding.
Other - The time it took the visual to prepare queries, wait for other visuals to complete, or perform other background processing tasks. If this category displays a long duration, the only real way to reduce this duration is to optimize DAX queries for other visuals, or reduce the number of visuals in the report.