Model - Optimise model performance Flashcards

1
Q

How does cardinality impact model performance?

A
  • Power BI uses a columnstore database
  • This means that it is the number of distinct values that matters not the number of rows

Example
- You could have 3 distinct values or 300 rows but it is the distinct values that impacts performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name some data loading best practice?

A
  • Only include data that is needed for reporting
  • You can add more data later
  • Only load queries that you need
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the best practice way to remove rows?

A
  • Necessary only = only load data necessary for you analysis (e.g., UK sales for UK demand analysis instead of global sales)
  • Use Parameters = can use parameters to make this more manageable AND also makes it possible to change filters once the dataset is published in the service
  • Filter dates = only uses time-period you want (i.e., usually the most recent time period)
  • Filtering rows after you create reports won’t break any visuals in existing reports
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the best practice way to remove columns?

A
  • Remove primary keys for fact tables = some data warehouses contain primary keys for fact tables…primary keys in fact tables can occupy up to 50% of a data model without any benefit
  • Use COUNTROWS = this is a more efficient way of counting rows in a fact table
  • Removing columns will break data report visuals
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What do columns in a data model do?

A

Columns in a data model serve one of two purposes: a) support visuals or calculations b) support both

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can you prevent new columns from breaking reports?

A
  • Use the ‘remove all other columns’ function in Power Query
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Performance Analyser?

A
  • A feature that allows you to track performance bottlenecks
  • Can track slow-performing visuals and see the DAX queries behind them
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does Performance Analyser work?

A
  • It works by recording traces and then telling you how long each visual took to render
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain how to use Performance Analyser?

A
  • View > Performance analyser
  • Start recording > perform actions (e.g., apply filters, refresh visuals) > see how long it takes it visual to render
  • Identify the slowest visuals in the Performance Analyser pane in (ms)
  • Each visual has a DAX query behind it which you can copy by expanding the line of the visual in Performance Analyser
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can you improve cardinality by changing data types?

A
  • You can decrease the data size by reducing the cardinality of columns by changing data types

In Power BI two data types can be used for decimal numbers:

  • Decimal number: can store more than four decimal numbers
  • Fixed decimal numbers: can store only four

HOW:

  • 1) If you don’t need more than four decimal places then change your data type to Fixed Decimal numbers
  • 2) Split decimals into two columns (as close to data source as possible) this can decrease cardinality and decrease data model size and then aggregate the two using SUMX
  • 3) Split Date/Time this will increase the number of duplicates but reduce cardinality

Recognise which columns could be split to reduce cardinality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can you improve cardinality through summarisation?

A
  • If your report contains too much detail (you want months but have daily) then you can reduce cardinality by summarising the data
  • This can reduce size of the data model but make analysis of the lower grain impossible and this trade-off depends on business requirements

HOW

  • Summarise data as close to the data source as possible
  • Can use the Group by function in the Transform ribbon
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can aggregations impact data model performance?

A
  • By using composite mode you can optimise the performance of data sources that use DirectQuery mode by using aggregations (use it as a transform)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can you create an aggregation table?

A
  • Aggregation table has to have some of the same keys and columns you want to aggregate (doesn’t have to come from data source)

3 ways include

  • Import M query
  • Calculated table in DAX
  • Another table in DirectQuery mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can you manage an aggregation table?

A

Fields pane > Manage aggregations

For each column you select the appropriate summarisation and corresponding details table:

  • Count
  • GroupBy: used for relationship keys
  • Max: correspond to max value
  • Min: corresponds to min value
  • Sum
  • Count table rows

Select apply all

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What happens if your queries cannot be resolved using an aggregation table?

A

In case the aggregation table is insufficient the detail table will be used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the benefits of efficient data models?

A

Fast running queries in the report

Fast running measures and calculations

17
Q

What is the cause of bad model performance?

A

Bad performance is usually a result of:
Bad data models (too much unnecessary data)
Bad DAX (too many unnecessary row calculations)
Mix of the two

18
Q

Name the characteristic of bad report performance? What’s the consequence of this?

A

From a report user’s perspective, poor performance is characterized by:
Loading pages into report taking too long
Visuals not responding fast enough
Tables not refreshing quickly enough
Large files put strain on organisations resources
The poor performance of a report leads to a negative user experience.
This poor performance results in a negative user experience.

19
Q

Name the query optimisation process steps:

A

Minimising the size of the data model
Deleting unnecessary columns and rows.
Summarizing data where possible (must ensure correct data types are applied to enable this by analysing model metadata)
Reducing cardinalities.
Replacing numeric columns with measures.
Avoiding repeated values.

20
Q

What are the limitations of data model optimisation practices?

A
  • Need for a data engineer: keep in mind that, while the basic performance and best practices guidance in
  • Power BI will lead you a long way, to optimize a data model for query performance, you will likely have to partner with a data engineer to drive data model optimizing in the source data sources.
21
Q

Explain how data model size can impact performance?

A
  • Power BI desktop file size (.pbix)
  • The data that you load into Power BI Desktop is compressed and stored to the disk by the VertiPaq storage engine.
  • The size of your data model has a direct impact on its performance; a smaller sized data model uses less resources (memory) and achieves faster data refresh, calculations, and rendering of visuals in reports.
22
Q

When is a report likely to have poor performance?

A
If your data model has:
 multiple tables
 complex relationships
 intricate calculations
 multiple visuals
 redundant data
23
Q

How to run performance analyser?

A

To ensure you get the most accurate results in your analysis (test) you need to clear the visual cache and data engine cache

Visual cache – (a) When you load a visual, you can’t clear this visual cache without closing Power BI Desktop and opening it again. To avoid any caching in play, you need to start your analysis with a clean visual cache. (b) To ensure that you have a clear visual cache, add a blank page to your Power BI Desktop (.pbix) file and then, with that page selected, save and close the file. Reopen the Power BI Desktop (.pbix) file that you want to analyze. It will open on the blank page.

Data engine cache: (a) When a query is run, the results are cached, so the results of your analysis will be misleading. You need to clear the data cache before rerunning the visual. (b) To clear the data cache, you can either restart Power BI Desktop or connect DAX Studio to the data model and then call Clear Cache.

After you have cleared cache go to performance analyser

Start recording and select the page of the report you want to analyse and select the elements you want to measure

When you are finished press stop

24
Q

How to review the performance analyser results?

A

Check the performance analyser pane

Sort the results

DAX query - The time it took for the visual to send the query, along with the time it took Analysis Services to return the results.

Visual display - The time it took for the visual to render on the screen, including the time required to retrieve web images or geocoding.

Other - The time it took the visual to prepare queries, wait for other visuals to complete, or perform other background processing tasks. If this category displays a long duration, the only real way to reduce this duration is to optimize DAX queries for other visuals, or reduce the number of visuals in the report.

25
Q

Explain how you can improve the performance of visuals?

A

DON’T reduce user experience: you should find a way to improve performance with minimal impact to user experience.

Use visuals sparingly: Consider the number of visuals on the report page; less visuals means better performance.

Use visual alternatives: Rather than using multiple visuals on the page, consider other ways to provide additional details, such as drill-through pages and report page tooltips.

Examine the number of fields in each visual.

120 seconds: Investigate DAX queries taking longer than 120 milliseconds

26
Q

What is the problems with performance analyser on DAX and how can you solve this?

A

Highlights issues but does not tell you how to improve them

You can use DAX Studio to investigate your queries in more detail.

Try different DAX functions to see if it improves performance

significantly reduce the evaluation duration time for this query.

27
Q

What is the auto date/time feature?

A

By default, this feature is enabled globally, which means that Power BI Desktop automatically creates a hidden calculated table for each date column, provided that certain conditions are met. The new, hidden tables are in addition to the tables that you already have in your dataset.

The Auto date/time option allows you to work with time intelligence when filtering, grouping, and drilling down through calendar time periods.

28
Q

Auto date/time best practice?

A

We recommend that you keep the Auto date/time option enabled only when you work with calendar time periods and when you have simplistic model requirements in relation to time.

If your data source already defines a date dimension table, that table should be used to consistently define time within your organization, and you should disable the global Auto date/time option. Disabling this option can lower the size of your data model and reduce the refresh time.

You can enable/disable this Auto date/time option globally so that it applies to all of your Power BI Desktop files, or you can enable/disable the option for the current file so that it applies to an individual file only.

29
Q

Why are variables important?

A

Some expressions involve the use of many nested functions and the reuse of expression logic. These expressions take a longer time to process and are difficult to read and, therefore, troubleshoot.

Using variables in your DAX formulas can help you write more complex and efficient calculations. Variables can improve performance and reliability, and readability, and reduce complexity.

30
Q

What are the benefits of variables?

A
  • Improve performance
  • Improve readability
  • Simplify debugging
  • Reduce complexity
31
Q

How do variables work?

A

The VAR keyword introduces the definition of a variable.

You can have as many variables as needed in a single expression, and each one has its own VAR definition.

The RETURN keyword defines the expression to return as the result. Inside RETURN expression, you can use the variables, which are replaced by the computed value.

32
Q

What are the two meanings of cardinality?

A

Cardinality is a term that is used to describe the uniqueness of the values in a column.

Cardinality is also used in the context of the relationships between two tables, where it describes the direction of the relationship.

33
Q

How can you identify cardinality levels in columns?

A

Low Cardinality: A column that has a lot of repeated values in its range (distinct count is low) will have a low level of cardinality.

High Cardinality: Conversely, a column that has a lot of unique values in its range (unique count is high) will have a high level of cardinality.

34
Q

What is the caveat of reducing cardinality levels (i.e., reduce the data loaded into models)?

A

Lose informational value: Deciding to summarize fact-type data will always involve a tradeoff with the detail of your data.

Lose drill-downs: A disadvantage is that you may lose the ability to drill into data because the detail no longer exists. This tradeoff could be mitigated by using a mixed model design.

35
Q

Best practice for reducing cardinality?

A

In Power BI Desktop, a Mixed mode design produces a composite model. Essentially, it allows you to determine a storage mode for each table. Therefore, each table can have its Storage Mode property set as Import or DirectQuery.

An effective technique to reduce the model size is to set the Storage Mode property for larger fact-type tables to DirectQuery. This design approach can work well in conjunction with techniques that are used to summarize your data.

36
Q

Factors impacting DirectQuery query performance?

A

Underlying data source

Network latency; faster networks return data quicker.

The performance of the data source’s server and how many other workloads are on that server. For example, consider the implications of a server refresh taking place while hundreds of people are using the same server for different reasons.

37
Q

What are the limitations of DirectQuery connections?

A

Performance - As previously discussed, your overall user experience depends heavily on the performance of the underlying data source.

Security - If you use multiple data sources in a DirectQuery model, it is important to understand how data moves between the underlying data sources and the associated security implications. You should also identify if security rules are applicable to the data in your underlying source because, in Power BI, every user can see that data.

Data transformation - Compared to imported data, data that is sourced from DirectQuery has limitations when it comes to applying data transformation techniques within Power Query Editor. For example, if you connect to an OLAP source, such as SAP BW, you can’t make any transformations at all; the entire external model is taken from the data source. If you want to make any transformations to the data, you will need to do this in the underlying data source.

Modelling - Some of the modelling capabilities that you have with imported data aren’t available, or are limited, when you use DirectQuery.

Reporting – Almost all the reporting capabilities that you have with imported data are also supported for DirectQuery models, provided that the underlying source offers a suitable level of performance. However, when the report is published in Power BI service, the Quick Insights and Q&A features are not supported. Also, the use of the Explore feature in Excel will likely result in poorer performance.

38
Q

What are the steps in optimising data models?

A

Optimise data source as much as possible (i.e., tune database)

Take further action within Power BI desktop using performance analyser

39
Q

What is a query reduction option?

A

Power BI Desktop gives you the option to send fewer queries and to disable certain interactions that will result in a poor experience if the resulting queries take a long time to run. Applying these options prevents queries from continuously hitting the data source, which should improve performance.