3B. Choose a Power BI model framework Flashcards
What is a Power BI data model?
A query-able data resource that’s optimized for analytics. Reports can query data models by using one of two analytic languages: Data Analysis Expressions (DAX) or Multidimensional Expressions (MDX). Power BI uses DAX, while paginated reports can use either DAX or MDX. The Analyze in Excel features uses MDX.
What is a Power BI dataset?
You develop a Power BI model in Power BI Desktop, and once published to a workspace in the Power BI service, it’s then known as a dataset. A dataset is a Power BI artifact that’s a source of data for visualizations in Power BI reports and dashboards.
What is an analytic query?
Power BI reports and dashboards must query a dataset. When Power BI visualizes dataset data, it prepares and sends an analytic query. An analytic query produces a query result from a model that’s easy for a person to understand, especially when visualized.
An analytic query has three phases that are executed in this order:
- Filtering (sometimes known as slicing) narrows down on a subset of the model data. Filter values aren’t visible in the query result. Most analytic queries apply filters because it’s common to filter by a time period, and usually other attributes. Filtering happens in different ways. In a Power BI report, you can set filters at report, page, or visual level. Report layouts often include slicer visuals to filter visuals on the report page. When the model enforces row-level security (RLS), it applies filters to model tables to restrict access to specific data. Measures, which summarize model data, can also apply filters.
- Grouping (sometimes known as dicing) divides query result into groups. Each group is also a filter, but unlike the filtering phase, filter values are visible in the query result. For example, grouping by customer filters each group by customer.
- Summarization produces a single value result. Typically, a report visual summarizes a numeric field by using an aggregate function. Aggregate functions include sum, count, minimum, maximum, and others. You can achieve simple summarization by aggregating a column, or you can achieve complex summarization by creating a measure using a DAX formula.
What is a tabular model?
A Power BI model is a tabular model. A tabular model comprises one or more tables of columns. It can also include relationships, hierarchies, and calculations.
What is a star schema design, and how does it relate to analytic queries?
A mature modeling approach widely adopted by relational data warehouses. It requires you to classify model tables as either dimension or fact. A fact table forms the center of a star, and the related dimension tables form the points of the star.
In an analytic query, dimensions table columns filter or group. Fact table columns are summarized.
What are the table storage modes?
Each Power BI model table (except calculated tables) has a storage mode property. The storage mode property can be either Import, DirectQuery, or Dual, and it determines whether table data is stored in the model.
- Import – Queries retrieve data that’s stored, or cached, in the model.
- DirectQuery – Queries pass through to the data source.
- Dual – Queries retrieve stored data or pass through to the data source. Power BI determines the most efficient plan, striving to use cached data whenever possible.
What is a model framework?
Table storage mode settings determine the model framework, which can be either import, DirectQuery, or composite. The following units in this module describe each of these frameworks and provides guidance on their use.
- An import model comprises tables that have their storage mode property set to Import. It includes calculated tables, which you define with a DAX formula, too.
- A DirectQuery model comprises tables that have their storage mode property set to DirectQuery, and they belong to the same source group. Source group is described later in this module.
- A composite model comprises more than one source group.
What are the benefits of Import models?
- Support all Power BI data source types, including databases, files, feeds, web pages, dataflows, and more.
- Can integrate source data. For example, one table sources its data from a relational database while a related table sources its data from a web page.
- Support all DAX and Power Query (M) functionality.
- Support calculated tables.
- Deliver the best query performance. That’s because the data cached in the model is optimized for analytic queries (filter, group, and summarize) and the model is stored entirely in memory.
In short, import models offer you the most options and design flexibility, and they deliver fast performance. For this reason, Power BI Desktop defaults to use import storage mode when you “Get data.”
What are the limitations of Import models?
- Model size. Power BI imposes dataset size restrictions, which limit the size of a model.
- Data refresh. Imported data must be periodically refreshed. If you prefer data to always be up-to-date, use DirectyQuery or set up a real-time data set.
What are the size limits to Power BI datasets, and how can you reduce their size?
When you publish the model to a shared capacity, there’s a 1-GB limit per dataset. When this size limit is exceeded, the dataset will fail to refresh. When you publish the model to a dedicated capacity (also known as Premium capacities), it can grow beyond 10-GB, providing you enable the Large dataset storage format setting for the capacity.
You should always strive to reduce the amount of data stored in tables. This strategy helps to reduce the duration of model refreshes and speed up model queries. There are numerous data reduction techniques that you can apply, including:
- Remove unnecessary columns
- Remove unnecessary rows
- Group by and summarize to raise the grain of fact tables
- Optimize column data types with a preference for numeric data
- Preference for custom columns in Power Query instead of calculated columns in the model
- Disable Power Query query load
- Disable auto date/time
- Use DirectQuery table storage, as described in later units of this module.
What are the limits on data refresh of Import models, and what might determine if those limits are acceptable?
Imported data must be periodically refreshed. Dataset data is only as current as the last successful data refresh. To keep data current, you set up scheduled data refresh, or report consumers can perform an on-demand refresh.
Power BI imposes limits on how often scheduled refresh operations can occur. It’s up to eight times per day in a shared capacity, and up to 48 times per day in a dedicated capacity.
You should determine whether this degree of latency is tolerable. It often depends on the velocity (or volatility) of the data, and the urgency to keep users informed about the current state of data. When scheduled refresh limits aren’t acceptable, consider using DirectQuery storage tables, or creating a hybrid table. Or take a different approach, and create a real-time dataset instead.
What are incremental refreshes of Import models, and what their benefits?
You must also consider refresh workload and duration. By default, to refresh a table, Power BI removes all data and reloads it again. These operations can place an unacceptable burden on source systems, especially for large fact tables. To reduce this burden, you can set up the incremental refresh feature. Incremental refresh automates the creation and management of time-period partitions, and intelligently update only those partitions that require refresh.
When your data source supports incremental refresh, it can result in faster and more reliable refreshes, and reduced resource consumption of Power BI and source systems.
What are source groups?
A source group is a set of model tables that relate to a data source. There are two types:
- Import – Represents all import storage mode tables including calculated tables. There can only be one import source group in a model.
- DirectQuery – Represents all DirectQuery storage mode tables that relate to a specific data source.
What are some benefits of DirectQuery models?
- They can handle large data source. Because DirectQuery tables don’t require refresh, they’re well suited to large data stores, like a data warehouse. It’s impractical and inefficient, if not impossible, to import an entire data warehouse into a model.
- They handle fast-changing data well. When the source data changes rapidly and users need to see current data, a DirectQuery model can deliver near real-time query results.
- They can enforce source RLS (row-level security). Instead of replicating RLS rules in your Power BI model, the source data base can enforce its rules. This approach works only for some relational databases, and it involves setting up single sign-on for the dataset data source.
- Data security/sovereignty restrictions. If your organization has security policies that restrict data leaving their premises, then it isn’t possible to import data. A DirectQuery model that connects to an on-premises data source may be appropriate. (You can also consider installing Power BI Report Server for on-premises reporting.)
- You can also create specialized datasets, by making a local copy of a remote DirectQuery relational model, and chain them together. This allows you to rename elements in the remote model, etc. However, note that this will turn your model into a composite model.
What are some limitations of DirectQuery models?
- Not all data sources are supported. Typically, only major relational database systems are supported. Power BI datasets and Azure Analysis Services models are supported too.
- All Power Query (M) transformations are not possible, because these queries must translate to native queries that are understood by source systems. So, for example, it’s not possible to use pivot or unpivot transformations.
- Analytic query performance can be slow, especially if source systems aren’t optimized (with indexes or materialized views), or there are insufficient resources for the analytic workload.
- Analytic queries can impact on source system performance. It could result in a slower experience for all workloads, including OLTP operations.