Modeling the Data Flashcards
By using DAX, you can add 3 types of calculations to your data model:
- Calculated tables
- Calculated columns
- Measures
How can DAX be used to enforce filters over model tables
By using RLS
The formula can duplicate or transform existing model data or create a series of data to produce a new table and always imported into your model?
Calculated tables
Cons of Calculated Tables
Increases model storage size and can prolong data refresh time
Calculated tables are useful in
- Date tables
- Role playing dimensions
- What-if analysis
What table is required to apply special time filters known as time inteligence
Date tables
When your source data doesnt include a date table, you can create one as calculated tables by using the
CALENDAR or CALENDARAUTO function
In this example, the MinDate and MaxDate in the data model are July 1, 2010 and June 30, 2011.
CALENDARAUTO() will return
all dates between January 1, 2010 and December 31, 2011.
In this example, the MinDate and MaxDate in the data model are July 1, 2010 and June 30, 2011.
CALENDARAUTO(3) will return
CALENDARAUTO(3) will return all dates between April 1, 2010 and March 31, 2012.
*Hint: July(7)-3 = Apr(4)
June(6) -3 = March(3)
What-if parameters allow report users to
select or filter by values that are stored in the calculated table. For example, awhat-ifparameter could allow the report user to select a hypothetical currency exchange rate, and a measure could divide revenue values (in a local currency) by the selected rate.
Formula is evaluated for each table row and it returns a single value
Calculated Columns
Concerned with achieving summarisation over model data and returns a single value.
Measures
_____ are evaluated at query time and their results are never stored in the model
Measures
Compound Measure
When a measure references one or more measures, it’s known as compound measure
Implicit vs Explicit Measures
- Explicit: model calculations that are written in DAX
- Implicit measures: columns that can be summarised by visuls in simplistic ways like counts, sum, minimum, maximum, etc. Implicit measures are noted by $\sum$
The DIVIDE function
TheDIVIDEfunction automatically handles division by zero cases. If an alternate result isn’t passed in, and the denominator is zero or BLANK, the function returns BLANK.
You’re developing a data model in Power BI Desktop. You’ve just added a date table by using the CALENDARAUTO function. You’ve extended it with calculated columns, and you’ve related it to other model tables. What else should you do to ensure that DAX time intelligence calculations work correctly?
A. Add time intelligence measures to the date table
B. Mark as a Date table
C. Add a fiscal hierarchy
D. Add a date column
Mark as a Date table so that Power BI can correctly filter its dates.
Difference between calculated column and a measure
- Calculated columns extend a table with a new column, while measures define how to summarize model data.
- Calculated columns are evaluated by usingrow contextat data refresh time, while measures are evaluated by usingfilter contextat query time.
- Calculated columns (in Import storage mode tables) store a value for each row in the table, but a measure never stores values in the model.
- Calculated columns (like any column) can be used to filter, group, or summarize (as an implicit measure), whereas measures are designed to summarize
Row context
The formula for a calculated column is evaluated for each table row.
- Row context doesn’t extend beyond the table. If your formula needs to reference columns in other tables, you have 2 options
- f the tables are related, directly or indirectly, you can use the[RELATED]
or
[RELATEDTABLE]DAX function. The
RELATEDfunction retrieves the value at the one-side of the relationship, while the
RELATEDTABLEretrieves values on the many-side. The
RELATEDTABLE`function returns a table object.
- f the tables are related, directly or indirectly, you can use the[RELATED]
- When the tables aren’t related, you can use the
LOOKUPVALUE
Using DAX Time Intelligence Functions …
simplifies the task of modifying date filter context
Date Table Requirement:
- It must have a column of data type Date (or date/time), known as thedate column.
- The date column must contain unique values.
- The date column must not contain BLANKs.
- The date column must not have any missing dates.
- The date column must span full years. A year isn’t necessarily a calendar year (January-December).
- The date table must be indicated as a date table
DATESYTD
Returns a single-column table that contains dates for the year-to-date (YTD) in the current filter context.
DATESBETWEEN
Returns a table that contains a column of dates that begins with a given start date and continues until a given end date.
DATESINPERIOD
Returns a table that contains a column of dates that begins with a given start date and continues for the specified number of intervals.
Define Rules
Row context means the expression is evaluated for each row using the column values of that row. When the expression returns TRUE, the user can “see” the row. You can define rules that are either static or dynamic
‘Region’[Region] = “Midwest”
Static Rules
Dynamic Rules
uses specific DAX functions that return environmental values
USERPRINCIPALNAME
USERPRINCIPALNAME function always returns the user in the user principal name format.
Object level security (OLS) can restrict…
access to specific tables and columns and secure their metadata
Restrictions in OLS
- You cannot mix RLS and OLS in the same role
- If you need to apply RLS and OLS in the same model, create separate roles dedicated to each type.
Good Modeling Practices on the number of datasets, number of roles, rules, model design, USERPRINCIPALNAME/USERNAME
- Strive to define fewer datasets (data models) with well-designed roles.
- Strive to create fewer roles by using dynamic rules. A data-driven solution is easier to maintain because you don’t need to add new roles.
- When possible, create rules that filter dimension tables instead of fact tables. It will help to deliver faster query performance.
- Validate that the model design, including its relationships and relationship properties, are correctly set up.
- Use the
USERPRINCIPALNAME
function instead ofUSERNAME
function. It provides consistency when validating the roles in Power BI Desktop and the Power BI service. - Rigorously validate RLS and OLS by testing all roles.
- Ensure that the Power BI Desktop data source connection uses the same credentials that will be applied when set up in the Power BI service.
Star Schema Design
- presents a user-friendly structure and supports high performance analytic queries
- it classifies model tables as either fact or dimension.
- The fact table forms the center of a star and the dimension table represents the points of the star
Fact Tables
- store an accumulation of rows that represent observations or event sthat record a specific business activity
- It is not GCP to relate a fact table directly to another fact table
Dimension Tables
- describe your business entities
- The columns in dimension tables allow filtering and grouping of fact table data
- Each dimension table must have a unique column which is referred to as its key column
- Dimension tables can be used to filter multiple fact tables and fact tables can be filtered by multiple dimension tables.
Analytic Queries
produces a result from a data model. Each power bi visual in the background submits an analytic query to Power BI to query the model
An analytic query has 3 phases that are implemented in the following order:
- Filter
- Group: divides query results into groups
- Summarise: produces a single value result. Typically numeric columns are summarised by using summarisation methods. These methods are simple summarisation.
3 Different Model Resources that are fields to configure visuals which in the background configures an analytic query:
- Columns: Filter, Group and Summarise
- Hierarchy levels: Filter and Group
- Measures: Filter and Summarise (cannot be used to group data)
Table Storage Mode:
- Import– Queries retrieve data that’s stored, or cached, in the model.
- DirectQuery– Queries pass through to the data source.
- Dual– Queries retrieve stored data or pass through to the data source. Power BI determines the most efficient plan, striving to use cached data whenever possible.