Remaining PL-300 Learning Paths Flashcards
Dashboards
A single-page collection of visuals built from a report (created in Power BI Service!)
Tiles
Snapshots of your data on a dashboard
Buildings Blocks of Power BI
Visualisations, Datasets, Reports, Dashboards, Tiles
Where to update dataset for an app or other data that you use in Power BI
Power BI Service
Flat file
This ia type of file that has only one data table and every row of data is in the same structure. The file doesn’t contain hierarchies. Examples are csv files, txt files.
Cons of Importing Data by SQL Query
- It is best practice to avoid doing this directly in Power BI. Instead, consider writing a query like this in a view. A view is an object in a relational database, similar to a table.
- Views have rows and columns and can contain almost every operator in the SQL language.
- If Power BI uses a view, when it retrieves data, it participates in query folding, a feature of Poewr Query.
A NoSQL database (also referred to as non-SQL, not only SQL ornon-relational) is a ________
flexible type of database that doesn’t use tables to store data.
If you’re working with data stored in JSON format, it’s often necessary to
extract and normalize the data first. This is because JSON data is often stored in a nested or unstructured format, which makes it difficult to analyze or report on directly.
The three different types of storage modes you can choose from:
- Import: allows you to create a local Power BI copy of your datasets from your data source. You can use all Power BI service features with this storage mode, including Q&A and Quick Insights. Data refreshes can be scheduled or no-demand. Import mode is the default for creating new Power BI reports
- DirectQuery: useful when you don’t want to save local copies of your data because your data won’t be cached. Instead, you can query the specific tables that you’ll need by using Power BI queries, and the required data will be retrieved from the underlying data source. Essentially, you’re creating a direct connection to the data source. Using this model ensures that you’re always viewing the most up to date data, and that all security requirements are satisfied. This mode is suited for when you have large datasets to pull data rom. Instead of slowing down performance by having to load large amounts of data into Power BI, you can use DirectQuery to create a connection to the source, solving data latency issues as well.
- Dual (Composite): identify some data to be directly imported and other data that must be queries. Any table that is brought in to your report is a product of both Import and DirectQuery modes. Using the Dual mode allows Power BI to choose the most efficient form of data retrieval.
As previously mentioned, you use theGet datafeature in Power BI Desktop. When you selectAnalysis Services, you’re prompted for the server address and the database name with two options:
Import andConnect live.
Connect liveis an option for Azure Analysis Services. Azure Analysis Services uses the tabular model and DAX to build calculations, similar to Power BI. These models are compatible with one another. Using the Connect live option helps you keep the data and DAX calculations in their original location, without having to import them all into Power BI.
The performance in Power Query depends on
the performance at the data source level.
The query folding within Power Query Editor helps you ______
increase the performance of your Power BI reports. The reason for implementing this process is to ensure that these transformations can take place in the original data source server and don’t overwhelm Power BI computing resources
Benefits to Query Folding
- More efficiency in data refreshes and incremental refreshes.
Automatic compatibility with DirectQuery and Dual storage modes.
Another tool that you can use to study query performance is
query diagnostics. You can determine what bottlenecks may exist while loading and transforming your data, refreshing your data in Power Query, running SQL statements in Query Editor, and so on.
Other ways to optimize query performance in Power BI include:
-Process as much data as possible in the original data source.
-Use native SQL queries.When using DirectQuery for SQL databases, such as the case for our scenario, make sure that you aren’t pulling data from stored procedures or common table expressions (CTEs).
- Separate date and time, if bound together.
Relational source systems often have many people who are concurrently using the same data in the same database. Some relational systems and their administrators seek to limit a user from monopolizing all hardware resources by setting a query timeout. This error indicates that you’ve pulled too much data according to your organisation’s policies. What will Power BI show?
Query Timeout Expired
Solution to Query Timeout Expired
You can resolve this error by pulling fewer columns or rows from a single table. While you’re writing SQL statements, it might be a common practice to include groupings and aggregations. You can also join multiple tables in a single SQL statement. If you need the rows, columns, and complexity, consider taking small chunks of data and then bringing them back together by using Power Query.
Couldn’t find file
Usually this error is caused by the file moving locations or the permissions to the file changing. If the cause is the former, you need to find the file and change the source settings.
Sometimes, when you import data into Power BI, the columns appear blank. This situation happens because of an error in interpreting the data type in Power BI. The resolution to this error is unique to the data source.
Data type errors
Use the Pivot Column feature to convert your flat data into _____
a table that contains an aggregate value for each unique value in a column.
When you import a table from any data source, Power BI Desktop automatically starts scanning the first __________
1000 rows and tries to detect the type of data in the columns. Solution to this is to change the profiling status in Power Query Editor
Implication of Incorrect Data Types
- It will prevent you from creating certain calculations, deriving hierarchies or creating proper relationships with other tables
- Another issue with having an incorrect data type applied on a date field is the inability to create a date hierarchy, which would allow you to analyze your data on a yearly, monthly, or weekly basis.
Benefits of Combine Multiple Tables into a Single Table
- allows you to append or merge different tables or queries together
When to combine multiple tables into a single table
- Too many tables exist, making it difficult to navigate an overly-complicated data model
- Several tables have a similar role
- A table has only a column or two that can fit into a different table
- You want to use several columns from different tabels in a custom column
You can combine the tables in 2 different ways: merging and appending
Power Query Editor determines data anomalies by using the
Column Distributionfeature.
To understand data anomalies and statistics, select the
Column Distribution,Column Quality, andColumn Profileoptions. The following figure shows the statistics that appear.
Column qualityshows you the
percentages of data that is valid, in error, and empty. In an ideal situation, you want 100 percent of the data to be valid.
Column distributionshows you the
distribution of the data within the column and the counts of distinct and unique values, both of which can tell you details about the data counts.
Distinct vs Unique
distinctin this table tells you the total count of how many values are present, whileuniquetells you how many of those values only appear once.
Column Profile
gives you a more in-depth look into the statistics within the columns for the first 1,000 rows of data. This column provides several different values, including the count of rows, which is important when verifying whether the importing of your data was successful.
Value distributiongraph tells you the ___________
counts for each distinct value in that specific column.
Column Statisticswill also include
how many zeroes and null values exist, along with the average value in the column, the standard deviation of the values in the column, and how many even and odd values are in the column. These statistics give you an idea of the distribution of data within the column, and are important because they summarize the data in the column and serve as a starting point to determine what the outliers are.
In a star schema, each table in your dataset is defined as
a dimension or a fact table
Fact tables contain
observational or event data vaalues and can contain several repeated values
Dimension tables contain
the details about the data in fact tables
Mark as Date Table command
Power BI performs validations to ensure that the data contains zero null values, is unique and contains continuous date values over a period. It will remove autogenerated hierarchies from the Date field in the table that you marked as a date table.
PATH() and PATHITEM()
Use PATH() , a simple DAX function that returns a text version of the managerial path for each employee and PATHITEM() to separate this path into each level of managerial hierarchy. This is part of the flatten parent-child hierarchy
Define Data Granularity
Very important topic for all data analysts
- the details that is represented within your data, meaning that the more granularity your data has, the greater the level of detail within your data
- Defining the correct data granularity can have a big impact on the performance and usability of your Power BI reports and visuals
- can also have an impact when you’re building relationships between tables in Power BI