Prepare - Get data Flashcards

1
Q

What is a query?

A

A query can either be a request for data results from your database or for action on the data, or for both.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Power Query?

A

Power Query is the query engine used by Power BI and Excel (i.e., it is the data connectivity and data transformation engine/capability).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Please explain the 4 stages of getting data?

A
  1. Connection Settings (Data Source Path)
    Most connectors require at least one parameter to initialise a connection to the data source.
  2. Authentication
    Every single connection that’s made in Power Query has to be authenticated and these authentication methods vary from connector to connector, with some connectors having multiple different methods of authentication.
  3. Data Preview
    User friendly way to select and preview your data. Depending on the connector that you use this is either done through the navigator window or table preview dialog box:
    a) Navigator window = object selection pane and data preview pane
    b) Table preview dialog box
  4. Query Destination (Load)
    This is the stage where you specify where to load the query. The options to load change from connection to connection but loading data into Power Query always remains the same
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What challenges does Power Query solve?

A

Connecting to different data sources – connects to a wide range of data sources
Data needs shaping before consumption – Power BI allows you to iteratively shape you queries until they are ready for consumption + track steps
Volume (data sizes), velocity (rate of change), and variety (breadth of data sources and data shapes) = a) uses subset to transform to manageable size b) scheduled refresh c) hundreds of data sources and over 350 transformations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the advantages of Power Query?

A
  • Graphical interface with prebuilt transformation functions (included in Home, transform, add column)
  • No need to write as steps in Power Query are already translated into the M code for the desired transformation
    Power BI automates visualisations
  • Each step in Power Query is recorded
  • Each time the Query connects to the data source it automatically applies steps to shape your data in the way you specified
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Name the 5 sections of Power Queries GUI?

A

Ribbon: the ribbon navigation experience, which provides multiple tabs to add transforms, select options for your query, and access different ribbon buttons to complete various tasks.
Queries pane: a view of all your available queries.
Current view: your main working view, that by default, displays a preview of the data for your query. You can also enable the diagram view along with the data preview view. You can also switch between the schema view and the data preview view while maintaining the diagram view.
Query settings: a view of the currently selected query with relevant information, such as query name, query steps, and various indicators.
Status bar: a bar displaying relevant important information about your query, such as execution time, total columns and rows, and processing status. This bar also contains buttons to change your current view.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Explain the different data connection settings?

A

Need to firstly establish the connection to a data source, some connectors require parameters some do not at all.
Excel only requires giving the file path through folder selection
Singleton connectors = do not require any parameters and have only one data source path
Connectors = data source connection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the current methods of authentication for Power Query?

A

Anonymous: Commonly used when connecting to a data source that doesn’t require user authentication, such as a webpage or a file available over public HTTP.
Basic: A username and password sent in base64 encoding are accepted for authentication.
API Key: A single API key is accepted for authentication.
Organizational account or Microsoft account: This method is also known as OAuth 2.0.
Windows: Can be implicit or explicit.
Database: This is only available in some database connectors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the file location options?

A

Local Server
Local - no link is created instead a new dataset is created in Power BI and the Excel file is loaded into it (i.e. changes in the Excel are not reflected in the Power BI)

Cloud
OneDrive - the Excel and Power BI files remain synchronised with changes in the dataset pulled through into the Power BI (BONUS: must use “Keep me signed in” for personal)
SharePoint - similar updates as OneDrive but you must connect through the URL root

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When to use different file locations?

A

Local = no changes, snapshot in time

Cloud = pull through new changes when refreshed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can you change a data source?

A

Data source settings
Query settings
Advanced Editor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are relational databases?

A

Relational databases are a type of database that store and organize data points with defined relationships for fast access.

Multiple tables = relational database, data is organized into tables that hold information about each entity and represent pre-defined categories through rows and columns.

Efficient = structuring data this way makes it efficient and flexible to access, which is why relational databases are most common.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the characteristics of a relational database?

A

Relational databases

  • Multiple tables where one table can be related to another
  • Represented by Schema
  • Contains tables, entities (row) , attributes (features/dimensions along the entity) and relationships (one to many etc…)
  • Each column within an entity will only allow a certain data type or format to be entered
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the pros and cons of a relational database?

A
  • Advantages = Reduced data redundancy and inconsistency, shared, centralised security
  • Disadvantages = takes lots of time to set up
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is SQL?

A

Relational databases are also built to understand Structured Query Language (SQL), a standardized programming language which is used to store, manipulate, and retrieve data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Name the options when connecting to a relational database?

A

(1) Connecting to the SQL database
Server name
Database name

Data connectivity mode
Import (selected by default, recommended, mostly select import)
DirectQuery.

Advanced options
Import with SQL Statement

(2) 3 Sign in options
Windows - Use your Windows account (Azure Active Directory credentials).

Database - Use your database credentials. For instance, SQL Server has its own sign-in and authentication system that is sometimes used. If the database administrator gave you a unique sign-in to the database, you might need to enter those credentials on the Database tab.

Microsoft account - Use your Microsoft account credentials. This option is often used for Azure services.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Name a common troubleshoot with data sources?

A

Permissions - authentication may change, passwords may be automatically changed after 90 days, you will need to edit or change permissions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is an online service?

A

An online service is a range of software applications, including SharePoint, Asana, GitHub etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Name the steps involved with connecting to an online application?

A

Parameters:
Choose the application data to import
Enter the URL (Select the list that you want to load into Power BI Desktop.)

Authentication:
Windows - Use your Windows account (Azure Active Directory credentials).
Anonymous.
Microsoft account - Use your Microsoft account credentials. This option is often used for Azure services.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How to get data from a NoSQL database?

A

Preview Connector window: enter your database credentials.
You can specify the Azure Cosmos DB account endpoint URL that you want to get the data from (you can get the URL from the Keys blade of your Azure portal).
Alternatively, you can enter the database name, collection name or use the navigator to select the database and collection to identify the data source.

If you are connecting to an endpoint for the first time, as you are in this example, make sure that you enter your account key. You can find this key in the Primary Key box in the Read-only Keys blade of your Azure portal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do you import a JSON file?

A

Need to normalise it and expand out the records

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the characteristics of a flat file database?

A
  • Single table (excel can contain multiple tabs) with no relation to other tables
  • Represented by Data Dictionary
  • Contains, files, records (row), fields (column), characters
  • No rules around data entry
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the pros and cons of a flat file database?

A
  • Advantages = simple to use, inexpensive

- Disadvantages = Data redundancy and inconsistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is storage mode?

A

In Microsoft Power BI Desktop, you can specify the storage mode of a table, after you have connected and shaped your query in Power Query.

The storage mode lets you control whether Power BI Desktop caches table data in-memory for reports.

Reports query the data model (stored) and the data model is imported by the query

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Types of storage mode?

A

Import: Queries submitted to the Power BI dataset that return data from Import tables can be fulfilled only from cached data. The Import mode allows you to create a local Power BI copy of your datasets from your data source.

Direct Query: Tables with this setting aren’t cached. Queries submitted to underlying data source in query language used by the data source. Create direct connection to the underlying data source

Dual: Tables with this setting can act as either cached or not cached, depending on the context of the query that’s submitted to the Power BI dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How can you configure storage mode?

A

In Model view you can toggle the table properties

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Name how you can optimise the configuration of the storage mode?

A

Query performance: visuals and DAX send queries to the data model. By caching data in storage mode can boost the query performance and interactivity of your reports

Large datasets: tables that aren’t cached don’t consume memory for caching purposes. Decide which tables are worth caching

Data refresh optimisation: don’t need to refresh tables that aren’t cached. Reduce refresh times by only refreshing data that needs to be refreshed.

Near real time requirements: tables with near real time requirements (e.g. instant updates) may benefit from not being cached to reduce data latency

Writeback: Writeback enables business users to explore what-if scenarios by changing cell values.

Custom applications can apply changes to the data source. Tables that aren’t cached can display changes immediately, which allows instant analysis of the effects.
Security: databases have added security features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the most common way of storing tables in Power BI?

A

The most popular way to use data in Power BI is to import it into a Power BI dataset. Importing the data means that the data is stored in the Power BI file and gets published along with the Power BI reports. This process helps make it easier for you to interact directly with your data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What are the constraints on DirectQuery?

A

Limited DAX functions and some M language transformation constraints for DirectQuery

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is Azure Analysis Services?

A

Azure Analysis Services is a fully managed platform as a service (PaaS) that provides enterprise-grade data models in the cloud.

The data model provides an easier and faster way for users to perform ad hoc data analysis using tools like Power BI and Excel.

Use advanced mashup and modelling features to combine data from multiple data sources, define metrics, and secure your data in a single, trusted tabular semantic data model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What can you do with Azure Analysis services?

A

Azure Analysis Services is an Azure product that allows you to:
• ingest data from multiple data sources
• build relationships between the data
• creates calculations on the data (the calculations are built using data analysis expressions (DAX))
•Azure Analysis Services is similar to the data modelling and storage technology in Power BI.

Uses the tabular model and DAX to build calculations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

How to connect to data in Azure Analysis Services?

A

Get data -> Azure -> Azure analysis services database
Sever and database parameters
Connection option (Connect live, Import, Mixed/Direct Query)
Authentications (Windows, basic, Microsoft account)
Load

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What are the similarities and differences between Azure Analysis Services and SQL?

A

Similarities to SQL server
Authenticate to the server.
Pick the cube you want to use.
Select which tables you need.

Differences to SQL server
Analysis Services cubes have calculations already in the cube, which will be discussed in more detail later.
If you don’t need an entire table, you can query the data directly. Instead of using Transact-SQL (T-SQL) to query the data, like you would in SQL Server, you can use multi-dimensional expressions (MDX) or data analysis expressions (DAX).
If you don’t need an entire table use MDX or DAX to query the data directly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What are the connection options for Azure?

A

Import
Likely to directly import the data into Power BI
However, we recommended you use live connections because Import mode does have some limitations; most notably, server performance might be impacted during import

Connect Live
Helps keep the data and DAX calculations in their original location without having to import them into Power BI
Refreshed in the service so fast refresh schedule so no need to initiate a Power BI refresh schedule
Improves Timeliness – above point
Best practice is to import all the data into the Azure Analysis Services Model into Power BI and use a live connection. Using this approach the data modelling and Dax measures are all performed in one place which is easier to maintain your solution

Similar to SQL databases you can choose which tables you are able to query the Azure Analysis services model using DAX or MDX

35
Q

What are import errors?

A

When attempting to import data into Power BI there can be a number of error messages that pop up

36
Q

What is Query timeout expired?

A

Relational source systems often have many people who are concurrently using the same data in the same database. Some relational systems and their administrators seek to limit a user from monopolizing all hardware resources by setting a query timeout. These timeouts can be configured for any timespan, from as little as five seconds to as much as 30 minutes or more.

37
Q

What is Power BI Query Error: Timeout expired?

A

This error indicates that you’ve pulled too much data according to your organization’s policies. Administrators incorporate this policy to avoid slowing down a different application or suite of applications that might also be using that database.

Solutions: a) pull fewer columns or rows from a single table b) use groups and aggregations in SQL c) join multiple tables in one SQL statement

38
Q

Name the solution for: We couldn’t find any data as a table?

A

Power BI expects to find data formatted as a table from Excel.

Solution: format excel as table

39
Q

Name the solution for: Could not find file?

A

This is caused when a) moving files or b) renaming files or worksheets

Solution: find the step that is causing the problem in Query Settings

Usually, this error is caused by the file moving locations or the permissions to the file changing. If the cause is the former, you need to find the file and change the source settings.
Open Power Query by selecting the Transform Data button in Power BI.
Highlight the query that is creating the error.
On the left, under Query Settings, select the gear icon next to Source.

40
Q

Name the solution for: Data type errors?

A

Sometimes, when you import data into Power BI, the columns appear blank. This situation happens because of an error in interpreting the data type in Power BI.

Solution: The resolution to this error is unique to the data source OR get SQL to change type

41
Q

Explain Power BI’s connection capabilities?

A

There is over 100 connectors and growing

Most common are files, database and web services

42
Q

How to connect to a folder?

A

Get data > Folder > Combine & Transform

Format of the files needs to be consistent (e.g., CSV)

Tables need to have the same format (e.g., sheet1)

Target queries and helper queries support the folder data connection

43
Q

What is the query dependency view?

A

Data sources and queries that are dependent on one another

There is a top-bottom layout (which you can toggle)

44
Q

Name the different approaches to change the data source path?

A
Query settings (cog)
 - Change settings at the individual query level

Data source settings

  • Change the source for all queries at the same time
  • May need to edit and clear permissions
45
Q

What are permissions?

A

Credential and privacy levels required

46
Q

What do you need to connect to a dataset or contribute to a workspace?

A

Build permission

47
Q

What are some of the limitations of shared datasets in the Power BI Service?

A
  • Missing or greyed out buttons
  • No Power Query
  • No data view
  • Cannot change data source settings
  • Report-level measures will not be shared
48
Q

What is exclusively Import storage mode?

A
  • Data from files and folders
49
Q

What are the connection options for databases?

A
  • Import Mode: caches it and can use all of the modelling features
  • DirectQuery: original data source is queried every time you interact with a visual

However not all databases support DirectQuery

50
Q

What is Live Connection? Name some of the characteristics.

A
  • A special type of DirectQuery for Analysis Services (data models) and Power BI Service (datasets)
  • It has tabular and multidimensional data
  • All calculations take place in the data model
51
Q

Explain what Import Mode is?

A
  • Import mode loads a copy into the .pbix file
  • The data is loaded into .pbix file in a compressed state
  • Power BI is built on in-memory columnar database engine
  • Consumes RAM and disk-space because it is saved in the .pbix file
52
Q

Explain the different ways Import mode could use resource?

A
  • Development Machine: uses RAM and disk-space on machine

- Publish: RAM and disk-space on the server

53
Q

What is one of the drawbacks of Import mode?

A
  • Can’t load more data than Power BI or your hardware allows (e.g., limited import restrictions)
54
Q

What is the storage technology used by Power BI?

A
  • VertiPaq Compression Technology
55
Q

What does data compression depend on?

A

Depends on many factors including:

  • data type
  • values
  • cardinality of columns
56
Q

What is the advantage of Import mode?

A
  • Use all functionality of Power BI: all Power Query transformations, DAX functions on models
  • Can combine import data from different sources (e.g., database and excel)
  • Speed of queries: little to no latency when accessing in-memory data
  • Can see Data and Model views
57
Q

What is DirectQuery? Explain the process.

A
  • No data is cached in Power BI
  • All data (apart from metadata) remains at the data source
  • Select the entities to connect to (these become tables in the data model)
58
Q

What data is stored with DirectQuery?

A
  • Metadata: column + table names + data types + relationships between tables
59
Q

What are the advantages of DirectQuery?

A
  • .pbix file is smaller
  • Not limited by hardware of development machine or virtual server
  • Can use Model view
60
Q

What are the drawbacks of DirectQuery?

A
  • Query performance slower: latency can be a couple seconds at best
  • Cannot see Data view
  • Virtual server could time out: takes too long or too many people using the server
61
Q

How is DirectQuery different to Live Connection?

A

DirectQuery has implications on functionality:

Report performance

  • Depends on underlying data source hardware: a) not quicker than .pbix b) could event time out
  • May not be able to handle demand: lots of users sending queries from visuals to data source at once

Not every query type usable
- Not possible to use native queries with a) Common Table Expressions b) Stored procedures

Limited data transformation functionality

  • Transformation limited: transformation steps performed every time a visual sends a query (compared to once every refresh with import)
  • “Not supported in DirectQuery mode”: only most efficient transforms used in DirectQuery

Data modelling limitations

  • No date tables which are required for Time Intelligence calculations (date/time columns in Import mode by default)
  • Calculate columns limited: a) only use current row of table on many-to-one relationships (this rules out data aggregation functions)
  • Only Calculate column functions that can be translated into native language supported
62
Q

What are the advantages of Live Connection?

A
  • Enhanced data modelling capabilities
  • Improved security features at data source: security always considers username of user viewing the report (unlike databases)
  • Can refresh as frequently as needed (Power BI Service limited to 8 or 48 with Premium)
63
Q

What is Composite mode? Explain the characteristics.

A
  • A mixture of import and DirectQuery
  • Dual mode is dynamic depending on the storage mode of other tables: Composite table has the ability to lookup Import and DirectQuery tables
  • If you change a Dual table to Import there is no going back
  • To use Dual you must first activate DirectQuery
64
Q

When can you use Composite mode?

A
  • When Import and DirectQuery queries are present

- DirectQuery on multiple data sources

65
Q

What are the drawbacks of Composite mode?

A
  • Potential security risk: database administrator could have access to Import data
66
Q

What ways can you check storage mode of tables?

A
  • Fields pane in the Data or Model view

- Model view > Advanced Selection > Properties

67
Q

Explain what scenarios an organisation might not choose to Import queries?

A

For the best experience you should import data but there are scenarios when you should not import:

  • Data model is too large for in-memory
  • High data velocity: need to see the most recent data (only if latency of database is quick enough)
  • Database administrators may only want to query the source to ease resource on own server
68
Q

What modelling functions are not supported by DirectQuery?

A

Intelli-Sense autocomplete feature lists the following:

  • Cacluated tables not supported by DirectQuery
  • Parent-child functions for building hierachies not avaiable (does in the database)
  • Building clusters (which relies on DAX) not supported =
69
Q

Compare the storage modes across the following dimensions:

 Max data model size
 Number of data sources
 Data refresh
 Performance
 Data transformation
 Data modelling 
 Security
A

TAKE PICTURE page 14

70
Q

Evaluate whether a client would want DirectQuery or Live Connection

A

Live Connection

  • Max data model size: same dataset limits as Import OR limited by underlying data source hardware
  • Number of sources: only 1
  • Data refresh: report shows latest data
  • Performance: best
  • Data transformation: none
  • Data modelling: Analysis Services and PBI services measures can be created with no restriction
  • Security: can leverage data source security based on current users login

Direct Query

  • Max data model size: Limited only by underlying data source hardware
  • Number of sources: unlimited
  • Data refresh: report shows latest data
  • Performance: varies depending on data source latency
  • Data transformation: transformations limited to what can be translated to data source language
  • Data modelling: highly restricted
  • Security: RLS is defined at data source but only available for some data sources
71
Q

What similarities do Microsoft Dataverse similarities have when getting data into Power BI?

A
  • All store data in entities (converts into tables in Power BI’s data model)
  • Sign in to access entities
72
Q

How do you connect to Dataverse?

A
  • Use server address

- Have permission to access data within entities

73
Q

What is a parameter?

A
  • Parameters are fed into arguments and are an easy way to easily store and manage values that can be reused
  • Its purpose is to change the output of queries depending on their values
74
Q

What are the main uses for parameters? What are the formal names?

A
  • Change the argument values for particular transforms and data source functions
  • Inputs in custom functions
  • Switching between development and custom environments
  • Configure incremental refresh
75
Q

What are the two most common ways to use parameters?

A
  • Step argument: You can use a parameter as the argument of multiple transformations driven from the user interface (UI).
  • Custom Function argument: You can create a new function from a query and reference parameters as the arguments of your custom function.
76
Q

What are the parameter properties?

A
  • Name: provide easily recognisable name that differentiates from other parameters
  • Description: help understand its purpose
  • Required: The checkbox indicates whether subsequent users can specify whether a value for the parameter must be provided.

-Type: Specifies the data type of the parameter (recommend setting up data type)
Suggested Values: Provides the user with suggestions to select a value for the Current Value from the available options:

Current value

  • Any value: The current value can be any manually entered value.
  • List of values: Provides you with a simple table-like experience so you can define a list of suggested values that you can later select from for the Current Value.
  • Query: Uses a list query (a query whose output is a list)
77
Q

How to create a parameter?

A
  • From an existing query: Right-click a query whose value is a simple non-structured constant, such as a date, text, or number, and then select Convert to Parameter.
  • You can also convert a parameter to a query by right-clicking the parameter and then selecting Convert To Query.
78
Q

Explain PBIDS files?

A
  • PBIDS files contain source connection details
  • You can export PBIDS from ‘Data source settings’
  • It is the same as connecting to a new data source in ‘Get Data’
  • Can help report creators to connect to specific datasets quicker (however you still need credentials so make sure the user has access)
79
Q

Explain what a dataflow is?

A
  • Dataflows allow you to share Power Query queries outside the Power BI or Excel app
  • Allows you to use Power Query in the Power BI Service
80
Q

How do you create a dataflow?

A

Create New Dataflow in the Power BI Service

  • Add new entities
  • add linked entities
  • import
  • create and attach
81
Q

What are some of the troubleshoots around dataflows?

A
  • Cannot create a dataflow in ‘My Workspace’
  • Needs to refresh dataflow once created otherwise won’t contain any data
  • Once refreshed you can use in Power BI desktop
82
Q

What should you consider when choosing an appropriate query?

A
  • Size of data model
  • How much the underlying dataset changes
  • Company policies (e.g., data should not be imported)
83
Q

What is an XLMA endpoint?

A
  • Allows you to connect to PBI datasets using other client tools to edit dataset
  • Only available for PBI premium and must be allowed in PBI Admin Portal