Prepare - Get data Flashcards
What is a query?
A query can either be a request for data results from your database or for action on the data, or for both.
What is Power Query?
Power Query is the query engine used by Power BI and Excel (i.e., it is the data connectivity and data transformation engine/capability).
Please explain the 4 stages of getting data?
- Connection Settings (Data Source Path)
Most connectors require at least one parameter to initialise a connection to the data source. - Authentication
Every single connection that’s made in Power Query has to be authenticated and these authentication methods vary from connector to connector, with some connectors having multiple different methods of authentication. - Data Preview
User friendly way to select and preview your data. Depending on the connector that you use this is either done through the navigator window or table preview dialog box:
a) Navigator window = object selection pane and data preview pane
b) Table preview dialog box - Query Destination (Load)
This is the stage where you specify where to load the query. The options to load change from connection to connection but loading data into Power Query always remains the same
What challenges does Power Query solve?
Connecting to different data sources – connects to a wide range of data sources
Data needs shaping before consumption – Power BI allows you to iteratively shape you queries until they are ready for consumption + track steps
Volume (data sizes), velocity (rate of change), and variety (breadth of data sources and data shapes) = a) uses subset to transform to manageable size b) scheduled refresh c) hundreds of data sources and over 350 transformations
What are the advantages of Power Query?
- Graphical interface with prebuilt transformation functions (included in Home, transform, add column)
- No need to write as steps in Power Query are already translated into the M code for the desired transformation
Power BI automates visualisations - Each step in Power Query is recorded
- Each time the Query connects to the data source it automatically applies steps to shape your data in the way you specified
Name the 5 sections of Power Queries GUI?
Ribbon: the ribbon navigation experience, which provides multiple tabs to add transforms, select options for your query, and access different ribbon buttons to complete various tasks.
Queries pane: a view of all your available queries.
Current view: your main working view, that by default, displays a preview of the data for your query. You can also enable the diagram view along with the data preview view. You can also switch between the schema view and the data preview view while maintaining the diagram view.
Query settings: a view of the currently selected query with relevant information, such as query name, query steps, and various indicators.
Status bar: a bar displaying relevant important information about your query, such as execution time, total columns and rows, and processing status. This bar also contains buttons to change your current view.
Explain the different data connection settings?
Need to firstly establish the connection to a data source, some connectors require parameters some do not at all.
Excel only requires giving the file path through folder selection
Singleton connectors = do not require any parameters and have only one data source path
Connectors = data source connection
What are the current methods of authentication for Power Query?
Anonymous: Commonly used when connecting to a data source that doesn’t require user authentication, such as a webpage or a file available over public HTTP.
Basic: A username and password sent in base64 encoding are accepted for authentication.
API Key: A single API key is accepted for authentication.
Organizational account or Microsoft account: This method is also known as OAuth 2.0.
Windows: Can be implicit or explicit.
Database: This is only available in some database connectors.
What are the file location options?
Local Server
Local - no link is created instead a new dataset is created in Power BI and the Excel file is loaded into it (i.e. changes in the Excel are not reflected in the Power BI)
Cloud
OneDrive - the Excel and Power BI files remain synchronised with changes in the dataset pulled through into the Power BI (BONUS: must use “Keep me signed in” for personal)
SharePoint - similar updates as OneDrive but you must connect through the URL root
When to use different file locations?
Local = no changes, snapshot in time
Cloud = pull through new changes when refreshed
How can you change a data source?
Data source settings
Query settings
Advanced Editor
What are relational databases?
Relational databases are a type of database that store and organize data points with defined relationships for fast access.
Multiple tables = relational database, data is organized into tables that hold information about each entity and represent pre-defined categories through rows and columns.
Efficient = structuring data this way makes it efficient and flexible to access, which is why relational databases are most common.
What are the characteristics of a relational database?
Relational databases
- Multiple tables where one table can be related to another
- Represented by Schema
- Contains tables, entities (row) , attributes (features/dimensions along the entity) and relationships (one to many etc…)
- Each column within an entity will only allow a certain data type or format to be entered
What are the pros and cons of a relational database?
- Advantages = Reduced data redundancy and inconsistency, shared, centralised security
- Disadvantages = takes lots of time to set up
What is SQL?
Relational databases are also built to understand Structured Query Language (SQL), a standardized programming language which is used to store, manipulate, and retrieve data.
Name the options when connecting to a relational database?
(1) Connecting to the SQL database
Server name
Database name
Data connectivity mode
Import (selected by default, recommended, mostly select import)
DirectQuery.
Advanced options
Import with SQL Statement
(2) 3 Sign in options
Windows - Use your Windows account (Azure Active Directory credentials).
Database - Use your database credentials. For instance, SQL Server has its own sign-in and authentication system that is sometimes used. If the database administrator gave you a unique sign-in to the database, you might need to enter those credentials on the Database tab.
Microsoft account - Use your Microsoft account credentials. This option is often used for Azure services.
Name a common troubleshoot with data sources?
Permissions - authentication may change, passwords may be automatically changed after 90 days, you will need to edit or change permissions
What is an online service?
An online service is a range of software applications, including SharePoint, Asana, GitHub etc
Name the steps involved with connecting to an online application?
Parameters:
Choose the application data to import
Enter the URL (Select the list that you want to load into Power BI Desktop.)
Authentication:
Windows - Use your Windows account (Azure Active Directory credentials).
Anonymous.
Microsoft account - Use your Microsoft account credentials. This option is often used for Azure services.
How to get data from a NoSQL database?
Preview Connector window: enter your database credentials.
You can specify the Azure Cosmos DB account endpoint URL that you want to get the data from (you can get the URL from the Keys blade of your Azure portal).
Alternatively, you can enter the database name, collection name or use the navigator to select the database and collection to identify the data source.
If you are connecting to an endpoint for the first time, as you are in this example, make sure that you enter your account key. You can find this key in the Primary Key box in the Read-only Keys blade of your Azure portal.
How do you import a JSON file?
Need to normalise it and expand out the records
What are the characteristics of a flat file database?
- Single table (excel can contain multiple tabs) with no relation to other tables
- Represented by Data Dictionary
- Contains, files, records (row), fields (column), characters
- No rules around data entry
What are the pros and cons of a flat file database?
- Advantages = simple to use, inexpensive
- Disadvantages = Data redundancy and inconsistency
What is storage mode?
In Microsoft Power BI Desktop, you can specify the storage mode of a table, after you have connected and shaped your query in Power Query.
The storage mode lets you control whether Power BI Desktop caches table data in-memory for reports.
Reports query the data model (stored) and the data model is imported by the query
Types of storage mode?
Import: Queries submitted to the Power BI dataset that return data from Import tables can be fulfilled only from cached data. The Import mode allows you to create a local Power BI copy of your datasets from your data source.
Direct Query: Tables with this setting aren’t cached. Queries submitted to underlying data source in query language used by the data source. Create direct connection to the underlying data source
Dual: Tables with this setting can act as either cached or not cached, depending on the context of the query that’s submitted to the Power BI dataset.
How can you configure storage mode?
In Model view you can toggle the table properties
Name how you can optimise the configuration of the storage mode?
Query performance: visuals and DAX send queries to the data model. By caching data in storage mode can boost the query performance and interactivity of your reports
Large datasets: tables that aren’t cached don’t consume memory for caching purposes. Decide which tables are worth caching
Data refresh optimisation: don’t need to refresh tables that aren’t cached. Reduce refresh times by only refreshing data that needs to be refreshed.
Near real time requirements: tables with near real time requirements (e.g. instant updates) may benefit from not being cached to reduce data latency
Writeback: Writeback enables business users to explore what-if scenarios by changing cell values.
Custom applications can apply changes to the data source. Tables that aren’t cached can display changes immediately, which allows instant analysis of the effects.
Security: databases have added security features
What is the most common way of storing tables in Power BI?
The most popular way to use data in Power BI is to import it into a Power BI dataset. Importing the data means that the data is stored in the Power BI file and gets published along with the Power BI reports. This process helps make it easier for you to interact directly with your data
What are the constraints on DirectQuery?
Limited DAX functions and some M language transformation constraints for DirectQuery
What is Azure Analysis Services?
Azure Analysis Services is a fully managed platform as a service (PaaS) that provides enterprise-grade data models in the cloud.
The data model provides an easier and faster way for users to perform ad hoc data analysis using tools like Power BI and Excel.
Use advanced mashup and modelling features to combine data from multiple data sources, define metrics, and secure your data in a single, trusted tabular semantic data model.
What can you do with Azure Analysis services?
Azure Analysis Services is an Azure product that allows you to:
• ingest data from multiple data sources
• build relationships between the data
• creates calculations on the data (the calculations are built using data analysis expressions (DAX))
•Azure Analysis Services is similar to the data modelling and storage technology in Power BI.
Uses the tabular model and DAX to build calculations
How to connect to data in Azure Analysis Services?
Get data -> Azure -> Azure analysis services database
Sever and database parameters
Connection option (Connect live, Import, Mixed/Direct Query)
Authentications (Windows, basic, Microsoft account)
Load
What are the similarities and differences between Azure Analysis Services and SQL?
Similarities to SQL server
Authenticate to the server.
Pick the cube you want to use.
Select which tables you need.
Differences to SQL server
Analysis Services cubes have calculations already in the cube, which will be discussed in more detail later.
If you don’t need an entire table, you can query the data directly. Instead of using Transact-SQL (T-SQL) to query the data, like you would in SQL Server, you can use multi-dimensional expressions (MDX) or data analysis expressions (DAX).
If you don’t need an entire table use MDX or DAX to query the data directly