Prepare - Get data Flashcards
What is a query?
A query can either be a request for data results from your database or for action on the data, or for both.
What is Power Query?
Power Query is the query engine used by Power BI and Excel (i.e., it is the data connectivity and data transformation engine/capability).
Please explain the 4 stages of getting data?
- Connection Settings (Data Source Path)
Most connectors require at least one parameter to initialise a connection to the data source. - Authentication
Every single connection that’s made in Power Query has to be authenticated and these authentication methods vary from connector to connector, with some connectors having multiple different methods of authentication. - Data Preview
User friendly way to select and preview your data. Depending on the connector that you use this is either done through the navigator window or table preview dialog box:
a) Navigator window = object selection pane and data preview pane
b) Table preview dialog box - Query Destination (Load)
This is the stage where you specify where to load the query. The options to load change from connection to connection but loading data into Power Query always remains the same
What challenges does Power Query solve?
Connecting to different data sources – connects to a wide range of data sources
Data needs shaping before consumption – Power BI allows you to iteratively shape you queries until they are ready for consumption + track steps
Volume (data sizes), velocity (rate of change), and variety (breadth of data sources and data shapes) = a) uses subset to transform to manageable size b) scheduled refresh c) hundreds of data sources and over 350 transformations
What are the advantages of Power Query?
- Graphical interface with prebuilt transformation functions (included in Home, transform, add column)
- No need to write as steps in Power Query are already translated into the M code for the desired transformation
Power BI automates visualisations - Each step in Power Query is recorded
- Each time the Query connects to the data source it automatically applies steps to shape your data in the way you specified
Name the 5 sections of Power Queries GUI?
Ribbon: the ribbon navigation experience, which provides multiple tabs to add transforms, select options for your query, and access different ribbon buttons to complete various tasks.
Queries pane: a view of all your available queries.
Current view: your main working view, that by default, displays a preview of the data for your query. You can also enable the diagram view along with the data preview view. You can also switch between the schema view and the data preview view while maintaining the diagram view.
Query settings: a view of the currently selected query with relevant information, such as query name, query steps, and various indicators.
Status bar: a bar displaying relevant important information about your query, such as execution time, total columns and rows, and processing status. This bar also contains buttons to change your current view.
Explain the different data connection settings?
Need to firstly establish the connection to a data source, some connectors require parameters some do not at all.
Excel only requires giving the file path through folder selection
Singleton connectors = do not require any parameters and have only one data source path
Connectors = data source connection
What are the current methods of authentication for Power Query?
Anonymous: Commonly used when connecting to a data source that doesn’t require user authentication, such as a webpage or a file available over public HTTP.
Basic: A username and password sent in base64 encoding are accepted for authentication.
API Key: A single API key is accepted for authentication.
Organizational account or Microsoft account: This method is also known as OAuth 2.0.
Windows: Can be implicit or explicit.
Database: This is only available in some database connectors.
What are the file location options?
Local Server
Local - no link is created instead a new dataset is created in Power BI and the Excel file is loaded into it (i.e. changes in the Excel are not reflected in the Power BI)
Cloud
OneDrive - the Excel and Power BI files remain synchronised with changes in the dataset pulled through into the Power BI (BONUS: must use “Keep me signed in” for personal)
SharePoint - similar updates as OneDrive but you must connect through the URL root
When to use different file locations?
Local = no changes, snapshot in time
Cloud = pull through new changes when refreshed
How can you change a data source?
Data source settings
Query settings
Advanced Editor
What are relational databases?
Relational databases are a type of database that store and organize data points with defined relationships for fast access.
Multiple tables = relational database, data is organized into tables that hold information about each entity and represent pre-defined categories through rows and columns.
Efficient = structuring data this way makes it efficient and flexible to access, which is why relational databases are most common.
What are the characteristics of a relational database?
Relational databases
- Multiple tables where one table can be related to another
- Represented by Schema
- Contains tables, entities (row) , attributes (features/dimensions along the entity) and relationships (one to many etc…)
- Each column within an entity will only allow a certain data type or format to be entered
What are the pros and cons of a relational database?
- Advantages = Reduced data redundancy and inconsistency, shared, centralised security
- Disadvantages = takes lots of time to set up
What is SQL?
Relational databases are also built to understand Structured Query Language (SQL), a standardized programming language which is used to store, manipulate, and retrieve data.
Name the options when connecting to a relational database?
(1) Connecting to the SQL database
Server name
Database name
Data connectivity mode
Import (selected by default, recommended, mostly select import)
DirectQuery.
Advanced options
Import with SQL Statement
(2) 3 Sign in options
Windows - Use your Windows account (Azure Active Directory credentials).
Database - Use your database credentials. For instance, SQL Server has its own sign-in and authentication system that is sometimes used. If the database administrator gave you a unique sign-in to the database, you might need to enter those credentials on the Database tab.
Microsoft account - Use your Microsoft account credentials. This option is often used for Azure services.
Name a common troubleshoot with data sources?
Permissions - authentication may change, passwords may be automatically changed after 90 days, you will need to edit or change permissions
What is an online service?
An online service is a range of software applications, including SharePoint, Asana, GitHub etc
Name the steps involved with connecting to an online application?
Parameters:
Choose the application data to import
Enter the URL (Select the list that you want to load into Power BI Desktop.)
Authentication:
Windows - Use your Windows account (Azure Active Directory credentials).
Anonymous.
Microsoft account - Use your Microsoft account credentials. This option is often used for Azure services.
How to get data from a NoSQL database?
Preview Connector window: enter your database credentials.
You can specify the Azure Cosmos DB account endpoint URL that you want to get the data from (you can get the URL from the Keys blade of your Azure portal).
Alternatively, you can enter the database name, collection name or use the navigator to select the database and collection to identify the data source.
If you are connecting to an endpoint for the first time, as you are in this example, make sure that you enter your account key. You can find this key in the Primary Key box in the Read-only Keys blade of your Azure portal.
How do you import a JSON file?
Need to normalise it and expand out the records
What are the characteristics of a flat file database?
- Single table (excel can contain multiple tabs) with no relation to other tables
- Represented by Data Dictionary
- Contains, files, records (row), fields (column), characters
- No rules around data entry
What are the pros and cons of a flat file database?
- Advantages = simple to use, inexpensive
- Disadvantages = Data redundancy and inconsistency
What is storage mode?
In Microsoft Power BI Desktop, you can specify the storage mode of a table, after you have connected and shaped your query in Power Query.
The storage mode lets you control whether Power BI Desktop caches table data in-memory for reports.
Reports query the data model (stored) and the data model is imported by the query