Class 4 - Preparing Data 2 Queries Flashcards
Explain the steps that go into Power Query 1? What is it?
Power Query 1:
Can use Power Query to combine data and export data into a worksheet that allows merging by multiple variables
Practice using “storesales” and “promotion” tables in “promotion.xlsx” excel file
Want to merge on multiple variables, would like to know which year each product had a promotion (merge on product and year)
Step 1: Open file, make data into excel tables
Step 2: Create a query connection to connect the two for each table. Data - Get Data - From other sources - From Table/Range, Power Query pops up. Under name tabe, select ‘close and load to”, import data pops up, only create connection (do this for 2 tables)
Step 3: Excel now recognizes it. Merge tables: data-get data- combine queries - merge
Step 4: Select fields that two datasets are merged by
Step 5: Select “Join Kind”
Step 6: Power Query Editor pops up, scroll to RHS of worksheet
Step 7: Click the arrow icon to expand the merged column, select the columns to keep
Step 8: Click “Close and Load to” - import data pops up, select only create connection
Step 9: Insert pivot-table, select from external data source, then choose connection, select the merged on
(have pivot table: filter, columns, rows, values
Year: colum, product(row) and promotion(yes/now), Quantity: Value
Explain the steps that go into Power Query 2? What is it?
When files or databases are to large for you to load into an excel spreadsheet(limited to 1 mill rows) you will connect to the files or database instead of loading into excel
Step 1: Open a blank excel worksheet
Step 2: Create query connection (different). Data - Get Data - From file. Select file to connect, select table/worksheet to connect. Select “Only Create Connection”. Repeat to connect more datasets or files.
Step 3 - 9: Exactly as before
Step 10: Save Worksheet
Step 11: Data - Refresh all
What is SQL? What are the SQL Extraction Queries and the tables in SQL?
Standard language for querying and manipulating data. Can handle big data sets (excel limited to 1 mil row)
SQL Extraction Queries: Select(Distinct), From, Inner Join, Left Join, On, Where, Group by, having, Order by
Tables in Sql: table name, attribute names, tuples/row
What is Distinct in Select (SELECT DISTINCT)
Removes duplicate values from results of query
Ensure unique values are returned
SELECT DISTINCT Category
FROM Product
What is the WHERE Clause in Select?
Filter rows based on certain conditions
Filter for a single condition, combine conditions with AND, filter in a range with BETWEEN
SELECT Price, Category
FROM Product
WHERE Price > 100
SELECT *
FROM Product
WHERE category = “Gadgets”
WHERE Price BETWEEN 20 AND 200
WHERE Price > 100 AND Category = “Gadgets”
Explain SQL Syntax
SQL not case sensitive Select and SELECT is the same
Where = filter rows, BETWEEN = filter ranges (dates)
Is Null and Is Not Null handles missing values
Use # for dates (#01/29/2021#)
Use [] for names with space
SELECT [store id], price, store_city
FROM [store sales]
WHERE Price BETWEEN 20 AND 200
AND order_date BETWEEN #06/01/2020 AND #12/31/2020#
AND Price IS NOT NULL;
Explain Order By - Sorting Query Results
Must be the last line in a query ASC is default, unless its DSC
When sorting by first column if two rows tie (same value) SQL uses second column in order by list to sort
SELECT category, price, manufacturer
FROM Product
Order By category, price
SELECT category price, manufacturer
FROM Product
ORDER BY category DESC, price
Explain Aggregation in SELECT
Aggregation function: Sum(), Count(), Min(), Max(), AVG()
Alias (AS): Used to rename result column for clarity
Count can be applied to multiple attributes, applies to duplicates
Explain Group By - Aggregation in grouping
Group rows that have the same value in a column using Agg
SELECT product, sum(quantity) AS TotalSold
FROM Purchase
WHERE data > #10/05/2020
GROUP BY product
Explain Having - Aggregation and filtering
Filters rows after aggregation (sum, avg)
Diff: WHERE filters before aggregation, HAVING filters results after aggregation
SELECT product, sum(quantity) AS TotalSold
FROM Purchase
WHERE data > #10/05/2020
GROUP BY product
HAVING Sum(quantity)>20
What is Inner Join: Show the Diagram and Code
Only matching rows in both tables.
SELECT *
FROM Students
INNER JOIN Classes
ON Students.ID= Classes.ID
What is Left Outer Join: Show the Diagram and Code
All rows from the left table + matches
SELECT *
FROM Students
LEFT OUTER JOIN Classes
ON Students.ID= Classes.ID
What is Right Outer Join: Show the Diagram and Code
All rows from the right table + matches.
SELECT *
FROM Students
RIGHT OUTER JOIN Classes
ON Students.ID= Classes.ID
What is Full Outer Join: Show the Diagram and Code
All rows from both tables, match or not.
SELECT *
FROM Students
FULL OUTER JOIN Classes
ON Students.ID= Classes.ID
LEFT ANTI JOIN
Rows in left table with no match.
SELECT *
FROM Students
FULL OUTER JOIN Classes
ON Students.ID= Classes.ID
WHERE Classes.ID IS NULL