Intermediate SQL Flashcards
What are the aggregation functions used in SQL?
COUNT counts how many rows are in a particular column.
SUM adds together all the values in a particular column.
MIN and MAX return the lowest and highest values in a particular column, respectively.
AVG calculates the average of a group of selected values.
What does COUNT command do?
COUNT is a SQL aggregate function for counting the number of rows in a particular column.
Typing COUNT(1) has the same effect as COUNT(*). Which one you use is a matter of personal preference. True/False?
True
What does below code do?
SELECT COUNT(high)
FROM tutorial.aapl_historical_stock_price
The code will provide a count of all rows in which the high column is NOT NULL.
What does below code do?
SELECT COUNT(date) AS count_of_date
FROM tutorial.aapl_historical_stock_price
The column header for count(date) is changed from “count” to “count_of_date”.
Does COUNT command, count null values?
No
What does SUM do?
SUM is a SQL aggregate function. that totals the values in a given column.
An important thing to remember: aggregators only aggregate vertically. If you want to perform a calculation across rows, you would do this with simple arithmetic. True/False?
True
How does SUM treat nulls?
SUM treats nulls as 0.
What do MIN / MAX commands do?
MIN and MAX are SQL aggregation functions that return the lowest and highest values in a particular column.
MIN/MAX can be used on non-numerical columns. True/False, if True, how do they do it?
True.
MIN will return the lowest number, earliest date, or non-numerical value as close alphabetically to “A” as possible.
As you might suspect, MAX does the opposite—it returns the highest number, the latest date, or the non-numerical value closest alphabetically to “Z.”
What does AVG command do?
AVG is a SQL aggregate function that calculates the average of a selected group of values.
What does GROUP BY do?
GROUP BY allows you to separate data into groups, which can be aggregated independently of one another.
When is it recommended to use numbers instead of column names for group by?
It’s generally recommended to do this only when you’re grouping many columns, or if something else is causing the text in the GROUP BY clause to be excessively long
We use HAVING clause with aggregation, because WHERE doesn’t allow you to filter on aggregate columns. True/False?
True
What is SQL’s way of handling if/then logic?
CASE
The CASE statement is followed by at least one pair of WHEN and THEN statements. True/False?
True
How do we end a CASE statement?
Every CASE statement MUST end with the END statement.
The ELSE statement is optional, and provides a way to capture values not specified in the WHEN/THEN statements. True/False?
True
What does below code do?
SELECT CASE WHEN year = ‘FR’ THEN ‘FR’
WHEN year = ‘SO’ THEN ‘SO’
WHEN year = ‘JR’ THEN ‘JR’
WHEN year = ‘SR’ THEN ‘SR’
ELSE ‘No Year Data’ END AS year_group,
COUNT(1) AS count
FROM benn.college_football_players
GROUP BY 1
It counts the number of players in each year_group
What does below code do?
SELECT state,
COUNT(CASE WHEN year = ‘FR’ THEN 1 ELSE NULL END) AS fr_count,
COUNT(CASE WHEN year = ‘SO’ THEN 1 ELSE NULL END) AS so_count,
COUNT(CASE WHEN year = ‘JR’ THEN 1 ELSE NULL END) AS jr_count,
COUNT(CASE WHEN year = ‘SR’ THEN 1 ELSE NULL END) AS sr_count,
COUNT(1) AS total_players
FROM benn.college_football_players
GROUP BY state
ORDER BY total_players DESC
displays the number of players in each state, with FR, SO, JR, and SR players in separate columns and another column for the total number of players. Order results such that states with the most players come first.
What does below code do?
SELECT CASE WHEN state IN (‘CA’, ‘OR’, ‘WA’) THEN ‘West Coast’
WHEN state = ‘TX’ THEN ‘Texas’
ELSE ‘Other’ END AS arbitrary_regional_designation,
COUNT(1) AS players
FROM benn.college_football_players
WHERE weight >= 300
GROUP BY 1
Counts the number of 300lb+ players for each of the following regions: West Coast (CA, OR, WA), Texas, and Other (everywhere else).
When do we use SELECT DISTINCT?
You’ll occasionally want to look at only the unique values in a particular column. You can do this using SELECT DISTINCT syntax
What will happen if you include two (or more) columns in a SELECT DISTINCT clause?
Your results will contain all the unique pairs of those two columns
You only need to include DISTINCT once in your SELECT clause—you do not need to add it for each column name. True/False?
True
Which aggregation function is used with DISTINCT more commonly?
You can use DISTINCT when performing an aggregation. You’ll probably use it most commonly with the COUNT function.
DISTINCT goes outside the aggregate function rather than at the beginning of the SELECT clause. True/False?
False, DISTINCT goes INSIDE the aggregate function rather than at the beginning of the SELECT clause.
SUM, AVG, MIN, MAX have very little practical use with DISTINCT, True/False?
True, SUM or AVG the distinct values in a column, but there are fewer practical applications for them. For MAX and MIN, you probably shouldn’t ever use DISTINCT because the results will be the same as without DISTINCT, and the DISTINCT function will make your query substantially slower to return results.
It’s worth noting that using DISTINCT, particularly in aggregations, can slow your queries down quite a bit. True/False
True
How can we give a table an alias?
benn.college_football_teams teams
Once you’ve given a table an alias, you can refer to columns in that table in the SELECT clause using the alias name. True/False?
True
How can we select columns using table aliases? Code
SELECT teams.conference AS conference,
AVG(players.weight) AS average_weight
What should we do when 2 tables have columns with the same name and we want to join these tables and see these 2 columns?
The results can only support one column with a given name—when you include 2 columns of the same name, the results will simply show the exact same result set for both columns even if the two columns should contain different data. You can avoid this by naming the columns individually.
SELECT players.school_name AS players_school_name,
teams.school_name AS teams_school_name
FROM benn.college_football_players players
JOIN benn.college_football_teams teams
ON teams.school_name = players.school_name
Why is right join rarely used?
RIGHT JOIN is rarely used because you can achieve the results of a RIGHT JOIN by simply switching the two joined table names in a LEFT JOIN
Why do we use COUNT(DISTINCT companies.permalink) to count distinct company names instead of using COUNT(DISTINCT companies.company_name) on MODE website?
Because it returned results much faster
What’s the difference between below two codes?
FROM tutorial.crunchbase_companies companies
LEFT JOIN tutorial.crunchbase_acquisitions acquisitions
ON companies.permalink = acquisitions.company_permalink
AND acquisitions.company_permalink != ‘/company/1000memories’
FROM tutorial.crunchbase_companies companies
LEFT JOIN tutorial.crunchbase_acquisitions acquisitions
ON companies.permalink = acquisitions.company_permalink
WHERE acquisitions.company_permalink != ‘/company/1000memories’
OR acquisitions.company_permalink IS NULL
.
In the first one the conditional statement AND… is evaluated before the join occurs. You can think of it as a WHERE clause that only applies to one of the tables.
In the second one, the filter happens after the tables are joined
How can we count NULL values in a table in SQL? What should we write in the SELECT section?
By using COUNT(CASE…)
For example: COUNT(CASE WHEN column IS NULL THEN ‘It’s empty’)
What’s the difference between join and union?
SQL joins allow you to combine two datasets side-by-side, but UNION allows you to stack one dataset on top of the other. Put differently, UNION allows you to write two separate SELECT statements, and to have the results of one statement display in the same table as the results from the other statement.
“UNION only appends distinct values.” What does this mean?
When you use UNION, the dataset is appended, and any rows in the appended table that are exactly identical to rows in the first table are dropped.
What can we do when we want all the values (even duplicates) in the second table, to be appended to the first table?
If you’d like to append all the values from the second table, use UNION ALL.
You’ll likely use UNION ALL far more often than UNION. True/False?
True
What are the rules of appending datasets in SQL?
SQL has strict rules for appending data:
1- Both tables must have the same number of columns
2- The columns must have the same data types in the same order as the first table
How can we create a column and fill it with a fixed value?
SELECT ‘investments_part1’ AS dataset_name,
FROM table
What are 2 reasons for using multiple keys for joining datasets?
There are couple reasons you might want to join tables on multiple foreign keys. The first has to do with accuracy. The second reason has to do with performance.
What does below code do?
SELECT DISTINCT japan_investments.company_name,
japan_investments.company_permalink
FROM tutorial.crunchbase_investments_part1 japan_investments
JOIN tutorial.crunchbase_investments_part1 gb_investments
ON japan_investments.company_name = gb_investments.company_name
AND gb_investments.investor_country_code = ‘GBR’
AND gb_investments.funded_at > japan_investments.funded_at
WHERE japan_investments.investor_country_code = ‘JPN’
ORDER BY 1
to identify companies that received an investment from Great Britain following an investment from Japan.
You only need to include DISTINCT once in your SELECT clause—you do not need to add it for each column name. True/False?
True
What will happen If you include two (or more) columns in a SELECT DISTINCT clause?
Your results will contain all the unique pairs of those two columns
When do we use SELECT DISTINCT?
You’ll occasionally want to look at only the unique values in a particular column. You can do this using SELECT DISTINCT syntax
What does below code do?
SELECT CASE WHEN state IN (‘CA’, ‘OR’, ‘WA’) THEN ‘West Coast’
WHEN state = ‘TX’ THEN ‘Texas’
ELSE ‘Other’ END AS arbitrary_regional_designation,
COUNT(1) AS players
FROM benn.college_football_players
WHERE weight >= 300
GROUP BY 1
Counts the number of 300lb+ players for each of the following regions: West Coast (CA, OR, WA), Texas, and Other (everywhere else).
What does below code do?
SELECT state,
COUNT(CASE WHEN year = ‘FR’ THEN 1 ELSE NULL END) AS fr_count,
COUNT(CASE WHEN year = ‘SO’ THEN 1 ELSE NULL END) AS so_count,
COUNT(CASE WHEN year = ‘JR’ THEN 1 ELSE NULL END) AS jr_count,
COUNT(CASE WHEN year = ‘SR’ THEN 1 ELSE NULL END) AS sr_count,
COUNT(1) AS total_players
FROM benn.college_football_players
GROUP BY state
ORDER BY total_players DESC
displays the number of players in each state, with FR, SO, JR, and SR players in separate columns and another column for the total number of players. Order results such that states with the most players come first.
What does below code do?
SELECT CASE WHEN year = ‘FR’ THEN ‘FR’
WHEN year = ‘SO’ THEN ‘SO’
WHEN year = ‘JR’ THEN ‘JR’
WHEN year = ‘SR’ THEN ‘SR’
ELSE ‘No Year Data’ END AS year_group,
COUNT(1) AS count
FROM benn.college_football_players
GROUP BY 1
It counts the number of players in each year_group
The ELSE statement is optional, and provides a way to capture values not specified in the WHEN/THEN statements. True/False?
True
How do we end a CASE statement?
Every CASE statement MUST end with the END statement.
The CASE statement is followed by at least one pair of WHEN and THEN statements. True/False?
True
What is SQL’s way of handling if/then logic?
CASE
We use HAVING clause with aggregation, because WHERE doesn’t allow you to filter on aggregate columns. True/False?
True