SQL Aggregate Functions Flashcards
what is an aggregate?
Calculations performed on multiple rows of a table are called
Aggregate functions combine multiple rows together to form a single value of more meaningful information.
COUNT function
takes the name of a column as an argument and counts the number of non-empty values in that column.
When using the SQL COUNT() function for a column, does it include duplicate values?
Yes. it will include duplicate values by default. It essentially counts all rows for which there is a value in the column.
If you wanted to count only the unique values in a column, then you can utilize the DISTINCT clause within the COUNT() function.
ex:
SELECT COUNT(DISTINCT category)
FROM fake_apps;
sum() function
takes the name of a column as an argument and returns the sum of all the values in that column.
MAX() and MIN() functions
take the name of a column as an argument and return the highest and lowest values in a column, respectively. the highest and lowest values in a column, respectively.
If multiple rows have the minimum or maximum value, which one is returned when using MAX/MIN?
Typically, when you have more than one row that contains the minimum or maximum value in a column, the topmost row containing that value will be returned in the result.
How can you make the output show all rows that contain the maximum price, instead of just top one?
You would need to use subqueries to do this. For example:
SELECT name,price FROM fake_apps
WHERE price = (SELECT max(price) FROM fake_apps);
So rather than SELECT-ing rows from the table where the price is exactly some number we specify, we compare price to the result of another query, in this case a MAX query.
avg() function
takes a column name as an argument and returns the average value for that column.
In SQL, how can we get the average of only the unique values of a column?
To run the AVG() function on a column such that it only averages the unique values in the column, we could use the DISTINCT clause right before the column name.
ex:
SELECT AVG(DISTINCT price)
FROM fake_apps;
round() function
ROUND() function takes two arguments inside the parenthesis:
a column name an integer
It rounds the values in the column to the number of decimal places specified by the integer.
When using the GROUP BY clause, do we always have to group by one of the selected columns listed after SELECT?
No, you can GROUP BY a column that was not included in the SELECT statement.
For example, this query does not list the price column in the SELECT, but it does group the data by that column.
SELECT name, downloads
FROM fake_apps
GROUP BY price;
However, usually we do include the grouped by column in the SELECT for the sake of clarity, so that it’s easier to see what rows belong to which group.
Do column references have to follow the order the columns are listed in the SELECT?
No, once you list the columns after the SELECT, they can be referenced by the order they appeared, starting from 1 for the first listed column.
You are not limited to referencing them in the exact order they were listed, like
GROUP BY 1, 2, 3
You can freely use the references in any order, like you would normally without using references.
GROUP BY 3, 1, 2
grouping by column references
SQL lets us use column reference(s) in our GROUP BY that will make our lives easier.
1 is the first column selected 2 is the second column selected 3 is the third column selected
and so on.
having vs where
When we want to limit the results of a query based on values of the individual rows, use WHERE.
When we want to limit the results of a query based on an aggregate property, use HAVING
HAVING
limits the results of a query based on an aggregate property.
HAVING statement always comes after GROUP BY, but before ORDER BY and LIMIT.