CHAPTER 7: Grouping and Summarizing Data Flashcards
Q: What are aggregate functions?
A: Aggregate functions operate on sets of values from multiple rows to summarize data, such as calculating totals, averages, and counts.
Q: List five common aggregate functions in SQL.
COUNT: Counts rows or non-NULL values.
SUM: Adds numeric or money values.
AVG: Computes the average of numeric or money values.
MIN: Finds the lowest value (works on numeric, character, and date).
MAX: Finds the highest value (works on numeric, character, and date).
Q: Which data types are not supported by aggregate functions?
A: TEXT, NTEXT, IMAGE, and certain special types like HierarchyID and spatial data (except for COUNT).
Q: Does COUNT(*) ignore NULL values?
A: No, COUNT(*) includes all rows, including those with NULLs. Other aggregate functions ignore NULLs.
Q: Write a query to count all rows and calculate the maximum, minimum, sum, and average of a column.
SELECT
COUNT(*) AS CountOfRows,
MAX(TotalDue) AS MaxTotal,
MIN(TotalDue) AS MinTotal,
SUM(TotalDue) AS SumOfTotal,
AVG(TotalDue) AS AvgTotal
FROM Sales.SalesOrderHeader;
Q: How can you count all rows, non-NULL values, and distinct values in a column?
SELECT
COUNT(*) AS CountOfRows,
COUNT(Color) AS CountOfColor,
COUNT(DISTINCT Color) AS CountOfDistinctColor
FROM Production.Product;
Q: Can MIN and MAX be applied to string and date columns?
A: Yes, they can. Example:
SELECT
MIN(Name) AS MinName,
MAX(Name) AS MaxName,
MIN(SellStartDate) AS MinSellStartDate
FROM Production.Product;
Q: How do aggregate functions handle NULL values?
A: All aggregate functions (except COUNT(*)) ignore NULL values.
Q: What happens when aggregate functions are used in a query?
A: The query becomes an aggregate query, and special rules (like grouping) may apply when combining aggregated and non-aggregated columns.
Q: Why use COUNT(DISTINCT <column>)?</column>
A: To count unique, non-NULL values in a column, useful for identifying distinct data patterns.
Q: What is the purpose of the GROUP BY clause in SQL?
A: The GROUP BY clause is used to group rows that have the same values in specified columns, allowing aggregate functions to operate on these groups instead of the entire result set.
Q: What happens if you include non-aggregated columns in the SELECT list without adding them to the GROUP BY clause?
A: The query will produce an error because SQL Server cannot determine how to handle the non-aggregated columns.
Q: Write the syntax for a query that groups data by a column and calculates an aggregate function.
SELECT <aggregate_function>(<column>), <grouping_column>
FROM <table>
GROUP BY <grouping_column>;</grouping_column></grouping_column></column></aggregate_function>
Q: Write a query to calculate the total order value (TotalDue) for each customer (CustomerID).
SELECT CustomerID, SUM(TotalDue) AS TotalPerCustomer
FROM Sales.SalesOrderHeader
GROUP BY CustomerID;
Q: Write a query to find the average order value (TotalDue) for each sales territory (TerritoryID).
SELECT TerritoryID, AVG(TotalDue) AS AveragePerTerritory
FROM Sales.SalesOrderHeader
GROUP BY TerritoryID;
Q: What happens if you group by a column while selecting an expression derived from that column?
A: The grouping will not align with the expression in the SELECT list, leading to incorrect results. For example:
SELECT COUNT(*) AS CountOfOrders, YEAR(OrderDate) AS OrderYear
FROM Sales.SalesOrderHeader
GROUP BY OrderDate; – Incorrect grouping
Q: How do you group by an expression correctly?
A: Include the exact expression from the SELECT list in the GROUP BY clause:
SELECT COUNT(*) AS CountOfOrders, YEAR(OrderDate) AS OrderYear
FROM Sales.SalesOrderHeader
GROUP BY YEAR(OrderDate);
Q: Why does grouping by OrderDate instead of YEAR(OrderDate) cause multiple rows with the same year?
A: Grouping by OrderDate treats each date as a unique group, while grouping by YEAR(OrderDate) consolidates all rows from the same year into one group.
Q: What should you do to avoid errors when using GROUP BY?
Include all non-aggregated columns from the SELECT list in the GROUP BY clause.
Use expressions in the GROUP BY clause if they appear in the SELECT list.
Q: What happens if you group by columns or expressions you don’t need in the results?
A: The GROUP BY clause will unnecessarily fragment the groups, leading to incorrect or overly granular results.
Q: What happens if you use a non-aggregated column in the ORDER BY clause without including it in the GROUP BY clause?
A: The query will produce an error because every column in the ORDER BY clause must either:
Be part of an aggregate expression, or
Be included in the GROUP BY clause.
Q: What is the correct way to sort results by a non-aggregated column in an aggregate query?
A: Include the non-aggregated column in both the GROUP BY clause and the ORDER BY clause.
Example:
SELECT CustomerID, SUM(TotalDue) AS TotalPerCustomer
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
ORDER BY CustomerID;
Q: Can you use aggregate expressions in the ORDER BY clause even if they are not included in the SELECT list?
A: Yes, you can use aggregate expressions directly in the ORDER BY clause without listing them in the SELECT list.
Example:
SELECT CustomerID, SUM(TotalDue) AS TotalPerCustomer
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
ORDER BY MAX(TotalDue) DESC;
Q: How can you sort the results using an alias for an aggregate expression?
A: You can use the alias defined in the SELECT clause to sort the results.
Example:
SELECT CustomerID, SUM(TotalDue) AS TotalPerCustomer
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
ORDER BY TotalPerCustomer DESC;
Q: Write a query to display the total order value (SUM(TotalDue)) per customer, sorted by the total value in descending order.
SELECT CustomerID, SUM(TotalDue) AS TotalPerCustomer
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
ORDER BY TotalPerCustomer DESC;
Q: Write a query to show the total number of orders per sales territory, sorted by the maximum order value per territory in ascending order.
SELECT TerritoryID, COUNT(*) AS OrderCount
FROM Sales.SalesOrderHeader
GROUP BY TerritoryID
ORDER BY MAX(TotalDue) ASC;
Q: What should you verify before using a column in the ORDER BY clause of an aggregate query?
A: Ensure that the column is either:
Part of an aggregate expression, or
Included in the GROUP BY clause.
Q: What happens if you sort by an alias that isn’t defined in the SELECT list?
A: The query will fail because the alias must be defined in the SELECT clause to be referenced in the ORDER BY clause.
Q: What is a shortcut for sorting by an aggregate expression already defined in the SELECT clause?
A: Use the alias of the aggregate expression instead of repeating the full formula.
Q: What is the purpose of the WHERE clause in an aggregate query?
A: The WHERE clause is used to eliminate rows before the groupings and aggregates are applied.
Q: Can aggregate expressions be used in the WHERE clause?
A: No, aggregate expressions are not allowed in the WHERE clause.
Q: Write an example of a WHERE clause in an aggregate query.
SELECT CustomerID, SUM(TotalDue) AS TotalPerCustomer
FROM Sales.SalesOrderHeader
WHERE TerritoryID IN (5, 6)
GROUP BY CustomerID;
Q: What does the HAVING clause do in an aggregate query?
A: The HAVING clause filters rows after the groupings and aggregates are applied.