CHAPTER 7: Grouping and Summarizing Data Flashcards

1
Q

Q: What are aggregate functions?

A

A: Aggregate functions operate on sets of values from multiple rows to summarize data, such as calculating totals, averages, and counts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Q: List five common aggregate functions in SQL.

A

COUNT: Counts rows or non-NULL values.
SUM: Adds numeric or money values.
AVG: Computes the average of numeric or money values.
MIN: Finds the lowest value (works on numeric, character, and date).
MAX: Finds the highest value (works on numeric, character, and date).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Q: Which data types are not supported by aggregate functions?

A

A: TEXT, NTEXT, IMAGE, and certain special types like HierarchyID and spatial data (except for COUNT).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Q: Does COUNT(*) ignore NULL values?

A

A: No, COUNT(*) includes all rows, including those with NULLs. Other aggregate functions ignore NULLs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Q: Write a query to count all rows and calculate the maximum, minimum, sum, and average of a column.

A

SELECT
COUNT(*) AS CountOfRows,
MAX(TotalDue) AS MaxTotal,
MIN(TotalDue) AS MinTotal,
SUM(TotalDue) AS SumOfTotal,
AVG(TotalDue) AS AvgTotal
FROM Sales.SalesOrderHeader;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Q: How can you count all rows, non-NULL values, and distinct values in a column?

A

SELECT
COUNT(*) AS CountOfRows,
COUNT(Color) AS CountOfColor,
COUNT(DISTINCT Color) AS CountOfDistinctColor
FROM Production.Product;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Q: Can MIN and MAX be applied to string and date columns?

A

A: Yes, they can. Example:

SELECT
MIN(Name) AS MinName,
MAX(Name) AS MaxName,
MIN(SellStartDate) AS MinSellStartDate
FROM Production.Product;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Q: How do aggregate functions handle NULL values?

A

A: All aggregate functions (except COUNT(*)) ignore NULL values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Q: What happens when aggregate functions are used in a query?

A

A: The query becomes an aggregate query, and special rules (like grouping) may apply when combining aggregated and non-aggregated columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Q: Why use COUNT(DISTINCT <column>)?</column>

A

A: To count unique, non-NULL values in a column, useful for identifying distinct data patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Q: What is the purpose of the GROUP BY clause in SQL?

A

A: The GROUP BY clause is used to group rows that have the same values in specified columns, allowing aggregate functions to operate on these groups instead of the entire result set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Q: What happens if you include non-aggregated columns in the SELECT list without adding them to the GROUP BY clause?

A

A: The query will produce an error because SQL Server cannot determine how to handle the non-aggregated columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Q: Write the syntax for a query that groups data by a column and calculates an aggregate function.

A

SELECT <aggregate_function>(<column>), <grouping_column>
FROM <table>
GROUP BY <grouping_column>;</grouping_column></grouping_column></column></aggregate_function>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Q: Write a query to calculate the total order value (TotalDue) for each customer (CustomerID).

A

SELECT CustomerID, SUM(TotalDue) AS TotalPerCustomer
FROM Sales.SalesOrderHeader
GROUP BY CustomerID;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Q: Write a query to find the average order value (TotalDue) for each sales territory (TerritoryID).

A

SELECT TerritoryID, AVG(TotalDue) AS AveragePerTerritory
FROM Sales.SalesOrderHeader
GROUP BY TerritoryID;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Q: What happens if you group by a column while selecting an expression derived from that column?

A

A: The grouping will not align with the expression in the SELECT list, leading to incorrect results. For example:

SELECT COUNT(*) AS CountOfOrders, YEAR(OrderDate) AS OrderYear
FROM Sales.SalesOrderHeader
GROUP BY OrderDate; – Incorrect grouping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Q: How do you group by an expression correctly?

A

A: Include the exact expression from the SELECT list in the GROUP BY clause:

SELECT COUNT(*) AS CountOfOrders, YEAR(OrderDate) AS OrderYear
FROM Sales.SalesOrderHeader
GROUP BY YEAR(OrderDate);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Q: Why does grouping by OrderDate instead of YEAR(OrderDate) cause multiple rows with the same year?

A

A: Grouping by OrderDate treats each date as a unique group, while grouping by YEAR(OrderDate) consolidates all rows from the same year into one group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Q: What should you do to avoid errors when using GROUP BY?

A

Include all non-aggregated columns from the SELECT list in the GROUP BY clause.

Use expressions in the GROUP BY clause if they appear in the SELECT list.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Q: What happens if you group by columns or expressions you don’t need in the results?

A

A: The GROUP BY clause will unnecessarily fragment the groups, leading to incorrect or overly granular results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Q: What happens if you use a non-aggregated column in the ORDER BY clause without including it in the GROUP BY clause?

A

A: The query will produce an error because every column in the ORDER BY clause must either:
Be part of an aggregate expression, or
Be included in the GROUP BY clause.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Q: What is the correct way to sort results by a non-aggregated column in an aggregate query?

A

A: Include the non-aggregated column in both the GROUP BY clause and the ORDER BY clause.
Example:

SELECT CustomerID, SUM(TotalDue) AS TotalPerCustomer
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
ORDER BY CustomerID;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Q: Can you use aggregate expressions in the ORDER BY clause even if they are not included in the SELECT list?

A

A: Yes, you can use aggregate expressions directly in the ORDER BY clause without listing them in the SELECT list.
Example:

SELECT CustomerID, SUM(TotalDue) AS TotalPerCustomer
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
ORDER BY MAX(TotalDue) DESC;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Q: How can you sort the results using an alias for an aggregate expression?

A

A: You can use the alias defined in the SELECT clause to sort the results.
Example:

SELECT CustomerID, SUM(TotalDue) AS TotalPerCustomer
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
ORDER BY TotalPerCustomer DESC;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Q: Write a query to display the total order value (SUM(TotalDue)) per customer, sorted by the total value in descending order.

A

SELECT CustomerID, SUM(TotalDue) AS TotalPerCustomer
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
ORDER BY TotalPerCustomer DESC;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Q: Write a query to show the total number of orders per sales territory, sorted by the maximum order value per territory in ascending order.

A

SELECT TerritoryID, COUNT(*) AS OrderCount
FROM Sales.SalesOrderHeader
GROUP BY TerritoryID
ORDER BY MAX(TotalDue) ASC;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Q: What should you verify before using a column in the ORDER BY clause of an aggregate query?

A

A: Ensure that the column is either:
Part of an aggregate expression, or
Included in the GROUP BY clause.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Q: What happens if you sort by an alias that isn’t defined in the SELECT list?

A

A: The query will fail because the alias must be defined in the SELECT clause to be referenced in the ORDER BY clause.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Q: What is a shortcut for sorting by an aggregate expression already defined in the SELECT clause?

A

A: Use the alias of the aggregate expression instead of repeating the full formula.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Q: What is the purpose of the WHERE clause in an aggregate query?

A

A: The WHERE clause is used to eliminate rows before the groupings and aggregates are applied.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Q: Can aggregate expressions be used in the WHERE clause?

A

A: No, aggregate expressions are not allowed in the WHERE clause.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Q: Write an example of a WHERE clause in an aggregate query.

A

SELECT CustomerID, SUM(TotalDue) AS TotalPerCustomer
FROM Sales.SalesOrderHeader
WHERE TerritoryID IN (5, 6)
GROUP BY CustomerID;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Q: What does the HAVING clause do in an aggregate query?

A

A: The HAVING clause filters rows after the groupings and aggregates are applied.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Q: Can the HAVING clause contain aggregate expressions not included in the SELECT list?

A

A: Yes, the HAVING clause can include aggregate expressions that are not in the SELECT list.

35
Q

Q: Provide an example of a query using the HAVING clause.

A

SELECT CustomerID, SUM(TotalDue) AS TotalPerCustomer
FROM Sales.SalesOrderHeader
GROUP BY CustomerID
HAVING SUM(TotalDue) > 5000;

36
Q

Q: What is the difference between filtering with WHERE and HAVING?

A

A: The WHERE clause filters rows before grouping, while the HAVING clause filters groups after aggregation.

37
Q

Q: Can nonaggregated columns in the HAVING clause be used if they are not in the GROUP BY clause?

A

A: No, nonaggregated columns in the HAVING clause must appear in the GROUP BY clause.

38
Q

Q: Why should criteria involving nonaggregate columns usually go in the WHERE clause instead of HAVING?

A

A: Filtering rows in the WHERE clause is more efficient since it eliminates rows before grouping and aggregation.

39
Q

Q: What error occurs if a nonaggregated column in HAVING does not appear in the GROUP BY clause?

A

A: The query will result in an error because all nonaggregated columns in HAVING must be in the GROUP BY clause.

40
Q

Q: How does SQL Server optimize queries with HAVING that include nonaggregate columns?

A

A: The database engine may move such criteria to the WHERE clause for better performance.

41
Q

Q: Write a query that uses both WHERE and HAVING clauses.

A

SELECT CustomerID, SUM(TotalDue) AS TotalPerCustomer
FROM Sales.SalesOrderHeader
WHERE TerritoryID IN (5, 6)
GROUP BY CustomerID
HAVING SUM(TotalDue) > 5000;

42
Q

Q: In what order are the clauses in an aggregate query written?

A

SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY

43
Q

Q: In what order does the database engine process the clauses in an aggregate query?

A

FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY

44
Q

Q: When should you use the WHERE clause in an aggregate query?

A

A: Use the WHERE clause to filter rows before the groupings and aggregates are applied.

45
Q

Q: When should you use the HAVING clause in an aggregate query?

A

A: Use the HAVING clause to filter rows after the groupings and aggregates are applied, typically using aggregate expressions.

46
Q

Q: What is the purpose of the DISTINCT keyword?

A

A: The DISTINCT keyword eliminates duplicate rows from the results of a query.

47
Q

Q: How is DISTINCT different from GROUP BY when used to eliminate duplicates?

A

A: While both eliminate duplicates, DISTINCT explicitly removes duplicate rows, whereas GROUP BY groups rows by unique values. They often produce the same result, but DISTINCT is simpler for non-aggregate queries.

48
Q

Q: Write a query using DISTINCT to eliminate duplicate SalesOrderID values.

A

SELECT DISTINCT SalesOrderID
FROM Sales.SalesOrderDetail;

49
Q

Q: Write a query using GROUP BY to achieve the same result as DISTINCT.

A

SELECT SalesOrderID
FROM Sales.SalesOrderDetail
GROUP BY SalesOrderID;

50
Q

Q: Can DISTINCT be used inside aggregate functions?

A

A: Yes, DISTINCT can be used within aggregate functions to operate on unique values.

51
Q

Q: How does COUNT(DISTINCT column) differ from COUNT(column)?

A

COUNT(column) counts all non-NULL values in the column.
COUNT(DISTINCT column) counts only unique non-NULL values.

52
Q

Q: Write a query to count the total rows, non-NULL SalesPersonID values, and distinct SalesPersonID values.

A

SELECT COUNT(*) AS CountOfRows,
COUNT(SalesPersonID) AS CountOfSalesPeople,
COUNT(DISTINCT SalesPersonID) AS CountOfUniqueSalesPeople
FROM Sales.SalesOrderHeader;

53
Q

Q: Can you use DISTINCT with aggregate functions like SUM?

A

A: Yes, DISTINCT can be used with SUM to calculate the sum of unique values in a column.

54
Q

Q: Write a query demonstrating SUM with and without DISTINCT.

A

SELECT SUM(TotalDue) AS TotalOfAllOrders,
SUM(DISTINCT TotalDue) AS TotalOfDistinctTotalDue
FROM Sales.SalesOrderHeader;

55
Q

Q: What is the purpose of the APPROX_COUNT_DISTINCT function?

A

A: APPROX_COUNT_DISTINCT is used for faster but less accurate counts of distinct values, useful in scenarios involving large datasets like web traffic logging.

56
Q

Q: How accurate is APPROX_COUNT_DISTINCT compared to COUNT(DISTINCT column)?

A

A: APPROX_COUNT_DISTINCT provides about a 2% discrepancy but is much faster for large datasets.

57
Q

Q: Can you use aggregate functions with multiple tables?

A

A: Yes, aggregate functions can be used with multiple tables, and the same rules for grouping and filtering apply.

58
Q

Q: When writing an aggregate query with joins, where must the non-aggregated columns appear?

A

A: Non-aggregated columns must appear in the GROUP BY clause.

59
Q

Q: How do INNER JOIN and LEFT OUTER JOIN affect aggregate queries differently?

A

INNER JOIN: Includes only rows with matching values in both tables.
LEFT OUTER JOIN: Includes all rows from the left table, with NULL for non-matching rows from the right table.

60
Q

Q: What issues can arise when using LEFT OUTER JOIN in aggregate queries?

A

COUNT(*) can incorrectly return 1 for rows with no matches.
Aggregate functions like SUM can return NULL for unmatched rows.

61
Q

Q: How can you handle NULL values in aggregate queries with LEFT OUTER JOIN?

A

Use COUNT(column) instead of COUNT(*) to correctly count non-NULL matches.
Use COALESCE(column, 0) to replace NULL with 0 for calculations.

62
Q

Q: Write an aggregate query to count orders and calculate the total due for each customer using INNER JOIN.

A

SELECT c.CustomerID, c.AccountNumber,
COUNT(*) AS CountOfOrders,
SUM(TotalDue) AS SumOfTotalDue
FROM Sales.Customer AS c
INNER JOIN Sales.SalesOrderHeader AS s
ON c.CustomerID = s.CustomerID
GROUP BY c.CustomerID, c.AccountNumber
ORDER BY c.CustomerID;

63
Q

Q: Write the same query using LEFT OUTER JOIN to include customers with no orders.

A

SELECT c.CustomerID, c.AccountNumber,
COUNT(*) AS CountOfOrders,
SUM(TotalDue) AS SumOfTotalDue
FROM Sales.Customer AS c
LEFT OUTER JOIN Sales.SalesOrderHeader AS s
ON c.CustomerID = s.CustomerID
GROUP BY c.CustomerID, c.AccountNumber
ORDER BY c.CustomerID;

64
Q

Q: How can you modify the LEFT OUTER JOIN query to correctly handle customers with no orders?

A

SELECT c.CustomerID, c.AccountNumber,
COUNT(s.SalesOrderID) AS CountOfOrders,
SUM(COALESCE(TotalDue, 0)) AS SumOfTotalDue
FROM Sales.Customer AS c
LEFT OUTER JOIN Sales.SalesOrderHeader AS s
ON c.CustomerID = s.CustomerID
GROUP BY c.CustomerID, c.AccountNumber
ORDER BY c.CustomerID;

65
Q

Q: What does COALESCE(column, 0) do in aggregate queries?

A

A: Replaces NULL values in a column with 0 for calculations.

66
Q

Q: Why use COUNT(column) instead of COUNT(*) in LEFT OUTER JOIN aggregate queries?

A

A: To ensure that only non-NULL rows are counted, avoiding incorrect counts for unmatched rows.

67
Q

Q: Can you use both WHERE and HAVING clauses in aggregate queries with joins?

A

A: Yes, use WHERE to filter rows before grouping and HAVING to filter groups after aggregation.

68
Q

Q: How do aggregate functions handle NULL values?

A

A: Aggregate functions (like SUM, AVG, and COUNT) ignore NULL values, except for COUNT(*).

69
Q

Q: What is the impact of NULL values on the AVG function?

A

A: The AVG function ignores NULL values, which can lead to an average based only on non-NULL rows.

70
Q

Q: How can you include NULL values in the AVG calculation?

A

A: Use the ISNULL or COALESCE function to replace NULL values with 0 or another default value.

Example:
AVG(ISNULL(DiscountAmt, 0)) AS AvgWithNullRows

71
Q

Q: What warning appears when aggregate functions encounter NULL values?

A

A: A message stating, “Null value is eliminated by an aggregate or other SET operation.”

72
Q

Q: How can you prevent the NULL warning in aggregate queries?

A

A: Use SET ANSI_WARNINGS OFF for the session, but note this disables warnings for other operations as well.

73
Q

Q: What does SET STATISTICS IO ON do?

A

A: Enables detailed output of disk and memory usage for queries, displayed in the Messages tab.

74
Q

Q: What is the most critical metric in Statistics IO for tuning query performance?

A

A: The number of logical reads, which indicates how many pages were read from memory.

75
Q

Q: Why might two queries accessing the same table have different logical reads?

A

A: Logical reads depend on whether the query uses an index or performs a table scan.

76
Q

Q: Why do queries without a WHERE clause often have higher logical reads?

A

A: Without a filter, the database must scan the entire table or clustered index.

77
Q

Q: How does a clustered index affect query performance?

A

A: Queries often scan the entire clustered index if no nonclustered index is available for filtering or sorting.

78
Q

Q: What is a nonclustered index, and why is it useful?

A

A: A nonclustered index is a smaller structure that allows efficient lookups for specific columns, reducing logical reads.

79
Q

Q: Why does a query with fewer columns not always result in fewer logical reads?

A

A: SQL Server reads entire rows from pages, regardless of how many columns are selected.

80
Q

Q: What is a columnstore index?

A

A: Introduced in SQL Server 2012, it stores data by columns instead of rows, enabling efficient column-based data access.

81
Q

Q: What is In-Memory OLTP, and how does it enhance performance?

A

A: Introduced in SQL Server 2014, it allows entire tables to be loaded into memory for faster processing.

82
Q

Q: Write a query to calculate the average discount while including NULL rows as 0.

A

SELECT AVG(ISNULL(DiscountAmt, 0)) AS AvgDiscountWithNulls
FROM Sales.SalesOrderHeader;

83
Q

Q: How do you enable Statistics IO in a session?

A

SET STATISTICS IO ON;

84
Q

Q: Write a query to determine logical reads for a table scan vs. index scan.

A

– Table scan
SELECT * FROM Sales.SalesOrderHeader;

– Index scan
SELECT SalesOrderID FROM Sales.SalesOrderHeader;