Chapter 5 Grouping and Windowing Flashcards
What is a data analysis function?
A data analysis function is a function applied to a set of rows and it returns a single value, e.g. the SUM aggregate function.
When does a query become a “grouped query”?
When you use an aggregate function, a GROUP BY clause, or both.
What happens when you invoke a group function, but don’t include an explicit GROUP BY clause?
All rows are arranged in one group and then the group function operates on the rows in that single group.
What happens when you invoke a group function and include an explicit GROUP BY clause?
Rows are arrange in one or more groups according to the grouping set of expressions and the group function operates on each group.
What is the HAVING clause?
The HAVING clause uses a predicate but evaluates the predicate per group as opposed to per row. This means that you can refer to aggregate computations because the data has already been grouped.
What is the difference between the HAVING and WHERE clauses?
WHERE is evaluated per row; HAVING is evaluated per group.
What are the general set functions supported by SQL?
(1) COUNT, (2) SUM, (3) AVG, (4) MIN, and (5) MAX. General set functions are applied to an expression and ignore NULLs.
What is the difference between COUNT(col1) and COUNT(*)?
The former ignores NULLs and therefore the counts are less than or equal to those produced by the latter.
When using general set functions, how can you work with distinct occurrences?
You can specify a DISTINCT clause before the expression, e.g. COUNT(DISTINCT shippeddate).
When using a GROUP BY clause, what must be done to expressions in the clauses that follow, namely, HAVING, SELECT, and ORDER BY?
All expressions that appear in those clauses must guarantee a single result value per group. There’s no problem referring directly to elements that appear in the GROUP BY clause because each of those already return one distinct value per group. For other elements from the underlying table, you must apply an aggregate function.
What are the clauses that allow you to define multiple grouping sets?
(1) GROUPING SETS, (2) CUBE, (3) ROLLUP. You use these in the GROUP BY clause.
What is the GROUPING SETS clause and how do you use it?
You can use the GROUPING SETS clause to list all grouping sets that you want to define in the query. You list the grouping sets separated by commas within the outer pair of parenthesis. You use an inner pair of parenthesis to enclose each grouping set. If you don’t use inner parenthesis, each individual element is considered a separate grouping set. e.g. GROUP BY GROUPING SETS ( (shupperid, YEAR(shippeddate)), (shipperid), (YEAR(shippeddate)), ());
Can a grouping set be empty?
Yes. This results in one group with all rows for computation of grand aggregates.
When using grouping sets, what happens in the output when an element isn’t part of the grouping set?
NULLs are used as placeholders in rows where an element isn’t part of the grouping set.
What is the CUBE clause and how do you use it?
The CUBE clause accepts a list of expressions as inputs and defines all possible grouping sets that can be generated from the inputs - including the empty grouping set., e.g. GROUP BY CUBE (shipperid, YEAR(shippeddate)); This produces 4 grouping sets: (1) shipperid, (2) YEAR(shippeddate), (3) shipperid, YEAR(shippeddate), (4) (Empty)
What is the ROLLUP clause and how do you use it?
The ROLLUP clause accepts a list of expressions as inputs and defines a hierarchy formed by the input elements such as a location hierarchy (country, region, city), .e.g GROUP BY ROLLUP (country, region, city) produces 4 grouping sets: (1) country, region, city, (2) country, region, (3) country, (4) (Empty).
How can you tell whether a NULL in the grouped results represents a placeholder or an original NULL from the table?
T-SQL provides 2 functions: GROUPING and GROUPING_ID.
What is the GROUPING function and how does it work?
Tells whether a NULL in the grouped results represents a placeholder or an original NULL. GROUPING accepts a single element as input and returns 0 when the element is part of the grouping set and 1 when it isn’t, e.g. GROUPING(country) => 0/1
What is the GROUPING_ID function and how does it work?
GROUPING_ID accepts the list of grouped columns as inputs and returns an integer representing a bitmap. The rightmost bit represents the rightmost input. The bit is 0 when the respective element is part of the grouping set and 1 when it isn’t. The result integer is the sum of the values representing elements that are not part of the grouping set because their bits are turned on., e.g. GROUPING_ID(country, region, city), 7 would represent the empty grouping set - none of the 3 elements is part of the grouping set. Therefore, the respective bits (1, 2, 4 => 7) are turned on.
Can you specify multiple GROUPING SETS, CUBE, and ROLLUP clauses in the GROUP BY clause?
Yes; however, by doing so, you achieve a multiplication effect. For example, CUBE(a,b,c) yields 8 grouping sets. ROLLUP(x,y,z) defines 4 grouping sets. By specifying a column between them as in CUBE(a,b,c), ROLLUP(x,y,z) you multiply them and get 32 grouping sets.
What are the clauses that you can use to define multiple grouping sets in the same query?
GROUPING SETS, CUBE, and ROLLUP.
What does COUNT(col1) return when all inputs are NULLs?
It returns 0 when all inputs are NULLs whereas other general set functions like MIN, MAX, SUM, and AVG return NULL in such as case.
Can the GROUPING and GROUPING_ID functions be used to sort data based on grouping set association - that is, first detail and then aggregates?
These functions can be used for sorting data because they return a 0 bit for a detail element and a 1 bit for an aggregated element. So, if you want to see detail first, sort by the result of the function in ascending order.
What is pivoting data?
Pivoting is a technique that groups and aggregates data, transitioning it from a state of rows to a state of columns.
What are the three things you need to identify in all pivot queries?
(1) What do you want to see on rows? This element is known as the “on rows” or “grouping element” (2) What do you want to see on columns? This element is known as the “on cols” or “spreading element” (3) What do you want to see in the intersection of each distinct row and column value? This element is known as the “data” or “aggregation element”.
What is the general form for a pivot query?
WITH PivotData AS ( SELECT < grouping column >, < spreading column >, < aggregation column > FROM < source table > )
SELECT < select list >
FROM PivotData
PIVOT ( < aggregate function >(< aggregation column >)
FOR < spreading column > IN () ) as P;
What are the steps to a pivot query?
(1) Define a table expression (like the one named PivotData) that returns the 3 elements necessary for pivoting. (2) Issue an outer query against the table expression and apply the PIVOT operator to that table expression. The PIVOT operator returns a table result. Assign an alias to the table result (P). (3) Specify the aggregate function for the PIVOT operator (e.g. SUM) (4) Then specify the FOR clause followed by the spreading column. (5) Then specify the IN clause followed by the list of distinct values that appear in the spreading element, separated by commas.
What is an example of the pivot query?
WITH PivotData AS ( SELECT custid, -- grouping column shipperid, -- spreading column freight, -- aggregation column FROM Sales.Orders ) SELECT custid, [1], [2], [3] FROM PivotData PIVOT(SUM(freight) FOR shipperid IN ([1], [2], [3])) AS P;