Chapter 5 Grouping and Windowing Flashcards
What is a window function?
You define a set of rows per function and then return one result value per each underlying row and function. The window is defined with respect to the current row.
What does COUNT(col1) return when all inputs are NULLs?
It returns 0 when all inputs are NULLs whereas other general set functions like MIN, MAX, SUM, and AVG return NULL in such as case.
What is the general form for a pivot query?
WITH PivotData AS ( SELECT < grouping column >, < spreading column >, < aggregation column > FROM < source table > ) SELECT < select list > FROM PivotData PIVOT ( < aggregate function >(< aggregation column >) FOR < spreading column > IN () ) as P;
What clause do you use to define a window?
You use an OVER clause to define a window. When using empty parenthesis, the OVER clause represents the entire underlying query’s result set, e.g. SUM(val) OVER () represents the grand total sum over all rows - it’s treated as one partition. You can use a window function partition clause to restrict the window, e.g. SUM(val) OVER (PARTITION BY custid) represents the current customer’s total.
Which window option ROWS or RANGE gets optimized better?
The ROWS option usually gets optimized much better than RANGE when using the same delimiters.
When using general set functions, how can you work with distinct occurrences?
You can specify a DISTINCT clause before the expression, e.g. COUNT(DISTINCT shippeddate).
What are the clauses that window functions support?
Partitioning, ordering, and framing clauses.
What happens when you invoke a group function, but don’t include an explicit GROUP BY clause?
All rows are arranged in one group and then the group function operates on the rows in that single group.
What is a group function?
You use grouped queries to arrange the queried rows in groups and then the group functions are applied to each group. You get one result row per group - not per underlying row.
What are the three types of window functions?
Aggregate, ranking, and offset.
Does the UNPIVOT operator filter out rows with NULLs in the value column?
Yes. The assumption is that those represent inapplicable cases. There’s no reason to keep a row for a certain customer-shipper pair if it’s not applicable.
What are the steps to a pivot query?
(1) Define a table expression (like the one named PivotData) that returns the 3 elements necessary for pivoting. (2) Issue an outer query against the table expression and apply the PIVOT operator to that table expression. The PIVOT operator returns a table result. Assign an alias to the table result (P). (3) Specify the aggregate function for the PIVOT operator (e.g. SUM) (4) Then specify the FOR clause followed by the spreading column. (5) Then specify the IN clause followed by the list of distinct values that appear in the spreading element, separated by commas.
What is the default when you define a window with a window order clause, but no window frame clause?
The default is *RANGE* BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Therefore, if you are after the special behavior you get from RANGE that includes peers, make sure you explicitly define the ROWS option.
What are the 4 different window offset functions available in T-SQL?
LAG, LEAD, FIRST_VALUE, and LAST_VALUE.
What are the clauses that you can use to define multiple grouping sets in the same query?
GROUPING SETS, CUBE, and ROLLUP.
What are the general set functions supported by SQL?
(1) COUNT, (2) SUM, (3) AVG, (4) MIN, and (5) MAX. General set functions are applied to an expression and ignore NULLs.
When does a query become a “grouped query”?
When you use an aggregate function, a GROUP BY clause, or both.
What is the difference between COUNT(col1) and COUNT(*)?
The former ignores NULLs and therefore the counts are less than or equal to those produced by the latter.
What are the differences between ROWS and RANGE window frame extent?
RANGE is based on logical offsets from the current row’s sort key. ROWS is based on physical offsets in terms of number of rows from the current row. SQL 2012 has a very limited implementation of RANGE and supports only UNBOUNDED PRECEDING or FOLLOWING and CURRENT ROW as delimiters. One difference between ROWS and RANGE when using the same delimiters is that the former doesn’t include tied rows in terms of the sort key and the latter does.
What is the difference between the HAVING and WHERE clauses?
WHERE is evaluated per row; HAVING is evaluated per group.
How can you tell whether a NULL in the grouped results represents a placeholder or an original NULL from the table?
T-SQL provides 2 functions: GROUPING and GROUPING_ID.
What is pivoting data?
Pivoting is a technique that groups and aggregates data, transitioning it from a state of rows to a state of columns.
What frame extent clause would you use to include only the last three rows?
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW.
Can you specify multiple GROUPING SETS, CUBE, and ROLLUP clauses in the GROUP BY clause?
Yes; however, by doing so, you achieve a multiplication effect. For example, CUBE(a,b,c) yields 8 grouping sets. ROLLUP(x,y,z) defines 4 grouping sets. By specifying a column between them as in CUBE(a,b,c), ROLLUP(x,y,z) you multiply them and get 32 grouping sets.
What is the CUBE clause and how do you use it?
The CUBE clause accepts a list of expressions as inputs and defines all possible grouping sets that can be generated from the inputs - including the empty grouping set., e.g. GROUP BY CUBE (shipperid, YEAR(shippeddate)); This produces 4 grouping sets: (1) shipperid, (2) YEAR(shippeddate), (3) shipperid, YEAR(shippeddate), (4) (Empty)
What is the NTILE function?
The NTILE function allows you to arrange the rows within the partition into a requested number of equally sized tiles based on the specified ordering. You specify the desired number of tiles as input to the function, e.g. NTILE(100). If there are 830 rows in the result set, the tile size is 830 / 100 = 8 with a rem of 30. Because there is a rem, the first 30 tiles are assigned an extra row.
Why should you prepare a table expression for the pivot operator to return only the three elements needed for the pivot task?
Because all elements besides the aggregation and spreading elements are implicitly used for grouping. By using a table expression, you control which columns are used for grouping.
What is the difference between pivot and unpivot?
Pivot rotates data from a state of rows to a state of column headers.
Unpivot rotates the data from a state of column headers to a state of row values.