Chapter 5 Grouping and Windowing Flashcards

Question

What are the three things you need to identify in all pivot queries?

Answer 1

(1) What do you want to see on rows? This element is known as the "on rows" or "grouping element" (2) What do you want to see on columns? This element is known as the "on cols" or "spreading element" (3) What do you want to see in the intersection of each distinct row and column value? This element is known as the "data" or "aggregation element".

Answer 2

``` WITH PivotData AS ( SELECT < grouping column >, < spreading column >, < aggregation column > FROM < source table > ) ``` SELECT < select list > FROM PivotData PIVOT ( < aggregate function >(< aggregation column >) FOR < spreading column > IN () ) as P;

Answer 3

(1) Define a table expression (like the one named PivotData) that returns the 3 elements necessary for pivoting. (2) Issue an outer query against the table expression and apply the PIVOT operator to that table expression. The PIVOT operator returns a table result. Assign an alias to the table result (P). (3) Specify the aggregate function for the PIVOT operator (e.g. SUM) (4) Then specify the FOR clause followed by the spreading column. (5) Then specify the IN clause followed by the list of distinct values that appear in the spreading element, separated by commas.

Answer 4

``` WITH PivotData AS ( SELECT custid, -- grouping column shipperid, -- spreading column freight, -- aggregation column FROM Sales.Orders ) SELECT custid, [1], [2], [3] FROM PivotData PIVOT(SUM(freight) FOR shipperid IN ([1], [2], [3])) AS P; ```

Answer 5

Process of elimination - it's what's left from the queried table besides the aggregation and spreading elements.

Answer 6

Because all elements besides the aggregation and spreading elements are implicitly used for grouping. By using a table expression, you control which columns are used for grouping.

Answer 7

(1) The aggregation and spreading elements cannot be the results of expressions - they must be column names from the queried table. You can however apply expressions in the query defining the table expression, assign aliases to those expressions, and then use the aliases in PIVOT (2) The COUNT(*) function isn't allowed as an aggregate function used by PIVOT. You must use COUNT( < col name >) There is a work around using the table expression (3) PIVOT is used to using only one aggregate function (4) The IN clause of the PIVOT operator accepts a static list of spreading values. It doesn't support a subquery as input. You need to know ahead of time what the distinct values are in the spreading column. You can use dynamic SQL to work around this.

Answer 8

String with pivoted data, when unpivoting, you rotate the input data from a state of columns to a state of rows.

Answer 9

They are table operators similar to JOIN, etc.

Answer 10

(1) The set of source columns that you're unpivoting, (2) The name you want to assign to the target values column (e.g. "freight"), (3) The name you want to assign to the target names column ("shipperid").

Answer 11

SELECT < column list >, < names column >, < values column > FROM < source table > UNPIVOT( < values column > FOR < names column > IN ( )) AS U;

Answer 12

SELECT custid, shipperid, freight FROM Sales.FreightTotals UNPIVOT(freight FOR shipperid IN ([1],[2],[3])) AS U;

Answer 13

Yes. The assumption is that those represent inapplicable cases. There's no reason to keep a row for a certain customer-shipper pair if it's not applicable.

Answer 14

The names colum is defined as a nvarchar(128) and the values column is defined with the same type as the type of the source columns that were unpivoted.

Answer 15

Pivot rotates data from a state of rows to a state of columns. Unpivot rotates the data from columns to rows.

Answer 16

You define a set of rows per function and then return one result value per each underlying row and function. The window is defined with respect to the current row.

Answer 17

You use grouped queries to arrange the queried rows in groups and then the group functions are applied to each group. You get one result row per group - not per underlying row.

Answer 18

Aggregate, ranking, and offset.

Answer 19

Windowed queries do not hide detail - they return a row for every underlying query's row. This means you can mix detail and aggregated elements in the same query.

Answer 20

You use an OVER clause to define a window. When using empty parenthesis, the OVER clause represents the entire underlying query's result set, e.g. SUM(val) OVER () represents the grand total sum over all rows - it's treated as one partition. You can use a window function partition clause to restrict the window, e.g. SUM(val) OVER (PARTITION BY custid) represents the current customer's total.

Answer 21

Framing is a filtering option available to window aggregate functions. You define ordering within the partition by using a window order clause, and then based on that order you can confine a frame of rows between two delimiters.

Answer 22

ROWS or RANGE

Answer 23

(1) UNBOUNDED PRECEDING or FOLLOWING, (2) CURRENT ROW, (3) < n > ROWS PRECEDING or FOLLOWING

Answer 24

ROWS UNBOUNDED PRECEDING

Answer 25

SELECT and ORDER BY - if you need to refer to the result of a window function in any clause evaluated before SELECT, you need to use a table expression (CTE).

Answer 26

ROWS BETWEEN 2 PRECEDING AND CURRENT ROW.

Answer 27

RANGE is based on logical offsets from the current row's sort key. ROWS is based on physical offsets in terms of number of rows from the current row. SQL 2012 has a very limited implementation of RANGE and supports only UNBOUNDED PRECEDING or FOLLOWING and CURRENT ROW as delimiters. One difference between ROWS and RANGE when using the same delimiters is that the former doesn't include tied rows in terms of the sort key and the latter does.

Answer 28

The default is *RANGE* BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Therefore, if you are after the special behavior you get from RANGE that includes peers, make sure you explicitly define the ROWS option.

Answer 29

The ROWS option usually gets optimized much better than RANGE when using the same delimiters.

Answer 30

ROW_NUMBER, RANK, DENSE_RANK, and NTILE.

Answer 31

Yes.But the window order clause only determines ordering for the window function's computation not for presentation. There's no guarantee that the rows will be presented in the same order as the window function's ordering. If you need such a guarantee, add a presentation ORDER BY clause.

Answer 32

The ROW_NUMBER function computes a unique sequential integer starting with 1 within the window partition based on the window ordering. Note that if the ordering isn't unique, the ROW_NUMBER function is not deterministic. If there's no "tie breaker" in the ordering, the choice of which row gets the higher number is arbitrary - optimization dependent.

Answer 33

The RANK function returns the number of rows that have a lower ordering value than the current plus 1. Can have gaps between ranking values.

Answer 34

The DENSE_RANK function returns the number of *distinct* ordering values that are lower than the current plus 1. Cannot have gaps between ranking values.

Answer 35

The NTILE function allows you to arrange the rows within the partition into a requested number of equally sized tiles based on the specified ordering. You specify the desired number of tiles as input to the function, e.g. NTILE(100). If there are 830 rows in the result set, the tile size is 830 / 100 = 8 with a rem of 30. Because there is a rem, the first 30 tiles are assigned an extra row.

Answer 36

LAG, LEAD, FIRST_VALUE, and LAST_VALUE.

Answer 37

Return an element from a single row that is in a given offset from the current row in the window partition, or from the first or last row in the window frame.

Answer 38

The LAG and LEAD functions support window partition and ordering clauses. They don't support a window frame clause. The LAG function returns an element from the row in the current partition that is a requested number of rows *before* the current row with 1 assumed as the default offset. The LEAD function returns an element from the row that is in the requested offset *after* the current row. If no explicit offset is specified, it uses a default of 1. If you want a different offset, you specify it as the second argument, e.g. LAG(val,3) If a row doesn't exist in the requested offset, NULL is returned. If you want to return something different, specify it as the third argument, e.g. LAG(val,3,0).

Answer 39

SELECT custid, orderid, orderdate, val, LAG/LEAD(val) OVER (PARTITION BY custid ORDER BY orderdate, orderid) FROM Sales.OrderValues

Answer 40

The FIRST_VALUE and LAST_VALUE functions return a value expression from the first or last rows in the window frame respectively. These functions support both window partition, order, and frame clauses.

Answer 41

SELECT custid, orderid, orderdate, value, FIRST_VALUE(val) OVER (PARTITION BY custid ORDER BY orderdate, orderid ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as first_val

Answer 42

Yes. You need to be explicit and use the ROWS clause.

Answer 43

ROWS BETWEEN *UNBOUNDED PRECEDING* AND CURRENT ROW - you need the first row in the partition.

Answer 44

ROWS BETWEEN CURRENT ROW AND *UNBOUNDED FOLLOWING* - you need the last row in the partition.

Answer 45

Partitioning, ordering, and framing clauses.

Answer 46

The beginning and the end of the partition.

Answer 47

They are supported to operate on the underlying query's result which is achieved when logical query processing gets to the SELECT phase.