B3-Group and pivot data by using queries Flashcards
What is SELECT - OVER Clause and what might be their arguments and remarks?
Determines the partitioning and ordering of a rowset before the associated window function is applied. That is, the OVER clause defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. You can use the OVER clause with functions to compute aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group results.
Arguments:
- PARTITION BY that divides the query result set into partitions.
- ORDER BY that defines the logical order of the rows within each partition of the result set
- ROWS/RANGE that limits the rows within the partition by specifying start and end points within the partition. It requires ORDER BY argument and the default value is from the start of partition to the current element if the ORDER BY argument is specified. ORDER BY is also required when OVER is paired with NTILE.
What is PIVOT and UNPIVOT?
You can use the PIVOT and UNPIVOT relational operators to change a table-valued expression into another table. PIVOT rotates a table-valued expression by turning the unique values from one column in the expression into multiple columns in the output, and performs aggregations where they are required on any remaining column values that are wanted in the final output. UNPIVOT performs the opposite operation to PIVOT by rotating columns of a table-valued expression into column values.
What is the syntax for PIVOT?
The following is annotated syntax for PIVOT.
SELECT ,
[first pivoted column] AS ,
[second pivoted column] AS ,
…
[last pivoted column] AS
FROM
()
AS
PIVOT
(
()
FOR
[]
IN ( [first pivoted column], [second pivoted column],
… [last pivoted column])
) AS
;
What is DENSE RANK () and what is it’s syntax?
DEF: This function returns the rank of each row within a result set partition, with no gaps in the ranking values. The rank of a specific row is one plus the number of distinct rank values that come before that specific row.
SYNTAX: DENSE_RANK ( ) OVER ( [] < order_by_clause > )
REMARKS: If two or more rows have the same rank value in the same partition, each of those rows will receive the same rank. For example, if the two top salespeople have the same SalesYTD value, they will both have a rank value of one. The salesperson with the next highest SalesYTD will have a rank value of two. This exceeds the number of distinct rows that come before the row in question by one. Therefore, the numbers returned by the DENSE_RANK function do not have gaps, and always have consecutive rank values.
What is NTILE and what is it’s syntax?
DEF:Distributes the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs.
syntax: NTILE (integer_expression) OVER ( [] < order_by_clause > ) ;
Remarks: If the number of rows in a partition is not divisible by integer_expression, this will cause groups of two sizes that differ by one member. Larger groups come before smaller groups in the order specified by the OVER clause. For example if the total number of rows is 53 and the number of groups is five, the first three groups will have 11 rows and the two remaining groups will have 10 rows each. If on the other hand the total number of rows is divisible by the number of groups, the rows will be evenly distributed among the groups. For example, if the total number of rows is 50, and there are five groups, each bucket will contain 10 rows.
NTILE() is a window function;
LAG def, syntax, remarks
def: Accesses data from a previous row in the same result set without the use of a self-join starting with SQL Server 2012 (11.x). LAG provides access to a row at a given physical offset that comes before the current row. Use this analytic function in a SELECT statement to compare values in the current row with values in a previous row.
LAG-allows you to access data in a previous row within the same result set, for a given OFFSET
syntax:
LAG (scalar_expression [,offset] [,default])
OVER ( [partition_by_clause] order_by_clause )
General Remarks:
LAG is nondeterministic.
What is Summarizing Data Using CUBE?
The CUBE operator generates a result set that is a multidimensional cube. A multidimensional cube is an expansion of fact data, or data that records individual events. The expansion is based on columns that the user wants to analyze. These columns are called dimensions. The cube is a result set that contains a cross tabulation of all the possible combinations of the dimensions.
The CUBE operator is specified in the GROUP BY clause of a SELECT statement. The select list contains the dimension columns and aggregate function expressions. The GROUP BY specifies the dimension columns and the keywords WITH CUBE. The result set contains all possible combinations of the values in the dimension columns, together with the aggregate values from the underlying rows that match that combination of dimension values.
What are GROUPING SET, CUBE and ROLLUP in SQL?
CUBE, ROLLUP and GROUPING SET are optional operators of the GROUP BY clause of the SELECT statement for doing reports with large amounts of information. They allow you to do several GROUP BY operations in one statement, potentially saving a lot of time and computational effort. They can provide all the information needed for reporting, including totals, whilst giving good performance over large tables, and helping the Query Optimiser devise a good execution plan.
The extra ‘super-aggregate’ rows provide summary values, thereby allowing you to have several ‘aggregations’ such as SUM() or MAX() within the one result. The NULLs within these rows in the result are intended to mean ‘all’ rather than ‘unknown’. It allows you to get all the aggregations you need in one pass through the table. Because of the presence of extra rows in the results, extra functions GROUPING() and GROUPING_ID() are provided to indicate these extra ‘super-aggregate’ rows, and which columns are being aggregated.
Remarks:
- You cannot use CUBE in a SELECT statement CUBE is part of a GROUP BY clause.
- GROUP BY CUBE shows totals for each grop and grand totals;
- You cannot use ROLLUP in a SELECT statement ROLLUP is part of a GROUP BY clause.
- GROUPING SETS allow you to provide a list of the groups that aggregates need to be performed upon.
- GROUP BY ROLLUP shows grand totals:
SELECT - GROUP BY def, syntax
def: A SELECT statement clause that divides the query result into groups of rows, usually for the purpose of performing one or more aggregations on each group. The SELECT statement returns one row per group.
syntax:
GROUP BY
{ column-expression
| CUBE ( [,…n] )
| GROUPING SETS ( [,…n] ) | () –calculates the grand total
} [,…n]
ROLLUP ( [,…n] )
SELECT - GROUP BY ROLLUP def, syntax
def: Creates a group for each combination of column expressions. In addition, it “rolls up” the results into subtotals and grand totals. To do this, it moves from right to left decreasing the number of column expressions over which it creates groups and the aggregation(s). The column order affects the ROLLUP output and can affect the number of rows in the result set.
syntax: Creates a group for each combination of column expressions. In addition, it “rolls up” the results into subtotals and grand totals. To do this, it moves from right to left decreasing the number of column expressions over which it creates groups and the aggregation(s).
The column order affects the ROLLUP output and can affect the number of rows in the result set.
syntax: GROUP BY ROLLUP
SELECT GROUP BY CUBE, GROUPING SETS ( ) def
GROUP BY CUBE creates groups for all possible combinations of columns. For GROUP BY CUBE (a, b) the results has groups for unique values of (a, b), (NULL, b), (a, NULL), and (NULL, NULL).
GROUP BY GROUPING SETS-The GROUPING SETS option gives you the ability to combine multiple GROUP BY clauses into one GROUP BY clause. The results are the equivalent of UNION ALL of the specified groups.
SELECT - HAVING def, syntax, remarks
def:
Specifies a search condition for a group or an aggregate. HAVING can be used only with the SELECT statement. HAVING is typically used with a GROUP BY clause. When GROUP BY is not used, there is an implicit single, aggregated group.
syntax: [HAVING]
remarks:
- HAVING clause filters grouped aggregate values
Lead def, syntax, remarks
def:
Accesses data from a subsequent row in the same result set without the use of a self-join starting with SQL Server 2012 (11.x). LEAD provides access to a row at a given physical offset that follows the current row. Use this analytic function in a SELECT statement to compare values in the current row with values in a following row.
syntax:
LEAD ( scalar_expression [,offset] , [default] ) OVER ( [partition_by_clause] order_by_clause )
remarks:
The LEAD function is designed to allow the referring of a subsequent row within the group of rows created by using the PARTITION BY clause.