SKILL 2.2: QUERY DATA BY USING TABLE EXPRESSIONS Flashcards
Name T-SQL Table Expressions
- Derived Tables
- common table expressions (CTE)
- Views
- Inline table variables
Table expressions that are visible only to the statement that defines them?
- Derived Tables
2. Common table Expressiosn (CTE)
Which Table expressions are reusable and preserved in the database as an object
- Views
2. Inline table-valued functions
Table expressions- Which operation is not allowed and how can it be circumvented?
Their inner query can’t have an ORDER BY clause, Except when you use OFFSET-FETCH
(Table expressions)- Columns returned by the inner query must have:
names or use aliases, and they need to be unique
Derived Tables - Format
SELECT
*
FROM
(
SELECT
FilmName
,FilmRunTimeMinutes
FROM
tblFilm
) AS MyDerivedTable
WHERE
FilmRunTimeMinutes < 100
Derived tables - drawbacks
When nested they are too complicated,
If you want to join multiple instances of the same derived table you can’t. You have to duplicate code - and that generates errors
Common table Expression - Visibility
It is not visible to the outer query
Common table Expression and Derived table parts
- Inner query
- Name assigned to the query and its columns
- Outer query
Common table expression - Format
WITH
AS
(
)
;
Example:
WITH C AS
(
SELECT
ROW_NUMBER() OVER(PARTITION BY categoryid ORDER BY unitprice, productid) AS rownum, categoryid, productid, productname, unitprice
FROM Production.Products
)
SELECT categoryid, productid, productname, unitprice
FROM C
WHERE rownum <= 2;
Defining multiple CTEs
You don’t nest CTEs, if you need multiple CTEs you simply separate them by commas.
Each can refer to the previously defined CTEs and the outer query can refer to all of them
Defining multiple CTEs - Format
WITH C1 AS
(
SELECT …
FROM T1
WHERE …
),
C2 AS
(
SELECT
FROM C1
WHERE …
)
SELECT …
FROM C2
WHERE …;
Recursive CTE - Format
WITH EmpsCTE AS
(
SELECT empid, mgrid, firstname, lastname, 0 AS distance
FROM HR.Employees
WHERE empid = 9
UNION ALL
SELECT M.empid, M.mgrid, M.firstname, M.lastname, S.distance + 1 AS distance
FROM EmpsCTE AS S
JOIN HR.Employees AS M ON S.mgrid = M.empid
)
SELECT empid, mgrid, firstname, lastname, distance
FROM EmpsCTE;
Which table expressions are reusable?
Views and Inlined table-valued functions. They are created as objects in the database and thus can be reused.
You can also control access by assigning permissions
Difference between Views and inline table-valued functions?
Views don’t accept input parameters while inline table-valued functions do
View - format
CREATE VIEW Sales.RankedProducts
AS
SELECT
ROW_NUMBER() OVER(PARTITION BY categoryid
ORDER BY unitprice, productid) AS rownum,
categoryid, productid, productname, unitprice
FROM Production.Products;
GO
Inline table-valued function - Format
CREATE FUNCTION HR.GetManagers(@empid AS INT) RETURNS TABLE
AS
RETURN
WITH EmpsCTE AS
(
SELECT empid, mgrid, firstname, lastname, 0 AS distance FROM HR.Employees WHERE empid = @empid UNION ALL SELECT M.empid, M.mgrid, M.firstname, M.lastname, S.distance + 1 AS distance FROM EmpsCTE AS S JOIN HR.Employees AS M ON S.mgrid = M.empid
)
SELECT empid, mgrid, firstname, lastname, distance
FROM EmpsCTE;
GO
With GROUPING SETS clause you can…
list all grouping sets that you want to define in the query
GROUPING SETS - Format
SELECT shipperid, YEAR(shippeddate) AS shipyear, COUNT(*) AS numorders
FROM Sales.Orders
WHERE shippeddate IS NOT NULL – exclude unshipped orders
GROUP BY GROUPING SETS
(
( shipperid, YEAR(shippeddate) ),
( shipperid ),
( YEAR(shippeddate) ),
( )
);
CUBE
Same as GROUP BY GROUPING SETS with all the combinations
ROLLUP
You use it when there is a natural hierarchy. Used when other combinations are not interesting
and to avoid unnecessary computation
ROLLUP - format
SELECT shipcountry, shipregion, shipcity, COUNT(*) AS numorders
FROM Sales.Orders
GROUP BY ROLLUP( shipcountry, shipregion, shipcity );
–Same as:
GROUPING SETS
( shipcountry, shipregion, shipcity )
( shipcountry, shipregion )
( shipcountry )
( )
GROUPING -function in GROUP BY
returns 0 when the element is a part of the grouping set and 1 when it’s not
GROUPING_ID
Accepts the list of grouped columns as inputs and returns an integer representing a bitmap:
grp_id shipcountry shipregion shipcity numorders
0 Argentina NULL Buenos Aires 16
1 Argentina NULL NULL 16
3 Argentina NULL NULL 16
…
0 USA AK Anchorage 10
1 USA AK NULL 10
0 USA CA San Francisco 4
1 USA CA NULL 4
0 USA ID Boise 31
1 USA ID NULL 31
…
3 USA NULL NULL 122
…
7 NULL NULL NULL 830
Pivoting data
Specialized case of grouping and aggregating data. It transitions it from a state of rows to a state of columns
3 elements necessary for PIVOT queries
- What do you want to see on rows? - the “on rows” or grouping element
- What do you want to see on columns? - the “on cols” or spreading element
- What do you want to see in the intersection of each distinct row and column value - the “data” or aggregation element
PIVOT- Form
WITH PivotData AS
(
SELECT
< grouping column >, < spreading column >, < aggregation column >
FROM < source table >
)
SELECT < select list >
FROM PivotData
PIVOT( < aggregate function >(< aggregation column >)
FOR < spreading column > IN (< distinct spreading values >) ) AS P;
PIVOT limitations
- the COUNT(*) isn’t allowed, however COUNT(colname) is allowed
- PIVOT operator is limited to using only one aggregate function
- The IN clause of the PIVOT operator accepts only a static list of spreading values. It doesn’t support a subquery as input
UNPIVOT
Always takes a set of source columns and rotates those to multiple rows, generating two target columns, one to hold source column values and one to hold the source column names
UNPIVOT - Format
SELECT < column list >, < names column >, < values column >
FROM < source table >
UNPIVOT( < values column > FOR < names column > IN( ) ) AS U;
SELECT custid, shipperid, freight
FROM Sales.FreightTotals
UNPIVOT( freight FOR shipperid IN([1],[2],[3]) ) AS U;
Window aggregate functions
are applied to a window of rows defiend by the OVER clause
Window aggregate functions - use case when they are faster than joins
They better perform computations such as running totals
Window aggregate functions - format
SELECT custid, orderid, val,
SUM(val) OVER(PARTITION BY custid) AS custtotal,
SUM(val) OVER() AS grandtotal
FROM Sales.OrderValues;
Window aggregate functions - running totals - format
SELECT custid, orderid, orderdate, val,
SUM(val) OVER(PARTITION BY custid
ORDER BY orderdate, orderid ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS runningtotal
FROM Sales.OrderValues;
UNBOUNDED PRECEDING and FOLLOWING
UNBOUNDED PRECEDING - start of the partition FOLLOWING - end of the partition
ROWS PRECEDING or FOLLOWING
meaning n rows before or after the current, respectively.
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW can be changed for a less verbose form
ROWS UNBOUNDED PRECEDING
Window functions are only allowed in which query clauses (the last evaluated clauses)
SELECT and ORDER BY. If you need to refer to the result of a window function in any clause that is before the SELECT statement you need to use a Table Expression
How to include only the last three rows in a window function?
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
Window Ranking Functions
rank rows within a partition base on a specified ordering
What is mandatory in a Window ranking function
The ORDER BY clause is mandatory
TSQL Window ranking functions
ROW_NUMBER, RANK, DENSE_RANK, NTILE
ROW_NUMBER
unique incrementing integers starting with 1 within the partition based on the window ordering.
When there are ties the ordering is non-deterministic, so use unique ORDER BY
Difference between RANK and DENSE_RANK
RANK increments the numbers but the values which are equal have the same rank but the following ranks “skip” the numbers.(1,2,2,4)
DENSE_RANK has the same behavior but without skipping ranks (1,2,2,3)
Are Window ranking functions deterministic or non-deterministic?
With all window ranking functions, two different rows can get two different rank values, and therefore are considered nondeterministic.
NTILE function
Arranges rows within the partition in a requested number of equally sized tiles, based on specific ordering
Window Offset Functions - list
LAG, LEAD, FIRST_VALUE, LAST_VALUE