Query Data with Advanced T-SQL Components Flashcards
What are 4 forms of table expressions?
Derived tables, common table expressions (CTEs), views, and inline table-valued functions
Because a table expression is supposed to represent a relation, the inner query defining it needs to be relational. What are 3 requirements to satisfy this?
- All columns returned by the inner query must have names
- All column names must be unique
- The inner query is not allowed to have an ORDER BY clause
Normally the inner query of a table expression is not allowed to have an ORDER BY clause. What is the exception?
The exception is if you use the TOP or OFFSET-FETCH option in the inner query. In this case, the ORDER BY clause serves a meaning that is not related to presentation ordering, rather it’s part of the filter’s specification
When should you use a temporary table/table variable over a table expression?
When you need to persist the result of a query for further processing.
Is the result set of a view stored in the database?
No, only the view’s definition is stored in the database
Syntax to create a view
CREATE VIEW viewname
AS
SELECT col1, col2 FROM table1;
What is the main difference between VIEWS and inline table-valued functions?
Views do not accept input parameters but inline table-valued functions do.
Syntax to create an inline table-valued function
CREATE FUNCTION fxname(@parameter datatype)
RETURNS TABLE
AS
RETURN
SELECT col1, col2
FROM table1
WHERE parameter = @parameter
How do you retrieve data from an inline table-valued function? (Syntax)
SELECT alias.col1, alias.col2, alias.col3
FROM schema.fxname(parameter) as alias
Notes:
- Always use 2-part function name (schema and function)
- Always alias the function
True or False. Derived tables:
- Must have an alias
- Can be referred to multiple times in the same query
- Can use internal or external aliases for columns
- Can refer to parameters or variables
- Cannot be nested within other derived tables
- Must have an alias - True
- Can be referred to multiple times in the same query - False
- Can use internal or external aliases for columns - True
- Can refer to parameters or variables - True
- Cannot be nested within other derived tables - False
In which clause do you use derived tables?
In the FROM clause
Common table expression (CTE) syntax
WITH ctealias AS
( innerquery )
outerquery
Syntax for multiple CTEs
When can you refer to a another CTE within a CTE?
WITH C1 AS
( SELECT .. FROM T1 WHERE… ),
C2 AS
( SELECT … FROM C1 WHERE …)
outerquery;
Each CTE can refer to a previously defined CTE
The following is an example of a recursive CTE. Explain what it’s doing.
WITH EmpsCTE AS
( SELECT empid, mgrid, firstname, lastname
FROM HR.Employees
WHERE empid = 9
UNION ALL
SELECT M.empid, M.mgrid, M.firstname, M.lastname
FROM EmpsCTE AS S
JOIN HR.Employees AS M
ON S.mgrid = M.empid )
SELECT empid, mgrid, firstname, lastname
FROM EmpsCTE;
The first query is the “anchor” member and returns a valid relational result. The anchor member is invoked only once. The second query acts as the “recursive” member. It has to have a reference to the CTE name. It is invoked repeatedly until it returns an empty result set. Think of it as if the results of the second query are added to the first in each repetition. Here, in each round, the recursive member joins the previous result set with the HR.Employees table to return the direct manager of the employee from the previous round. The recursive query stops when it doesn’t find a manager.
What does APPLY do?
Difference between CROSS APPLY and OUTER APPLY
APPLY evaluates the left input first, and for each of the left rows, it applies the table expression provided as the right input. The right input can be a derived table query or table function. The left input provides info to the right input.
CROSS APPLY does not return the left row if the right table expression returns an empty set, but OUTER APPLY does
Syntax for window functions
Fx OVER(PARTITION BY col1
ORDER BY col2
ROWS BETWEEN UNBOUNDED PRECEDING//FOLLOWING AND CURRENT ROW)
Note: instead of UNBOUNDED can also have n rows PRECEDING/FOLLOWING
In which clauses can you use window functions?
SELECT and ORDER BY
What are 4 window ranking functions and what do they do?
ROW_NUMBER: creates unique incrementing integers starting with 1 within the window partition based on the window ordering
RANK: rank values based on window ordering. If two values tie, they’ll receive same rank, e.g. 2. Then it’ll skip a value for the next rank, e.g. 1, 2, 2, 4
DENSE RANK: same as RANK but doesn’t have gaps
NTILE: Can arrange rows within partition in a requested number of equally sized tiles
What are 4 window offset functions and how do they work?
LAG(col, numrows): returns an element from the row in the current partition that is a requested number of rows before the current row.
LEAD(col, numrows): returns an element from the row in the current partition that is a requested number of rows after the current row.
FIRST_VALUE(col): returns a value expression from the first row in the window frame
LAST_VALUE(col): returns a value expression from the last row in the window frame
What 3 elements do you need to pivot data?
- What do you want to see on rows? i.e. on rows or grouping element
- What do you want to see on columns? i.e. on cols or spreading element
- What do you want to see in the intersection of each distinct row and column value? i.e. data or aggregation element.
Syntax to pivot data
WITH PivotData AS
( SELECT groupingcol, spreadingcol, aggregationcol
FROM tablename )
SELECT (select list)
FROM PivotData
PIVOT( aggregatefx(aggregation col)
FOR spreadingcol IN (distinct spreading values)) AS alias.
Notes:
- You should define a table expression that returns the three elements involved in pivoting.
- The PIVOT operator returns a table result. Remember to alias it.
In some cases you might get NULLs in the pivoted table. What should you do if you want to return something else instead of a NULL in those cases?
Use the ISNULL or COALESCE function in the outer query’s select list. e.g.
WITH PivotData AS
( SELECT custid, shipperid, freight
FROM Sales.Orders )
SELECT custid,
ISNULL([1], 0.00) as [1],
ISNULL([2], 0.00) as [2],
ISNULL([3], 0.00) as [3]
FROM PivotData
PIVOT(SUM(freight) FOR shipperid IN ([1], [2], [3])) as P;
Why should you define a table expression that returns the 3 elements involved in pivoting instead of querying the underlying table directly?
Note that you don’t specify the grouping element in the PIVOT operator. It’s determined by elimination. So, if the table has more than the 3 cols, all extra cols will be considered a grouping element
What are 4 limitations of the PIVOT operator?
- The aggregation and spreading elements cannot directly be results of expressions; instead, they must be column names from the queried table. You can, however, apply expressions in the query defining the table expression, assign aliases to those expressions, and then use the aliases in the PIVOT operator
- The COUNT(*) function isn’t allowed as the aggregate function. You need to use COUNT(col).
- Limited to using only one aggregate function
- The IN clause of the PIVOT operator accepts a static list of spreading values. It doesn’t support a subquery as input.
Syntax to unpivot data
SELECT col1, namescol, valuescol
FROM sourcetable
UNPIVOT(valuescol FOR namescol IN (name1, name2)) as alias
Notes:
- The namescol refers to the columns that you’re trying to get back into rows.
- valuescol are the values in the table.
- You need to come up with your own labels for the namescol and valuescol
True or False. UNPIVOT filters out rows with NULLs
True
GROUPING SETS syntax
SELECT col1, col2, COUNT(*) AS numorders
FROM Sales.Orders
GROUP BY GROUPING SETS ((col1, col2), (col1), (col2), ());
Note: the empty grouping sets aggregates all rows.
What’s the difference between the CUBE and ROLLUP clauses?
CUBE defines all possible grouping sets from the inputs, e.g.
SELECT col1, col2, COUNT(*) as numorders
FROM Sales.Orders
GROUP BY CUBE (col1, col2);
It will define four grouping sets: (col1, col2), (col1), (col2), ()
The ROLLUP clause forms grouping sets based on a hierarchy, e.g.
SELECT shipcountry, shipregion, shipcity, COUNT(*) AS numorders
FROM Sales.Orders
GROUP BY ROLLUP(shipcountry, shipregion, shipcity);
It will define four grouping sets:
(shipcountry, shipregion, shipcity), (shipcountry, shipregion), (shipcountry) ()
When is GROUPING function useful? What does it do? How is GROUPING_ID different?
The GROUPING function is useful when you’re doing multiple grouping sets and need to figure out whether a NULL means it was aggregated over that column or it there was a NULL in the data. The GROUPING function returns 1 when the element comes from an aggregate and 0 when it doesn’t. For example, GROUPING(col1) will return a 1 when the aggregate is done over the whole column and a 0 when the aggregate is done within the categories in col1.
GROUPING_ID returns an integer representing a bitmap. The position of each column determines the value, e.g. for 4 columns: 8 4 2 1. If you had 3 cols and GROUPING would return 1 1 1, GROUPING_ID would return 1+2+4 = 7. If GROUPING would return 0 1 1, GROUPING_ID would return 1+2+0 = 3.
What are 7 elements required to mark a table as temporal? Syntax inside CREATE TABLE statement particular to temporal table
- A primary key constraint
- Two DATETIME2 columns to store the start and end of the validity period of the row
- The start column needs to be marked with the clause GENERATED ALWAYS AS ROW START.
- The end column needs to be marked with the clause GENERATED ALWAYS AS ROW END.
- The designation of the pair of columns that store the row’s validity period with the clause PERIOD FOR SYSTEM_TIME (startcol, endcol)
- The table option SYSTEM_VERSIONING needs to be set to ON
- A linked history table, which SQL Server can create for you
CREATE TABLE tablename
( col1 datatype NOT NULL CONSTRAINT PK_tablename PRIMARY KEY(col1),
…more cols..
validfrom DATETIME2(3) GENERATED ALWAYS AS ROW START HIDDEN NOT NULL,
validto DATETIME2(3) GENERATED ALWAYS AS ROW END HIDDEN NOT NULL,
PERIOD FOR SYSTEM_TIME (validfrom, validto) )
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = histtablename));
When you modify a temporal table, in what time zone will change times be recorded? If you apply multiple changes in a transaction, will each change have a different change time? When you insert rows into a temporal table, SQL Server sets the start column to the transactions’ start time and the end time to _________________.
UTC time zone If you apply multiple changes in a transaction, the transaction start time is considered the effective change time for all changes in the transaction. When you insert rows into a temporal table, SQL Server sets the end time to the maximum possible point in time in the data type, e.g. 9999-12-31 23:59:59.999.
When you insert data in an empty temporal table, what happens in the history table?
Nothing. It’s still empty.
What happens in the history table when you delete rows from a temporal table?
The deleted row appears in the history table with the end column set to the start time of the transaction that deleted the row.
An update is handled as a delete plus insert. Explain what this means in terms of what happens in the temporal and history tables.
The temporal table will have the new state of the modified rows with the start column set to the change time, and the history table will have the old state of the modified rows with the end column set to the change time.
What will be the validfrom and validto times in the history table when you update a row in the temporal table several times within the same transaction?
The in-between changes will have a degenerate interval as the validity period where the validfrom value will be equal to the validto value (which will be the start time of the transaction).
Syntax to query a temporal table using the FOR SYSTEM_TIME AS OF syntax
What will the results of querying the temporal table look like?
SELECT col1, col2
FROM tablename FOR SYSTEM_TIME AS OF ‘20151101 14:06:00.000’
This will give you the state of the table at the time specified.
What will happen to rows with degenerate intervals as the validity period when you query the temporal table using the FOR SYSTEM_TIME clause?
SQL Server will discard those rows
Syntax to query a temporal table using the FOR SYSTEM_TIME FROM syntax and explain how it works
SELECT col1, col2
FROM tablename FOR SYSTEM_TIME FROM ‘startdatetime’ TO ‘enddatetime’;
You get all rows that have a validity period that intersects with the input period, exclusive of the two input delimiters. You get rows with a validfrom datetime before the end datetime input and a validto datetime after the start datetime input.
Syntax to query a temporal table using the FOR SYSTEM_TIME BETWEEN syntax and explain how it works
SELECT col2, col2
FROM tablename FOR SYSTEM_TIME BETWEEN ‘startdatetime’ AND ‘enddatetime’;
You get rows with a validfrom datetime before or equal to the end datetime input and a validto datetime after the start datetime input.