CHAPTER 6: Building on Subqueries, Common Table Expressions, and Unions Flashcards
Q: What is a subquery in T-SQL?
A: A subquery is a nested query—a query within another query—that can return results for use in the main query.
Q: Where can subqueries be used in a SQL query?
A: Subqueries can be used in the SELECT, FROM, and WHERE clauses, or as part of other expressions.
Q: How is a subquery used in an IN list?
A: A subquery can replace a hard-coded list in the WHERE clause using the syntax:
SELECT <columns>
FROM <table>
WHERE <column> IN (SELECT <column> FROM <table>);</column></column></columns>
Q: What is the difference between joining tables and using a subquery in the WHERE clause?
A: A join includes columns from both tables in the result, while a subquery returns a filtered list of values for comparison in the main query.
Q: How do you use a subquery to find rows not in another table?
A: Add NOT before IN in the query:
SELECT <columns>
FROM <table1>
WHERE <column> NOT IN (SELECT <column> FROM <table2>);</table2></column></column></table1></columns>
Q: What happens if a subquery returns a NULL in a NOT IN clause?
A: If the subquery returns any NULL values, the NOT IN clause will return no rows due to SQL’s three-valued logic (TRUE, FALSE, UNKNOWN).
Q: How can you handle NULL values in a subquery?
A: Add a WHERE condition in the subquery to exclude NULL values:
SELECT <columns>
FROM <table1>
WHERE <column> NOT IN (SELECT <column> FROM <table2> WHERE <column> IS NOT NULL);</column></table2></column></column></table1></columns>
Q: Why is it important to eliminate NULL in subqueries?
A: To ensure accurate results when using operators like NOT IN, which cannot handle NULL values correctly without additional filtering.
Q: When should you use a subquery instead of a join?
A: Use subqueries when you need to filter data without including additional columns from the related table in the result.
Q: Can subqueries return more than one column?
A: Subqueries in the WHERE clause must return only one column, but subqueries in other clauses (e.g., FROM) can return multiple columns.
Q: What is the purpose of the EXISTS keyword in T-SQL?
A: EXISTS checks whether a subquery returns any rows. If rows exist, it evaluates to TRUE for the outer query.
Q: How does the EXISTS clause work in the WHERE statement?
A: It uses a correlated subquery to determine if rows in the subquery match the outer query condition.
SELECT CustomerID, AccountNumber
FROM Sales.Customer AS Cust
WHERE EXISTS (SELECT 1 FROM Sales.SalesOrderHeader AS SOH WHERE SOH.CustomerID = Cust.CustomerID);
Q: What does NOT EXISTS do in T-SQL?
A: It returns TRUE for rows in the outer query if no rows in the subquery meet the condition.
SELECT CustomerID, AccountNumber
FROM Sales.Customer AS Cust
WHERE NOT EXISTS (SELECT 1 FROM Sales.SalesOrderHeader AS SOH WHERE SOH.CustomerID = Cust.CustomerID);
Q: Why can you use SELECT * or SELECT 1 in EXISTS subqueries?
A: The subquery only checks for the presence of rows, not the returned data. Both forms are functionally identical.
Q: What is the purpose of CROSS APPLY?
A: CROSS APPLY joins each row from the left table with the results of a subquery. The subquery runs once for each row in the left table and returns only rows that match.
Q: How does CROSS APPLY differ from a regular join?
A: CROSS APPLY allows you to use a subquery that can reference columns from the outer query, enabling row-by-row computation.
Q: What is OUTER APPLY in T-SQL?
A: OUTER APPLY works like a LEFT OUTER JOIN. It returns all rows from the left table, including rows where the subquery does not return results (those will have NULL values).
Q: Provide an example of using CROSS APPLY.
SELECT CustomerID, AccountNumber, SalesOrderID
FROM Sales.Customer AS Cust
CROSS APPLY (SELECT SalesOrderID FROM Sales.SalesOrderHeader AS SOH WHERE Cust.CustomerID = SOH.CustomerID) AS A;
Q: Provide an example of using OUTER APPLY.
SELECT CustomerID, AccountNumber, SalesOrderID
FROM Sales.Customer AS Cust
OUTER APPLY (SELECT SalesOrderID FROM Sales.SalesOrderHeader AS SOH WHERE Cust.CustomerID = SOH.CustomerID) AS A;
Q: When should you use EXISTS over a join or subquery in the WHERE clause?
A: Use EXISTS when you only need to verify the existence of matching rows in a subquery without returning actual data from it.
Q: What is the main difference between CROSS APPLY and OUTER APPLY?
A: CROSS APPLY behaves like an INNER JOIN, only returning rows where the subquery produces results, whereas OUTER APPLY behaves like a LEFT OUTER JOIN, returning all rows from the left table.
Q: Why might EXISTS perform better than other techniques?
A: EXISTS stops evaluating as soon as it finds a match, making it efficient for large datasets with selective filtering.
Q: What is the purpose of the UNION operator in T-SQL?
A: UNION combines the results of two or more queries into a single result set, removing duplicate rows by default.
Q: What is the difference between UNION and UNION ALL?
A: UNION removes duplicate rows, while UNION ALL includes all rows, even duplicates, which improves performance if deduplication is unnecessary.
Q: What must match between the queries in a UNION statement?
A: Each query must return the same number of columns, and the columns must have compatible data types.
Q: Provide the syntax for a simple UNION query.
SELECT col1, col2 FROM table1
UNION
SELECT col1, col2 FROM table2;
Q: Why should you use UNION ALL when possible?
A: UNION ALL avoids the overhead of removing duplicates, which improves performance, especially with large datasets.
Q: Where must the ORDER BY clause be placed in a UNION query?
A: The ORDER BY clause must be placed at the end of the entire UNION query, not after individual queries.
Example:
SELECT col1 FROM table1
UNION
SELECT col1 FROM table2
ORDER BY col1;
Q: How does SQL Server determine column names in a UNION query?
A: The column names and data types are determined by the first query in the UNION.
Q: What happens if the columns or data types don’t match between queries in a UNION?
A: SQL Server will throw an error if the column counts differ or if incompatible data types cannot be implicitly converted.
Q: Provide an example of a UNION query with duplicate elimination.
SELECT CustomerID FROM Sales.Customers
UNION
SELECT CustomerID FROM Sales.ArchivedCustomers;
Q: Provide an example of a UNION ALL query to combine results without deduplication.
SELECT ProductID, Name FROM Products_Current
UNION ALL
SELECT ProductID, Name FROM Products_Discontinued;
Q: What error will occur if the number of columns doesn’t match in a UNION query?
A: SQL Server will throw an error indicating a mismatch in the number of columns.
Q: Why is it important to use compatible data types in UNION queries?
A: SQL Server follows precedence rules, and incompatible types (e.g., mixing strings and integers) can cause errors or unexpected implicit conversions.
Q: What is a good use case for a UNION query?
A: Combining production and archived data or displaying data from different divisions of a company as a unified result.
Q: Provide an example of a UNION query that uses the same table with different WHERE clauses.
SELECT CustomerID, OrderDate FROM Orders WHERE Year(OrderDate) = 2023
UNION
SELECT CustomerID, OrderDate FROM Orders WHERE Year(OrderDate) = 2022;
Q: What is the purpose of the EXCEPT operator in T-SQL?
A: EXCEPT returns rows from the first query that are not present in the second query, removing duplicates by default.
Q: Provide an example of an EXCEPT query.
SELECT BusinessEntityID
FROM HumanResources.Employee
EXCEPT
SELECT BusinessEntityID
FROM Person.Person;
Q: How does EXCEPT handle duplicates?
A: EXCEPT removes duplicates from the results unless explicitly overridden by using additional query constructs.
Q: When should you use EXCEPT in queries?
A: Use EXCEPT when you need to find rows in one query that do not exist in another query, often for identifying mismatches or gaps.
Q: What rules must be followed when using EXCEPT?
A: Both queries must return the same number of columns, and their data types must be compatible.
Q: What does the INTERSECT operator do in T-SQL?
A: INTERSECT returns rows that exist in both queries, removing duplicates by default.
Q: Provide an example of an INTERSECT query.
SELECT BusinessEntityID
FROM HumanResources.Employee
INTERSECT
SELECT BusinessEntityID
FROM Person.Person;
Q: How does INTERSECT handle duplicate rows?
A: INTERSECT automatically removes duplicate rows, ensuring only unique common rows are returned.
Q: When is it appropriate to use INTERSECT?
A: Use INTERSECT when you want to find common rows between two queries, such as overlapping records between two datasets.
Q: What are the requirements for columns and data types in an INTERSECT query?
A: The columns in both queries must match in number and have compatible data types.
Q: What is the difference between EXCEPT and INTERSECT?
A: EXCEPT returns rows from the first query that are not in the second query, while INTERSECT returns rows that are present in both queries.
Q: Can EXCEPT or INTERSECT include an ORDER BY clause?
A: Yes, but the ORDER BY clause must be placed after the entire EXCEPT or INTERSECT operation, not within the individual queries.
Q: How can EXCEPT be used for data quality checks?
A: EXCEPT can identify rows in one table that are missing in another, such as detecting missing foreign key references.
Q: How can INTERSECT assist in verifying data consistency?
A: INTERSECT can be used to find rows that are common across datasets, ensuring data alignment between two sources.
Q: What is a derived table in T-SQL?
A: A derived table is a subquery that appears in the FROM clause of a query and is treated like a temporary table.
Q: Why use a derived table?
A: Derived tables isolate part of a query’s logic, making the query modular and easier to understand and manage.
Q: What is the syntax for using a derived table?
SELECT <columns>
FROM (SELECT <columns> FROM <table>) AS <alias>;</alias></columns></columns>
Q: Must a derived table be aliased?
A: Yes, a derived table always requires an alias to reference its columns in the outer query.
Q: Can a derived table contain multiple tables or a WHERE clause?
A: Yes, a derived table can include joins, multiple tables, and a WHERE clause.
Q: Can a derived table include an ORDER BY clause?
A: Only if the TOP keyword is used.
Q: Can a derived table reference columns not included in its SELECT list?
A: No, all columns needed for joins or output in the outer query must be explicitly included in the derived table’s SELECT list.
Q: Provide an example of a derived table in an INNER JOIN.
SELECT c.CustomerID, s.SalesOrderID
FROM Sales.Customer AS c
INNER JOIN (
SELECT SalesOrderID, CustomerID
FROM Sales.SalesOrderHeader
) AS s ON c.CustomerID = s.CustomerID;
Q: How can derived tables be combined with joins?
A: Derived tables can be used with any type of join, such as INNER JOIN, LEFT JOIN, CROSS JOIN, or FULL JOIN.
Q: Can derived tables reference a CTE?
A: Yes, a derived table can reference a CTE defined earlier in the query.
Q: Can a derived table contain another derived table?
A: Yes, derived tables can be nested within other derived tables for complex queries.
Q: What is one limitation of derived tables compared to other T-SQL techniques?
A: Derived tables cannot define their own CTEs within their scope.
Q: When is it beneficial to use a derived table?
A: Use derived tables to encapsulate logic, simplify joins, or perform intermediate calculations.
Q: What happens if a required column is omitted from the derived table’s SELECT list?
A: The query will fail because the outer query cannot access columns that are not explicitly selected in the derived table.
Q: What is a Common Table Expression (CTE)?
A: A CTE is a temporary result set defined within a query that exists only for the duration of the query.
Q: How does a CTE differ from a derived table?
A: Unlike a derived table, a CTE is defined separately using the WITH keyword and can be referenced multiple times in the main query.
Q: What happens to a CTE after the main query runs?
A: A CTE goes out of scope and cannot be reused once the query has completed execution.
Q: What is the basic syntax for defining a CTE?
WITH <CTE> AS (SELECT <columns> FROM <table>)
SELECT <columns> FROM <CTE>;</CTE></columns></columns></CTE>
Q: How do you define column aliases in a CTE?
A: Specify the aliases after the CTE name:
WITH CTE_Name ([Column1], [Column2]) AS (
SELECT Column1, Column2 FROM Table
)
SELECT * FROM CTE_Name;
Q: What is the required placement of the WITH keyword in a query?
A: The WITH keyword must be the first statement in the batch or be preceded by a semicolon if other statements appear before it.
Q: Can a CTE include joins, WHERE clauses, or expressions?
A: Yes, a CTE can include joins, filters, expressions, and even reference other tables or views.
Q: Provide an example of a CTE used for an INNER JOIN.
WITH Orders AS (
SELECT SalesOrderID, CustomerID, TotalDue + Freight AS Total
FROM Sales.SalesOrderHeader
)
SELECT c.CustomerID, o.SalesOrderID, o.Total
FROM Sales.Customer AS c
INNER JOIN Orders AS o ON c.CustomerID = o.CustomerID;
Q: What happens if a column alias is not defined in a CTE?
A: The column name from the CTE query is used, but defining aliases upfront can improve readability, especially for expressions.
Q: Can a CTE be reused across multiple queries?
A: No, a CTE is valid only for the query in which it is defined.
Q: Can a CTE reference another CTE?
A: Yes, one CTE can reference another within the same WITH clause.
Q: Is it possible to use ORDER BY in a CTE?
A: Yes, but only when the TOP keyword is used.
Q: How do you ensure the CTE is correctly referenced in complex queries?
A: Always use descriptive aliases and verify that required columns are included in the CTE’s SELECT list.
Q: What advanced capabilities make CTEs superior to derived tables in some scenarios?
A: CTEs can be recursive, allowing them to solve hierarchical and iterative problems.
Q: What is one major use case for CTEs in advanced queries?
A: CTEs are commonly used for breaking down complex logic into manageable parts or simplifying nested queries.
Q: What is the main difference between UNION and UNION ALL?
A: UNION removes duplicate rows, while UNION ALL includes all rows, including duplicates.
Q: Why does UNION perform worse than UNION ALL?
A: UNION uses additional resources to eliminate duplicates, often employing a Hash Match operation to aggregate results.
Q: When should you use UNION ALL instead of UNION?
A: Use UNION ALL if you are certain there are no duplicates or duplicates are acceptable in the results.
Q: How does SQL Server process duplicate elimination for UNION?
A: It uses the Hash Match Aggregate operator, which can be resource-intensive for large datasets.
Q: Compare the estimated costs for UNION and UNION ALL from the example.
UNION query cost: 2.62912
UNION ALL query cost: 0.806502
UNION took ~77% of the resources in the batch.
Q: How can you view the performance difference between UNION and UNION ALL?
A: Enable Include Actual Execution Plan in SQL Server Management Studio or use Explain in Azure Data Studio.
Q: Why doesn’t wrapping a UNION ALL in a DISTINCT query improve performance?
A: The optimizer processes it like a UNION query internally, performing the same duplicate elimination steps.
Q: When should you use DISTINCT with UNION ALL?
A: Use DISTINCT in individual queries if duplicates exist within the individual datasets but not across them.
Q: Can using UNION ALL in a CTE or derived table help optimize performance?
A: No, unless the underlying queries ensure no duplicates. Otherwise, using DISTINCT or UNION negates performance gains.
Q: What is a best practice for deciding between UNION and UNION ALL?
A: Always prefer UNION ALL if you don’t need to eliminate duplicates, as it improves performance by avoiding unnecessary operations.
Q: What is a common operation used by SQL Server to remove duplicates in a UNION query?
A: The Hash Match Aggregate operation.
Q: How can you combine improved performance and duplicate elimination if necessary?
A: Use DISTINCT within individual queries and combine them with UNION ALL.