SQL - 05: Window functions Flashcards
What is the WHERE clause used for in SQL?
The WHERE clause is used to filter records before any groupings are made. It restricts which rows are included in the result set based on specific conditions.
What is the HAVING clause used for in SQL?
The HAVING clause is used to filter groups after aggregations have been performed. It is used in conjunction with the GROUP BY clause to filter results based on an aggregate condition.
How do WHERE and HAVING clauses differ in their execution?
The WHERE clause filters rows before grouping and aggregation, reducing the number of rows that need to be processed. The HAVING clause filters after grouping and aggregation, applying conditions to the grouped data.
Can you use a HAVING clause without a GROUP BY clause?
Yes, the HAVING clause can be used without a GROUP BY clause to filter rows based on an aggregate function directly. However, this is less common and typically it is used with GROUP BY.
Give an example of a query using both WHERE and HAVING clauses.
SELECT department_id, AVG(salary) AS average_salary
FROM employees
WHERE department_id IS NOT NULL
GROUP BY department_id
HAVING AVG(salary) > 50000;
Why can’t you use column aliases in WHERE clauses but can use them in HAVING clauses?
Column aliases are not recognized in WHERE clauses because WHERE is processed before column aliases have been assigned, which happens during the select statement. HAVING is processed after the select phase, so it recognizes column aliases.
What types of conditions can you specify in HAVING and WHERE clauses?
WHERE clauses can contain conditions on any of the individual table columns, including checks for values, ranges, and patterns. HAVING clauses can contain conditions on aggregate functions like SUM, AVG, COUNT, etc.
How does the presence of WHERE and HAVING clauses affect the performance of SQL queries?
Using WHERE clauses can significantly improve the performance of SQL queries by reducing the number of rows processed in the early stages of query execution. HAVING clauses are applied after aggregation and can also enhance performance by filtering groups but might be less impactful since the data has already been processed.
What is a practical use case for combining WHERE and HAVING clauses?
A practical use case is when you need to analyze specific subsets of data, such as finding departments in certain locations (filtered by WHERE) that have an average salary above a certain threshold (filtered by HAVING).
What is a window function in SQL?
A window function performs a calculation across a set of table rows that are somehow related to the current row. Unlike regular aggregate functions, window functions do not cause rows to become grouped into a single output row — the rows retain their separate identities.
How is a window function different from regular aggregate functions?
Window functions allow you to perform calculations across sets of rows without collapsing these rows into a single output row, whereas aggregate functions collapse rows. Window functions also preserve the detail of each row.
What are some common types of window functions?
Common types include:
ROW_NUMBER(): Assigns a unique number to each row starting from 1.
RANK(): Assigns a rank to each row within a partition of a result set, with gaps in rank values if there are ties.
DENSE_RANK(): Similar to RANK, but without gaps in the ranking values.
SUM(), AVG(), MIN(), MAX(): Perform sum, average, minimum, and maximum calculations over a set of rows, respectively.
What is the syntax for using a window function?
The syntax for using a window function generally looks like this:
SELECT column_name, WINDOW_FUNCTION() OVER (PARTITION BY column_name ORDER BY column_name)
FROM table_name;
The PARTITION BY clause divides the result set into partitions to which the window function is applied. ORDER BY dictates the order within each partition.
What is the purpose of the PARTITION BY clause in window functions?
The PARTITION BY clause is used to divide the result set into partitions where the window function is applied independently. This is similar to the role of GROUP BY in aggregate functions but without collapsing the rows into a single output row.
Can window functions be used in UPDATE statements?
No, window functions cannot be directly used in UPDATE statements. They are typically used in SELECT statements to calculate values over a window of rows.
Give an example of a window function used to calculate running totals.
SELECT order_id, order_date, amount,
SUM(amount) OVER (ORDER BY order_date) AS running_total
FROM orders;
How do you handle ties in ranking functions like RANK and DENSE_RANK?
RANK gives the same rank to tied values but leaves gaps in the ranking for subsequent values. DENSE_RANK also gives the same rank to ties but does not leave gaps, ensuring ranks are consecutive.
What is the LEAD and LAG window functions used for?
LEAD and LAG are used to access data from the next row (LEAD) or the previous row (LAG) in the result set, without using a self-join. They are useful for comparing current row values with values in other rows.
Can window functions be nested within other window functions?
No, window functions cannot be nested within other window functions. Each window function can be used only in the select list or ORDER BY clause of the OVER clause.
What is a window frame in SQL?
A window frame is a subset of rows in a partition of a result set that a window function operates over. It defines the range of rows used to perform the calculation for each row in the partition.
How do you specify a window frame in SQL?
A window frame is specified using the ROWS or RANGE clause within the OVER clause of a window function. For example:
SUM(column_name) OVER (PARTITION BY column_name ORDER BY column_name
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
What is the difference between ROWS and RANGE in defining a window frame?
ROWS specifies a window frame in terms of physical rows within the partition, while RANGE defines the frame in terms of logical ranges of values, which might include rows with equal values. RANGE is often dependent on the ORDER BY clause.
What does ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING mean?
This defines a window frame that includes the row before the current row, the current row itself, and the row after the current row for each row in the partition.