3_Advanced Queries Flashcards
Window Functions
The window functions are SQL functions performing calculations over the defined set of rows (a window). Compared to the aggregate functions, which return a single value as a result, the window functions allow you to add the aggregated value to each row in a separate column. This means the rows are not grouped and all the rows are kept as a query result.
RANK /DENSE_RANK /ROW_NUMBER: These assign a rank to each row by ordering specific columns. If any partition columns are given, rows are ranked within a partition group that it belongs to.
NOTE: When using ROW_NUMBER, each row will have a unique rank number and ranks for tied records are assigned randomly. With RANK/DENSE_RANK, tied records will have the same rank.
LAG /LEAD: It retrieves column values from a preceding or following row based on a specified order and partition group.
Window Functions - Example
Find the top 3 employees who have the highest salary in each department.
When an SQL question asks for “TOP N”, we can use either ORDER BY or ranking functions to answer the question. However, in this example, it asks to calculate “TOP N X in each Y”, which is a strong hint that we should use ranking functions because we need to rank rows within each partition group.
The query below finds exactly 3 highest-payed employees regardless of ties:
LAG/LEAD
LAG() and LEAD() are positional functions. A positional function is a type of window function.
LAG
The LAG() function allows access to a value stored in a different row above the current row. The row above may be adjacent or some number of rows above, as sorted by a specified column or set of columns.
Syntax: LAG(expression [,offset[,default_value]]) OVER(ORDER BY columns)
Only the first argument (column or expression) is required.
LEAD
LEAD() is similar to LAG(). Whereas LAG() accesses a value stored in a row above, LEAD() accesses a value stored in a row below.
Subqueries & Common Table Expressions (CTEs)
Subquery
A subquery is a query found within the query. It can occur in a SELECT clause, FROM clause, or WHERE clause.
CTE
A CTE or a Common Table Expression is a temporary result set returned by a query and used by another query. In that way, it’s similar to subquery. But the main difference is CTE can be named and can reference itself.
NULL
In SQL, any predicates can result in one of the three values: true, false, and NULL, a reserved keyword for unknown or missing data values.
NULL is different from ‘’ (empty string) and ‘NULL’, which are both known values and are treated differently, they can be compared to other empty string or blank spaces. NULL cannot be compared to another NULL.
Utilize functions such as:
- A IS (NOT) NULL: returns true/false depending on whether A is NULL or not.
- NVL(A,B): returns B if A is null, otherwise returns A.
- COALESCE(C1, C2,…): returns the first C that is not NULL. If all C’s are null, it returns NULL.
- NULLIF(A,B): returns NULL if A = B, otherwise returns A.
NULL - AND
A AND B is true if only if both A and B are true, and is false if either A or B is false. But what if A and B columns contain NULL?
- When A is NULL and B is true, A AND B is evaluated to NULL. Since B is true and A is unknown, A AND B completely depends on the value of A. Thus, A AND B is also unknown.
- When A is NULL and B is false, A AND B is evaluated to false. Since we already know B is false, A AND B must be false regardless of the value of A.
- When A and B are both NULL, A AND B is evaluated to NULL as we have no information on neither A or B.
NULL - OR
A OR B is true when either A or B is true. It is false when both A and B are false.
- A is NULL and B is True -> A OR B is True
- A is NULL and B is False -> A OR B is NULL
- A is NULL and B is NULL -> A OR B is NULL
NULL - NOT
NOT A is evaluated to true when A is false, and to false when A is true. When A is NULL (again, meaning the value of A is unknown), we of course don’t know the logical opposite of A, so NOT A is still NULL.