SQL - 04 Flashcards
What is a self join in the context of SQL?
A self join is a type of SQL join where a table is joined with itself. It’s used for comparing rows within the same table to each other.
When would you typically use a self join?
A self join is used when you need to compare or contrast data within the same table, such as finding pairs of rows that meet certain criteria.
How do you differentiate between the two instances of the same table in a self join?
In a self join, you use aliases to differentiate between the two instances of the same table.
Can you give an example of a situation where a self join would be useful?
A self join is useful in situations like finding employees who work in the same department, or products in the same category with similar characteristics.
What is a correlated subquery in SQL?
A correlated subquery is a subquery that refers to columns from a table in the outer query, making the subquery dependent on the outer query. It is executed repeatedly, once for each row that might be selected by the main query.
How does a correlated subquery differ from a non-correlated subquery?
Unlike a non-correlated subquery, which runs once and returns a value or set that can be used by the outer query, a correlated subquery must be re-evaluated for each row processed by the outer query because it depends on data from the outer query.
In what SQL clauses can correlated subqueries appear?
Correlated subqueries can appear in various SQL clauses, including SELECT, WHERE, and HAVING clauses.
Can you give an example of a correlated subquery used in a WHERE clause?
Yes, for instance, to find employees whose salary is greater than the average salary in their respective departments:
SELECT employee_name, salary
FROM employees e1
WHERE salary > (
SELECT AVG(salary)
FROM employees e2
WHERE e1.department_id = e2.department_id
)
What is the performance impact of using correlated subqueries?
Correlated subqueries can negatively impact performance because the subquery may be executed many times; once for each row evaluated by the outer query. Optimization or alternative querying methods may be needed for large datasets.
What is a common use case for correlated subqueries?
A common use case is to determine if a record exists in a database that meets specific conditions relative to each row processed by the outer query, such as checking each department for employees earning above a certain threshold.
How do correlated subqueries interact with the EXISTS operator?
Correlated subqueries are often used with the EXISTS operator to check for the existence of rows that meet certain conditions. EXISTS returns true as soon as it finds a matching row, making it efficient for this purpose.
Can correlated subqueries update data in SQL tables?
Yes, correlated subqueries can also be used in UPDATE and DELETE statements to specify which rows should be updated or deleted based on conditions related to data in the same table or in other tables.
Give an example of a correlated subquery used in an UPDATE statement.
For example, to increase the salary of employees who earn less than the average salary in their department:
UPDATE employees e1
SET salary = salary * 1.1
WHERE salary < (
SELECT AVG(salary)
FROM employees e2
WHERE e1.department_id = e2.department_id
)
What are the challenges of writing correlated subqueries?
The main challenges include ensuring the correctness of the reference to the outer query, avoiding excessive performance costs, and managing the complexity that comes with multiple dependencies and repeated evaluations.
What is the GROUP BY clause used for in SQL?
The GROUP BY clause in SQL is used to arrange identical data into groups. This clause is often used with SQL aggregate functions (COUNT, MAX, MIN, SUM, AVG) to perform a calculation on each group of data.
How do you use a basic GROUP BY clause in an SQL query?
A basic GROUP BY clause groups rows that have the same values in specified columns into summary rows. For example, to count the number of employees in each department:
SELECT department_id, COUNT(employee_id)
FROM employees
GROUP BY department_id;