SQL: from 0 to hero Flashcards
Learn active concepts about SQL to succeed in product data science interviews.
What is the difference between create table
and create view
?
Table 2: departments (dept_id, de
Table 1: employees (id, name, dept_id)
easy
create table
: Stores data physically; faster for repeated access but uses storage.
- Use: Static data or precomputed results.
- Avoid: Dynamic data needing frequent updates.
create view
: Virtual table; reflects live data but slower for complex queries.
Use: Dynamic, query-based data.
Avoid: Performance-intensive queries.
What is the difference between INNER JOIN
and OUTER JOIN
?
T2: departments (dept_id, dept_name)
T1: employees (id, name, dept_id), T2: departments (dept_id, dept_name)
easy
INNER JOIN
returns matching rows from both tables, while OUTER JOIN
includes unmatched rows from one or both tables.
– INNER JOIN: Only matching rows
SELECT e.name, d.dept_name FROM employees e INNER JOIN departments d ON e.dept_id = d.dept_id;
– OUTER JOIN: Includes unmatched rows
SELECT e.name, d.dept_name FROM employees e LEFT OUTER JOIN departments d ON e.dept_id = d.dept_id;
How can you **find duplicate records **in a table?
Table: employees (id, name, dept_id)
difficulty: easy
Use GROUP BY on columns and HAVING COUNT(*) > 1
SELECT name, COUNT(*) FROM employees GROUP BY name HAVING COUNT(*) > 1;
How do window functions differ from aggregate functions?
employees (id, name, dept_id, salary)
**difficulty: ** medium
Window functions operate over a subset of rows defined by OVER() without collapsing them into one row, unlike aggregate functions.
-- Window Function SELECT name, salary, SUM(salary) OVER (PARTITION BY dept_id) AS dept_salary FROM employees;
-- Aggregate Function SELECT dept_id, SUM(salary) AS dept_salary FROM employees GROUP BY dept_id;
What is a Common Table Expression (CTE), and when would you use it?
Table: sales (region, sales)
**difficulty: ** medium
A CTE is a temporary result set used for modular, readable queries (e.g., recursive operations).
WITH SalesCTE AS ( SELECT region, SUM(sales) AS total_sales FROM sales GROUP BY region ) SELECT region, total_sales FROM SalesCTE;
How can you calculate the **cumulative sum **of a column?
sales (name, sales, sales_date)
**difficulty: ** medium
Use SUM(column) OVER (ORDER BY column) in a window function.
SELECT name, sales, SUM(sales) OVER (ORDER BY sales_date) AS cumulative_sales FROM sales;
What is the purpose of the EXPLAIN
statement?
Table: employees (id, name, dept_id)
**difficulty: ** medium
It shows the query execution plan to optimize performance.
How do you remove duplicate rows from a table?
employees (id, name, dept_id)
**difficulty: ** medium
Use DELETE with ROW_NUMBER() or DISTINCT.
DELETE FROM employees WHERE ROW_NUMBER() OVER (PARTITION BY name ORDER BY id) > 1
Write a query to get the second highest salary from an employees table.
Table: employees (id, name, salary)
**difficulty: ** medium
SELECT MAX(salary) FROM employees WHERE salary < (SELECT MAX(salary) FROM employees);
What is the difference between HAVING
and WHERE
?
employees (id, name, dept_id, salary)
**difficulty: ** easy
WHERE
filters rows before grouping and HAVING
filters groups after aggregation.
-- WHERE SELECT * FROM employees WHERE salary > 50000;
-- HAVING SELECT dept_id, AVG(salary) FROM employees GROUP BY dept_id HAVING AVG(salary) > 60000;
How do you handle nulls in SQL when performing calculations?
Table: employees (id, name, salary)
**difficulty: ** easy
Use COALESCE()
or ISNULL()
to replace nulls with default values.
SELECT name, COALESCE(salary, 0) AS salary FROM employees; -- COALESCE replaces NULL with a value
How do you find the 90th percentile in a dataset?
Table: employees (id, name, salary)
**difficulty: ** hard
With PERCENTILE_CONT
SELECT PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY salary) AS percentile_salary FROM employees;
How do you **rank rows **in a table?
Table: employees (id, name, salary)
**difficulty: ** medium
SELECT name, salary, RANK() OVER (ORDER BY salary DESC) AS rank FROM employees;
What is the difference between TRUNCATE
and DELETE
?
Table: employees (id, name, dept_id)
**difficulty: ** easy
TRUNCATE TABLE employees; -- Deletes all rows
DELETE FROM employees WHERE dept_id = 101; -- Deletes specific rows
Table: employees (id, name, dept_id)
What is a primary key, and why is it important? Create a table with a primary key.
Table: employees (id, name)
**difficulty: ** easy
The primary key is a column in a relational database table that’s distinctive for each record. Primary keys must contain UNIQUE values, and cannot contain NULL values.
CREATE TABLE employees ( id INT PRIMARY KEY, name VARCHAR(50) )
Learn more about primary keys.
How do you pivot data in SQL?
Table: employees (id, dept_id, gender)
**difficulty: ** medium
SELECT dept_id, SUM(CASE WHEN gender = 'Male' THEN 1 ELSE 0 END) AS male_count, SUM(CASE WHEN gender = 'Female' THEN 1 ELSE 0 END) AS female_count FROM employees GROUP BY dept_id
Write a query to calculate the percentage of total sales per region?
Table: sales (region, sales)
**difficulty: ** hard
SELECT region, SUM(sales) / SUM(SUM(sales)) OVER () * 100 AS percentage FROM sales GROUP BY region;
What is the purpose of normalization in SQL?
employees (id, name, dept_id)
Tables: departments (dept_id, dept_name),
**difficulty: ** easy
To reduce redundancy and dependency by organizing data into logical tables.
CREATE TABLE departments (dept_id INT PRIMARY KEY, dept_name VARCHAR(50)); CREATE TABLE employees (id INT PRIMARY KEY, name VARCHAR(50), dept_id INT);
How do you get the total row count of a table or grouped result set?
Table: employees (id, name, dept_id)
**difficulty: ** easy
Use COUNT(*)
for the table or COUNT(column)
for grouped subsets.
-- Total row count of the table SELECT COUNT(*) AS total_rows FROM employees;
-- Row count grouped by department SELECT dept_id, COUNT(*) AS dept_row_count FROM employees GROUP BY dept_id;
What is the difference between RANK()
, DENSE_RANK()
, and ROW_NUMBER()
?
Table: employees (id, name, salary)
**difficulty: ** medium
- RANK() assigns ranks with gaps for tied values.
- DENSE_RANK() assigns consecutive ranks without gaps.
- ROW_NUMBER() assigns unique row numbers regardless of ties.
SELECT name, salary, RANK() OVER (ORDER BY salary DESC) AS rank, DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank, ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_number FROM employees;
How can you replace rows based on a condition while updating a table?
Table: employees (id, name, salary, dept_id)
**difficulty: ** medium
Use the UPDATE
statement with a CASE
expression in the SET
clause to selectively update rows based on conditions.
UPDATE employees SET salary = CASE WHEN dept_id = 101 THEN salary * 1.1 ELSE salary END WHERE dept_id IN (101, 102);
What are common SQL data types, and how are they used?
sales (id INT, product STRING, amount DOUBLE, sale_date TIMESTAMP
**difficulty: ** easy
-
INT
: Whole numbers -
STRING
: Text -
DOUBLE
: Decimal numbers -
TIMESTAMP
: Date and time
CREATE TABLE sales ( id INT, product STRING, amount DOUBLE, sale_date TIMESTAMP );
How do you query across multiple databases in Databricks?
Tables: sales, campaigns
Databases: sales_db, marketing_db
**difficulty: ** medium
-- Joining tables across databases SELECT s.id, s.amount, c.campaign_name FROM sales_db.sales s JOIN marketing_db.campaigns c ON s.id = c.sale_id;
What is advanced SQL functions like NTILE
?
Table: sales (id, amount, sale_date)
**difficulty: ** medium
NTILE
: Divide rows into quartiles
SELECT id, amount, NTILE(4) OVER (ORDER BY amount) AS quartile FROM sales;
What is the advanced SQL function LEAD
?
Table: sales (id, amount, sale_date)
**difficulty: ** medium
LEAD
: Access the next row’s value
SELECT id, amount, LEAD(amount) OVER (ORDER BY sale_date) AS next_amount FROM sales;
How can you use CASE
for advanced query logic?
Table: sales (id, amount)
**difficulty: ** easy
Use CASE
to create conditional columns or filters in queries.
SELECT id, amount, CASE WHEN amount > 100 THEN 'High' WHEN amount > 50 THEN 'Medium' ELSE 'Low' END AS sales_category FROM sales;
What does the LAG()
function do in SQL?
Table: sales (id, amount, sale_date)
**difficulty: ** medium
LAG()
retrieves the value of a column from the previous row in a window.
SELECT id, amount, sale_date, LAG(amount) OVER (ORDER BY sale_date) AS prev_sale FROM sales;