SQL: from 0 to hero Flashcards

Learn active concepts about SQL

You may prefer our related Brainscape-certified flashcards:
1
Q

What is the difference between create table and create view?

Table 2: departments (dept_id, de

Table 1: employees (id, name, dept_id)

easy

A

create table: Stores data physically; faster for repeated access but uses storage.

  • **Use: **Static data or precomputed results.
  • **Avoid: **Dynamic data needing frequent updates.

create view: Virtual table; reflects live data but slower for complex queries.

Use: Dynamic, query-based data.
Avoid: Performance-intensive queries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between INNER JOIN and OUTER JOIN?

T2: departments (dept_id, dept_name)

T1: employees (id, name, dept_id), T2: departments (dept_id, dept_name)

easy

A

INNER JOIN returns matching rows from both tables, while OUTER JOIN includes unmatched rows from one or both tables.

-- INNER JOIN: Only matching rows
SELECT e.name, d.dept_name 
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;
-- OUTER JOIN: Includes unmatched rows
SELECT e.name, d.dept_name 
FROM employees e
LEFT OUTER JOIN departments d ON e.dept_id = d.dept_id;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How can you **find duplicate records **in a table?

Table: employees (id, name, dept_id)

difficulty: easy

A

Use GROUP BY on columns and HAVING COUNT(*) > 1

SELECT name, COUNT(*) 
FROM employees 
GROUP BY name 
HAVING COUNT(*) > 1;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do window functions differ from aggregate functions?

employees (id, name, dept_id, salary)

**difficulty: ** medium

A

Window functions operate over a subset of rows defined by OVER() without collapsing them into one row, unlike aggregate functions.

-- Window Function
SELECT name, salary, SUM(salary) OVER (PARTITION BY dept_id) AS dept_salary 
FROM employees;
-- Aggregate Function
SELECT dept_id, SUM(salary) AS dept_salary 
FROM employees 
GROUP BY dept_id;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Common Table Expression (CTE), and when would you use it?

Table: sales (region, sales)

**difficulty: ** medium

A

A CTE is a temporary result set used for modular, readable queries (e.g., recursive operations).

WITH SalesCTE AS (
  SELECT region, SUM(sales) AS total_sales 
  FROM sales 
  GROUP BY region
)
SELECT region, total_sales 
FROM SalesCTE;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can you calculate the **cumulative sum **of a column?

sales (name, sales, sales_date)

**difficulty: ** medium

A

Use SUM(column) OVER (ORDER BY column) in a window function.

SELECT name, sales, 
       SUM(sales) OVER (ORDER BY sales_date) AS cumulative_sales 
FROM sales;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the purpose of the EXPLAIN statement?

Table: employees (id, name, dept_id)

**difficulty: ** medium

A

It shows the query execution plan to optimize performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you remove duplicate rows from a table?

employees (id, name, dept_id)

**difficulty: ** medium

A

Use DELETE with ROW_NUMBER() or DISTINCT.

DELETE FROM employees 
WHERE ROW_NUMBER() OVER (PARTITION BY name ORDER BY id) > 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Write a query to get the second highest salary from an employees table.

Table: employees (id, name, salary)

**difficulty: ** medium

A
SELECT MAX(salary) 
FROM employees 
WHERE salary < (SELECT MAX(salary) FROM employees);
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the difference between HAVING and WHERE?

employees (id, name, dept_id, salary)

**difficulty: ** easy

A

WHERE filters rows before grouping and HAVING filters groups after aggregation.

-- WHERE
SELECT * 
FROM employees 
WHERE salary > 50000;
-- HAVING
SELECT dept_id, AVG(salary) 
FROM employees 
GROUP BY dept_id 
HAVING AVG(salary) > 60000;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you handle nulls in SQL when performing calculations?

Table: employees (id, name, salary)

**difficulty: ** easy

A

Use COALESCE() or ISNULL() to replace nulls with default values.

SELECT name, COALESCE(salary, 0) AS salary 
FROM employees; -- COALESCE replaces NULL with a value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you find the 90th percentile in a dataset?

Table: employees (id, name, salary)

**difficulty: ** hard

A

With PERCENTILE_CONT

SELECT PERCENTILE_CONT(0.9) 
WITHIN GROUP (ORDER BY salary) AS percentile_salary 
FROM employees;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you **rank rows **in a table?

Table: employees (id, name, salary)

**difficulty: ** medium

A
SELECT name, salary, 
       RANK() OVER (ORDER BY salary DESC) AS rank 
FROM employees;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the difference between TRUNCATE and DELETE?

Table: employees (id, name, dept_id)

**difficulty: ** easy

A
TRUNCATE TABLE employees; -- Deletes all rows
DELETE FROM employees WHERE dept_id = 101; -- Deletes specific rows

Table: employees (id, name, dept_id)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a primary key, and why is it important? Create a table with a primary key.

Table: employees (id, name)

**difficulty: ** easy

A

The primary key is a column in a relational database table that’s distinctive for each record. Primary keys must contain UNIQUE values, and cannot contain NULL values.

CREATE TABLE employees (
  id INT PRIMARY KEY,
  name VARCHAR(50)
)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you pivot data in SQL?

Table: employees (id, dept_id, gender)

**difficulty: ** medium

A
SELECT dept_id,
       SUM(CASE WHEN gender = 'Male' THEN 1 ELSE 0 END) AS male_count,
       SUM(CASE WHEN gender = 'Female' THEN 1 ELSE 0 END) AS female_count
FROM employees 
GROUP BY dept_id
17
Q

Write a query to calculate the percentage of total sales per region?

Table: sales (region, sales)

**difficulty: ** hard

A
SELECT region, 
       SUM(sales) / SUM(SUM(sales)) OVER () * 100 AS percentage 
FROM sales 
GROUP BY region;
18
Q

What is the purpose of normalization in SQL?

employees (id, name, dept_id)

Tables: departments (dept_id, dept_name),

**difficulty: ** easy

A

To reduce redundancy and dependency by organizing data into logical tables.

CREATE TABLE departments (dept_id INT PRIMARY KEY, dept_name VARCHAR(50));
CREATE TABLE employees (id INT PRIMARY KEY, name VARCHAR(50), dept_id INT);
19
Q

How do you get the total row count of a table or grouped result set?

Table: employees (id, name, dept_id)

**difficulty: ** easy

A

Use COUNT(*) for the table or COUNT(column) for grouped subsets.

-- Total row count of the table
SELECT COUNT(*) AS total_rows 
FROM employees;
-- Row count grouped by department
SELECT dept_id, COUNT(*) AS dept_row_count 
FROM employees 
GROUP BY dept_id;
20
Q

What is the difference between RANK(), DENSE_RANK(), and ROW_NUMBER()?

Table: employees (id, name, salary)

**difficulty: ** medium

A
  • RANK() assigns ranks with gaps for tied values.
  • DENSE_RANK() assigns consecutive ranks without gaps.
  • ROW_NUMBER() assigns unique row numbers regardless of ties.
SELECT name, salary, 
       RANK() OVER (ORDER BY salary DESC) AS rank,
       DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank,
       ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_number
FROM employees;
21
Q

How can you replace rows based on a condition while updating a table?

Table: employees (id, name, salary, dept_id)

**difficulty: ** medium

A

Use the UPDATE statement with a CASE expression in the SET clause to selectively update rows based on conditions.

UPDATE employees
SET salary = CASE 
               WHEN dept_id = 101 THEN salary * 1.1 
               ELSE salary 
               END
WHERE dept_id IN (101, 102);
22
Q

What are common SQL data types, and how are they used?

sales (id INT, product STRING, amount DOUBLE, sale_date TIMESTAMP

**difficulty: ** easy

A
  • INT: Whole numbers
  • STRING: Text
  • DOUBLE: Decimal numbers
  • TIMESTAMP: Date and time
CREATE TABLE sales (
  id INT,
  product STRING,
  amount DOUBLE,
  sale_date TIMESTAMP
);
23
Q

How do you query across multiple databases in Databricks?

Tables: sales, campaigns

Databases: sales_db, marketing_db

**difficulty: ** medium

A
-- Joining tables across databases
SELECT s.id, s.amount, c.campaign_name
FROM sales_db.sales s
JOIN marketing_db.campaigns c 
ON s.id = c.sale_id;
24
Q

What is advanced SQL functions like NTILE?

Table: sales (id, amount, sale_date)

**difficulty: ** medium

A

NTILE: Divide rows into quartiles

SELECT id, amount, 
       NTILE(4) OVER (ORDER BY amount) AS quartile
FROM sales;
25
Q

What is the advanced SQL function LEAD?

Table: sales (id, amount, sale_date)

**difficulty: ** medium

A

LEAD: Access the next row’s value

SELECT id, amount, 
       LEAD(amount) OVER (ORDER BY sale_date) AS next_amount
FROM sales;
26
Q

How can you use CASE for advanced query logic?

Table: sales (id, amount)

**difficulty: ** easy

A

Use CASE to create conditional columns or filters in queries.

SELECT id, amount, 
       CASE 
         WHEN amount > 100 THEN 'High'
         WHEN amount > 50 THEN 'Medium'
         ELSE 'Low'
       END AS sales_category
FROM sales;
27
Q

What does the LAG() function do in SQL?

Table: sales (id, amount, sale_date)

**difficulty: ** medium

A

LAG() retrieves the value of a column from the previous row in a window.

SELECT id, amount, sale_date, 
       LAG(amount) OVER (ORDER BY sale_date) AS prev_sale
FROM sales;
28
Q

What does UNBOUNDED PRECEDING mean in window functions?

sales (id, amount, sale_date)

**difficulty: ** hard

A

It refers to the start of the window, including all rows before the current one.

To create a cumulative sum with UNBOUNDED PRECEDING:

SELECT id, amount, 
       SUM(amount) OVER (ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_sales
FROM sales;
29
Q

How does PARTITION BY work in SQL window functions?

Table: sales (id, amount, sale_date, region)

**difficulty: ** medium

A

PARTITION BY divides data into partitions to apply window functions separately to each group.

SELECT region, id, amount, 
       RANK() OVER (PARTITION BY region ORDER BY amount DESC) AS region_rank
FROM sales;
30
Q

What is the difference between RANGE and ROWS in windowing?

Table: sales (id, amount, sale_date)

**difficulty: ** hard

A

ROWS operates on a physical set of rows (e.g., current and previous).
RANGE operates on a logical range of values (e.g., same amount).

– Using ROWS

SELECT id, amount, 
       SUM(amount) OVER (ORDER BY sale_date ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS moving_sum
FROM sales;

– Using RANGE

SELECT id, amount, 
       SUM(amount) OVER (ORDER BY sale_date RANGE BETWEEN INTERVAL 1 DAY PRECEDING AND CURRENT ROW) AS moving_sum
FROM sales;
31
Q

How do FIRST_VALUE() and LAST_VALUE() work in SQL?

Table: sales (id, amount, sale_date)

**difficulty: ** medium

A

FIRST_VALUE() returns the first value in a window; LAST_VALUE() returns the last value.

SELECT id, amount, sale_date, 
       FIRST_VALUE(amount) OVER (ORDER BY sale_date) AS first_sale,
       LAST_VALUE(amount) OVER (ORDER BY sale_date ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS last_sale
FROM sales;
32
Q

How do you calculate a moving average in SQL using window functions?

Table: sales (id, amount, sale_date)

**difficulty: ** hard

A

Use AVG() over a window frame to calculate a moving average.

– 3-day moving average

SELECT id, amount, sale_date, 
       AVG(amount) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg
FROM sales;
33
Q

How can you calculate percentiles with SQL window functions?

**difficulty: ** medium

A

Use PERCENT_RANK() or CUME_DIST() to calculate percentile rankings.

SELECT id, amount, 
       PERCENT_RANK() OVER (ORDER BY amount) AS percent_rank,
       CUME_DIST() OVER (ORDER BY amount) AS cumulative_dist
FROM sales;
34
Q

What is the QUALIFY statement in SQL, and how is it used?

sales (id, amount, sale_date)

**difficulty: ** medium

A

The QUALIFY statement is used to filter rows after applying window functions, simplifying complex filtering logic.

In the example below, we only keep the row with the highest rank:

SELECT id, amount, 
       RANK() OVER (ORDER BY amount DESC) AS rank
FROM sales
QUALIFY rank = 1;
36
Q

Exercise

How would you calculate the total revenue for each product for the past 7 days?

Table Name: orders
- order_id INT
- product_id INT
- order_date TIMESTAMP
- status STRING (values: ‘completed’, ‘pending’, ‘cancelled’)
- total_amount DOUBLE

Table Name: products
- product_id INT
- product_name STRING
- category STRING
- price DOUBLE

medium

A
SELECT p.product_name, SUM(o.total_amount) AS total_revenue
FROM orders o
JOIN products p ON o.product_id = p.product_id
WHERE o.status = 'completed'
  AND o.order_date >= CURRENT_DATE - INTERVAL 7 DAY
GROUP BY p.product_name;
37
Q
A