SQL: from 0 to hero Flashcards

Learn active concepts about SQL to succeed in product data science interviews.

1
Q

What is the difference between create table and create view?

Table 2: departments (dept_id, de

Table 1: employees (id, name, dept_id)

easy

A

create table: Stores data physically; faster for repeated access but uses storage.

  • Use: Static data or precomputed results.
  • Avoid: Dynamic data needing frequent updates.

create view: Virtual table; reflects live data but slower for complex queries.

Use: Dynamic, query-based data.
Avoid: Performance-intensive queries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the difference between INNER JOIN and OUTER JOIN?

T2: departments (dept_id, dept_name)

T1: employees (id, name, dept_id), T2: departments (dept_id, dept_name)

easy

A

INNER JOIN returns matching rows from both tables, while OUTER JOIN includes unmatched rows from one or both tables.

– INNER JOIN: Only matching rows

SELECT e.name, d.dept_name 
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;

– OUTER JOIN: Includes unmatched rows

SELECT e.name, d.dept_name 
FROM employees e
LEFT OUTER JOIN departments d ON e.dept_id = d.dept_id;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How can you **find duplicate records **in a table?

Table: employees (id, name, dept_id)

difficulty: easy

A

Use GROUP BY on columns and HAVING COUNT(*) > 1

SELECT name, COUNT(*) 
FROM employees 
GROUP BY name 
HAVING COUNT(*) > 1;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do window functions differ from aggregate functions?

employees (id, name, dept_id, salary)

**difficulty: ** medium

A

Window functions operate over a subset of rows defined by OVER() without collapsing them into one row, unlike aggregate functions.

-- Window Function
SELECT name, salary, SUM(salary) OVER (PARTITION BY dept_id) AS dept_salary 
FROM employees;
-- Aggregate Function
SELECT dept_id, SUM(salary) AS dept_salary 
FROM employees 
GROUP BY dept_id;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Common Table Expression (CTE), and when would you use it?

Table: sales (region, sales)

**difficulty: ** medium

A

A CTE is a temporary result set used for modular, readable queries (e.g., recursive operations).

WITH SalesCTE AS (
  SELECT region, SUM(sales) AS total_sales 
  FROM sales 
  GROUP BY region
)
SELECT region, total_sales 
FROM SalesCTE;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can you calculate the **cumulative sum **of a column?

sales (name, sales, sales_date)

**difficulty: ** medium

A

Use SUM(column) OVER (ORDER BY column) in a window function.

SELECT name, sales, 
       SUM(sales) OVER (ORDER BY sales_date) AS cumulative_sales 
FROM sales;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the purpose of the EXPLAIN statement?

Table: employees (id, name, dept_id)

**difficulty: ** medium

A

It shows the query execution plan to optimize performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you remove duplicate rows from a table?

employees (id, name, dept_id)

**difficulty: ** medium

A

Use DELETE with ROW_NUMBER() or DISTINCT.

DELETE FROM employees 
WHERE ROW_NUMBER() OVER (PARTITION BY name ORDER BY id) > 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Write a query to get the second highest salary from an employees table.

Table: employees (id, name, salary)

**difficulty: ** medium

A
SELECT MAX(salary) 
FROM employees 
WHERE salary < (SELECT MAX(salary) FROM employees);
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the difference between HAVING and WHERE?

employees (id, name, dept_id, salary)

**difficulty: ** easy

A

WHERE filters rows before grouping and HAVING filters groups after aggregation.

-- WHERE
SELECT * 
FROM employees 
WHERE salary > 50000;
-- HAVING
SELECT dept_id, AVG(salary) 
FROM employees 
GROUP BY dept_id 
HAVING AVG(salary) > 60000;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you handle nulls in SQL when performing calculations?

Table: employees (id, name, salary)

**difficulty: ** easy

A

Use COALESCE() or ISNULL() to replace nulls with default values.

SELECT name, COALESCE(salary, 0) AS salary 
FROM employees; -- COALESCE replaces NULL with a value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you find the 90th percentile in a dataset?

Table: employees (id, name, salary)

**difficulty: ** hard

A

With PERCENTILE_CONT

SELECT PERCENTILE_CONT(0.9) 
WITHIN GROUP (ORDER BY salary) AS percentile_salary 
FROM employees;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you **rank rows **in a table?

Table: employees (id, name, salary)

**difficulty: ** medium

A
SELECT name, salary, 
       RANK() OVER (ORDER BY salary DESC) AS rank 
FROM employees;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the difference between TRUNCATE and DELETE?

Table: employees (id, name, dept_id)

**difficulty: ** easy

A
TRUNCATE TABLE employees; -- Deletes all rows
DELETE FROM employees WHERE dept_id = 101; -- Deletes specific rows

Table: employees (id, name, dept_id)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a primary key, and why is it important? Create a table with a primary key.

Table: employees (id, name)

**difficulty: ** easy

A

The primary key is a column in a relational database table that’s distinctive for each record. Primary keys must contain UNIQUE values, and cannot contain NULL values.

CREATE TABLE employees (
  id INT PRIMARY KEY,
  name VARCHAR(50)
)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you pivot data in SQL?

Table: employees (id, dept_id, gender)

**difficulty: ** medium

A
SELECT dept_id,
       SUM(CASE WHEN gender = 'Male' THEN 1 ELSE 0 END) AS male_count,
       SUM(CASE WHEN gender = 'Female' THEN 1 ELSE 0 END) AS female_count
FROM employees 
GROUP BY dept_id
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Write a query to calculate the percentage of total sales per region?

Table: sales (region, sales)

**difficulty: ** hard

A
SELECT region, 
       SUM(sales) / SUM(SUM(sales)) OVER () * 100 AS percentage 
FROM sales 
GROUP BY region;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the purpose of normalization in SQL?

employees (id, name, dept_id)

Tables: departments (dept_id, dept_name),

**difficulty: ** easy

A

To reduce redundancy and dependency by organizing data into logical tables.

CREATE TABLE departments (dept_id INT PRIMARY KEY, dept_name VARCHAR(50));
CREATE TABLE employees (id INT PRIMARY KEY, name VARCHAR(50), dept_id INT);
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How do you get the total row count of a table or grouped result set?

Table: employees (id, name, dept_id)

**difficulty: ** easy

A

Use COUNT(*) for the table or COUNT(column) for grouped subsets.

-- Total row count of the table
SELECT COUNT(*) AS total_rows 
FROM employees;
-- Row count grouped by department
SELECT dept_id, COUNT(*) AS dept_row_count 
FROM employees 
GROUP BY dept_id;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the difference between RANK(), DENSE_RANK(), and ROW_NUMBER()?

Table: employees (id, name, salary)

**difficulty: ** medium

A
  • RANK() assigns ranks with gaps for tied values.
  • DENSE_RANK() assigns consecutive ranks without gaps.
  • ROW_NUMBER() assigns unique row numbers regardless of ties.
SELECT name, salary, 
       RANK() OVER (ORDER BY salary DESC) AS rank,
       DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank,
       ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_number
FROM employees;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How can you replace rows based on a condition while updating a table?

Table: employees (id, name, salary, dept_id)

**difficulty: ** medium

A

Use the UPDATE statement with a CASE expression in the SET clause to selectively update rows based on conditions.

UPDATE employees
SET salary = CASE 
               WHEN dept_id = 101 THEN salary * 1.1 
               ELSE salary 
               END
WHERE dept_id IN (101, 102);
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are common SQL data types, and how are they used?

sales (id INT, product STRING, amount DOUBLE, sale_date TIMESTAMP

**difficulty: ** easy

A
  • INT: Whole numbers
  • STRING: Text
  • DOUBLE: Decimal numbers
  • TIMESTAMP: Date and time
CREATE TABLE sales (
  id INT,
  product STRING,
  amount DOUBLE,
  sale_date TIMESTAMP
);
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How do you query across multiple databases in Databricks?

Tables: sales, campaigns

Databases: sales_db, marketing_db

**difficulty: ** medium

A
-- Joining tables across databases
SELECT s.id, s.amount, c.campaign_name
FROM sales_db.sales s
JOIN marketing_db.campaigns c 
ON s.id = c.sale_id;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is advanced SQL functions like NTILE?

Table: sales (id, amount, sale_date)

**difficulty: ** medium

A

NTILE: Divide rows into quartiles

SELECT id, amount, 
       NTILE(4) OVER (ORDER BY amount) AS quartile
FROM sales;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the advanced SQL function LEAD?

Table: sales (id, amount, sale_date)

**difficulty: ** medium

A

LEAD: Access the next row’s value

SELECT id, amount, 
       LEAD(amount) OVER (ORDER BY sale_date) AS next_amount
FROM sales;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How can you use CASE for advanced query logic?

Table: sales (id, amount)

**difficulty: ** easy

A

Use CASE to create conditional columns or filters in queries.

SELECT id, amount, 
       CASE 
         WHEN amount > 100 THEN 'High'
         WHEN amount > 50 THEN 'Medium'
         ELSE 'Low'
       END AS sales_category
FROM sales;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What does the LAG() function do in SQL?

Table: sales (id, amount, sale_date)

**difficulty: ** medium

A

LAG() retrieves the value of a column from the previous row in a window.

SELECT id, amount, sale_date, 
       LAG(amount) OVER (ORDER BY sale_date) AS prev_sale
FROM sales;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What does UNBOUNDED PRECEDING mean in window functions?

sales (id, amount, sale_date)

**difficulty: ** hard

A

It refers to the start of the window, including all rows before the current one.

To create a cumulative sum with UNBOUNDED PRECEDING:

SELECT id, amount, 
       SUM(amount) OVER (ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_sales
FROM sales;
29
Q

How does PARTITION BY work in SQL window functions?

Table: sales (id, amount, sale_date, region)

**difficulty: ** medium

A

PARTITION BY divides data into partitions to apply window functions separately to each group.

SELECT region, id, amount, 
       RANK() OVER (PARTITION BY region ORDER BY amount DESC) AS region_rank
FROM sales;
30
Q

What is the difference between RANGE and ROWS in windowing?

Table: sales (id, amount, sale_date)

**difficulty: ** hard

A

ROWS operates on a physical set of rows (e.g., current and previous).
RANGE operates on a logical range of values (e.g., same amount).

– Using ROWS

SELECT id, amount, 
       SUM(amount) OVER (ORDER BY sale_date ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS moving_sum
FROM sales;

– Using RANGE

SELECT id, amount, 
       SUM(amount) OVER (ORDER BY sale_date RANGE BETWEEN INTERVAL 1 DAY PRECEDING AND CURRENT ROW) AS moving_sum
FROM sales;
31
Q

How do FIRST_VALUE() and LAST_VALUE() work in SQL?

Table: sales (id, amount, sale_date)

**difficulty: ** medium

A

FIRST_VALUE() returns the first value in a window; LAST_VALUE() returns the last value.

SELECT id, amount, sale_date, 
       FIRST_VALUE(amount) OVER (ORDER BY sale_date) AS first_sale,
       LAST_VALUE(amount) OVER (ORDER BY sale_date ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS last_sale
FROM sales;
32
Q

How do you calculate a moving average in SQL using window functions?

Table: sales (id, amount, sale_date)

**difficulty: ** hard

A

Use AVG() over a window frame to calculate a moving average.

– 3-day moving average

SELECT id, amount, sale_date, 
       AVG(amount) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg
FROM sales;
33
Q

How can you calculate percentiles with SQL window functions?

**difficulty: ** medium

A

Use PERCENT_RANK() or CUME_DIST() to calculate percentile rankings.

SELECT id, amount, 
       PERCENT_RANK() OVER (ORDER BY amount) AS percent_rank,
       CUME_DIST() OVER (ORDER BY amount) AS cumulative_dist
FROM sales;
34
Q

What is the QUALIFY statement in SQL, and how is it used?

sales (id, amount, sale_date)

**difficulty: ** medium

A

The QUALIFY statement is used to filter rows after applying window functions, simplifying complex filtering logic.

In the example below, we only keep the row with the highest rank:

SELECT id, amount, 
       RANK() OVER (ORDER BY amount DESC) AS rank
FROM sales
QUALIFY rank = 1;
36
Q

Exercise

How would you calculate the total revenue for each product for the past 7 days?

Table Name: orders
- order_id INT
- product_id INT
- order_date TIMESTAMP
- status STRING (values: ‘completed’, ‘pending’, ‘cancelled’)
- total_amount DOUBLE

Table Name: products
- product_id INT
- product_name STRING
- category STRING
- price DOUBLE

medium

A
SELECT p.product_name, SUM(o.total_amount) AS total_revenue
FROM orders o
JOIN products p ON o.product_id = p.product_id
WHERE o.status = 'completed'
  AND o.order_date >= CURRENT_DATE - INTERVAL 7 DAY
GROUP BY p.product_name;
37
Q

How would you calculate the average order value for each product category in the last 60 days?

Table Name: orders
- order_id INT
- product_id INT
- order_date TIMESTAMP
- status STRING (values: ‘completed’, ‘pending’, ‘cancelled’)
- total_amount DOUBLE

Table Name: products
- product_id INT
- category STRING

medium

A
SELECT p.category, AVG(o.total_amount) AS avg_order_value
FROM orders o
JOIN products p ON o.product_id = p.product_id
WHERE o.status = 'completed'
  AND o.order_date >= CURRENT_DATE - INTERVAL 60 DAY
GROUP BY p.category
38
Q

How would you find the most recent order placed by each customer?

Table Name: orders
- order_id INT
- customer_id INT
- order_date TIMESTAMP
- total_amount DOUBLE

medium

A
SELECT customer_id, MAX(order_date) AS latest_order
FROM orders
GROUP BY customer_id;
39
Q

How would you count the total number of orders by order status for the last 90 days?

Table Name: orders
- order_id INT
- order_date TIMESTAMP
- status STRING (values: ‘completed’, ‘pending’, ‘cancelled’)

A
SELECT status, COUNT(order_id) AS total_orders
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL 90 DAY
GROUP BY status
40
Q

How would you calculate the 7-day rolling average of total sales for each product category?

Table Name: orders
- order_id INT
- product_id INT
- order_date TIMESTAMP
- total_amount DOUBLE

Table Name: products
- product_id INT
- category STRING

advanced

A
SELECT p.category, 
       o.order_date, 
       AVG(o.total_amount) 
         OVER (PARTITION BY p.category 
               ORDER BY o.order_date 
               RANGE BETWEEN INTERVAL 6 DAY PRECEDING AND CURRENT ROW) AS rolling_avg
FROM orders o
JOIN products p ON o.product_id = p.product_id;
41
Q

How would you compare the current order total with the previous order total for each customer and calculate the difference?

Table Name: orders
- order_id INT
- customer_id INT
- order_date TIMESTAMP
- total_amount DOUBLE

advanced

A
SELECT customer_id, 
       order_id, 
       total_amount, 
       LAG(total_amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS previous_order, 
       total_amount - LAG(total_amount) OVER (PARTITION BY customer_id ORDER BY order_date) AS difference
FROM orders;
42
Q

How would you calculate the cumulative sales total for each product category?

Table Name: orders
- order_id INT
- product_id INT
- order_date TIMESTAMP
- total_amount DOUBLE

Table Name: products
- product_id INT
- category STRING

advanced

A
SELECT p.category, 
       o.order_date, 
       SUM(o.total_amount) 
         OVER (PARTITION BY p.category 
               ORDER BY o.order_date 
               ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_sales
FROM orders o
JOIN products p ON o.product_id = p.product_id
43
Q

How would you retrieve the first order placed by each customer?

Table Name: orders
- order_id INT
- customer_id INT
- order_date TIMESTAMP
- total_amount DOUBLE

advanced

A
WITH ranked_orders AS (
    SELECT 
        customer_id, 
        order_id, 
        order_date, 
        total_amount, 
        ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) AS row_num
    FROM orders
)
SELECT 
    customer_id, 
    order_id, 
    order_date, 
    total_amount
FROM ranked_orders
WHERE row_num = 1
44
Q

How would you rank products by total sales within each category for the last 30 days?

Table Name: orders
- order_id INT
- product_id INT
- order_date TIMESTAMP
- total_amount DOUBLE

Table Name: products
- product_id INT
- category STRING

advanced

A
SELECT p.category, 
       p.product_id, 
       SUM(o.total_amount) AS total_sales, 
       RANK() OVER (PARTITION BY p.category ORDER BY SUM(o.total_amount) DESC) AS sales_rank
FROM orders o
JOIN products p ON o.product_id = p.product_id
WHERE o.order_date >= CURRENT_DATE - INTERVAL 30 DAY
GROUP BY p.category, p.product_id
45
Q

How would you rank products by their sales total in each category, ensuring no gaps in ranking?

Table Name: orders
- order_id INT
- product_id INT
- order_date TIMESTAMP
- total_amount DOUBLE

Table Name: products
- product_id INT
- category STRING

advanced

A
SELECT p.category, 
       p.product_id, 
       SUM(o.total_amount) AS total_sales, 
       DENSE_RANK() OVER (PARTITION BY p.category ORDER BY SUM(o.total_amount) DESC) AS sales_rank
FROM orders o
JOIN products p ON o.product_id = p.product_id
GROUP BY p.category, p.product_id
46
Q

How would you divide customers into 4 quartiles based on their total order value over the last 90 days?

Table Name: orders
- order_id INT
- customer_id INT
- order_date TIMESTAMP
- total_amount DOUBLE

advanced

A
SELECT customer_id, 
       SUM(total_amount) AS total_spent,
       NTILE(4) OVER (ORDER BY SUM(total_amount) DESC) AS quartile
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL 90 DAY
GROUP BY customer_id
47
Q

How would you calculate the total sales by category and overall sales using GROUPING SETS?

Table Name: orders
- order_id INT
- product_id INT
- order_date TIMESTAMP
- total_amount DOUBLE

Table Name: products
- product_id INT
- category STRING

advanced

A
SELECT p.category, 
       SUM(o.total_amount) AS total_sales
FROM orders o
JOIN products p ON o.product_id = p.product_id
GROUP BY GROUPING SETS (p.category, ())
48
Q

How would you generate a list of all possible combinations of products and customers?

Table Name: orders
- order_id INT
- customer_id INT
- product_id INT

Table Name: products
- product_id INT
- category STRING

Table Name: customers
- customer_id INT
- customer_name STRING

advanced

A
SELECT c.customer_id, 
       p.product_id
FROM customers c
CROSS JOIN products p
49
Q

How would you get a list of all customers with their order details (including those who didn’t place any order)?

Table Name: orders
- order_id INT
- customer_id INT
- order_date TIMESTAMP
- total_amount DOUBLE

Table Name: customers
- customer_id INT
- customer_name STRING

advanced

A
SELECT c.customer_name, 
       o.order_id, 
       o.total_amount
FROM customers c
FULL OUTER JOIN orders o ON c.customer_id = o.customer_id
50
Q

How would you extract values from a JSON column to find the product name from a JSON structure in the orders table?

Table Name: orders
- order_id INT
- order_date TIMESTAMP
- product_info STRING (JSON format: {"product_id": 1, "product_name": "Widget"})

advanced

A
SELECT order_id, 
       JSON_EXTRACT(product_info, '$.product_name') AS product_name
FROM orders
51
Q

How would you retrieve only the latest order for each customer using the QUALIFY statement?

Table Name: orders
- order_id INT
- customer_id INT
- order_date TIMESTAMP
- total_amount DOUBLE

advanced

A
SELECT customer_id, 
       order_id, 
       order_date, 
       total_amount
FROM (
    SELECT customer_id, 
           order_id, 
           order_date, 
           total_amount, 
           ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS row_num
    FROM orders
) 
QUALIFY row_num = 1
52
Q

How would you flatten an array of product IDs from the orders table to see each individual product in the order?

Table Name: orders
- order_id INT
- customer_id INT
- order_date TIMESTAMP
- product_ids ARRAY<INT> (example: `[1, 2, 3]`)</INT>

medium

A
SELECT order_id, 
       customer_id, 
       order_date, 
       EXplode(product_ids) AS product_id
FROM orders
53
Q

How would you join orders with a list of products using an exploded field for product_ids in the orders table?

Table Name: orders
- order_id INT
- customer_id INT
- order_date TIMESTAMP
- product_ids ARRAY<INT> (example: `[1, 2, 3]`)</INT>

Table Name: products
- product_id INT
- product_name STRING

advanced

A
SELECT o.order_id, 
       o.customer_id, 
       o.order_date, 
       p.product_name
FROM orders o
LATERAL VIEW EXPLODE(o.product_ids) exploded AS product_id
JOIN products p ON o.product_id = p.product_id
54
Q

How would you join orders with products and filter only the orders containing a specific product (e.g., product_id = 2) from the exploded product_ids field?

Table Name: orders
- order_id INT
- customer_id INT
- order_date TIMESTAMP
- product_ids ARRAY<INT> (example: `[1, 2, 3]`)</INT>

Table Name: products
- product_id INT
- product_name STRING

advanced

A
SELECT o.order_id, 
       o.customer_id, 
       o.order_date, 
       p.product_name
FROM orders o
LATERAL VIEW EXPLODE(o.product_ids) exploded AS product_id
JOIN products p ON exploded.product_id = p.product_id
WHERE exploded.product_id = 2
55
Q

How would you check the number of NULL values in the email column of the users table?

Table Name: users
- user_id INT
- email STRING

medium

A
SELECT COUNT(*) AS null_count
FROM users
WHERE email IS NULL
56
Q

How would you identify rows with missing values in the first_name or last_name columns?

Table Name: users
- user_id INT
- first_name STRING
- last_name STRING

medium

A
SELECT user_id, first_name, last_name
FROM users
WHERE first_name IS NULL OR last_name IS NULL;
57
Q

How would you check for duplicate entries in the email column of the users table?

Table Name: users
- user_id INT
- email STRING

easy

A
SELECT email, COUNT(*) AS duplicate_count
FROM users
GROUP BY email
HAVING COUNT(*) > 1
58
Q

How would you find non-numeric values in a column amount which should contain numeric data?

Table Name: transactions
- transaction_id INT
- amount STRING

advanced

A
SELECT transaction_id, amount
FROM transactions
WHERE NOT amount REGEXP '^[0-9]+(\.[0-9]+)?$'
59
Q

How would you identify if there are duplicate rows based on the combination of email and user_id in the users table?

Table Name: users
- user_id INT
- email STRING

medium

A
SELECT email, user_id, COUNT(*) AS duplicate_count
FROM users
GROUP BY email, user_id
HAVING COUNT(*) > 1
60
Q

How would you detect invalid date formats in the birth_date column (which should contain valid DATE values)?

Table Name: users
- user_id INT
- birth_date STRING

A
SELECT user_id, birth_date
FROM users
WHERE NOT ISDATE(birth_date)
61
Q

How would you check if the values in the product_id column in the orders table match the valid IDs in the products table?

Table Name: orders
- order_id INT
- product_id INT

Table Name: products
- product_id INT
- product_name STRING

A
SELECT o.order_id, o.product_id
FROM orders o
LEFT JOIN products p ON o.product_id = p.product_id
WHERE p.product_id IS NULL
62
Q

How would you identify orders with unusually high amounts greater than 1000 in the orders table?

Table Name: orders
- order_id INT
- total_amount DOUBLE

A
SELECT order_id, total_amount
FROM orders
WHERE total_amount > 1000
63
Q

Handling Missing Values (Imputation Techniques)

How would you fill in missing total_amount values in the orders table with the average total_amount?

Table Name: orders
- order_id INT
- total_amount DOUBLE

A
SELECT order_id, 
       COALESCE(total_amount, (SELECT AVG(total_amount) FROM orders)) AS filled_total_amount
FROM orders
64
Q

Handling Outliers in Numerical Data

How would you identify outliers in the total_amount column of the orders table using the Interquartile Range (IQR) method?

Table Name: orders
- order_id INT
- total_amount DOUBLE

advanced

A
WITH percentiles AS (
  SELECT 
    PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY total_amount) AS Q1,
    PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY total_amount) AS Q3
  FROM orders
)
SELECT order_id, total_amount
FROM orders, percentiles
WHERE total_amount < Q1 - 1.5 * (Q3 - Q1)
   OR total_amount > Q3 + 1.5 * (Q3 - Q1)
65
Q

Removing Duplicates in a Large Dataset

How would you remove duplicate entries in the users table, keeping only the first occurrence based on email?

Table Name: users
- user_id INT
- email STRING

advanced

A
WITH ranked_users AS (
  SELECT user_id, email, ROW_NUMBER() OVER (PARTITION BY email ORDER BY user_id) AS row_num
  FROM users
)
DELETE FROM ranked_users
WHERE row_num > 1;
66
Q

Data Type Conversions and Implicit Casting

How would you convert the amount column from STRING to FLOAT in the transactions table and check for any conversion errors?

Table Name: transactions
- transaction_id INT
- amount STRING

A
SELECT transaction_id, 
       CAST(amount AS FLOAT) AS amount_float
FROM transactions
WHERE TRY_CAST(amount AS FLOAT) IS NULL
67
Q

Empty Strings vs NULL Values

How would you find rows where the email is either NULL or an empty string in the users table?

Table Name: users
- user_id INT
- email STRING

A
SELECT user_id, email
FROM users
WHERE email IS NULL OR email = ''
68
Q

Data Redundancy

How would you identify redundant rows where the same email appears multiple times in the users table?

Table Name: users
- user_id INT
- email STRING

A
SELECT email, COUNT(*) AS duplicate_count
FROM users
GROUP BY email
HAVING COUNT(*) > 1
69
Q

A product manager needs to update the price of a product, insert a new product, and delete an outdated product in the products table.

  1. Insert a new product with product_id = 101, product_name = ‘Wireless Mouse’, and price = 29.99.
  2. Update the price of the product with product_id = 101 to 24.99.
  3. Delete the product with product_id = 101 as it’s outdated.

Table Name: products
- product_id INT
- product_name STRING
- price DOUBLE

easy

A

insert

INSERT INTO products (product_id, product_name, price)
   VALUES (101, 'Wireless Mouse', 29.99);

** update**

UPDATE products
SET price = 24.99
WHERE product_id = 101;

delete

DELETE FROM products
WHERE product_id = 101;