PostgreSQL Flashcards

1
Q

What steps do you want for cleaning string data?

A

Restrict capitalization in column names
Remove extra divider spaces in column names
make column names uniform

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

INITCAP()

A

SELECT INITCAPT(column)
Output: Hello Friend!
Fixes capitalization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

REPLACE()

A

SELECT REPLACE(column, string_to_replace, replacement)
SELECT REPLACE(streets, ‘Main Street’, ‘St’);
Output: Main St

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

LPAD(input_string, length, fill_value)

A

Left-Pad a Column
SELECT LPAD(‘column’, 7, ‘X’);
OUTPUT: XXXX123
Prepending text values to a string. The fourth value has to be a string.
SELECT LPAD(‘123’, 5, ‘0’); – Output: ‘00123’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Like

A

Like is used for pattern matching using wildcards such as % or _
SELECT *
FROM employees
WHERE firstname LIKE ‘Ali%’ or firstname LIKE ‘_li’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

%

A

Matches 0 or more characters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

_

A

Matches exactly 1 character
Can use multiple _ to represent more than 1 character

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

REGEXP_REPLACE()

A

REGEXP_REPLACE(column, ‘regrex’, replacement, ‘g’)

‘g’ = global

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

DIFFERENCE()

A

DIFFERENCE is used to compare the SOUNDEX values of two strings and returns a score between 0 and 4:
DIFFERENCE(vehicle_color, ‘GRAY’) = 4;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

SOUNDEX()

A

SOUNDEX is a phonetic algorithm used to match words or strings that sound similar in English. It converts a word to a code based on its pronunciation.
Example:
SOUNDEX(‘GRAY’) → G600
SOUNDEX(‘GREY’) → G600

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

EXPLAIN ANALYZE

A

can be used to provide insight into the execution plan of a SQL query. actually runs the query. use with caution.

EXPLAIN ANALYZE
SELECT * FROM patient WHERE age = 60;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

EXPLAIN

A

the EXPLAIN command alone can be used to review the planned execution path without the associated overhead of query execution.
EXPLAIN
SELECT * FROM patient WHERE age = 60;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

INDEX

A

CREATE INDEX index_title
ON table(column);

example:
CREATE INDEX idx_patient_age_cholesterol_restingbp
ON patient(age,cholesterol,restingbp);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Visualize how to query to confirm index exists

A

SELECT * FROM pg_indexes
WHERE tablename = ‘patient’
AND indexname = ‘idx_patient_age’;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Visualize how to extract a row data from a column, and create a new table, inserting the column data from an old table.

A

SELECT DISTINCT(chestpaintype) FROM patient;
CREATE TABLE IF NOT EXISTS chestpain (
id serial PRIMARY KEY,
type TEXT
);
INSERT INTO chestpain(type)
SELECT DISTINCT(chestpaintype) FROM patient;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Visualize how to alter a table with a foreign key

A

ALTER TABLE patient
ADD CONSTRAINT patient_chestpain_fk
FOREIGN KEY (chestpaintypeid) REFERENCES chestpain(id);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Visualize how to use INFORMATION_SCHEMA

A

is used to retrieve metadata about the columns of a table from the database.
Example:
SELECT *
FROM INFORMATION_SCHEMA.COLUMNS
WHERE table_schema = ‘public’ #this is a default
AND table_name = ‘table’;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Visualize how to check for columns using INFORMATION_SCHEMA

A

SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE
FROM INFORMATION_SCHEMA.COLUMNS
WHERE DATA_TYPE = ‘integer’;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Database transactions

A

A transaction is like a package of operations that are grouped together. Either all operations within the transaction succeed (committed) or none of them take effect (rolled back). It ensures ACID (Atomicity, Consistency, Isolation, Durability).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How do you create a database transaction?

A

Use BEGIN and COMMIT
BEGIN starts the transaction, and all statements are part until transaction is ended with COMMIT.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is ACID?

A

Atomicity, Consistency, Isolation, Durability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

ROLLBACK

A

If an error occurs or if you decide not to save the changes, you can use ROLLBACK instead of COMMIT.
ROLLBACK undoes all changes made during the transaction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

SERIAL

A

Create surrogate keys

CREATE TABLE table_name(
column_number SERIAL PRIMARY KEY
)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How to designate a foregin key

A

CREATE TABLE table_name(
column_name INT REFERENCES table(column) ON DELETE CASCADE)

When the column is deleted in the reference table, it automatically deletes when using ON DELETE CASCADE.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How do you use a foreign key explicity and set to NULL on delete?

A

CREATE TABLE orders (
ord_id SERIAL PRIMARY KEY,
prod_sku INT,
FOREIGN KEY (prod_sku) REFERENCES product(prod_sku) ON DELETE SET NULL
);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Break up a table to normalize it?

A

INSERT INTO normalized_table (col 1, col 2, col3)
SELECT DISTINCT col1,col2,col3 FROM big_table;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How do you update an existing constraint on a column?

A

1️⃣ Find the existing constraint name
SELECT conname FROM pg_constraint WHERE conrelid = ‘orders’::regclass AND contype = ‘c’;
2️⃣ Drop the old constraint
ALTER TABLE orders DROP CONSTRAINT chk_ord_priority;
3️⃣ Add the updated constraint
ALTER TABLE orders ADD CONSTRAINT chk_ord_priority CHECK (ord_priority IN (‘L’, ‘M’, ‘H’, ‘C’, ‘X’));
4️⃣ Fix invalid values if needed
UPDATE orders SET ord_priority = ‘L’ WHERE ord_priority NOT IN (‘L’, ‘M’, ‘H’, ‘C’, ‘X’);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is pg_constraint?

A

pg_constraint is a system catalog table in PostgreSQL that stores all constraints for all tables in the database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What are the constraints stored in pg_constraint?

A

Primary Keys (p)
Foreign Keys (f)
Unique Constraints (u)
Check Constraints (c)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Importing data into PostgreSQL

A
  1. create the table
  2. Use the copy command
    COPY my_table (col1, col2, col3)
    FROM ‘/path/to/file.csv’
    DELIMITER ‘,’
    CSV HEADER;
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

INTEGER GENERATED ALWAY AS IDENTITY

A

CREATE TABLE table_name (
col1 INTEGER GENERATED ALWAYS AS IDENTITY (START WITH 100 INCREMENT BY 10) PRIMARY KEY,
col2 VARCHAR(100)
);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What happen when you join two tables where the column names are the same?

A

You have to use dot notation?

ex.
– Select name fields (with alias) and region
SELECT cities.name AS city , countries.name AS country, region
FROM cities
INNER JOIN countries
ON cities.country_code = countries.code;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

USING

A

a join operations when both tables share a column with the same name.
Instead of:
SELECT * FROM table1 JOIN table2 ON table1.col = table2.col;
you can write:
SELECT * FROM table1 JOIN table2 USING(col);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Visualize using multiple joins

A

SELECT A., B., C.*
FROM A
JOIN B ON A.id = B.a_id
JOIN C ON B.id = C.b_id;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

T or F: You can mix different types of joins (INNER, LEFT, RIGHT, FULL) depending on your needs.

A

True.
SELECT A., B., C.*
FROM A
LEFT JOIN B ON A.id = B.a_id
INNER JOIN C ON B.id = C.b_id;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

T or F: Order affects your result when using an INNER or LEFT/RIGHT join.

A

False. For inner joins, the order is generally interchangeable (thanks to the associative property of inner joins). However, for outer joins, the order can affect which rows are retained.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Visualize using on multiple joins

A

SELECT *
FROM A
JOIN B USING(common_column)
JOIN C USING(common_column);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Using AND for multiple joins

A

SELECT*
FROM left_table
INNER JOIN right_table
ON left_table.id = right_table.id
AND left_table.date = right_table.date

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

T or F: When converting a Left to a Right Join, you only have to change the join.

A

False. when converting a LEFT JOIN to a RIGHT JOIN, change both the type of join and the order of the tables to get equivalent results. You would get different results if you only changed the table order. The order of fields you are joining ON still does not matter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What query
Removes dead tuples (deleted rows still taking up space).
Reclaims disk space.
Updates query statistics at the same time

A

VACUUM ANALYZE;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

How do you update statistics?

A

ANALYZE;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Where do you place the foreign key on a table?

A

1-to-1 (1:1): In the table that depends on the other.
1-to-Many (1:M): In the table on the “Many” side.
Many-to-Many (M:M): Create a junction table with two foreign keys.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

How do you decide if you should add indexes onto a table?

A

If it has low sparsity (fewer unique values, meaning more repeated values).
If it is frequently used in joins and aggregations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

UNION

A

UNION takes two tables as input, and returns all records from both tables, excluding duplicates

SELECT* FROM left_table
UNION
SELECT* FROM right_table;

44
Q

UNION ALL

A

UNION ALL takes two tables and returns all records from both tables, including duplicates
SELECT* FROM left_table
UNION ALL
SELECT* FROM right_table;

45
Q

Begin your query by selecting all fields from economies2015.
Create a second query that selects all fields from economies2019.
Perform a set operation to combine the two queries you just created, ensuring you do not return duplicates.

A

– Select all fields from economies2015
SELECT * FROM economies2015
– Set operation
UNION
– Select all fields from economies2019
SELECT * FROM economies2019
ORDER BY code, year;

46
Q

INTERSECT

A

INTERSECT is a set operation that returns only the common rows between two queries without duplicates
SELECT col FROM table1
INTERSECT
SELECT col FROM table2;

47
Q

What is the difference between JOIN and INTERSECT

A

INTERSECT requires the tables to have the same number of columns and matching data types. While JOIN does not require queries to have the same columns or data types.

48
Q

EXCEPT

A

Returns only the tables in the left table that aren’t in the right table.
SELECT col FROM table1
EXCEPT
SELECT col FROM table2;

49
Q

Semi Join

A

it only checks for existence of the columns and data in another table. Think of it as a JOIN that doesn’t include extra columns from the second table.

50
Q

What would this Semi Join return?
SELECT e.name, e.department
FROM employees e
WHERE EXISTS (
SELECT 1 FROM managers m
WHERE e.manager_id = m.id
);

A

Selects employees who have a manager in the managers table.
Returns only data from employees (no extra columns from managers).
If a manager exists, the row is included.

51
Q

Anti-Join

A

returns rows from the first table that have NO match in the second table.
SELECT e.name, e.department
FROM employees e
WHERE NOT EXISTS (
SELECT 1 FROM managers m
WHERE e.manager_id = m.id
);

52
Q

Subquery

A

subquery is a query inside another query.
It is enclosed in parentheses.
It can return a single value (scalar subquery) or multiple rows (table subquery).

53
Q

Visualize a subquery with IN (Semi Join)

A

SELECT name, department
FROM employees
WHERE department IN (
SELECT dept_name FROM departments
);

54
Q

Visualize a subquery NOT IN (anti join

A

SELECT name, department
FROM employees
WHERE department NOT IN (
SELECT dept_name FROM departments
);

55
Q

Identify the languages spoken in the Middle East. The languages table contains information about languages and countries, but not the region. Use a semi join on the languages and countries table to get the “name” for the languages spoken in the “region” Middle East

A

SELECT DISTINCT name
FROM languages
– Add syntax to use bracketed subquery below as a filter
WHERE code IN
(SELECT code
FROM countries
WHERE region = ‘Middle East’)
ORDER BY name;

56
Q

Write a query that returns the names from the countries table where the continent is Oceania that are not in the currencies table using the country code

A

SELECT code, name
FROM countries
WHERE continent = ‘Oceania’
– Filter for countries not included in the bracketed subquery
AND code NOT IN
(SELECT code
FROM currencies);

57
Q

Using the countries and cities tables, Return the name, country_code and urbanarea_pop for all capital cities in the countries table

A

– Select relevant fields from cities table
SELECT name, country_code, urbanarea_pop
FROM cities
– Filter using a subquery on the countries table
WHERE name IN
(
select capital
FROM countries
)
ORDER BY urbanarea_pop DESC;

58
Q

Write a LEFT JOIN with countries on the left and the cities on the right, joining on country code.
In the SELECT statement of your join, include country names as country, and count the cities in each country, aliased as cities_num.
Sort by cities_num (descending), and country (ascending), limiting to the first nine records.

A

– Find top nine countries with the most cities
SELECT countries.name AS country, COUNT(*) AS cities_num
FROM countries
LEFT JOIN cities
ON countries.code = cities.country_code
GROUP BY country
– Order by count of cities as cities_num
ORDER BY cities_num DESC, country
LIMIT 9;

59
Q

Use the WHERE clause to enable the correct country codes to be matched in the cities and countries columns.

A

SELECT countries.name AS country,
– Subquery that provides the count of cities
(SELECT COUNT(*)
FROM cities
WHERE cities.country_code = countries.code) AS cities_num
FROM countries
ORDER BY cities_num DESC, country
LIMIT 9;

60
Q

T or F: This code will not produce an error
SELECT code, COUNT(*) AS local_name
FROM languages

A

False. , you need a GROUP BY clause to tell SQL how to group the data before applying the aggregation. Otherwise it will not know what to do with code.

61
Q

T or F: Group By is always required when using aggregate functions like COUNT(), SUM(), AVG(), etc.

A

False. GROUP BY is required when using aggregate functions with non-aggregated columns included in a SELECT statement.

62
Q

Wheren do you want to use a subquery in a FROM clause instead of WHERE?

A

You need to aggregate data (like COUNT(*)) before filtering.
You need to join with another table.
The subquery returns multiple columns (which is not possible in a WHERE clause).

63
Q

Using the the countries and languages table, select local_name from countries, with the aliased lang_num from your subquery and use WHERE to match the code field from countries and sub.

A

– Select local_name and lang_num from appropriate tables
SELECT local_name, sub.lang_num
FROM countries,
(SELECT code, COUNT(*) AS lang_num
FROM languages
GROUP BY code) AS sub
– Where codes match
WHERE countries.code = sub.code
ORDER BY lang_num DESC;

64
Q

Why would this code fail?
SELECT local_name
FROM countries
WHERE code IN (
SELECT code
FROM languages
GROUP BY code
HAVING COUNT(*) > 1
)
ORDER BY lang_num DESC;

A

WHERE can’t reference an aggregated column (COUNT(*)) directly unless inside HAVING.
You can’t order by lang_num because it isn’t in SELECT!

65
Q

💡 Use a CASE statement to classify match results based on scores!
WHEN home_goal > away_goal → “Home win!”
WHEN home_goal < away_goal → “Home loss :(“
ELSE → “Tie”

A

SELECT
– Select the date of the match
date,
– Identify home wins, losses, or ties
CASE WHEN home_goal > away_goal THEN ‘Home win!’
WHEN home_goal < away_goal THEN ‘Home loss :(‘
ELSE ‘Tie’ END AS outcome
FROM matches_spain;

66
Q

Use a CASE statement and LEFT JOIN to track Barcelona’s home game results!
LEFT JOIN teams_spain → Connects match data with team names.
CASE statement → Determines match outcome:
Win if home_goal > away_goal
Loss if home_goal < away_goal
Tie otherwise
WHERE m.hometeam_id = 8634 → Filters for Barcelona’s home matches.

A

SELECT
m.date,
t.team_long_name AS opponent,
– Complete the CASE statement with an alias
CASE WHEN m.home_goal > m.away_goal THEN ‘Barcelona win!’
WHEN m.home_goal < m.away_goal THEN ‘Barcelona loss :(‘
ELSE ‘Tie’ END AS outcome
FROM matches_spain AS m
LEFT JOIN teams_spain AS t
ON m.awayteam_id = t.team_api_id
– Filter for Barcelona as the home team
WHERE m.hometeam_id = 8634;

67
Q

Create a CASE statement that identifies the id of matches played in the 2012/2013 season from the Country’s table.
Join the country’s table to the match table. Specify that you want ELSE values to be NULL.
Wrap the CASE statement in a COUNT function and group the query by the country alias.

A

SELECT
c.name AS country,
– Count games from the 2012/2013 season
COUNT(CASE WHEN m.season = ‘2012/2013’
THEN m.id ELSE NULL END) AS matches_2012_2013
FROM country AS c
LEFT JOIN match AS m
ON c.id = m.country_id
– Group by country name alias
GROUP BY country;

68
Q

VISUALIZE USING A COUNT statement in CASE to get the years aliased as matches.

A

SELECT
c.name AS country,
– Count matches in each of the 3 seasons
COUNT(CASE WHEN m.season = ‘2012/2013’ THEN m.id END) AS matches_2012_2013,
COUNT(CASE WHEN m.season = ‘2013/2014’ THEN m.id END) AS matches_2013_2014,
COUNT(CASE WHEN m.season = ‘2014/2015’ THEN m.id END) AS matches_2014_2015
FROM country

69
Q

Visualize using case statements to return ones and 0’s so you can get the sums of a true and false scenarios.

A

SELECT
c.name AS country,
– Sum the total records in each season where the home team won
SUM(CASE WHEN m.season = ‘2012/2013’ AND m.home_goal > m.away_goal
THEN 1 ELSE 0 END) AS matches_2012_2013,
SUM(CASE WHEN m.season = ‘2013/2014’ AND m.home_goal > m.away_goal
THEN 1 ELSE 0 END) AS matches_2013_2014,
SUM(CASE WHEN m.season = ‘2014/2015’ AND m.home_goal > m.away_goal
THEN 1 ELSE 0 END) AS matches_2014_2015
FROM country AS c
LEFT JOIN match AS m
ON c.id = m.country_id
– Group by country name alias
GROUP BY country;

70
Q

How can you calculate the percentage of tied games in a season using a CASE statement?

Uses AVG + CASE to calculate the proportion of games that ended in a tie:
Tied match (home_goal = away_goal) → 1
Non-tied match (home_goal != away_goal) → 0
The average of these values gives the percentage of ties.
ROUND(AVG(…), 2) ensures the percentage is rounded to 2 decimal places

A

SELECT
c.name AS country,
– Round the percentage of tied games to 2 decimal points
ROUND(AVG(CASE WHEN m.season=’2013/2014’ AND m.home_goal = m.away_goal THEN 1
WHEN m.season=’2013/2014’ AND m.home_goal != m.away_goal THEN 0
END),2) AS pct_ties_2013_2014,
ROUND(AVG(CASE WHEN m.season=’2014/2015’ AND m.home_goal = m.away_goal THEN 1
WHEN m.season=’2014/2015’ AND m.home_goal != m.away_goal THEN 0
END),2) AS pct_ties_2014_2015

71
Q

How do you filter a table using a subquery in the WHERE clause?
Selects team_long_name and team_short_name from the team table.
Uses a subquery to find all hometeam_IDs from the match table where the home team scored 8 or more goals.
Filters the team table to show only teams that appear in the subquery.

A

SELECT
– Select the team long and short names
team_long_name,
team_short_name
FROM team
– Filter for teams with 8 or more home goals
WHERE team_api_id IN
(SELECT hometeam_ID
FROM match
WHERE home_goal >= 8);

72
Q

Why is EXISTS more efficient than IN in large datasets?

A

EXISTS stops evaluating after finding the first matching record, making it more efficient than IN in large datasets.

73
Q

What does this code do?
SELECT
c.name AS country_name,
COUNT(sub.id) AS matches
FROM country AS c
INNER JOIN (SELECT country_id, id
FROM match
WHERE (home_goal + away_goal) >= 10) AS sub
ON c.id = sub.country_id
GROUP BY country_name;

A

It helps filter matches first before joining, which can improve readability. It selects country_id (to track which country the match belongs to). It filters only matches where the total goals scored (home + away) is 10 or more.
This joins the subquery (sub) with the country table (c): The INNER JOIN ensures that only countries with at least one high-scoring match (10+ goals) appear in the final results.This ensures we count matches per country, rather than returning one big total.

74
Q

Visualize a query that:
✔ AVG() Function: Calculates the average goals scored in matches.
✔ GROUP BY: Groups by stage to calculate stage-specific averages.
✔ WHERE Clause: Filters only stages where the average goals exceed the season average.
✔ ROUND(): Rounds the output for better readability.

A

SELECT
s.stage,
ROUND(s.avg_goals,2) AS avg_goals
FROM
(SELECT
stage,
AVG(home_goal + away_goal) AS avg_goals
FROM match
WHERE season = ‘2012/2013’
GROUP BY stage) AS s
WHERE
s.avg_goals > (SELECT AVG(home_goal + away_goal)
FROM match WHERE season = ‘2012/2013’);

75
Q

CTE

A

WITH cte_name AS (
SELECT column1, column2
FROM table_name
WHERE condition
)
SELECT * FROM cte_name;

76
Q

What questions do you ask yourself when setting up CTE?

A

✅ What is the main goal of the query?
✅ What tables do I need?
✅ What filtering conditions do I need?
✅ What calculations do I need?

77
Q

How Does SQL Know the Difference Between a Subquery and a Correlated Query?

A

Even though subqueries and correlated queries use similar syntax (nested queries inside parentheses), SQL knows the difference based on whether the subquery references the main query.

78
Q

What is a subquery

A

A subquery is a query that is executed first, and its result is passed to the main query.
It is independent of the outer query.

79
Q

Visualize a correlated query that finds the title and ratings of movies where the rating is above average.

A

SELECT title, rating
FROM movies m1
WHERE rating > (
SELECT AVG(rating)
FROM movies m2
WHERE m1.movie_id = m2.movie_id
);

80
Q

OVER()

A

The OVER() function is used with window functions like RANK(), DENSE_RANK(), ROW_NUMBER(), and SUM(). It allows calculations over a specific set of rows while keeping all the rows in the result.

81
Q

What are window functions?

A

A window partition in SQL is a way to divide rows into groups (partitions) within a result set and apply window functions (like ROW_NUMBER(), RANK(), AVG(), SUM(), etc.) to each partition separately.

82
Q

How does the OVER() clause work with the WINDOW function?

A

The OVER() clause defines how a window function is applied to partitions.

83
Q

Visualize the basic syntax for a window function used with over()

A

SELECT column_name,
WINDOW_FUNCTION() OVER (PARTITION BY column_name ORDER BY column_name) AS alias

84
Q

What are examples of window functions?

A

SUM(), AVG(), RANK(), ROW_NUMBER(), DENSE_RANK(), LAG()

85
Q

PARTITION BY

A

keep all rows, but calculations are applied within each group

86
Q

Sliding Window

A

Focus on a small set of rows at a time.
✅ The window moves row by row, changing which rows are visible.
✅ You can calculate things like moving averages, running totals, or rankings.

87
Q

Visualize using ROWS BETWEEN <start> AND <finish></finish></start>

A

SELECT
col1 , co2,
AVG(col3) OVER (
ORDER BY col4
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS col_avg
FROM table;

88
Q

Keywords to use with ROWS BETWEEN <start> AND <finish></finish></start>

A

PRECEDING - Rows before the current row
FOLLOWING - Rows after the current row
UNBOUNDED PRECEDING -Starts from the very first row
UNBOUNDED FOLLOWING - Includes all future rows
CURRENT ROW - The row you’re looking at now

89
Q

LAG()

A

a window function that retrieves data from the previous row in a specified order without collapsing rows.

90
Q

Visualize the basic syntax for LAG()

A

LAG(column_name, offset, default_value) OVER (ORDER BY column_name)

91
Q

LEAD()

A

A window function that looks ahead
LEAD(Column, n)
N is the number of rows to look ahead

92
Q

FIRST_VALUE(), LAST_VALUE

A

Window functions that always check either the first value or the last value
FIRST_VALUE(column)
LAST_VALUE(column)

93
Q

What must we type with FIRST_VALUE and LAST_VALUE() to get mySQL to look at the entire dataset?

A

UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING

LAST_VALUE(student) OVER (
ORDER BY spot ASC
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)

94
Q

RANK() WITH partition vs. RANK() without

A

RANK() WITHOUT Partition 🐰 -Ranks all rows together as one big group.
Example: 🥇(1), 🥈(2), 🥈(2), 🥉(4)

RANK() WITH Partition 🍎 -Restarts ranks for each smaller group.
Example (Partition by class):
Class A: 🥇(1), 🥈(2), 🥉(3)
Class B: 🥇(1), 🥈(2), 🥈(2)

95
Q

RANK() vs. DENSE_RANK()

A

RANK() - Skips numbers if there’s a tie (1,2,2,4…)
DENSE_RANK() - Doesn’t skip numbers if there’s a tie (1,2,2,3…)

96
Q

What is paging?

A

If you have lots of data (like thousands of names), you split it into smaller pages to easily read and manage it.

97
Q

NTILE()

A

To easily split data into equal groups (like finding the top 25% of students).
To create pages when showing data on websites (showing 10 items per page).
To quickly group rankings (top group, middle groups, lower groups).

98
Q

Visualize using a CTE() named Events and split the column olympic_event into 111 unique groups and order by olympic_event in alphabetical order.

A

WITH Events AS (
SELECT DISTINCT olympic_event
FROM Summer_Medals)
SELECT
— Split up the distinct events into 111 unique groups
olympic_event,
NTILE(111) OVER (ORDER BY olympic_event ASC) AS Page # split and number your events into pages.
FROM Events
ORDER BY olympic_event ASC; # how the final results are shown on your screen

99
Q

Frame

A

A frame tells SQL how many rows to look at when using a window function like SUM(), AVG(), or LAST_VALUE().

100
Q

What is the default Frame?

A

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

🔹 Starts from the first row
🔹 Ends at the current row

101
Q

What does this frame mean?
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

A

🔹 Starts at the first row
🔹 Ends at the last row

102
Q

n PRECEDING

A

Look at n rows before current row

103
Q

CURRENT ROW

A

Just this row

104
Q

n FOLLOWING

A

Look at n rows after current row

105
Q

ROWS BETWEEN 3 PRECEDING AND CURRENT ROW

A

Looks at 3 rows before + current row

106
Q

ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING

A

Looks at 1 row before + 1 row after

107
Q

ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING

A

Looks at rows 5 to 1 before current row