SQL Advanced Flashcards

1
Q

What does this query do?

SELECT companies.permalink,
companies.founded_at_clean,
acquisitions.acquired_at_cleaned,
acquisitions.acquired_at_cleaned -
companies.founded_at_clean::timestamp AS time_to_acquisition
FROM tutorial.crunchbase_companies_clean_date companies
JOIN tutorial.crunchbase_acquisitions_clean_date acquisitions
ON acquisitions.company_permalink = companies.permalink
WHERE founded_at_clean IS NOT NULL

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does this query do?

SELECT companies.permalink,
companies.founded_at_clean,
companies.founded_at_clean::timestamp +
INTERVAL ‘1 week’ AS plus_one_week
FROM tutorial.crunchbase_companies_clean_date companies
WHERE founded_at_clean IS NOT NULL

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Interval

A

The interval is defined using plain-English terms like ‘10 seconds’ or ‘5 months’. Also note that adding or subtracting a date column and an interval column results in another date column as in the above query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

NOW()

A

get the current server time

SELECT companies.permalink, companies.founded_at_clean, NOW() - companies.founded_at_clean::timestamp AS founded_time_ago FROM tutorial.crunchbase_companies_clean_date companies WHERE founded_at_clean IS NOT NULL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Write a query that counts the number of companies acquired within 3 years, 5 years, and 10 years of being founded (in 3 separate columns). Include a column for total companies acquired as well. Group by category and limit to only rows with a founding date.

A

SELECT companies.category_code,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL ‘3 years’
THEN 1 ELSE NULL END) AS acquired_3_yrs,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL ‘5 years’
THEN 1 ELSE NULL END) AS acquired_5_yrs,
COUNT(CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL ‘10 years’
THEN 1 ELSE NULL END) AS acquired_10_yrs,
COUNT(1) AS total
FROM tutorial.crunchbase_companies_clean_date companies
JOIN tutorial.crunchbase_acquisitions_clean_date acquisitions
ON acquisitions.company_permalink = companies.permalink
WHERE founded_at_clean IS NOT NULL
GROUP BY 1
ORDER BY 5 DESC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

LEFT

A

You can use LEFT to pull a certain number of characters from the left side of a string and present them as a separate string. The syntax is LEFT(string, number of characters).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

SELECT incidnt_num,
date,
LEFT(date, 10) AS cleaned_date
FROM tutorial.sf_crime_incidents_2014_01

What will this do with this database?

A

It will take the first 10 characters from the date column and move them to cleaned_date a string

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

RIGHT

A

same as left but works from the right side

SELECT incidnt_num, date, LEFT(date, 10) AS cleaned_date, RIGHT(date, 17) AS cleaned_time FROM tutorial.sf_crime_incidents_2014_01

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

LENGTH()

A

returns the length of a string

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

SELECT incidnt_num, date, LEFT(date, 10) AS cleaned_date, RIGHT(date, LENGTH(date) - 11) AS cleaned_time FROM tutorial.sf_crime_incidents_2014_01

What will this produce with this database?

A

inner functions are always evaluated first

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

TRIM

A

The TRIM function takes 3 arguments. First, you have to specify whether you want to remove characters from the beginning (‘leading’), the end (‘trailing’), or both (‘both’, as used above). Next you must specify all characters to be trimmed. Any characters included in the single quotes will be removed from both beginning, end, or both sides of the string. Finally, you must specify the text you want to trim using FROM.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

SELECT location, TRIM(both ‘()’ FROM location) FROM tutorial.sf_crime_incidents_2014_01

What will this do in this database?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

POSITION

A

POSITION allows you to specify a substring, then returns a numerical value equal to the character number (counting from left) where that substring first appears in the target string

POSITION(‘A’ IN position)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

SELECT incidnt_num,
descript,
POSITION(‘A’ IN descript) AS a_position
FROM tutorial.sf_crime_incidents_2014_01

What will this do in this database?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

SUBSTR

A

SUBSTR(*string*, *starting character position*, *# of characters*):

removes strings at a particular location

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

SELECT incidnt_num, date,

SUBSTR(date, 4, 2) AS day

FROM tutorial.sf_crime_incidents_2014_01

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Write a query that separates the location field into separate fields for latitude and longitude.

A

SELECT location,

TRIM(leading ‘(‘ FROM LEFT(location, POSITION(‘,’ IN location) - 1)) AS lattitude,

TRIM(trailing ‘)’ FROM RIGHT(location, LENGTH(location) - POSITION(‘,’ IN location) ) )

AS longitude

FROM tutorial.sf_crime_incidents_2014_01

18
Q

CONCAT

A

Simply order the values you want to concatenate and separate them with commas. If you want to hard-code values, enclose them in single quotes.

19
Q

SELECT incidnt_num, day_of_week,

LEFT(date, 10) AS cleaned_date, CONCAT(day_of_week, ‘, ‘, LEFT(date, 10)) AS day_and_date

FROM tutorial.sf_crime_incidents_2014_01

What will this produce?

A
20
Q

Concatenate the lat and lon fields to form a field that is equivalent to the location field. (Note that the answer will have a different decimal precision.)

A

SELECT lat,
lon,
location,
CONCAT(‘(‘,lat,’, ‘,lon,’)’) AS concat_loc
FROM tutorial.sf_crime_incidents_2014_01

21
Q

Create the same concatenated location field from lat and lon, but using the || syntax instead of CONCAT.

A

SELECT lat,
lon,
location,
‘(‘ || lat || ‘,’ || lon || ‘)’ AS concat_loc
FROM tutorial.sf_crime_incidents_2014_01

22
Q

Write a query that creates a date column formatted YYYY-MM-DD.

A

SELECT date,
CONCAT(SUBSTR(date, 7, 4), ‘-‘, SUBSTR(date, 1, 2), ‘-‘, SUBSTR(date, 4, 2)) AS con_date
FROM tutorial.sf_crime_incidents_2014_01

23
Q

Changing case with UPPER and LOWER

A

SELECT incidnt_num, address, UPPER(address) AS address_upper, LOWER(address) AS address_lower FROM tutorial.sf_crime_incidents_2014_01

24
Q

Write a query that returns the category field, but with the first letter capitalized and the rest of the letters in lower-case.

A

SELECT incidnt_num, category, UPPER(LEFT(category, 1)) || LOWER(RIGHT(category, LENGTH(category) - 1)) AS category_cleaned FROM tutorial.sf_crime_incidents_2014_01

25
Q

Write a query that creates an accurate timestamp using the date and time columns in tutorial.sf_crime_incidents_2014_01. Include a field that is exactly 1 week later as well.

A

SELECT date, time,
CONCAT(SUBSTR(date,7,4),’-‘,SUBSTR(date,1,2),’-‘,SUBSTR(date,4,2), ‘ ‘, time, ‘:00’)::timestamp AS date_time,
CONCAT(SUBSTR(date,7,4),’-‘,SUBSTR(date,1,2),’-‘,SUBSTR(date,4,2), ‘ ‘, time, ‘:00’)::timestamp + INTERVAL ‘1 week’ AS date_time_week
FROM tutorial.sf_crime_incidents_2014_01

26
Q

EXTRACT

A

SELECT cleaned_date,
EXTRACT(‘year’ FROM cleaned_date) AS year,
EXTRACT(‘month’ FROM cleaned_date) AS month,
EXTRACT(‘day’ FROM cleaned_date) AS day,
EXTRACT(‘hour’ FROM cleaned_date) AS hour,
EXTRACT(‘minute’ FROM cleaned_date) AS minute,
EXTRACT(‘second’ FROM cleaned_date) AS second,
EXTRACT(‘decade’ FROM cleaned_date) AS decade,
EXTRACT(‘dow’ FROM cleaned_date) AS day_of_week
FROM tutorial.sf_crime_incidents_cleandate

27
Q

DATE_TRUNC

A

SELECT cleaned_date, DATE_TRUNC(‘year’ , cleaned_date) AS year, DATE_TRUNC(‘month’ , cleaned_date) AS month, DATE_TRUNC(‘week’ , cleaned_date) AS week, DATE_TRUNC(‘day’ , cleaned_date) AS day, DATE_TRUNC(‘hour’ , cleaned_date) AS hour, DATE_TRUNC(‘minute’ , cleaned_date) AS minute, DATE_TRUNC(‘second’ , cleaned_date) AS second, DATE_TRUNC(‘decade’ , cleaned_date) AS decade FROM tutorial.sf_crime_incidents_cleandate

28
Q

Write a query that counts the number of incidents reported by week. Cast the week as a date to get rid of the hours/minutes/seconds.

A

SELECT

DATE_TRUNC(‘week’, cleaned_date)::date AS week_beginning,

COUNT(*) AS incidents

FROM tutorial.sf_crime_incidents_cleandate GROUP BY 1

ORDER BY 1

29
Q

COALESCE

A

In cases like this, you can use COALESCE to replace the null values

SELECT incidnt_num, descript, COALESCE(descript, ‘No Description’) FROM tutorial.sf_crime_incidents_cleandate ORDER BY descript DESC

30
Q

SELECT COUNT(incidnt_num),
COUNT(descript),
COUNT(COALESCE(descript, ‘No Description’))
FROM tutorial.sf_crime_incidents_cleandate

A
31
Q

subqueries

A

Subqueries (also known as inner queries or nested queries) are a tool for performing operations in multiple steps. For example, if you wanted to take the sums of several columns, then average all of those values, you’d need to do each aggregation in a distinct step.

32
Q

Write a query that selects all Warrant Arrests from the tutorial.sf_crime_incidents_2014_01 dataset, then wrap it in an outer query that only displays unresolved incidents.

A

SELECT warrents.*
FROM (
SELECT *
FROM tutorial.sf_crime_incidents_2014_01
WHERE descript = ‘WARRANT ARREST’) warrents
WHERE warrents.resolution = ‘NONE’

33
Q

What if you wanted to figure out how many incidents get reported on each day of the week? Better yet, what if you wanted to know how many incidents happen, on average, on a Friday in December? In January?

A

There are two steps to this process: counting the number of incidents each day (inner query), then determining the monthly average (outer query):

SELECT

LEFT(sub.date, 2) AS cleaned_month, sub.day_of_week,

AVG(sub.incidents) AS average_incidents FROM ( SELECT day_of_week, date, COUNT(incidnt_num) AS incidents FROM tutorial.sf_crime_incidents_2014_01 GROUP BY 1,2 ) sub

GROUP BY 1,2

ORDER BY 1,2

Step 1 count the number of incidents for each day and day of week. Step two then take the average for each month and day of week from the counts

34
Q

Write a query that displays the average number of monthly incidents for each category. Hint: use tutorial.sf_crime_incidents_cleandate to make your life a little easier.

A

SELECT sub.category,

AVG(sub.incidents) AS avg_incidents_per_month FROM

( SELECT

EXTRACT(‘month’ FROM cleaned_date) AS month, category,

COUNT(1) AS incidents

FROM tutorial.sf_crime_incidents_cleandate GROUP BY 1,2 ) sub

GROUP BY 1

35
Q

Subqueries in conditional logic

A

ou can use subqueries in conditional logic (in conjunction with WHERE, JOIN/ON, or CASE).

SELECT * FROM tutorial.sf_crime_incidents_2014_01 WHERE Date = (SELECT MIN(date) FROM tutorial.sf_crime_incidents_2014_01 )

The following query returns all of the entries from the earliest date in the dataset

36
Q

SELECT *
FROM tutorial.sf_crime_incidents_2014_01
WHERE Date IN (SELECT date
FROM tutorial.sf_crime_incidents_2014_01
ORDER BY date
LIMIT 5
)

A

NO ALIAS WHEN subqueries are in conditionals

IN allows you to use subqueries with multiple return values. This will return the top 5 dates

37
Q

SELECT incidents.*,
sub.incidents AS incidents_that_day
FROM tutorial.sf_crime_incidents_2014_01 incidents
JOIN ( SELECT date,
COUNT(incidnt_num) AS incidents
FROM tutorial.sf_crime_incidents_2014_01
GROUP BY 1
) sub
ON incidents.date = sub.date
ORDER BY sub.incidents DESC, time

A

The following query ranks all of the results according to how many incidents were reported in a given day. It does this by aggregating the total number of incidents each day in the inner query, then using those values to sort the outer query:

38
Q

Write a query that displays all rows from the three categories with the fewest incidents reported.

A

SELECT incidents.*, sub.count AS total_incidents_in_category

FROM tutorial.sf_crime_incidents_2014_01 incidents

JOIN ( SELECT category,

COUNT(*) AS count

FROM tutorial.sf_crime_incidents_2014_01

GROUP BY 1

ORDER BY 2

LIMIT 3 ) sub

ON sub.category = incidents.category

39
Q

Write a query that counts the number of companies founded and acquired by quarter starting in Q1 2012. Create the aggregations in two separate queries, then join them.

USING acquisitions and investments

A

SELECT COALESCE(companies.quarter, acquisitions.quarter) AS quarter,
companies.companies_founded,
acquisitions.companies_acquired
FROM (
SELECT founded_quarter AS quarter,
COUNT(permalink) AS companies_founded
FROM tutorial.crunchbase_companies
WHERE founded_year >= 2012
GROUP BY 1
) companies

LEFT JOIN (
SELECT acquired_quarter AS quarter,
COUNT(DISTINCT company_permalink) AS companies_acquired
FROM tutorial.crunchbase_acquisitions
WHERE acquired_year >= 2012
GROUP BY 1
) acquisitions

ON companies.quarter = acquisitions.quarter
ORDER BY 1

40
Q

EXPLAIN
SELECT *
FROM benn.sample_event_table
WHERE event_date >= ‘2014-03-01’
AND event_date < ‘2014-04-01’
LIMIT 100

A

The entry at the bottom of the list is executed first. So this shows that the WHERE clause, which limits the date range, will be executed first. Then, the database will scan 600 rows (this is an approximate number). You can see the cost listed next to the number of rows—higher numbers mean longer run time.

41
Q

SELECT teams.conference AS conference,
players.year,
COUNT(1) AS players
FROM benn.college_football_players players
JOIN benn.college_football_teams teams
ON teams.school_name = players.school_name
GROUP BY 1,2
ORDER BY 1,2

Pivot this to give the total players, the nuber of players per year by conference

A

SELECT conference,
SUM(players) AS total_players,
SUM(CASE WHEN year = ‘FR’ THEN players ELSE NULL END) AS fr,
SUM(CASE WHEN year = ‘SO’ THEN players ELSE NULL END) AS so,
SUM(CASE WHEN year = ‘JR’ THEN players ELSE NULL END) AS jr,
SUM(CASE WHEN year = ‘SR’ THEN players ELSE NULL END) AS sr
FROM (
SELECT teams.conference AS conference,
players.year,
COUNT(1) AS players
FROM benn.college_football_players players
JOIN benn.college_football_teams teams
ON teams.school_name = players.school_name
GROUP BY 1,2
) sub
GROUP BY 1
ORDER BY 2 DESC

42
Q
A