Advanced SQL Flashcards
Generally, numeric column types in various SQL databases do not support commas or currency symbols. True/False?
True
How can we change data type of a column?
Use CAST or CONVERT to change the data type
What is the syntax for casting data types?
CAST(column_name AS integer)
column_name::integer
SELECT
CAST(funding_total_usd AS varchar) AS funding_total_usd_string,
founded_at_clean::varchar AS founded_at_string
FROM tutorial.crunchbase_companies_clean_date
When you perform arithmetic on dates (such as subtracting one date from another), what is the data type of the result?
The results are often stored as the interval data type—a series of integers that represent a period of time.
What does the below code do?
SELECT companies.permalink,
companies.founded_at_clean,
companies.founded_at_clean::timestamp +
INTERVAL ‘1 week’ AS plus_one_week
FROM tutorial.crunchbase_companies_clean_date companies
WHERE founded_at_clean IS NOT NULL
The interval is defined using plain-English terms like ‘10 seconds’ or ‘5 months’. using the INTERVAL function.
Also note that adding or subtracting a date column and an interval column results in another date column as in the above query.
The code adds a week to each date in the column “companies.founded_at_clean”
What does below code do?
SELECT companies.permalink,
companies.founded_at_clean,
NOW() - companies.founded_at_clean::timestamp AS founded_time_ago
FROM tutorial.crunchbase_companies_clean_date companies
WHERE founded_at_clean IS NOT NULL
You can add the current time (at the time you run the query) into your code using the NOW()function
Does the below code work?
COUNT ( CASE WHEN acquisitions.acquired_at_cleaned-INTERVAL ‘3 years’ > companies.founded_at_clean::timestamp>= acquisitions.acquired_at_cleaned- INTERVAL ‘5 years’ THEN 1 ELSE NULL END ) AS between_3_to_5
(Using two interval data types for comparison)
No, it returns a syntax error. Using interval data types, work as below (one-sided)
COUNT (CASE WHEN acquisitions.acquired_at_cleaned <= companies.founded_at_clean::timestamp + INTERVAL ‘3 years’
THEN 1 ELSE NULL END) AS acquired_3_yrs
What is data wrangling?
data wrangling (or munging) is the process of programmatically transforming data into a format that makes it easier to work with.
Does using a particular function, in many cases, change the data to the appropriate type in SQL? Explain your answer
Yes. LEFT, RIGHT, and TRIM are all used to select only certain elements of strings, but using them to select elements of a number or date will treat them as strings for the purpose of the function.
What does the left function do in SQL? how is the syntax?
You can use LEFT to pull a certain number of characters from the left side of a string and present them as a separate string. The syntax is LEFT (string, number of characters).
What does the below code do?
SELECT incidnt_num,
date,
LEFT(date, 10) AS cleaned_date
FROM tutorial.sf_crime_incidents_2014_01
The code, extracts only the first 10 characters on the left of the entries in date column
What does the below code do?
SELECT incidnt_num,
date,
LEFT(date, 10) AS cleaned_date,
RIGHT(date, 17) AS cleaned_time
FROM tutorial.sf_crime_incidents_2014_01
It extracts 10 characters from the left side and creates a new column
and then extracts 17 characters from the right side and creates another column
What does the length function do?
The LENGTH function returns the length of a string.
What does the below code do?
SELECT incidnt_num,
date,
LEFT(date, 10) AS cleaned_date,
RIGHT(date, LENGTH(date) - 11) AS cleaned_time
FROM tutorial.sf_crime_incidents_2014_01
it’s the same as:
SELECT incidnt_num,
date,
LEFT(date, 10) AS cleaned_date,
RIGHT(date, 17) AS cleaned_time
FROM tutorial.sf_crime_incidents_2014_01
When using functions within other functions, it’s important to remember that the outermost functions will be evaluated first, followed by the functions encapsulated by them. True/False?
False. When using functions within other functions, it’s important to remember that the 🎆 innermost 🎆 functions will be evaluated first, followed by the functions that encapsulate them.
What does the TRIM function do? How is its syntax?
The TRIM function is used to remove characters from the beginning and end of a string.
SELECT location,
TRIM(both ‘()’ FROM location)
FROM tutorial.sf_crime_incidents_2014_01
The TRIM function takes 3 arguments.
First, you have to specify whether you want to remove characters from the beginning (‘leading’), the end (‘trailing’), or both (‘both’, as used above).
Next you must specify all characters to be trimmed. Any characters included in the single quotes will be removed from both beginning, end, or both sides of the string.
Finally, you must specify the text you want to trim using FROM.
What does the function POSITION do? How is its syntax?
POSITION allows you to specify a substring, then returns a numerical value equal to the character number (counting from left) where that substring first appears in the target string.
For example, the following query will return the position of the character ‘A’ (case-sensitive) where it first appears in the descript field
SELECT incidnt_num,
descript,
POSITION(‘A’ IN descript) AS a_position
FROM tutorial.sf_crime_incidents_2014_01
Is POSITION function case-sensitive?
Yes
What is another function that does the same thing as POSITION? how is its syntax?
You can also use the STRPOS function to achieve the same results as POSITION—just replace IN with a comma and switch the order of the string and substring:
SELECT incidnt_num,
descript,
STRPOS(descript, ‘A’) AS a_position
FROM tutorial.sf_crime_incidents_2014_01
Is STRPOS case-sensitive?
yes
What do you do If you want to look for a character regardless of its case?
You can make your entire string a single case string by using the UPPER or LOWER functions
What does SUBSTR do? What is its syntax?
LEFT and RIGHT both create substrings of a specified length, but they only do so starting from the sides of an existing string. If you want to start in the middle of a string, you can use SUBSTR.
The syntax is SUBSTR(string, starting character position, # of characters)
SELECT incidnt_num,
date,
SUBSTR(date, 4, 2) AS day
FROM tutorial.sf_crime_incidents_2014_01
This returns 01/31/2014 08:00:00 AM +0000 as 31 (note that positions start from 1 not 0, and the starting position is inclusive)
Do indices in SQL start from 1 or 0?
1
What does CONCAT do? How is its syntax?
You can combine strings from several columns together (and with hard-coded values) using CONCAT. Simply order the values you want to concatenate and separate them with commas. If you want to hard-code values, enclose them in single quotes. Here’s an example
SELECT incidnt_num,
day_of_week,
LEFT(date, 10) AS cleaned_date,
CONCAT(day_of_week, ‘, ‘, LEFT(date, 10)) AS day_and_date
FROM tutorial.sf_crime_incidents_2014_01
How can we use || instead of CONCAT?
Alternatively, you can use two pipe characters (||) to perform the same concatenation:
SELECT incidnt_num,
day_of_week,
LEFT(date, 10) AS cleaned_date,
day_of_week || ‘, ‘ || LEFT(date, 10) AS day_and_date
FROM tutorial.sf_crime_incidents_2014_01
What is the use of LOWER, UPPER functions? How are their syntaxes?
They change the strings to upper or lower case.
SELECT incidnt_num,
address,
UPPER(address) AS address_upper,
LOWER(address) AS address_lower
FROM tutorial.sf_crime_incidents_2014_01
What are 3 reasons for scrambled dates?
📌The data was manipulated in Excel at some point, and the dates were changed to MM/DD/YYYY format or another format that is not compliant with SQL’s strict standards.
📌The data was manually entered by someone who use whatever formatting convention he/she was most familiar with.
📌The date uses text (Jan, Feb, etc.) instead of numbers to record months.
What is the difference between timestamp and date data types in SQL?
timestamp includes additional precision (hours, minutes, seconds)
Assume we have a date or timestamp field, how can we extract different parts of it? Demonstrate with code, for a date column named cleaned_date
Use EXTRACT to pull the pieces apart one-by-one:
SELECT cleaned_date,
EXTRACT(‘year’ FROM cleaned_date) AS year,
EXTRACT(‘month’ FROM cleaned_date) AS month,
EXTRACT(‘day’ FROM cleaned_date) AS day,
EXTRACT(‘hour’ FROM cleaned_date) AS hour,
EXTRACT(‘minute’ FROM cleaned_date) AS minute,
EXTRACT(‘second’ FROM cleaned_date) AS second,
EXTRACT(‘decade’ FROM cleaned_date) AS decade,
EXTRACT(‘dow’ FROM cleaned_date) AS day_of_week
FROM tutorial.sf_crime_incidents_cleandate
How can we round dates to the beginning of a certain unit of measurement? For example, we want to round dates based on year, so that every date in the table for each year becomes January 1st of that year?
You can also round dates to the nearest unit of measurement. This is particularly useful if you don’t care about an individual date, but do care about the week (or month, or quarter) that it occurred in. The DATE_TRUNC function rounds a date to whatever precision you specify. The value displayed is the first value in that period. So when you DATE_TRUNC by year, any value in that year will be listed as January 1st of that year:
SELECT cleaned_date,
DATE_TRUNC(‘year’ , cleaned_date) AS year,
DATE_TRUNC(‘month’ , cleaned_date) AS month,
DATE_TRUNC(‘week’ , cleaned_date) AS week,
DATE_TRUNC(‘day’ , cleaned_date) AS day,
DATE_TRUNC(‘hour’ , cleaned_date) AS hour,
DATE_TRUNC(‘minute’ , cleaned_date) AS minute,
DATE_TRUNC(‘second’ , cleaned_date) AS second,
DATE_TRUNC(‘decade’ , cleaned_date) AS decade
FROM tutorial.sf_crime_incidents_cleandate
How can we get today’s date or time using SQL? Does it require a FROM clause?
You can instruct your query to pull the local date and time at the time the query is run using any number of functions. Interestingly, you can run them 📍 without 📍a FROM clause:
SELECT CURRENT_DATE AS date,
CURRENT_TIME AS time,
CURRENT_TIMESTAMP AS timestamp,
LOCALTIME AS localtime,
LOCALTIMESTAMP AS localtimestamp,
NOW() AS now
How can we make current time appear in a different time zone in SQL?
You can make a time appear in a different time zone using AT TIME ZONE:
SELECT CURRENT_TIME AS time,
CURRENT_TIME AT TIME ZONE ‘PST’ AS time_pst