SQL Flashcards
Calculate median in SQL
with t1 as (
select
row_number() as rn,
count() over () as cnt,
round(count() over () / 2) as mid
from table
)
select * from t1
where rn = mid
or (cnt % 2 = 0 and rn = mid + 1)
– Pull where rn = md, so the middle number. This works if dataset is odd.
– If dataset is even, pull where the rn is mid AND another row mid + 1
Tips for Optimizing a Query
- Minimize full table scans
- Filter data
- Use CTE’s
- Temp tables
- Avoid joins
- User inner (limits result set)
- No select *
- Avoid distinct / union
- Materialize frequently used queries
Get difference between two times
timestampsidff(
second,
time1,
time2
)
Get min/max between two or more columns
- least()
- greatest()
What type of language is SQL
Declarative (as opposed to procedural)
You describe the results you want - NOT how to do it
Group strings
group_contact()
list_agg()
Scalar function to add strings together
concat()
Result of FULL OUTER JOIN where join keys are null
All from A and all from B but NOT where A and B intersect
Result of FULL OUTER JOIN
All from A and B and intersection
count(*) vs count(1) vs count(col)
- count(*): All rows including nulls
- count(1): All rows including nulls
- count(col): All rows EXCEPT nulls
Find start and end of continuous ranges
1
2
3
7
8
with t1 as (
select
log_id,
log_id - row_number() over (order by log_id asc) as grp
from logs
)
select
min(log_id) as start_id,
max(log_id) as end_id
from t1
group by grp
Recursive function in SQL
with recursive t1 as (
select 1 as col1 union all select col1 + 1 from t1 where col1 < 10
)
select * from t1
Partial Dependencies
When non-key is dependent on part of a candidate key
studentid | projectno | studentname | projectname
studentname and projectname are both dependent on the ids in the table.
Problem with this is if we don’t have any students assigned to a project, then we can’t get the project name from anywhere
Pros of 3NF
- Fast writes
- Better organization
- Reduction of redundant data
2nd Normal Form Rules
- Be in 1NF
- Have no partial dependencies
Fan-out Trap
Two 1-to-many joins
Where to use an Index
- Typically on columns in WHERE and JOIN clauses
- Column that appears in max number of queries
When would Indexing fail
- Have too many indexes
- If index column is hardly queried
- Too many DML actions
Nested Joins
For every value in OUTER, look for it in INNER
Can be extremely slow
Cache
Temporarily storing an expressive operation to speed up results
Repeatable Read Isolation Level
Can read within same transaction without anything changing
Read Uncommitted Isolation Level
Can read uncommitted data (dirty reads)
Read Committed Isolation Level
Each read gets latest data committed
Purpose of database normalization
- Eliminating redundant data
- Ensure data dependencies make logical sense
- Prevent insert, update and removal anomalies