SQL Flashcards
relational database
A relational database is a database that organizes information into one or more tables.
SQL statement
SQL statement is text that the database recognizes as a valid command. Statements always end in a semi-colon ;.
let’s create a new table in SQL
CREATE TABLE celebs (id INTEGER, name TEXT, age INTEGER); CREATE TABLE is a clause that tells SQL you want to create a new table. 2. celebs is the name of the table. 3. (id INTEGER, name TEXT, age INTEGER) is a list of parameters defining each column in the table and its data type. id is the first column in the table. It stores values of data type INTEGER name is the second column in the table. It stores values of data type TEXT age is the third column in the table. It stores values of data type INTEGER
Add a row to the table.
INSERT INTO celebs (id, name, age) VALUES (1, ‘Justin Bieber’, 21);
How to query only interested columns.
SELECT is used every time you want to query data from a database. SELECT column1, column2 FROM table_name;
SELECT * FROM celebs;
You can also query data from all columns in a table with SELECT. SELECT statements always return a new table called the result set.
let’s edit a row
UPDATE celebs SET age = 25 WHERE id = 1;
Add a new column to the table.
ALTER TABLE celebs ADD COLUMN twitter_handle TEXT;
Delete all of the rows that have a NULL value in the twitter column.
DELETE FROM celebs WHERE twitter_handle IS NULL;
Create a new table with constraints on the values
CREATE TABLE awards ( id INTEGER PRIMARY KEY, recipient TEXT NOT NULL, award_name TEXT DEFAULT “Grammy”);
Constraints that add information about how a column can be used are invoked after specifying the data type for a column. They can be used to tell the database to reject inserted data that does not adhere to a certain restriction. examples?
- PRIMARY KEY columns can be used to uniquely identify the row. Attempts to insert a row with an identical value to a row already in the table will result in a constraint violation which will not allow you to insert the new row. 2. UNIQUE columns have a different value for every row. This is similar to PRIMARY KEY except a table can have many different UNIQUE columns. 3. NOT NULL columns must have a value. Attempts to insert a row without a value for a NOT NULL column will result in a constraint violation and the new row will not be inserted. 4. DEFAULT columns take an additional argument that will be the assumed value for an inserted row if the new row does not specify a value for that column.
WHATS SQL
SQL is a programming language designed to manipulate and manage data stored in relational databases.One of the core purposes of the SQL language is to retrieve information stored in a database. This is commonly referred to as querying. A relational database is a database that organizes information into one or more tables. A table is a collection of data organized into rows and columns.
how to renamed the “name” column to Movies in table “movies”
SELECT name AS ‘Movies’ FROM movies; AS is a keyword in SQL that allows you to rename a column or table using an alias. The new name can be anything you want as long as you put it inside of single quotes. It is important to remember that the columns have not been renamed in the table. The aliases only appear in the result.
DISTINCT statement
DISTINCT is used to return unique values in the output. It filters out all duplicate values in the specified column(s).e.g. SELECT DISTINCT tools FROM inventory;
WHERE clause
We can restrict our query results using the WHERE clause in order to obtain only the information we want. Following this format, the statement below filters the result set to only include top rated movies (IMDb ratings greater than 8): SELECT * FROM movies WHERE imdb_rating > 8;
what is is an operator
Operators create a condition that can be evaluated as either true or false. Comparison operators used with the WHERE clause are: = equal to != not equal to > greater than < less than
LIKE statement
LIKE is a special operator used with the WHERE clause to search for a specific pattern in a column. e.g. The movies table contains two films with similar titles, ‘Se7en’ and ‘Seven’. SELECT * FROM movies WHERE name LIKE ‘Se_en’; here, name LIKE ‘Se_en’ is a condition evaluating the name column for a specific pattern. Se_en represents a pattern with a wildcard character. The _ means you can substitute any individual character here without breaking the pattern. The names Seven and Se7en both match this pattern.
wildcard character % and _
% is a wildcard character that matches zero or more missing letters in the pattern. This statement below filters the result set to only include movies with names that begin with the letter ‘A’: SELECT * FROM movies WHERE name LIKE ‘A%’; another e.g. , any movie that contains the word ‘man’ in its name will be returned in the result below. SELECT * FROM movies WHERE name LIKE ‘%man%’; LIKE is not case sensitive. ‘Batman’ and ‘Man of Steel’ will both appear in the result of the query above.
The _ means you can substitute any individual character here without breaking the pattern.
To filter for all movies with an IMDb rating in the name column of table movies
SELECT name FROM movies WHERE imdb_rating IS NOT NULL;
BETWEEN operator
BETWEEN operator can be used in a WHERE clause to filter the result set within a certain range. The values can be numbers, text or dates. This statement filters the result set to only include movies with names that begin with letters ‘A’ up to but not including ‘J’. SELECT * FROM movies WHERE name BETWEEN ‘A’ AND ‘J’; In this statement, the BETWEEN operator is being used to filter the result set to only include movies with years between 1990 up to and including 1999. SELECT * FROM movies WHERE year BETWEEN 1990 AND 1999; Really interesting point to emphasize again: BETWEEN two letters is not inclusive. BETWEEN two numbers is inclusive.
how to combine multiple conditions in a WHERE clause to make the result set more specific and useful.
e.g. we use the AND operator to only return 90’s romance movies. SELECT * FROM movies WHERE year BETWEEN 1990 AND 1999 AND genre = ‘romance’;
Suppose we want to check out a new movie or something action-packed:
SELECT * FROM movies WHERE year > 2014 OR genre = ‘action’;
ORDER BY
ORDER BY always goes after WHERE! ORDER BY is a clause that indicates you want to sort the result set by a particular column. if we want to sort everything by the movie’s title from A through Z SELECT * FROM movies ORDER BY name; Sometimes we want to sort things in a decreasing order. For example, if we want to select all of the well-received movies, sorted from highest to lowest by their year: SELECT * FROM movies WHERE imdb_rating > 8 ORDER BY year DESC; DESC is a keyword used in ORDER BY to sort the results in descending order (high to low or Z-A). ASC is a keyword used in ORDER BY to sort the results in ascending order (low to high or A-Z). The column that we ORDER BY doesn’t even have to be one of the columns that we’re displaying.
CASE statement
It is SQL’s way of handling if/then logic. A CASE statement allows us to create different outputs (usually in the SELECT statement). SELECT name, CASE WHEN imdb_rating > 7 THEN ‘Good’ WHEN imdb_rating > 5 THEN ‘Okay’ END FROM movies; Each WHEN tests a condition. The following THEN gives us the string if the condition is true. CASE statement must end with END. Here, we set all movies above 8 as ‘Good’ and movies above 5 ‘Okay’.
code for - Use a CASE statement to create a column called mood that is: fun if genre is romance fun if genre is comedy serious in all other cases
SELECT name, CASE WHEN genre = ‘romance’ OR ‘comedy’ THEN ‘fun’ ELSE ‘serious’ END AS ‘mood’ FROM movies;
what are aggregates. recall some important aggregates
Calculations performed on multiple rows of a table are called aggregates.
Here is a quick preview of some important aggregates
COUNT: count the number of values
SUM: add up all of the values in a column
MAX/MIN: get the largest/smallest value
AVG: get the mean for all values in a column
ROUND: round the values in the column
The fastest way to calculate how many rows are in a table is to use what
The fastest way to calculate how many rows are in a table is to use the COUNT function.
COUNT is a function that takes the name of a column as an argument and counts the number of non-empty values in that column.
SELECT COUNT(*) FROM table_name;
Here, we want to count every row, so we pass *as an argument inside parenthesis.
define SUM function
SUM is a function that takes the name of a column as an argument and returns the sum of all the values in that column.e.g.
SELECT SUM(downloads) FROM fake_apps;
This adds all values in the downloads column.
define MAX and MIN function
MAX takes the name of a column as an argument and returns the largest value in that column. Here, we returned the largest value in the downloadscolumn.
MIN works the same way but it does the exact opposite; it returns the smallest value.e.g.
How many downloads does the most popular app have?
SELECT MAX(downloads) FROM fake_apps;
The most popular app has 31,090 downloads!
ROUND function , How does it work
ROUND function takes two arguments inside the parenthesis:
a column name
an integer
It rounds the values in the column to the number of decimal places specified by the integer.
SELECT ROUND(price, 0) FROM fake_apps;
Here, we pass the column price and integer 0 as arguments. SQL rounds the values in the column to zero decimal places in the output.
define GROUP BY clause
GROUP BY is a clause in SQL that is used with aggregate functions. It is used in collaboration with the SELECT statement to arrange identical data into groups.
The GROUP BY statement comes after any WHEREstatements, but before ORDER BY or LIMIT.e.g.
SELECT year, AVG(imdb_rating) FROM movies GROUP BY year ORDER BY year;
how to Make the code here simplified -
we might want to know how many movies have IMDb ratings that round to 1, 2, 3, 4, 5. We could do this using the following syntax:
SELECT ROUND(imdb_rating),
COUNT(name)
FROM movies
GROUP BY ROUND(imdb_rating)
ORDER BY ROUND(imdb_rating);
However, this query may be time-consuming to write and more prone to error.
The following query is equivalent to the above:
SELECT ROUND(imdb_rating), COUNT(name) FROM movies GROUP BY 1 ORDER BY 1;
Here, the 1 refers to the first column in our SELECT statement, ROUND(imdb_rating)
SQL lets us use column reference(s) in our GROUP BY that will make our lives easier.
1 is the first column selected, 2 is the second column selected, and so on.
Difference between WHERE and HAVING clause
HAVING is very similar to WHERE. In fact, all types of WHERE clauses you learned about thus far can be used with HAVING.
When we want to limit the results of a query based on values of the individual rows, we use WHERE.
When we want to limit the results of a query based on an aggregate property, we use HAVING.
HAVING statement always comes after GROUP BY, but before ORDER BY and LIMIT.e.g.
SELECT price,
ROUND(AVG(downloads))
FROM fake_apps
GROUP BY price;
Certain price points don’t have very many apps, so the average is less meaningful.
Add a HAVING clause to restrict the query to prices where the total number of apps at that price point is greater than 9.
SELECT price,
ROUND(AVG(downloads))
FROM fake_apps
GROUP BY price
HAVING COUNT(*) > 9;
Suppose we have the three tables below. How do you combine table orders and table customers using join clause,
SELECT *
FROM orders
JOIN customers
ON orders.customer_id = customers.customer_id
If we only want to select certain columns, we can specify which ones we want.e.g.
SELECT orders.order_id, customers.customer_name
FROM orders
JOIN customers
ON orders.customer_id = customers.customer_id
Describe what this code does -
SELECT *
FROM orders
JOIN subscriptions
ON orders.subscription_id = subscriptions.subscription_id
WHERE subscriptions.description = ‘Fashion Magazine’;
only selects rows from the join where description is equal to ‘Fashion Magazine’.
Again, WHERE statement is used to refine and limit the results of a query based on values of individual rows
What are inner joins?
Consider an inner join of two tables on table1.key = table2.key:
suppose each table has 3 rows. The first and last rows have matching values of key. The middle rows do not match. The final result has all values form the first and last row, but does not include the non-matching middle row.
What data does the code below retrieve
SELECT COUNT(\*) FROM newspaper;
SELECT COUNT(\*) FROM online;
SELECT COUNT(*)
FROM newspaper
JOIN online
ON newspaper.id = online.id;
An inner JOIN does not return unmatched rows, while LEFT JOIN does. Describe how it works
A left join will keep all rows from the first table, regardless of whether there is a matching row in the second table. The final result will keep all rows of the first table but will omit the un-matched row from the second table.
How do you define primary keys, and what are the requirements?
Def- a column in a table that uniquely identifies each row of that table.
Primary keys have a few requirements:
None of the values can be NULL
Each value must be unique (i.e., you can’t have two customers with the same customer_id in the customers table)
A table can not have more than one primary key column
How do you define a foreign key? and how is it used the task of joining tables?
When the primary key for one table appears in a different table, it is called a foreign key. So customer_id is a primary key when it appears in customers, but a foreign key when it appears in orders.
The most common types of joins will be joining a foreign key from one table with the primary key from another table. For instance, when we join orders and customers, we join on customer_id, which is a foreign key in orders and the primary key in customers.
What is a CROSS JOIN and how is it constructed?
CROSS JOIN is used to combine all rows of one table with all rows of another table. A cross join is used when you wish to create combination of every row from two tables. All row combinations are included in the result; this is commonly called cross product join. A common use for a cross join is to create obtain all combinations of items, such as colors and sizes. e.g
SELECT shirts.shirt_color, pants.pant_color
FROM shirts
CROSS JOIN pants;
The first two lines select the columns shirt_color and pant_color
The third line pulls data from the table shirts
The fourth line performs a CROSS JOINwith pants
Notice that cross joins don’t require an ONstatement. You’re not really joining on any columns!
If we have three different shirts (red, yellow, and blue) and two different pants (navy and black), the results might look like this:
shirt_color pant_color
red navy
red black
yellow navy
yellow black
blue navy
blue black
A more common usage of CROSS JOIN is when we need to compare each row of a table to a list of values. how?
code
Let’s start by counting the number of customers who were subscribed to the newspaper during March (3).
– First query
SELECT COUNT(*)
FROM newspaper
WHERE start_month < 3
AND end_month > 3;
The previous query lets us investigate one month at a time. In order to check across all months, we’re going to need to use a cross join.
Our database contains another table called months which contains the numbers between 1 and 12. Select all columns from the cross join of newspaper and months.
– Second query
SELECT *
FROM newspaper
CROSS JOIN months;
Create a third query where you add a where statement to your cross join. The column month should be greater than start_month, but less than end_month. This will select all months where a user was subscribed.
– Third query
SELECT *
FROM newspaper
CROSS JOIN months
WHERE start_month < month
AND end_month > month;
Create a final query where you aggregate over each month.
– Fourth query
SELECT months.month,
COUNT(*)
FROM newspaper
CROSS JOIN months
WHERE start_month < month
AND end_month > month
GROUP BY months.month;
What clause allows us to stack one dataset on top of the other.?
code
UNION -
Suppose we have two tables and they have the same columns:
table1
name email
Sonny sonny@foobar.com
Eric posture@boy.net
table2
name email
Chloe wallflower1991@gsnail.com
Ned sk8terboi@lotmail.com
SELECT *
FROM table1
UNION
SELECT *
FROM table2;
SQL has strict rules for appending data:
Tables must have the same # of columns.
The columns must have the same data types in the same order as the first table.
what is and when is the WITH clause used
WITH allows us to define a bunch of temporary tables that can be used in the final query. i.e. allows us to combine two tables, when one of the tables is the result of another calculation.
The WITH statement allows us to perform a separate query (such as aggregating customer’s subscriptions)
previous_results is the alias that we will use to reference any columns from the query inside of the WITH clause
We can then go on to join our results with another table. e.g.
WITH previous_query AS (
SELECT customer_id,
COUNT(subscription_id) as subscriptions
FROM orders
GROUP BY customer_id)
SELECT customers.customer_name,
previous_query.subscriptions
FROM previous_query
JOIN customers
ON customers.customer_id = previous_query.customer_id;
explain
SELECT last_name, first_name
FROM employees
WHERE last_name = ‘Johnson’
ORDER BY last_name DESC, first_name ASC;
When sorting your result set using the SQL Server ORDER BY clause, you can use the ASC and DESC attributes in a singleSELECT statement.This SQL Server ORDER BY would return all records sorted by the last_name field in descending order, with a secondary sort by first_name in ascending order.
expln wht ths does
INSERT INTO contacts
(contact_id, last_name, first_name)
SELECT employee_id, last_name, first_name
FROM employees
WHERE first_name = ‘Joanne’
AND employee_id >= 800;
the AND condition can be used in the INSERT statement.
This SQL Server AND condition example would insert into the contacts table, all employee_id, last_name, and first_namerecords from the employees table where the first_name is ‘Joanne’ and the employee_id is greater than or equal to 800.
When combining AND condition and the OR condition together, it is important to use parentheses so that the database knows what order to evaluate each condition.e.g.
SELECT employee_id, last_name, first_name
FROM employees
WHERE (last_name = ‘Smith’)
OR (last_name = ‘Anderson AND first_name = ‘Sarah’)
OR (employee_id > 1000 and state = ‘California’);
expln
SELECT DISTINCT first_name, last_name
FROM employees
WHERE employee_id >=50
ORDER BY last_name;
This SQL Server DISTINCT clause example would return each unique first_name and last_name combination from the employees table where the employee_id is greater than or equal to 50. The results are sorted in ascending order by last_name.
The SQL Server (Transact-SQL) IN condition is used to help reduce the need to use multiple OR conditions in a SELECT, INSERT, UPDATE, or DELETE statement. e.g.
SELECT *
FROM employees
WHERE last_name IN (‘Smith’, ‘Anderson’, ‘Johnson’);
is th sme as
SELECT *
FROM employees
WHERE last_name = ‘Smith’
OR last_name = ‘Anderson’
OR last_name = ‘Johnson’;