Week 2: Data Extraction Flashcards
How do you get the unique values in a list?
list.unique()
What is the degree of a relation?
The number of columns
How do you get column sums in python?
df.sum()
What is the requirement of attribute values in a relation?
attribute values are normally required to be atomic: indivisible (not containing more than one value)
What are the 2 ways to create dataframes in python pandas?
data = {‘State’: [‘Ohio, ‘Ohio’, ‘Ohio’, ‘Nevada’, ‘Nevada’, ’Nevada'], ‘Year’: [2000, 2001, 2002, 2001, 2002, 2003], ‘Population’: [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]} df = pd.DataFrame(data)
data = [[‘Ohio’, 2000, 1.5], [‘Ohio’, 2001, 1.7], [‘Ohio’, 2002, 3.6], [Nevada, 2001, 2.4], [‘Nevada, 2002, 2.9], [‘Nevada’, 2003, 3.2]] cols = [‘State’, ‘Year’, ‘Population’] df = pd.DataFrame(data, columns = cols)
How do you get a vector containing the column names in python?
df.columns
How do you add a new column to a dataframe in python?
df[‘newcolumn’] = …
How do you do a left outer join in python?
pd.merge(df1, df2, on=‘attribute’, how = ‘left’)
How do you find mismatches of strings?
WHERE … NOT LIKE …
How do you match an exact substring and the beginning of a string?
‘Intro%’ #matches any string beginning with “Intro”
‘%Comp%’ #matches any string containing “Comp” as a substring
What are the distributive laws?
A ^ (BvC) = (A^B) v (A^C) A v (B^C) = (AvB) ^ (AvC)
What is the basic form of a select from where statement?
SELECT desired attributes
FROM one or more tables
WHERE condition about tuples of the tables
How do you concatenate dataframes row-wise in python?
pd.concat([df1, df2])
What does 5> unknown return?
Unknown
What are 2 ways to extract a single column by name in python?
df.director_name
df[“director_name”]
How do you extract the first 5 rows and column 2 in python?
df.iloc[0:5,1]
How do you force the elimination of duplicates in an SQL query?
SELECT DISTINCT
What are the comparison signs in SQL?
=, <> (not equal), , <=, >=
How do you get the first and last rows of a dataframe in python?
df. head(n) #get first n rows
df. tail(n) #get last n rows
What is a tuple comparison?
WHERE (instructor.ID, dept_name) = (teaches.ID, ‘Biology’)
How do you display the number of columns in python?
print(len(df.columns))
How do you extract multiple columns by name in python?
df[[‘actor’, ‘facebook_likes’, ‘content_rating’]]
What is a superkey of R?
K is a superkey of R if values for K are sufficient to identify a unique tuple of each possible relation
What are the absorptive laws?
A v (A^B) = A A ^ (AvB) = A
What is a self join?
Self join is where a table joins with itself on two columns with different names but storing data from the same domain. distinguish copies by renaming the relations
How do you display the number of rows in python?
print(len(df))
What is the cardinality of a relation?
The number of rows
How do you select all attributes in an SQL query?
SELECT *
How do you rename an attribute in SQL? (2 ways)
using “as”
e.g. SELECT ID, name, salary/12.0 as monthly_salary
FROM instructor
or simply write salary/12.0 monthly_salary
do not need the “as”
What happens to unknown values in the where clause?
treated as false if it evaluates to unknown
What is the relational algebra symbol for renaming?
rho (p)
What are the advantages of composing relational algebra operators?
- the output of a relational algebra (RA) is a relation that can be input for another RA operator
- don’t need heaps of functions for each line of code
How do you order the results of an SQL query?
What order does it put them in?
SELECT *
FROM instructor
WHERE dept_name = ‘Physics’
ORDER BY name
-puts results in ascending order
What aggregate functions can be used in SQL? (5)
- avg: average value
- min: minimum value
- max: maximum value
- sum: sum of values
- count: number of values
What is the double negation law?
~(~A) = A
What are the complement laws?
Av ~A = T
A^ ~A = F
What are the 4 boolean laws including true and false
A^F = F AvF = A A^T = A AvT = T
How do you get a true/false vector for whether the value is null in a column in python?
df[‘column’].isnull()
What is the result of SELECT ‘A’ FROM instructor
returns a relation with one column and N rows(the number of tuples in instructor) where each row will contain the value ‘A’