Week 2: Data Extraction Flashcards
How do you get the unique values in a list?
list.unique()
What is the degree of a relation?
The number of columns
How do you get column sums in python?
df.sum()
What is the requirement of attribute values in a relation?
attribute values are normally required to be atomic: indivisible (not containing more than one value)
What are the 2 ways to create dataframes in python pandas?
data = {‘State’: [‘Ohio, ‘Ohio’, ‘Ohio’, ‘Nevada’, ‘Nevada’, ’Nevada'], ‘Year’: [2000, 2001, 2002, 2001, 2002, 2003], ‘Population’: [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]} df = pd.DataFrame(data)
data = [[‘Ohio’, 2000, 1.5], [‘Ohio’, 2001, 1.7], [‘Ohio’, 2002, 3.6], [Nevada, 2001, 2.4], [‘Nevada, 2002, 2.9], [‘Nevada’, 2003, 3.2]] cols = [‘State’, ‘Year’, ‘Population’] df = pd.DataFrame(data, columns = cols)
How do you get a vector containing the column names in python?
df.columns
How do you add a new column to a dataframe in python?
df[‘newcolumn’] = …
How do you do a left outer join in python?
pd.merge(df1, df2, on=‘attribute’, how = ‘left’)
How do you find mismatches of strings?
WHERE … NOT LIKE …
How do you match an exact substring and the beginning of a string?
‘Intro%’ #matches any string beginning with “Intro”
‘%Comp%’ #matches any string containing “Comp” as a substring
What are the distributive laws?
A ^ (BvC) = (A^B) v (A^C) A v (B^C) = (AvB) ^ (AvC)
What is the basic form of a select from where statement?
SELECT desired attributes
FROM one or more tables
WHERE condition about tuples of the tables
How do you concatenate dataframes row-wise in python?
pd.concat([df1, df2])
What does 5> unknown return?
Unknown
What are 2 ways to extract a single column by name in python?
df.director_name
df[“director_name”]
How do you extract the first 5 rows and column 2 in python?
df.iloc[0:5,1]
How do you force the elimination of duplicates in an SQL query?
SELECT DISTINCT
What are the comparison signs in SQL?
=, <> (not equal), , <=, >=
How do you get the first and last rows of a dataframe in python?
df. head(n) #get first n rows
df. tail(n) #get last n rows
What is a tuple comparison?
WHERE (instructor.ID, dept_name) = (teaches.ID, ‘Biology’)
How do you display the number of columns in python?
print(len(df.columns))
How do you extract multiple columns by name in python?
df[[‘actor’, ‘facebook_likes’, ‘content_rating’]]
What is a superkey of R?
K is a superkey of R if values for K are sufficient to identify a unique tuple of each possible relation
What are the absorptive laws?
A v (A^B) = A A ^ (AvB) = A