advanced_pandas Flashcards
Given a dataframe ‘purchases’, how do you add a new column named ‘Date’ with the following values?
- December 1
- January 1
- mid-May
- df[‘Date’] = [‘December 1’, ‘January 1’, ‘mid-May’]
- What is a outer (full) join?
- What is an inner (intersection) join?
- Outer join: When there are two groups and we want to join everything between those two groups
- Inner join: When we have two groups and we only want to join what both groups have in common
- Given two DataFrames ‘staff_df’ and ‘student_df’:
- How do you outer join the two dataframes together?
- How do you inner join the two datframes together?
- How would you join if you wanted to get all ‘staff’ data, and any additional data if the staff were ‘students’ as well?
- How would you join if you wanted to get all ‘student’ data, and any additional data if the students were ‘staff’ as well?
- How would you join if you wanted to get all ‘staff’ data, and any additional data if the staff were ‘students’ as well AND you wanted to join on the specific common column ‘Name’?
- pd.merge(staff_df, student_df, how=’outer’, left_index=True, right_index=True)
- We have to tell Pandas that we want to ‘join on’ the left and right indices
- pd.merge(staff_df, student_df, how=’inner’, left_index=True, right_index=True_
- pd.merge(staff_df, student_df, how=’left’, left_index=True, right_index=True)
- pd.merge(staff_df, student_df, how=’right’, left_index=True, right_index=True)
- pd.merge(staff_df, student_df, how=’left’, left_on=’Name’, right_on=’Name’)
How would you merge the following two DataFrames together to generate totals into one DataFrame?
- products DataFrame:
Price
Product
Product ID
4109
5.0
Sushi Roll
1412
0.5
Egg
8931
1.5
Bagel
- invoices DataFrame
Customer
Product ID
Product
0
Ali
4109
1
1
Eric
1412
12
2
Ande
8931
6
3
Sam
4109
2
- pd.merge(products, invoices, left_index=True, right_on=’Product ID’)
Given a DataFrame ‘df’, how would you write a single query to do the following?
- Filter on ‘SUMLEV’ column that equals 50
- Remove the NaN values
- Set the index to the ‘STNAME’ and ‘CTYNAME’ columns
- Rename the column ‘ESTIMATESBASE2010’ to ‘Estimates Base 2010’
df.where(df[‘SUMLEV’] == 50)
.dropna()
.set_index([‘STNAME’, ‘CTYNAME’])
.rename(columns={‘ESIMATESBASE2010’ : ‘Estimates Base 2010’} ) )
Use method chaining to modify the DataFrame ‘df’ in one statement to drop any entries where ‘Quantity’ is 0 and rename the column ‘Weight’ to ‘Weight (oz.)’
df.drop(df[df[‘Quantity’] == 0].index)
.rename(columns={‘Weight’ : ‘Weight (oz.)’} ) )
What is the difference between axis=0 and axis=1 in Pandas?
- It specifies the axis along which the means are computed.
- axis=0 is within the column (vertical)
- axis=1 is across the row (horizontal)