pandas Flashcards
importing, filtering, slicing, querying
Create a DataFrame df from this dictionary data which has the index labels.
df = pd.DataFrame(data, index=labels)
Display a summary of the basic information about this DataFrame and its data.
Return the first 3 rows of the DataFrame df.
or equivalently
Select just the ‘animal’ and ‘age’ columns from the DataFrame df.
df.loc[:, [‘animal’, ‘age’]]
df[[‘animal’, ‘age’]]
Select the data in rows [3, 4, 8] and in columns [‘animal’, ‘age’].
df.loc[df.index[[3, 4, 8]], [‘animal’, ‘age’]]
Select only the rows where the number of visits is greater than 3.
df[df[‘visits’] > 3]
Select the rows where the age is missing, i.e. is NaN.
Select the rows where the animal is a cat and the age is less than 3.
df[(df[‘animal’] == ‘cat’) & (df[‘age’] < 3)]
Select the rows the age is between 2 and 4 (inclusive).
df[df[‘age’].between(2, 4)]
Change the age in row ‘f’ to 1.5.
df.loc[‘f’, ‘age’] = 1.5
Calculate the sum of all visits (the total number of visits).
Calculate the mean age for each different animal in df.
Append a new row ‘k’ to df with your choice of values for each column. Then delete that row to return the original DataFrame.
df.loc[‘k’] = [5.5, ‘dog’, ‘no’, 2]
and then deleting the new row…
df = df.drop(‘k’)
Count the number of each type of animal in df.
Sort df first by the values in the ‘age’ in decending order, then by the value in the ‘visit’ column in ascending order.
df.sort_values(by=[‘age’, ‘visits’], ascending=[False, True])