pandas Flashcards
importing, filtering, slicing, querying
Create a DataFrame df from this dictionary data which has the index labels.
df = pd.DataFrame(data, index=labels)
Display a summary of the basic information about this DataFrame and its data.
df.info()
…or…
df.describe()
Return the first 3 rows of the DataFrame df.
df.iloc[:3]
or equivalently
df.head(3)
Select just the ‘animal’ and ‘age’ columns from the DataFrame df.
df.loc[:, [‘animal’, ‘age’]]
or
df[[‘animal’, ‘age’]]
Select the data in rows [3, 4, 8] and in columns [‘animal’, ‘age’].
df.loc[df.index[[3, 4, 8]], [‘animal’, ‘age’]]
Select only the rows where the number of visits is greater than 3.
df[df[‘visits’] > 3]
Select the rows where the age is missing, i.e. is NaN.
df[df[‘age’].isnull()]
Select the rows where the animal is a cat and the age is less than 3.
df[(df[‘animal’] == ‘cat’) & (df[‘age’] < 3)]
Select the rows the age is between 2 and 4 (inclusive).
df[df[‘age’].between(2, 4)]
Change the age in row ‘f’ to 1.5.
df.loc[‘f’, ‘age’] = 1.5
Calculate the sum of all visits (the total number of visits).
df[‘visits’].sum()
Calculate the mean age for each different animal in df.
df.groupby(‘animal’)[‘age’].mean()
Append a new row ‘k’ to df with your choice of values for each column. Then delete that row to return the original DataFrame.
df.loc[‘k’] = [5.5, ‘dog’, ‘no’, 2]
and then deleting the new row…
df = df.drop(‘k’)
Count the number of each type of animal in df.
df[‘animal’].value_counts()
Sort df first by the values in the ‘age’ in decending order, then by the value in the ‘visit’ column in ascending order.
df.sort_values(by=[‘age’, ‘visits’], ascending=[False, True])