Pandas Flashcards
Filter df for when Column is null
df[ df.Column.isnull() ]
Filter df for when Column is not null
df[ df.Column.notna() ]
Create a boolean series for when colA >100 AND colB <0
(df.colA >100 & df.colB <0)
Create a boolean series for when colA >100 OR colB <0
(df.colA >100 | df.colB <0)
Return a dataframe’s data types
df.dtypes
Return the dimensions of a dataframe
df.shape
Rename column MANUfacturer as ‘manufacturer’
df.rename( columns= {‘MANUfacturer’ : ‘manufacturer’}, inplace=True)
Convert a string column to a float
df[‘column’] = df[‘column’].astype(float)
Extract first prefix when string column is split by a dash
df.column.str.split(‘-‘).str[0]
Replace values in Column using a mapping dictionary
df.column = df.column.map( {‘Key’ : ‘newkey’, ‘Key1’ : ‘newkey1’ } )
Export dataframe to csv file without index values
df.to_csv(‘filename.csv’, index=False)
Get meta-data information for the columns of a dataframe
df.info()
Get the name of the columns in a dataframe
df.columns
Get descriptive statistics for a column
df.column.describe()
Get frequencies for each unique value in a column
df.column.value_counts()
Get the averages of col_B grouped by col_A
df.groupby(df.col_A).col_B.mean()
Apply the size, min, and max functions to the dataframe grouped by col_A
df.groupby(df.col_A).agg( [‘size’, min, max] )
Create a pivot table where col_V1 is sumed and col_V2 shows the min and max. Have col_1 and col_2 as rows and col_A and col_B as columns. Include grand totals.
df.pivot_table( values=['col_V1', col_V2'], index=['col_1', 'col_2'], columns=['col_A', 'col_B'], aggfunc={ 'col_v1': sum, 'col_V2': [min, max] }, margins=True)