Pandas Pt 2 (UCSD) Flashcards
replace all of the cells in a dataframe that have value 9999 with 0
df = df.replace(9999, 0)
fill missing values in a dataframe with the last known value before it, or after it
df. fillna(method=’ffill’) ##forward
df. fillna(method=’backfill’) ##backward
drop rows or columns with any NaN values
df. dropna(axis=0) ## rows
df. dropna(axix=1) ## columns
interperolate missing values. default is linear interpolation
df.interpolate() ## fills in missing values using a linear interpolation, but there are others
create a dataframe with boolean values, where TRUE is set for any null values
df.isnull()
some common plot functions (but many more), df.plot.func()
funcs = bar(), box(), hist(), plot(), line(), pie(), scatter() ## would call differently in jupyter (w/o .plot)
use a magic function in jupyter to use matplotlib in jupyter
%matplotlib inline
get the histogram of the ratings column of the df dataframe in jupyter notebook
df.hist( column = ‘ratings’, figsize = (15,10) ) ## figsize is the size that will be plotted in the notebook
get the boxplot of the ratings column of the df dataframe in jupyter notebook
df.boxplot( column = ‘ratings’, figsize = (15,10) ) ## figsize is the size that will be plotted in the notebook
return all of the rows from a dataframe where ‘col2’ values are greater than 5
df[ df[‘col2’ > 5]
delete rows indexed 5 and 6 in a dataframe
df.drop[ df.index[ 5,6 ] ] or df.drop[ ‘rowName’, row2name’ ]
delete column ‘col2’ from a dataframe
del df [ ‘col2’ ]
get the mean of rows aggregated on ‘studentID’
df.groupby(‘studentID’).mean() ## groupby aggregates the rows on the column specified
get the count of unique rows from a set of columns in a dataframe (each different permutation of columns is unique)
df[ [ list of cols ] ].value_counts() ## optional list of columns; otherwise entire df. gives the unique col permutations and the frequency of each