Pandas Pt 1 - Cont'd (UCSD) Flashcards
basic descriptive statistics functions for all columns on a datframe df
df.describe(), df.corr(), df.mean(), df.median(), df.mode(), df. min(), df.max() , df.std()
use axis to get statistical values for the columns or the rows (mean as example)
df.mean(axis = 1) ## 0 index, 1 columns
check if any element in a dataframe or series is `non-zero or non-empty (but mostly useful w/ a boolean column)
df.any()
check if ALL element in a dataframe or series is `non-zero or non-empty (but mostly useful w/ a boolean column)
df.all()
use describe on a column of df_var called ‘ratings’
df[‘ratings’].describe()
Create a series with Boolean values if the rating column from ratings df is > 5, then use any() to test for true on the series
filter_1 = ratings[‘rating’] > 5 ## not sure if this is a series, or 1 col df
filter_1.any()
Get the mean value from ‘rating’ col for all rows that have movieId of 1, from dataframe ratings
ratings[‘rating’][ratings.movieId==1].mean()