analysis Flashcards
Get summary statistics of a variable
df[‘var’].describe()
Get the sum of revenue
df[‘revenue’].sum()
Get the average
df[‘var’].mean()
Get the minimum / maximum
df[‘var’].min()
df[‘var’].max()
Get the standard deviation
df[‘var’].std()
Get the average of multiple columns
df[[‘var1’, ‘var2’]].mean()
Get both sum and mean from a variable
df[‘var’].agg([sum, mean])
Count per value of a variable
df[‘var’].value_counts()
add sort = True to get them immediately in count descending order
Count per value of a variable (but in percentage)
df[‘var’].value_counts(normalize = True)
Get the average per value of a variable
df.groupby(‘category’)[‘var’].mean()
Get the average, min, max per value of a variable
df.groupby(‘category’)[‘var’].agg([mean, min, max])
Create a pivottable output
df.pivot_table(values = ‘var’, index = ‘rowheader’)
Create a pivottable output for both mean and median
df.pivot_table(values = ‘var’, index = ‘rowheader’, aggfunc = [np.mean, np.median])
Create pivottable output on both row and column variable
df.pivot_table(values = ‘var’, index = ‘col’, columns = ‘col’)