analysis Flashcards

Question 1

Q

Get summary statistics of a variable

Answer

A

df[‘var’].describe()

Question 2

Q

Get the sum of revenue

Answer

A

df[‘revenue’].sum()

Question 3

Q

Get the average

Answer

A

df[‘var’].mean()

Question 4

Q

Get the minimum / maximum

Answer

A

df[‘var’].min()

df[‘var’].max()

Question 5

Q

Get the standard deviation

Answer

A

df[‘var’].std()

Question 6

Q

Get the average of multiple columns

Answer

A

df[[‘var1’, ‘var2’]].mean()

Question 7

Q

Get both sum and mean from a variable

Answer

A

df[‘var’].agg([sum, mean])

Question 8

Q

Count per value of a variable

Answer

A

df[‘var’].value_counts()

add sort = True to get them immediately in count descending order

Question 9

Q

Count per value of a variable (but in percentage)

Answer

A

df[‘var’].value_counts(normalize = True)

Question 10

Q

Get the average per value of a variable

Answer

A

df.groupby(‘category’)[‘var’].mean()

Question 11

Q

Get the average, min, max per value of a variable

Answer

A

df.groupby(‘category’)[‘var’].agg([mean, min, max])

Question 12

Q

Create a pivottable output

Answer

A

df.pivot_table(values = ‘var’, index = ‘rowheader’)

Question 13

Q

Create a pivottable output for both mean and median

Answer

A

df.pivot_table(values = ‘var’, index = ‘rowheader’, aggfunc = [np.mean, np.median])

Question 14

Q

Create pivottable output on both row and column variable

Answer

A

df.pivot_table(values = ‘var’, index = ‘col’, columns = ‘col’)