Summary Stastics Flashcards

Question 1

Q

Explore your new DataFrame first by printing the first few rows of the sales DataFrame.

Answer

A

print(sales.head())

Question 2

Q

Print the mean of the ‘weekly_sales’ column.

Answer

A

print(sales[‘weekly_sales’].mean())

Question 3

Q

Print information about the columns in the df, sales.

Answer

A

print(sales.info())

Question 4

Q

Print the minimum of the ‘date’ column from sales df

Answer

A

print(sales[‘date’].min())

Question 5

Q

Print the maximum of the ‘date’ column from sales df

Answer

A

print(sales[‘date’].max())

Question 6

Q

Print the median of the ‘weekly_sales’ column.

Answer

A

print(sales[‘weekly_sales’].median())

Question 7

Q

IQR means what

Answer

A

Inter-quartile range, which is the 75th percentile minus the 25th percentile

Question 8

Q

Sort the rows of sales_1_1 by the date column in ascending order.

Answer

A

sales_1_1 = sales_1_1.sort_values(‘date’)

Question 9

Q

Get the cumulative sum of ‘weekly_sales’ and add it as a new column of sales_1_1 called cum_weekly_sales.

Answer

A

sales_1_1[‘cum_weekly_sales’] = sales_1_1[‘weekly_sales’].cumsum()

Question 10

Q

Get the cumulative maximum of ‘weekly_sales’, and add it as a column called cum_max_sales.

Answer

A

sales_1_1[‘cum_max_sales’]= sales_1_1[‘weekly_sales’].cummax()

Question 11

Q

Remove rows from the sales df with duplicate pairs of ‘stor’e and ‘type’, and save as store_types and print the head.

Answer

A

store_types = sales.drop_duplicates
(subset=[‘store’,’type’])
print(store_types.head())

Question 12

Q

Count the number of stores of each store ‘type’ in the df, store_types, and call it store_counts

Answer

A

store_counts = store_types[‘type’].value_counts()

Question 13

Q

Count the proportion of stores of each store type in store_types.

Answer

A

store_props = store_types[‘type’].value_counts(normalize=True)

Question 14

Q

Count the number of different ‘department’ (column) in the df, store_depts, sorting the counts in descending order.

Answer

A

dept_counts_sorted = store_depts[‘department’].value_counts(sort=True)

Question 15

Q

Count the proportion of different departments (from column ‘department’) in the df, store_depts, sorting the proportions in descending order.

Answer

A

dept_props_sorted = store_depts[‘department’].value_counts(sort=True, normalize=True)

Question 16

Q

Remove rows of sales df with duplicate pairs of store”” and department”” and save as store_depts and print the head.

Answer

A

store_depts = sales.drop_duplicates
(subset=[‘store’,’department’])

Question 17

Q

Subset the rows that are holiday weeks using the is_holiday column, and drop the duplicate “dates”, saving as holiday_dates.

Answer

A

holiday_dates = sales[sales[‘is_holiday’]].drop_duplicates(subset=[‘date’])

Question 18

Q

Count the number of stores of each store “type” in store_types.

Answer

A

store_counts = store_types[‘type’].value_counts()

Question 19

Q

Count the proportion of stores of each store ‘type’ in store_types.

Answer

A

store_props = store_types[‘type’].value_counts(normalize=True)

Question 20

Q

Count the number of different ‘departments’ in store_depts, sorting the counts in descending order.

Answer

A

dept_counts_sorted = store_depts[‘department’].value_counts(sort=True)

Question 21

Q

Count the proportion of different departments in store_depts, sorting the proportions in descending order.

Answer

A

dept_props_sorted = store_depts[‘department’].value_counts(sort=True, normalize=True)

Question 22

Q

Calculate the total weekly_sales over the whole dataset (df=sales)

Answer

A

sales_all = sales[“weekly_sales”].sum()

Question 23

Q

Subset for ‘type’ “A” stores, and calculate their total weekly sales (from sales df).

Answer

A

sales_A = sales[sales[“type”] == “A”][“weekly_sales”].sum()