Summary Stastics Flashcards

1
Q

Explore your new DataFrame first by printing the first few rows of the sales DataFrame.

A

print(sales.head())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Print the mean of the ‘weekly_sales’ column.

A

print(sales[‘weekly_sales’].mean())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Print information about the columns in the df, sales.

A

print(sales.info())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Print the minimum of the ‘date’ column from sales df

A

print(sales[‘date’].min())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Print the maximum of the ‘date’ column from sales df

A

print(sales[‘date’].max())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Print the median of the ‘weekly_sales’ column.

A

print(sales[‘weekly_sales’].median())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

IQR means what

A

Inter-quartile range, which is the 75th percentile minus the 25th percentile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sort the rows of sales_1_1 by the date column in ascending order.

A

sales_1_1 = sales_1_1.sort_values(‘date’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Get the cumulative sum of ‘weekly_sales’ and add it as a new column of sales_1_1 called cum_weekly_sales.

A

sales_1_1[‘cum_weekly_sales’] = sales_1_1[‘weekly_sales’].cumsum()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Get the cumulative maximum of ‘weekly_sales’, and add it as a column called cum_max_sales.

A

sales_1_1[‘cum_max_sales’]= sales_1_1[‘weekly_sales’].cummax()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Remove rows from the sales df with duplicate pairs of ‘stor’e and ‘type’, and save as store_types and print the head.

A

store_types = sales.drop_duplicates
(subset=[‘store’,’type’])
print(store_types.head())

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Count the number of stores of each store ‘type’ in the df, store_types, and call it store_counts

A

store_counts = store_types[‘type’].value_counts()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Count the proportion of stores of each store type in store_types.

A

store_props = store_types[‘type’].value_counts(normalize=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Count the number of different ‘department’ (column) in the df, store_depts, sorting the counts in descending order.

A

dept_counts_sorted = store_depts[‘department’].value_counts(sort=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Count the proportion of different departments (from column ‘department’) in the df, store_depts, sorting the proportions in descending order.

A

dept_props_sorted = store_depts[‘department’].value_counts(sort=True, normalize=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Remove rows of sales df with duplicate pairs of store”” and department”” and save as store_depts and print the head.

A

store_depts = sales.drop_duplicates
(subset=[‘store’,’department’])

14
Q

Subset the rows that are holiday weeks using the is_holiday column, and drop the duplicate “dates”, saving as holiday_dates.

A

holiday_dates = sales[sales[‘is_holiday’]].drop_duplicates(subset=[‘date’])

15
Q

Count the number of stores of each store “type” in store_types.

A

store_counts = store_types[‘type’].value_counts()

16
Q

Count the proportion of stores of each store ‘type’ in store_types.

A

store_props = store_types[‘type’].value_counts(normalize=True)

17
Q

Count the number of different ‘departments’ in store_depts, sorting the counts in descending order.

A

dept_counts_sorted = store_depts[‘department’].value_counts(sort=True)

18
Q

Count the proportion of different departments in store_depts, sorting the proportions in descending order.

A

dept_props_sorted = store_depts[‘department’].value_counts(sort=True, normalize=True)

19
Q

Calculate the total weekly_sales over the whole dataset (df=sales)

A

sales_all = sales[“weekly_sales”].sum()

20
Q

Subset for ‘type’ “A” stores, and calculate their total weekly sales (from sales df).

A

sales_A = sales[sales[“type”] == “A”][“weekly_sales”].sum()