Summary Stastics Flashcards
Explore your new DataFrame first by printing the first few rows of the sales DataFrame.
print(sales.head())
Print the mean of the ‘weekly_sales’ column.
print(sales[‘weekly_sales’].mean())
Print information about the columns in the df, sales.
print(sales.info())
Print the minimum of the ‘date’ column from sales df
print(sales[‘date’].min())
Print the maximum of the ‘date’ column from sales df
print(sales[‘date’].max())
Print the median of the ‘weekly_sales’ column.
print(sales[‘weekly_sales’].median())
IQR means what
Inter-quartile range, which is the 75th percentile minus the 25th percentile
Sort the rows of sales_1_1 by the date column in ascending order.
sales_1_1 = sales_1_1.sort_values(‘date’)
Get the cumulative sum of ‘weekly_sales’ and add it as a new column of sales_1_1 called cum_weekly_sales.
sales_1_1[‘cum_weekly_sales’] = sales_1_1[‘weekly_sales’].cumsum()
Get the cumulative maximum of ‘weekly_sales’, and add it as a column called cum_max_sales.
sales_1_1[‘cum_max_sales’]= sales_1_1[‘weekly_sales’].cummax()
Remove rows from the sales df with duplicate pairs of ‘stor’e and ‘type’, and save as store_types and print the head.
store_types = sales.drop_duplicates
(subset=[‘store’,’type’])
print(store_types.head())
Count the number of stores of each store ‘type’ in the df, store_types, and call it store_counts
store_counts = store_types[‘type’].value_counts()
Count the proportion of stores of each store type in store_types.
store_props = store_types[‘type’].value_counts(normalize=True)
Count the number of different ‘department’ (column) in the df, store_depts, sorting the counts in descending order.
dept_counts_sorted = store_depts[‘department’].value_counts(sort=True)
Count the proportion of different departments (from column ‘department’) in the df, store_depts, sorting the proportions in descending order.
dept_props_sorted = store_depts[‘department’].value_counts(sort=True, normalize=True)
Remove rows of sales df with duplicate pairs of store”” and department”” and save as store_depts and print the head.
store_depts = sales.drop_duplicates
(subset=[‘store’,’department’])
Subset the rows that are holiday weeks using the is_holiday column, and drop the duplicate “dates”, saving as holiday_dates.
holiday_dates = sales[sales[‘is_holiday’]].drop_duplicates(subset=[‘date’])
Count the number of stores of each store “type” in store_types.
store_counts = store_types[‘type’].value_counts()
Count the proportion of stores of each store ‘type’ in store_types.
store_props = store_types[‘type’].value_counts(normalize=True)
Count the number of different ‘departments’ in store_depts, sorting the counts in descending order.
dept_counts_sorted = store_depts[‘department’].value_counts(sort=True)
Count the proportion of different departments in store_depts, sorting the proportions in descending order.
dept_props_sorted = store_depts[‘department’].value_counts(sort=True, normalize=True)
Calculate the total weekly_sales over the whole dataset (df=sales)
sales_all = sales[“weekly_sales”].sum()
Subset for ‘type’ “A” stores, and calculate their total weekly sales (from sales df).
sales_A = sales[sales[“type”] == “A”][“weekly_sales”].sum()