Summary Stastics Flashcards
Explore your new DataFrame first by printing the first few rows of the sales DataFrame.
print(sales.head())
Print the mean of the ‘weekly_sales’ column.
print(sales[‘weekly_sales’].mean())
Print information about the columns in the df, sales.
print(sales.info())
Print the minimum of the ‘date’ column from sales df
print(sales[‘date’].min())
Print the maximum of the ‘date’ column from sales df
print(sales[‘date’].max())
Print the median of the ‘weekly_sales’ column.
print(sales[‘weekly_sales’].median())
IQR means what
Inter-quartile range, which is the 75th percentile minus the 25th percentile
Sort the rows of sales_1_1 by the date column in ascending order.
sales_1_1 = sales_1_1.sort_values(‘date’)
Get the cumulative sum of ‘weekly_sales’ and add it as a new column of sales_1_1 called cum_weekly_sales.
sales_1_1[‘cum_weekly_sales’] = sales_1_1[‘weekly_sales’].cumsum()
Get the cumulative maximum of ‘weekly_sales’, and add it as a column called cum_max_sales.
sales_1_1[‘cum_max_sales’]= sales_1_1[‘weekly_sales’].cummax()
Remove rows from the sales df with duplicate pairs of ‘stor’e and ‘type’, and save as store_types and print the head.
store_types = sales.drop_duplicates
(subset=[‘store’,’type’])
print(store_types.head())
Count the number of stores of each store ‘type’ in the df, store_types, and call it store_counts
store_counts = store_types[‘type’].value_counts()
Count the proportion of stores of each store type in store_types.
store_props = store_types[‘type’].value_counts(normalize=True)
Count the number of different ‘department’ (column) in the df, store_depts, sorting the counts in descending order.
dept_counts_sorted = store_depts[‘department’].value_counts(sort=True)
Count the proportion of different departments (from column ‘department’) in the df, store_depts, sorting the proportions in descending order.
dept_props_sorted = store_depts[‘department’].value_counts(sort=True, normalize=True)