Communicating Results Flashcards
Summarizing descriptive statistics, plotting visualizations, drawing conclustions, and customizing visuals to communicate results
What is the .groupby() method
This allows you to group data by columns and aggregate info about groupings. The numeric_only excludes values that aren’t numeric.
df.groupby(“column_name”).mean(numeric_only=True)
or
df.groupby([“workclass”,”race”], as_index=False)[“capital-gain”].mean()
What is summation or .sum()?
It aggregates data vertically .sum(axis=0) or horizontally .sum(axis=1).
df_census[[“capital_gain”,”capital-loss”]].sum()
Visualize how to get the sum while using .groupby and then sort the values in descending order
df.groupby(by=”column”).sum(numeric_only=True).sort_values(by=”column2”, ascending=False)
What are the measures of center?
Mean = .mean()
Median = .medain()
Mode = .mode()
What is the mean?
It is the average or sum of all numbers in set/by number of values in the set
What is the median
The center value in a set. Always sort the values first then calculate the median.
What is the mode?
It is the value with the highest frequency in a set