Descriptive Statistics Flashcards
Central tendency
Summarizes where data is clustered around
Mean, median, mode
Mode
Most frequently occurring value (best for categorical data; only choice for nominal data)
Median
Middle value when data are sorted
Used mainly for numeric data that is skewed
Mean
Average of values
Only for numeric data that isn’t skewed
Variance
Explains for entire spread of data how far it is from the mean
s^2
Reason for squaring: further penalizes for being far away from mean and further rewards for being close to the mean
Standard deviation
s (square root of variance)
Empirical rule for normally distributed data
68% of data lie within 1 standard deviation of mean
95% of data lie within 2 standard deviations of mean
99.7% of data lie within 3 standard deviations of mean
Z score
(value - mean)/ standard deviation
Measures how far value is from the mean (how many standard deviations)
5 number summary for box plots
Minimum (lower whisker), Q1 (first part of box to line), median (line), Q3 (line to end of box), maximum (higher whisker)
Q1
Bottom 25% of data
Q3
Top 25% of data
Interquartile range
IQR= Q3 - Q1
Middle 50% of data
IQR method
Outliers: behind Q1 - 1.5(IQR) or beyond Q3 + 1.5(IQR)