Descriptive stats: width Flashcards
What are the 5 descriptive stats for width?
Range, interquartile range, boxplots, standard deviation, variance
What is range?
The difference between the maximum measurement value and the minimum
Why is range not a good estimation of distribution width?
Sample range is biased. A smaller sample gives a lower estimate about the range
What would we report instead of range?
Maximum and minimum
What is the interquartile range?
Range between the 1st and 3rd quartile in the range, between the 25th and 75th percentile
What is included in a boxplot?
Median, interquartile range, the majority of the data, and any outliers
How are outliers calculated in a boxplot?
Any value that is outside 1.5 times the interquartile range
Why would we use a boxplot?
Better representation of the variation in asymmetric distributions
Why shouldn’t we use a histogram on asymmetric data?
Standard deviation isn’t a good representation of variability, and can’t see outliers
What are the whiskers on a box plot?
Show the max and min
What are the boundaries of the box in a box plot?
The interquartile range
What is variance?
The measure of the spread from which we calculate standard deviation
Why do we typically use standard deviation over variance?
SD is a little more intuitive since its in the same units as the mean
What is standard deviation?
A common measure of how far from the mean from the observations will typically be
What does it mean if standard deviation is large?
Most observations are far from the mean
What does it mean if standard deviation is small?
Most observations are close to the mean
What connection does standard deviation have to a normal distribution?
Mean ± 1 SD will always contain ~67% of the data, and ± 2 SD will always contain ~95% of the data
Why is standard deviation not a good measurement for skewed data? What should we use instead?
It’s even more sensitive to extreme values than the mean is. Better to use the IQR