Lecture 3 Flashcards
Measures of Dispersion
Why is knowing the dispersion of data important?
Two different data sets may share the same mean, but the dispersion (spread) of that data may be entirely different
Measures of Dispersion
Range (IQR or Median), Variance and Standard Deviation
Range
Calculated by subtracting the smallest value from the largest value. Strongly affected by outliers
IQR x 1.5 rule
We call an observation a suspected outlier if it falls more than 1.5 x IQR above the third quartile or below the first quartile. Can be used to justify excluding outlier data.
Variance
The overall degree to which data points differ from the mean and each other
How to calculate variance?
- Find the difference between each value and the mean.
- Square these differences
- Add these differences together
- Divide by the number of data points minus 1 (if working with a sample – if working with the population simply divide by the number of data points)
Why do we calculate standard deviation rather than variance?
Variance: we square the differences between the individual data points and the mean, which means the units of measurement are less intuitive
Standard Deviation
A very useful measure of dispersion IF our data is normally distributed. Nearly all values/observations fall within 2 standard deviations either side of the mean.
The 68-95-99.7 Rule
68% of data will lie 1SD away from the mean. 95% of data will lie 2DS’s away from the mean. 99.7% of data will lie 3SD’s away from the mean