Week 13 Flashcards
What are the three measures of dispersion?
Range (IQR)
Variance
Standard deviation
How do you calculate IQR?
Q3 - Q1
If you want t justify excluding outlier data, we can use that IQR method?
1.5 x IQR rule
(We can an observation a suspected outlier if it falls more than 1.5 x IQR above the third quartile or below the first)
What is variance (measures of dispersion)?
The overall degree to which data points differ from the mean; notifying us about the spread/dispersion of the data
What is the equation for sample variance?
S^2 = Sum of (x1-x bar)^2 / n-1
S^2 - Variance
X1 = term in data set
X bar = sample mean
n = sample size
How is variance calculated?
Find the difference between each value and the mean.
Square these differences
Add these differences together
Divide by the number of data points minus 1 (if working with a sample divide by the number of data points)
Why do we square the deviations when measuring variance?
To account for negative numbers in our deviations, as if we add them they will answer zero.
(sum of squared deviations is divided by the number of data point, so it’s like an average of the squared deviations)
Why would a standard deviation be calculated?
When calculating the variance we square the differences between the individual data points and the mean. This means that the units of measurement are less intuitive when we calculate the variance. For this reason it is often preferable to calculate the standard deviation of a sample or population
How do you calculate standard deviation of a population?
Population = square root of ((the sum of the value in data distribution - population mean)^2 / total number of observations)
How do you calculate standard deviation of a sample?
S = square root of ((the sum of value in data distribution - sample mean)^2 / n-1)
Standard deviation is a useful measure of dispersion if our data is….
normally distributed
(Many types of continuous types of continuous/interval data follow this. Particularly where many factors causally affect quantity being measured)
What is the 68-95-99.7 rule of standard deviation?
No matter what u and o are:
the area between u-o and u+o is about 68%;
the area between u-2o and u+2o is about 95%;
and the area between u-3o and u+3o is about 99.7%.
So, nearly all values/observations fall within 2 standard deviations either side of the mean.
By using standard deviation we can determine if data is…
But, why should we be careful with this?
normally distributed
(be careful with this though …it is a good enough rule of thumb but we are really testing to see if SD is less than half of the mean value. – but this is only really useful if our sample size is large (n=50+). )
Why can the range have limited usefulness?
Strongly affected by outliers
Range = max - min
What is variance?
sum of the squared differences from the mean. This is very useful but remember this changes the units
What is standard deviation?
simply the square root of the variance