Module 4, Measures of Variability Flashcards
Variability
- quantifies the amount of difference among the scores (we want to know the variability amongst the scores)
- concerned with the spread of the scores
- indicates the amount of difference among the scores (using measures of variability)
Variability is Measured in 3 Ways:
- range
- variance
- standard deviation
Why Variability?!
- describe variability
- understand variability
- explain variability
- predict variability
The Range: Measuring Variability
examines two endpoints of the distribution:
range = highest score - lowest score
- you just order the values from lowest to highest (have to pick out the lowest and highest value)
- you must provide the range as a subtraction of the two numbers, not just the highest and lowest numbers (range is 18 not 15-33)
- outliers can have a significant impact on the range
Range: Strengths and Weaknesses
strengths:
- easy to compute
- provide some information about the sample
weaknesses:
- only focuses on two scores out of the whole distribution
- may not accurately reflect the variability of the whole distribution
- cannot be used to test hypotheses about distributions
- affect by outliers (extreme scores) - the range gets significantly inflated when there are outliers (but you do not remove them, keep them in the calculation)
4 8 8 9 9 9 9
4 5 5 6 7 7 9 (more variability)
- but the range is the same for both (5)
The Interquartile Range: Measuring Variability
the range of the middle 50% of the scores
- removes the highest and lowest 25% of the distribution (get rid of these outliers)
- minimized the effect of outliers
interquartile range = (N - N/4) th score - (N/4 + 1) th score
Interquartile Range: Strengths and Weaknesses
strengths:
- reduces the influence of outliers by focusing on the middle 50%
- can be reported with median (both compensating for outliers)
weaknesses:
- ignores the top 25% and the bottom 25%
- may not accurately reflect the variability of the whole distribution
- cannot be used to test hypotheses about distributions
The Variance (s2): Measuring Variability (Sample & Population Variance)
sample: n = 50 s2 (sample variance symbol)
population: N = 2,170,985 σ2 (symbol for population variance)
- estimate population parameters based on sample statistics
- always error in estimates (not a miscalculation rather a random chance)
- different equations for population and samples
- population parameters could mean population mean, variance, SD, etc. (same goes for sample statistics)
The Variance (s2)
- includes all of the scores in the distribution (all the values we actually have)
- measures variability by examining the extent to which score differs from the mean (measure of central tendency, where most of the data is, that is why we compare to mean)
The Variance (s2): The Issue and How to Resolve it?
- if we simply averaged the deviation scores, the variance would equal 0
- implies that all scores are at the mean (all the deviation scores coming to 0 means that all the scores are at the mean)
- does this make sense? - no because it implies all the scores are the same
- how can we resolve this? by squaring all the values (add a forth column)
Variance Definition
average squared deviation from the mean
(get rid of negative values by squaring the deviation score)
for example: on average the square deviation of a score from the mean is 38.39
why divide N - 1?
- N - 1 corrects from the bias using a sample to estimate a population variability
- bias: systematic underrepresentation of the true score
- dividing by a smaller number makes the variance larger
Standard Deviation (s): Measuring Variability
to calculate the variance we needed squared deviations… BUT… in general, we prefer just the average deviation
standard deviation (s) = the average deviation of a score from the mean
◦ take the square root of the
variance
- we can undo our squaring by square rooting (what we do in SD)
- if you have larger value, that is more variability (have to have same unit of measurement)
example: the average deviation of a score from the mean is 6.20
Measures of Variability for Population
- most frequently calculate sample statistics
- we may have data from the entire population
- just N in denominator
Box Plots
- box plots are useful in displaying variability in data
- the median is important (middle value in range of scores) - represented by a line in the box
- the box itself represents where the middle 50% of vertical jump height scores fall for males and females
- the line coming out of the top of the box represents the range of the top 25%
- the bottom line represents the bottom 25%
- both of these lines are called the whiskers
- the box and whiskers being bigger means there is more range
- there is outlier which is represented by a dot