Chapter 4 - Central Tendency and Variability Flashcards
Define the 3 measures of central tendency:
MEAN
MEDIAN
MODE
When SKEWED:
- expectation or average of a set of data (take sum of all the #’s in a data set and dividesby the total # of data points). Good for data that is normally distributed. – Can be distorted by outliers though! So the median is better to look at the true #.
- is the middle #, if we lined up our data from smallest to largest. Ex: look at the bunch of numbers and if there is an even number of numbers, take the middle two and take the mean.
- Latin, from modus, meaning ‘manner, fashion, or style’. The most popular # meaning, the value that appears most in our data set. MOST useful when you have a relatively large sample so that you have a large number of the popular values.
Ex: (mode) if you wanted to know whether judges in my province are typically men or women, what measure of central tendency would I use to describe the typical judge? MODE. Because men/women is a nominal variable (categorical) so can’t rank OR do math with the words!
What is variability?
The degree to which data points are distributed around the mean
This variability tells us 2 things:
1. how consistent the scores are
2. how accurately the mean (or median) describes the distribution
What is the range?
measure of the highest to the lowest score
Range = highest score - lowest score
But isn’t very reliable:
1. depends on only 2 observations (influenced by outliers)
2. larger with larger sample sizes
Ex: if 5 students were taken and the range of heights were measured, it wouldn’t be representative of all the students, even though the ‘range’ was measured.
What is a distribution?
A distribution shows us how often each value occurs in our data set (frequency)
The mean can be assessed visually and arithmetically.
Describe each method.
Explain how the mean mathematically balances the distribution
Explain what is meant by unimodal, bimodal, and multimodal distributions.
- 1 value is the same.
- an example of multimodal which has two groups of data being measured.
- has 2 + groups of data being measured
Explain why the mean might not be useful for a bimodal or multimodal distribution.
What is an outlier?
a very small or very large data point that is outside of the average data points
Increases the mean # calculated, so want to limit outliers!
In which situations is the mode typically used?
When it’s categorical data and contains words, such as man or woman or good/bad. Mode is used when MATH or RANK can’t be placed on the data.
How does the interquartile range differ from the range?
take the difference beween the 1st quartile (median of the lower half of the scores) and the 3rd quartile (median of the upper half of the scores).
It uses more data points than just the range.
IQR = Q3 -Q1.
This is nice but it would be better to calculate all the data points
Explain the concept of standard deviation
Explain Average Deviation?
- the amount that the score is away from the mean. Useful because it allows us to quantify around the mean.
STANDARD DEVIATION & VARIANCE
Why is the standard deviation typically reported, rather than the variance?
FOR SAMPLES:
M = mean
SD^2 = Variance
SD = standard deviation
FOR POPULATIONS:
USE THIS FORMULA FOR NOW:
Sampling Populations: Volunteers
Sampling Populations: Convenience
ex: like choosing a ‘sample’ of 46 students from only the Psychology department!
Sampling Population: Random Sample