Central Tendency, Variability, & Z-scores Flashcards
What data classes are best for mean?
interval, ratio
What data classes are best for mode?
nominal, ordinal, interval, ratio
What data classes are best for median?
ordinal, interval, ratio
Under what conditions might a median be a better measure of central tendency than the mean?
- when the data is ordinal (mean does not apply)
- interval/ratio data if there are extreme values
It should seem clear how the mean and the median are measures of the central tendency of the data since the mean is is a familiar average and the median is the middle. However, explain why mode is also considered a measure of central tendency?
most data sets peak in the middle (bell shape). The mode is the highest frequency so it’s usually in the middle somewhere.
For any data set, what is Σ(X − X ̅)?
Σ(X − X) may be written as:
ΣX − ΣX ̅ = ΣX − nX ̅ = ΣX − n(ΣX)/n = ΣX − ΣX = 0.
The following data represent a sample of the time to complete a certain task in minutes and seconds (mm:ss).
6:30, 11:15, 6:22, 11:32, 8:12, 5:02, 9:17, 6:51, 8:44, 7:45, 9:37, 7:28, 4:29, 7:42
compute the mean:
compute the std. dev.:
Since the values are given in minutes and seconds they first need to be converted to either minutes and decimal parts (eg. 6:30 = 6 + 30/60 = 6 + 0.5000 = 6.5000min) or to seconds (eg. 6:30 = 6*60 + 30 = 360 + 30 = 390s) so that they can be easily added.
mean: 7:55
std. dev.: 2:04
For a certain set of data, the mean and standard deviation are computed.
How does X ̅ (data treated as sample) compare to μ (data treated as a population)?
How does s (data treated as sample) compare to σ (data treated as a population)?
X ̅ is the sample mean, μ is the population mean; they are calculated the same way.
standard deviation of sample (N-1) vs. population (N)
Given the following sample data set:
6, 12, 9, 7, 8, 4, 3, 12, 15
Compute the mean.
What is the median?
What is the mode?
Compute the variance.
Compute the standard deviation.
mean: 8.44
median: 8
mode:12
variance: 15.77
standard deviation: 3.97
For the following sample data set:
X frequency
52 5
54 8
57 2
Compute the mean.
Compute the variance.
Compute the standard deviation.
mean: 53.73
variance: 2.635
standard deviation: 1.62
The following sample data of the number of communications are taken from logs of communication with Distance Education students:
5, 9, 5, 23, 27, 55, 34, 7, 30, 15, 22, 60, 14, 52, 297, 8, 51, 15, 51, 35, 15, 39, 137, 43, 38, 14, 93, 7
Compute the mean.
Compute the standard deviation.
Draw a boxplot with the minimum, Q1, Q2, Q3, and maximum.
Which is a better representation of the central tendency: mean or median? Explain.
mean: 42.89
std. dev.: 57.28
Minimum: 5
Q1: 14
Q2: 28.5
Q3: 51
Maximum: 297
The mean is; this is due to extreme values.
If the two largest values in the sample data set of the previous problem were omitted,
Compute the mean.
Compute the standard deviation.
Draw a boxplot with the minimum, Q1, Q2, Q3, and maximum.
Which is a better representation of the central tendency: mean or median? Explain.
mean: 29.50
std. dev.: 21.68
minimum: 5
Q1: 14
Q2: 25
Q3: 43
Maximum: 93
Mean may now be a better measure because extreme outliers have been removed.
Consider the following data set:
21, 34, 18, 26, 30, 35, 24, 29, 25
If this is a population, compute the mean.
If this is a sample, compute the mean.
If this is a population, compute the standard deviation.
If this a sample, compute the standard deviation.
μ=26.9
X ̅= 26.9
σ= 5.34
s= 5.67
If we had a set of ordinal values (not interval/ratio), could you create a boxplot?
Technically yes, because quartiles depend only on the position in the ordered data set. Thus, one could determine the positions in the ordered set for Q1, Q2 (median), and Q3 and the first and last position for the minimum and maximum. However, without interval/ratio data, visualizing this with a boxplot would not make sense.
For example, imagine you ask 9 people what size drink they ordered, small, medium, or large. The ordered data might be: small, small, small, medium, medium, large, large, large, large. Q1 is position 2.5 (small), Q2 is position 5 (medium), and Q3 is position 7.5 (large), minimum is position 1 (small) and maximum is position 9 (large).
Typically we consider quantitative data that is symmetric about the mean. If we have a data set that has a few extreme high values, then
a. How is it skewed?
b. Would you use a mean or median? Why?
It is positively skewed (right-skewed)
You would use median since it is less sensitive to extreme values.
- For the MCAT, µ = 500 and σ = 10. What is the probability of an individual getting a score greater than 502.5?
z=0.25
p=0.413
- For the MCAT, µ = 500 and σ = 10.
What is the minimum score would you have to obtain to be in the top 5%?
What is the minimum score you would have to obtain to be in the top 2.5%?
95%
500 + (1.64 x 10) = 516.4
97.5%
500 + (1.96 x 10)= 519.6
Correlational method
looking for relationships between variables (correlation or regression)