Chapter 4- Measures of Central Tendency and Variability Flashcards
Variability
a measure of the dispersion or spread of scores in a distribution and ranges from 0 to +∞.
range
the difference between the largest value (L) and smallest value (S) in a data set.
interquartile range (IQR)
the range of values between the upper (Q3) and lower (Q1) quartiles of a data set.
Quartiles
four equal parts or sections, each containing 25% of the data
semi-interquartile range (SIQR)
a measure of half the distance between the upper quartile (Q3) and lower quartile (Q1) of a data set and is computed by dividing the IQR in half.
Variance
a measure of variability for the average squared distance that scores deviate from their mean.
Population variance
a measure of variability for the average squared distance that scores in a population deviate from the mean. It is computed only when all scores in a given population are recorded.
deviation
the difference of each score from its mean.
sum of squares (SS)
the sum of the squared deviations of scores from their mean. The SS is the numerator in the variance formula.
Sample variance
a measure of variability for the average squared distance that scores in a sample deviate from the mean. It is computed when only a portion or sample of data is measured in a population.
biased estimator
any sample statistic, such as the sample variance when we divide SS by n, obtained from a randomly selected sample that does not equal the value of its respective population parameter, such as a population mean, on average
unbiased estimator
any sample statistic, such as the sample variance when we divide SS by (n − 1), obtained from a randomly selected sample that equals the value of its respective population parameter, such as a population variance, on average.
When we divide SS by (n − 1), the sample variance is an unbiased estimator of the population variance. This is one reason why researchers place (n − 1) in the denominator of sample variance.
definitional formula for variance
a way to calculate the population variance and sample variance that requires summing the squared differences of scores from their mean to compute the SS in the numerator.
computational formula for variance, or the raw scores method for variance
a way to calculate the population variance and sample variance without needing to sum the squared differences of scores from their mean to compute the SS in the numerator.
standard deviation
also called the root mean square deviation, is a measure of variability for the average distance that scores deviate from their mean. It is calculated by taking the square root of the variance.
population standard deviation
a measure of variability for the average distance that scores in a population deviate from their mean. It is calculated by taking the square root of the population variance.
sample standard deviation
a measure of variability for the average distance that scores in a sample deviate from their mean. It is calculated by taking the square root of the sample variance.
empirical rule
states that for data that are normally distributed, at least 99.7% of data lie within three standard deviations of the mean, at least 95% of data lie within two standard deviations of the mean, and at least 68% of data lie within one standard deviation of the mean
Chebyshev’s theorem
defines the percentage of data from any distribution that will be contained within any number of standard deviations (where SD > 1).
CHARACTERISTICS OF THE STANDARD DEVIATION
The standard deviation is always positive: SD ≥ 0. The standard deviation is a measure of variability. Data sets can either vary (be greater than 0) or not vary (be equal to 0) from the mean. A negative variability is meaningless.
The standard deviation is used to describe quantitative data. The standard deviation is a numeric value—it is the square root of the variance. For this reason, the standard deviation is used to describe quantitative data, which can be continuous or discrete.
The standard deviation is most informative when reported with the mean. The standard deviation is the average distance that scores deviate from their mean. It is therefore most informative to report the mean and the standard deviation together. For normally distributed data, knowing just the mean and standard deviation can inform the reader of the distribution for close to all the recorded data (at least 99.7% of data fall within 3 SD of the mean). A common way to see the mean and standard deviation reported in a scientific article is “mean plus or minus standard deviation” or M ± SD. For example, if a data set consists of scores with M = 16 and SD = 4, then these values can be reported as 16 ± 4.
The value for the standard deviation is affected by the value of each score in a distribution. To change the standard deviation, you must change the distance of scores from the mean and from each other. To illustrate this, consider two cases: one where changing scores in a distribution has no effect on standard deviation and another where the standard deviation is changed.
Which measure of central tendency is best for interval and ratio scales
Mean
Which measure of central tendency is best for ordinal scales
Median
Which measure of central tendency is best for nominal scales
Mode
Mean and Median positions on Negatively Skewed data
the mean is located to the left of the median
Mean and median positions on Positively skewed data
The mean is located to the right of the median.
how to calculate the weighted mean
weighted sum/combined n
multiply each observation by its respective n
add the sample sizes together.
To compute the weighted mean, we find the product, M × n, for each sample. This gives us a weight for the mean of each sample. By adding these products, we arrive at the weighted sum
Then, we divide the weighted sum by the combined sample size (n), which is computed by adding the sample sizes in the denominator.
How to calculate the median
LL +W [ 0.5 (n)-CF] / f
LL –> the lower “real limit” of the interval
This is half a unit smaller than the apparent lower limit
W –> the interval width
If the data are whole numbers/ungrouped, this is always 1!
n –>the sample size
cf –> the cumulative frequency less than the interval
f –> the number of scores in the row that contains the median