Chapter 12: Data-Based and Statistical Reasoning Flashcards
What are the measures of central tendency?
Measures of central tendency are those that describe the middle of a sample.
The three measures of central tendency are the mean, median, and mode.
What is the mean? Can it be used for populations and samples? When is it useful? What causes a skewed mean?
The mean or average of a set of data (the arithmetic mean) is calculated by adding up all of the individual values within the data set and dividing the result by the number of values.
The mean may be a parameter or a statistic (parameter for population, statistics for sample).
Mean values are good indicator of central tenancy when all of the values tend to be fairly close to one another.
Having an outlier, and extremely large or extremely small value compared to the other data values, can shift the mean toward one end of the range.
Example mean page 437
What is the median? How do we calculate the median? When is the use of a median appropriate or not? Are medians susceptible to outliers?
The median value for a set of data is its midpoint, were half of the data points are greater than the value and half are smaller. In data sets with an odd number of values, the median will actually be one of the data points. And data sets with an even number of values, the median will be the mean of the two central data points.
The data must first be listed in increasing fashion, and then can be calculated as the image shows.
The median tends to be least susceptible to outliers, but may not be useful for data sets with very large ranges or modes.
Median example page 438
If n, the number of data points, is even, the median will be the average of the two center data points.
If n is odd, the median will be the center data point.
What can be implied if the mean and the median are far from each other? Close together?
If the mean, and the median are far from each other, this implies the presence of an outlier or a skewed distribution. If the main and median are very close, this implies a symmetrical distribution.
What is the mode? When are modes used?
The mode is the number that appears most often in a set of data.
There may be multiple modes in a data set, or if all numbers appear equally, there can even be no mode for a set of data.
The mode is not typically used as a measure of central tendency for a set of data, but the number of modes and their distance from one another is often informative. If the data set has two modes with a small number of values between them, it may be useful to analyze these portions separately, or to look for other variables that may be responsible for dividing the distribution in two parts.
MCAT concept check central tendency 12.1 page 439 question 1
What types of data sets are best analyzed using the mean as a measure of central tendency?
The mean is the best measure of central tendency for a data set with a relatively normal distribution. The mean performs poorly in data sets with outliers.
MCAT concept check central tendency 12.1 page 439 question 2
What is normal distribution? What are the mean, median, and mode for normal distribution? What percentage of distribution is within one standard deviation of the mean? Within two? Three?
What is standard distribution?
Normal distribution can be transformed to a standard distribution with a mean of zero and a standard deviation of one, and then use the newly generated curve to get information about probability or percentages of populations.
What is a skewed distribution? Where will the mean, median, and mode be on negative or positive skewed distributions?
Skewed distribution is an asymmetric distribution and contains a tail on one side or the other of the data.
The tail points to the skew direction. Tail on the left equals skewed left. Tail on the right equals skewed right.
What is bimodal distribution?
A distribution containing two peaks with the valley in between is called bimodal.
Bimodal distributions might only have one mode if one peak is slightly higher than the other. However, even when the peaks are of two different sizes, we still call the distribution bimodal.
If there is a sufficient separation of the two peaks, or a sufficiently small amount of data within the valley region, bimodal distributions can often be analyzed as two separate distributions. However, they do not HAVE to be analyzed as two separate distributions.
MCAT concept check distributions 12.2 page 442 question 1
How do the mean, median, and mode compare for a right skewed distribution?
The mean of a right (positively) skewed distribution is to the right of the median, which is to the right of the mode.
MCAT concept check distributions 12.2 page 442 question 2
Can data that do not follow a normal distribution be analyzed with measures of central tendency and measures of distribution? Why or why not?
Any distribution can be mathematically, or procedurally, transformed to follow a normal distribution by virtue of the central limit theorem. A distribution that is not normal may still be analyzed with these measures.
MCAT concept check distributions 12.2 page 442 question 3
What is the difference between normal or skewed distributions, and bimodal distributions?
Bimodal distributions have two peaks, whereas normal or skewed distributions have only one.
What is the range of a data set? How do we calculate the range of a data set? What do we do when we cannot calculate the standard deviation for a normal distribution?
The range of a data set is the difference between its largest and smallest values.
Range does not consider the number of items of the data set, nor does it consider the placement of any measures of central tendency. Range is therefore heavily affected by the presence of data outliers.
In the case where it is not possible to calculate the standard deviation for a normal distribution because the entire data is not provided, it is possible to approximate the standard deviation is 1/4 of the range.
What is the interquartile range?
Interquartile range is related to the median, first, and third quartiles. Quartiles, including the median (Q2), divide data (when placed in ascending order) into groups that comprise one fourth of the entire set.
How do we calculate quartiles? How do we calculate inter quartile range (IQR)?
Interquartile range calculation example page 443. Including the first part of the question in the question card because the answer is split between two pages.
What is standard deviation? How do we calculate it?
Standard deviation is an informative measure of distribution. It is calculated relative to the mean of the data.
Standard deviation is calculated by taking the difference between each data point and the mean, squaring the value, dividing the sum of all the squared values by the number of points in the data set minus one, and then taking the square root of the result……………..
How do we determine an outlier using standard deviation?
Another definition of an outlier is any value that lies more than three standard deviations from the mean.
Standard deviation calculation example page 445
What percentage of the data points fall within one standard deviation of the mean? Two standard deviations? Three?
68% fall working one standard deviation.
95% fall within two.
99% fall within three.