Ch. 12: Data-Based and Statistical Reasoning Flashcards
defn: measures of central tendency
those that describe the middle of a sample
how do we find the mean + aka?
aka: average, arithmetic mean
add up all the individual values within the data set and divide the result by the number of values
when are means good indicators of central tendency?
when all of the values tend to be fairly close to one another
defn + impact: outlier
an extremely large or extremely small value compared to the other data values (can shift the mean toward one end of the range)
defn + how to find: median
the midpoint of a set of data (half of data points are greater than the value and half are smaller)
in data sets with an odd number of values, the median will be one of the data points
in data sets with an even number of values, the median will be the mean of the two central data points
to calculate, first organize the data in increasing fashion
when is the median a good tool to use? when is it not helpful?
GOOD FOR: it is the least susceptible to outliers
BAD FOR: may not be useful for data sets with large ranges or multiple modes
what does it mean if the mean and median are far from each other? if they are close to each other?
IF FAR: this implies the presence of outliers or a skewed distribution
IF CLOSE: implies a symmetrical distribution
defn: mode
the number that appears the most often in a set of data
there may be multiple modes (or even no mode!)
peaks represent modes in a data set
is the mode a measure of central tendency?
no, but the number of modes and their distance from one another is informative
what does it mean to “solve” a normal distribution?
we can transform any normal distribution to a STANDARD distribution with a mean of zero and a standard deviation of one and then use the newly generated curve to get information about probability or percentages of populations
what is the basis of the bell curve?
the normal distrubition
what % of the distribution (normal) is within one SD? within 2 SD? within 3 SD?
1 SD: 68%
2 SD: 95%
3 SD: 99%
defn: skewed distribution
one that contains a tail on one side or the other of the data set
why are skewed distributions often confusing?
the VISUAL shift in the data appear OPPOSITE the direction of the skew
the direction of a skew in a sample is determined by its TAIL, not the bulk of the distribution
defn: negatively vs. positively skewed distribution
NEGATIVELY = tail on left (negative) side
POSITIVELY = tail on right (positive) side
why is the mean of a negatively skewed distribution lower than the median?
why is the mean of a positively skewed distribution higher than the median?
because the mean is more susceptible to outliers than the median
defn: bimodal
a distribution containing 2 peaks with a valley
note: it might only have one actual MODE if one peak is slightly higher than the other
in what circumstances can (but don’t have to be!) we analyze bimodal distributions as two separate distributions?
if there is sufficient separation of the two peaks, or a sufficiently small amount of data within the valley region
can measures of central tendency and measures of distribution be applied to bimodal distributions?
Yes!
defn: range
the difference between its largest and smallest values
what is range affected heavily by?
the presence of data outliers
what is an estimate of the SD based on the range when it is not possible to calculate the SD?
SD is approx. 1/4 range
defn: quartile
divide data (when placed in ascending order) into groups that comprise one-fourth the entire set
what are the 4 steps to calculating the quartiles?
- to find the position of Q1 in a set of data sorted in ascending order, multiply n by 0.25
- if this is a whole number, the quartile is the mean of the value at this position and the next highest position
- if this is a decimal, round up to the next whole number and take that as the quartile position
- to calculate the position of Q3 multiply the value of n by 0.75. Again, if this is a whole number, take the mean of this position and the next. If it is a decimal, round up to the next whole number, and take that as the quartile position
how do you calculate the interquartile range?
IQR = Q3 - Q1
what is the IQR helpful for determining? + how?
outliers
any value that falls more than 1.5 IQRs below the first quartile or above the third quartile is considered an outlier
what is the most informative measure of distribution?
standard deviation
how is std dev calculated (in words)?
by taking the difference between each data point and the mean, squaring this value, dividing the sum of all of these squared values by the number of points in the data set minus one, and then taking the square root of the result