Chapter 3: Describing, Exploring, and Comparing Data Flashcards
Measure of Center
the value at the center or middle of a data set
Arithmetic Mean (Mean)
the measure of center obtained by adding the values and dividing the total by the numbers of value. What most people call an average
median
the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude
Are median values affected by extreme values?
No, they are resistant measures of the center.
How is the median found for a data set that has an odd number of values?
- Sort the values 2. The median is the number located in the exact middle of the list.
How is the median found for a data set that has an even number of values?
- Sort the values 2. The median is found by computing the mean of the two middle numbers.
mode
the value that occurs with the greatest frequency
bimodal
two data values occur with the same greatest frequency
multimodal
more than data values occur with the same greatest frequency
no mode
no data value is repeated
Which measure of central tendency can be used with nominal data.
Only mode
midrange
the value midway between the maximum and minimum values in the original data set
How is the midrange calculated?
(max value + min value)/2
What is the range of a set of data values?
The difference between the max data value and the min data value.
How is the range of a set of data values calculated?
range= (max value) - (min value)
What is the standard deviation of a set of sample values?
A measure of how much data values deviate from the mean.
Can the value of a standard deviation be negative?
NO!
What units are standard deviations expressed in?
The units are the same as the units of the original data values.
What is the range rule of thumb for understanding standard deviation?
For many data sets, the vast majority (such as 95%) of sample values lie within two standard deviations of the mean.
How are “usual” values in a data set determined using the range rule of thumb?
(mean)+/-2*(standard deviation)
Using the range rule of thumb, how are standard deviations roughly estimated from a collection of known samples?
range/4
variance
a measure of variation equal to the square of the standard deviation
Why is the sample variance s^2 an unbiased estimator of the population variance?
because the values of s^2 tend to target the value of population variance instead of systematically tending to overestimate or underestimate population variance.
What is the empirical rule?
For data sets having a distribution that is approximately bell shaped, the following properties apply: ~68% of all values fall with in 1 standard deviation of the mean; ~95% of all values fall within 2 standard deviations of the mean; ~99.7% of all values fall within 3 standard deviations of the mean.
What is Chebyshev’s Theorem?
The proportion (or fraction) of any set of data lying within K standard deviations of the mean is always at least 1-1K^2, where K is any positive number greater than 1.
Using Chebyshev’s Theorem, what does K=2 mean?
At least 3/4 (or 75%) of all values lie within 2 standard deviations of the mean.
Using Chebyshev’s Theorem, what does K=3 mean?
At least 8/9 (or 89%) of all values lie within 3 standard deviations of the mean.
variance
a measure of variation equal to the square of the standard deviation
Why is the sample variance s^2 an unbiased estimator of the population variance?
because the values of s^2 tend to target the value of population variance instead of systematically tending to overestimate or underestimate population variance.
What is the empirical rule?
For data sets having a distribution that is approximately bell shaped, the following properties apply: ~68% of all values fall with in 1 standard deviation of the mean; ~95% of all values fall within 2 standard deviations of the mean; ~99.7% of all values fall within 3 standard deviations of the mean.
What is Chebyshev’s Theorem?
The proportion (or fraction) of any set of data lying within K standard deviations of the mean is always at least 1-1K^2, where K is any positive number greater than 1.
Using Chebyshev’s Theorem, what does K=2 mean?
At least 3/4 (or 75%) of all values lie within 2 standard deviations of the mean.
Using Chebyshev’s Theorem, what does K=3 mean?
At least 8/9 (or 89%) of all values lie within 3 standard deviations of the mean.
What is the coefficient of variation?
For a set of nonnegative sample or population data, expressed as percent, CV describes the standard deviation relative to the mean.
How is the coefficient of variation calculated?
cv= (standard deviation/the mean) *100
What are measures of relative standing?
Numbers showing the location of data values relative to the other values within a data set.
In which ways are measures of relative standing used?
To compare values from different data sets or to compare values within the same data set.
Name 4 examples of measures of relative standing.
z scores, percentiles, quartiles, and boxplots
What is a z score?
The number of standard deviations that a given value x is above or below the mean.
How are z scores determined?
z= (x-the mean)/standard deviation; round score to 2 decimal places
Whenever a value is less than the mean, is its z-score positive or negative?
Negative
What are ordinary z-score values?
-2<=2
What are unusual z-score values?
zscore< -2 or zscore>2
What are percentiles?
Measures of location that divide a set of data into 100 groups with about 1% of the values in each group.
How is the percentile of a data value found?
(# of values less than x/ total # of values) *100
How is a percentile converted to a data value?
L=(k/100)*n, where n=total number of values in the data set, k=percentile being used, L=locator that gives position of a value, Pk= kth percentile
How is Pk found with a “L” value that is a whole number?
By adding the Lth value and the next value and dividing the total by 2.
How is Pk found with a “L” value that is not a whole number?
By rounding L up to the next larger whole number.
What are quartiles?
Measures of location, denoted Q1, Q2, and Q3, which divide a set of data into 4 equal parts with about 25% of the values in each group.
Q1
separates the bottom 25% of sorted values from the top 75%
Q2
same as the median; separates the bottom 50% of sorted values from the top 50%
Q3
separates the bottom 75% of sorted values from the top 25%
What is a interquartile range (IQR)?
Q3-Q1
What is a semi-interquartile range?
(Q3-Q1)/2
What is a midquartile?
(Q3+Q1)/2
How is the 10-90 percentile range determined?
P90-P10
What is a boxplot?
A graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at Q1, the median, and Q3.