Chapter 3 and Chapter 20 Flashcards
A numerical value used as a summary measure for a population.
Population Parameter
A numerical value used as summary measure for a sample.
Sample Statistic
A sample statistic used to estimate the corresponding population parameter.
Point Estimator
A measure of central location computed by summing the data values and dividing by the number of observations.
Mean
A measure of central location provided by the value in the middle when the data are arranged in ascending order.
Median
A measure of location defined as the value that occurs at the greatest frequency.
Mode
The mean obtained by assigning each observation a weight that reflects its importance.
Weighted Mean
A measure of location that is calculated by finding the nth root of the product of n values.
Geometric Mean
A measure of location that is calculated by removing a percentage of the smallest and largest values from a data set, then calculating the average of the remaining values.
Trimmed Mean
A value such that at least p% of the observations are less than or equal to this value and at least (100 - p)% of the observations are greater than or equal to this value.
Percentile
The 25th, 50th, and 75th percentiles which can be used to divide a data set into four parts, with each part containing approximately 25% of the data.
Quartiles
A measure of variability defined to be the largest value minus the smallest value.
Range
A measure of variability defined to be the difference between the third and first quartiles.
Interquartile Range (IQR)
A technique that uses the smallest value, first quartile, median, third quartile, and largest value to summarize the data set.
Five-Number Summary
A graphical summary of data based on a five-number summary.
Boxplot
A measure of variability based on the squared deviations of the data values about the mean.
Variance
A measure of variability computed by taking the positive square root of the variance.
Standard Deviation
A measure of relative variability computed by dividing the standard deviation by the mean and multiplying by 100.
Coefficient of Variation
Name the descriptive statistic described by the following statement: The sample standard deviation is 18.2% of the value of the sample mean.
Coefficient of Variation
Name the descriptive statistic that is useful for comparing the variability of variables that have different standard deviations and different means.
Coefficient of Variation
Name the descriptive statistic that 1) finds the distance from the mean for each data value, and then 2) finds the average of those distances.
Standard Deviation
A higher coefficient of variation means the data set is more variable / less variable.
More Variable
A value computed by dividing the deviation of a data value from the mean by the standard deviation.
Z-score
Another name for a standard score.
Z-score
An unusually small or unusually large data value.
Outlier
A measure of the shape of a data distribution.
Skewness
If the data is skewed to the left, the skewness is positive / negative.
Negative
If the data is skewed to the right, the skewness is positive / negative.
Positive
A symmetric data distribution has a skewness equal to _________ .
Zero
A theorem that can be used to make statements about the proportion of data values that must be within a specified number of standard deviations of the mean.
Chebyshev’s Theorem
A rule that can be used to compute the percentage of data values that must be within one, two, and three standard deviations of the mean for data that exhibit a bell-shaped distribution.
Empirical Rule
A measure of linear association between two variables.
Covariance
A positive covariance indicates a positive / negative linear relationship.
Positive
A negative covariance indicates a positive / negative linear relationship.
Negative
A measure of linear association between two variables that takes on values between -1 and +1.
Correlation Coefficient
If a correlation coefficient value is near +1, this indicates a 1) strong / weak 2) positive / negative linear relationship.
1) Strong
2) Positive
If a correlation coefficient value is near -1, this indicates a 1) strong / weak 2) positive / negative linear relationship.
1) Strong
2) Negative
If a correlation coefficient value is near zero, this indicates…
A lack of linear relationship
What are two methods of detecting outliers?
Z-score Interquartile Range (Fences)
When using z-scores to identify outliers, a data value with a z-score greater than ____ or less than ____ is treated as an outlier.
+3
-3
When using the IQR (fences) to identify outliers, a data value is classified as an outlier if it is greater than the ______ ______ or less than the _______ _______ .
Upper Limit
Lower Limit
How do you compute the upper limit?
Q3 + 1.5 (IQR)
How do you compute the lower limit?
Q1 - 1.5 (IQR)
What are three reasons a data set may contain outliers?
1) A data value was incorrectly recorded
2) An observation was incorrectly included in the data set
3) A data value is unusual, but it was recorded correctly and should be included in the data set
The ______ _______ states that for data sets with a bell-shaped distribution, almost all the data values will be within ___ standard deviations of the mean.
Empirical Rule
3
Of the two methods for detecting outliers, the _______ method cannot be used for data sets that do not have a bell-shaped curve.
Z-score
The correlation coefficient indicates the ________ and ________ of a linear relationship.
Strength
Direction