Module 3 Notes - Numerical Descriptive Measures Flashcards
The _______ ________ is the extent to which the values of a numerical variable group around a typical or central value.
central tendency
the _________ is the amount of dispersion or scattering away from a central value that the values of a numerical variable show
variation
the _____ is the pattern of a distribution of values from the lowest to the highest value
shape
Arithmetic Mean
A= \frac {1}{n} \sum \limits_{i=1}^n a_i
Middle value in the ordered array
Median
Most frequently observed value
Mode
the __________ ____ (often just called “mean”) is the most common measure of central tendency.
*For a sample of size n (lower case n):
arithmetic mean
*The most common measure of _______ ________.
*____ = sum of values divided by the number of values
*Affected by extreme values (outliers).
Mean
*In an ordered array, the ______ is the “middle number (50% above, 50% below)
*less sensitive than the mean to extreme values
median
Locating the Median
*The location of the median when the values are in numerical order (smallest to largest):
*If the number of values is odd, the media is the middle number
*If the number of values is even, the media is the average of the two middle numbers
Median Position = n+1/2 position in the ordered data
*Value that occurs most often
*Not affected by extreme values.
Mode
Range, Variance, Standard Deviation, Coefficient of Variation
-Measures of _________ give information on the spread or variability or dispersion of the data values
Measures of Variation
*Simplest measure of variation.
*Difference between the largest and smallest value
Range.
*Does not account for how the data are distributed.
*Sensitive to outliers
“Why the _____ can be misleading”
range
*Average (Approx.) of squared deviations of values from the mean.
Sample Variance
*Most commonly used measure of variation.
*Shows variation about the mean.
*Is the square root of the variance.
*Has the same units as the original data.
Sample standard deviation
Steps for computing _________ _________
1. Computer the difference between each value and the mean.
2. Square each difference.
3. Add the squared differences.
4. Divide this total by n-1 to get the sample variables.
5. Take the square root of the sample variance to get the sample ________ _________
standard deviation.
*Measures relative variation.
*Always in percentage (%)
*Shows variation relative to mean.
*Can be used to compare the variability of two or more sets of data measured in different units.
The Coefficient of variation (Standard Deviation / Mean) * 100
Locating Extreme Outliers: _-_____
Z=X-x̄/S
Where X represents the data value
x̄ is the sample mean
S is the sample standard deviation
Z-score
*Suppose the mean math SAT score is 490, with a standard deviation of 100.
*Computer the Z-score for a test score of 620.
(Z=x-x̄/s)=(620-490/100)=(130/100)=1.3
-A score of 620 is 1.3 standard deviations above the mean and would not be considered an outlier.
*A data value is considered an extreme outlier if its Z-score is less than -3.0 or greater than +3.0
Z-score
The more data are spread out, the greater the _____, ________, and ________ __________.
range, variance, standard deviation
The more data are concentrated, the smaller the _____, ________, and ________ _________.
range, variance, and standard deviation
If the values are all the same (no variation) all these measures will be zero
range, variance, and standard deviation
None of these measures are ever in negative.
range, variance, and standard deviation
The larger the absolute value of the _-_____, the farther the data value is from the mean.
Z-score
*Measures the extent to which data values are not symmetrical.
Skewness
*Measures the peakedness of the curve of the distribution -that is how sharply the curve rises approaching the center of the distribution.
Kurtosis
Measures the extent to which data is not symmetrical.
Skewness
Mean < Median
Left-Skewed
Mean = Median
Symmetric
Median < Mean
Right Skewed
Sharper peak than bell-shaped (Kurtosis > 0)
Leptokurtic
Bell-shaped (Kurtosis = 0)
Mesokurtic
Flatter than bell-shaped (Kurtosis < 0)
Platykurtic
*Can visualize the distribution of the values for a numerical variable by computing:
*The _________
*The five-number _______.
*Constructing a _______.
quartiles, summary, boxplot
_________ split into 4 segments with an equal number of values per segment.
Quartiles
*the first ________ Q1, is the value for which 25% of the values are smaller than 75% are larger.
quartile
*Q_ is the same as the media (50% of the values are smaller and 50% are larger).
Q2
*Only 25% of the values are greater than the third quartile.
Find a ________ by determining the value in where the appropriate position in the ranked data
quartile
_____ quartile position: Q1 = (N+1)/4
where n is the number of observed values
First
______ quartile position: Q2=(n+1)/2
where n is the number of observed values
Second
_____ quartile position: Q3 = 3(n+1)/4
where n is the number of observed values
Third
The ___ is Q3-Q1 and the measures the spread in the middle 50% of the data.
IQR
The ___ is also called the midspread because it covers the middle 50% of the data.
IQR
-measure of variability that is not influenced by outliers or extreme values.
IQR
Measures like Q1, Q3, and IQR that are not influenced by outliers are called _________ _______.
resistant measures
*Range is the difference between the smallest values
*IQE is
Q3-Q1
The five numbers that describe center, spread, and shape of data are:
*Xsmallest
*First Quartile (Q1)
*Median (Q2)
*Third Quartile (Q3)
*Xlargest
Five number Summary
The _______: A graphical display of the data based on the five-number summary
Boxplot, Xsmallest – Q1 – Median – Q3 – Xlargest
(If the data are symmetric around the median then the box and central line are centered between the endpoints
*A _______ can be shown in either a vertical or horizontal orientation
Boxplot
*The __________ mean is the sum of the values in the population (not the sample) divided by the population size, N (not the sample size)
Population mean
μ
population mean
Population mean equation: N
Population size (Capital N)
Population mean equation: Xi
ith value of the variable X
Average of squared deviation of values from the population mean.
Population variance.
*Most commonly used measure of variation.
*Shows variation about the mean.
*Is the same square root of the population variance.
*Has the same units as the original data.
The Standard Deviation σ
Mean: μ
Variance: σ^2
Standard Deviation: σ
Population Parameter Measure
Mean: X
Variance: S^2
Standard Deviation: S
Sample Statistic Measure
*The _________ ____ approximates the variation of data in a symmetric mound-shaped distribution.
*Approximately __% of the data in a symmetric mound shaped distribution is within 1 standard deviation of the mean or μ ± 1 σ
The Empirical Rule
approximately __% of the date in a symmetric mound-shaped distribution lies within two standard deviations of the mean, or μ ± 2σ
95
approximately __% of the date in a symmetric mound-shaped distribution lies within three standard deviations of the mean, or μ ± 3σ
99.7
______ plots allow you to visually examine the relationship between two numerical variables and now we will discuss two quantitative measures of such relationships.
*The Covariance
*The Coefficient of Correlation.
Scatter plots
*The ___________ measures the strength of the linear relationship between two numerical variables (X&Y)