Chapter 3 Flashcards
3 main “measures of center”
- Mean
- Median
- Mode
Mean
Obtained by dividing the sum of all values by the number of values in the data set.
Median
The value that divides a data set that has been sorted in increasing order into two equal halves.
Mode
The value that occurs w/ the highest frequency in a data set.
Mean for population data
u = sum on all x’s / N
Mean for sample data
X bar = sum of all x’s /n
2 steps to calculate the median
- Sort the data set into increasing order
- Find the value that divides the sorted data set in two equal parts.
Can there be no mode?
Yes
Can modes be from qualitative data?
Yes
Can there be more than one mode?
Yes
What are the mean, median, and mode of a symmetrical histogram /distribution curve
Mean = median = mode
Mean, median, and mode of a right-skewed histogram
Mean > median > mode
Left-skewed histogram mean, median, and mode
Mean < median < mode
Trimmed mean
After we drop K% of the values from each end of a ranked data set, the mean of the remaining values is called the K% trimmed mean.
Weighted mean
When each value of a data set is assigned a different weight.
Sum of x* W/ sum of W
Measures of dispersion tell us…
How much variation exists around that “typical value”
3 main measures of dispersion
- Range
- Variance
- Standard deviation
Range
The difference between the largest value and the smallest value.
Variance
A measure of how much the values in a dataset differ from the mean.
Standard deviation
A measure of the average distance of each data point from the mean. The square root of variance
Range formula
Largest value - smallest value
Disadvantages of range
- Only based on 2 values
2, affected by outliers
Can the variance and the standard deviation be negative?
No
Units for standard deviation
Same as the original units
Units for variation
The square of the original data’s units
Coefficient of variation ( CV )
A measure of relative variability.
Useful if you are comparing the variation of two datasets w/ different magnitudes of value.
Variance and SD depend on…
The units of measurement
Coefficient of variation units
Expressed as a percentage of the mean.
Has no units and is always expressed as a percentage.
Coefficient of variation formula
100 x SD / mean
Mean of grouped data for population data
u = sum of frequency x midpoint /n
Mean of grouped data for sample data
X bar = sum of frequency x midpoint /n
Standard deviation
A measure of the average distance of each data point from the mean.
ChebyShev’s theorem
For any number k greater than 1, at least ( 1-1 / k^2 ) of the data values lie within K standard deviations of the mean
ChebyShey’s theorem works for…
Any distribution shape
Empirical rule
If our distribution is a “bell-shaped”or “normal” or “Gaussian” we use the empirical rule.
68% of observations lie w/ in one standard deviation of the mean
95% of the observations lie with in 2 SDs of the mean
99. 7% of the observations lie with in 3 SDs of the mean
Quartile
Three summary measures ( Q1, Q2, Q3 ) that divide a ranked data set into four equal parts.
Q2 is the same as the median.
Splits the data into 4 sections. (Each contains 25% of the observations of a data set)
Interquartile range (irq)
The difference between Q3 and Q 1
IRQ = Q3 - Q1
Another measure of dispersion.
Small IRQ = less spread out data
Large IRQ = more spread out data
Percentiles
99 summary measures that divide a ranked data set into 100 equal parts.
Each portion contains 1% of the observations of a data set.
The (approximate) value of the K Th percentile is sample of size N is:
Pk = value of the ( kn / 100 ) Th term in a ranked data set
Always round the position up.
Given a certain number in a set and find its percentile.
Percentile = number of values less than k / total number of values in the data set X 100%
Box-and-whisker plot
Shows 5 measures:
1. Median
2.Q1
3. Q3
4. Minimum
5. Maximum
Lower inner fence = Q1 - 1.5x IQR
Upper inner fence = Q3 + 1.5x1QR
Outliers are plotted outside the fences.