lecture 8 Flashcards
describe empirical rule - gen
distributional properties in pop or sample
based on measures of central tendency and variability
tells us how much of data we will observe in 1,2 or 3 standard deviations away from mean
*for perfectly symmetric mound shaped distributions, concentrated around mean
describe empirical rule - specifics
roughly 68% of the obs will lie in the range = mean +- s.d.
roughly 95% of the obs will lie in the range = mean +- (2 x s.d.)
roughly 99.7% of the obs will lie in the range = mean +- (3 x s.d.)
“how much data in region”
describe empirical rule - graphs
only theoretical
area under curve - just add up heights then divide by sample size = give relative freq
Riemann sum = histogram approximation
sample based calculation if replace pop quantities with sample quantities = get same result
describe empirical rule - body temp ex
gives bounds = 1,2 or 3 s.d. away
Empirical = theoretical
actual = pretty close but not exact
as long as graph is symmetric (no skew) = result will hold
describe empirical rule - heights ex where = mean (xbar) = 72 inches and s = 3.5 inches
where 68% of men in sample will have height in interval and same for 95%
(72-3.5, 72+3.5) = (68.5, 75.5 inches) ~ 68%
(72- 2 x 3.5, 72+ 2 x 3.5) = (65, 79 inches) ~95%
describe empirical rule - statistics grades ex - want to find out interval that captures ~95% of observed data ??
(xbar-2s, xbar+2s) is the interval
approximating sample content in particular regions of observation range
describe empirical rule - mathematics grades ex
if assume symmetric and calculate for ~95%, interval = 56.3, 110.7)
above 100
limitation = symmetric construction can lead to interval that doesnt overlap with measurement range
symmetric but mound in upper range
empirical rule NOT good to use here
can adjust interval to (56.3, 100)
not possible in context of experiment, outliers at lower end inflate
describe empirical rule - exam grades ex
graph shows long left handed tail =negative skew
also upper range above measurement range
Empirical rule does not work well
still roughly right proportion but range not sensible
does the empirical rule always work
NUH UHNNNN
for skewed data = no
happens often with range restricted observations
Empirical rule usually pretty robust but can be broken
interval will not contain amount of data it says it will
what is chebyshev’s rule - gen
gives aprox bound
at least certain % can fall within interval
empirical rule = symmetric
but this is for any distribution
describechebyshev’s rule - formula
for any distribution any number k>1: at least (1-(1/k^2)) x 100% of observations will fall into interval (xbar-ks, xbar+ks)
REGARDLESS of shape of histogram
shows relationship between mean, variance and distribution
describe chebyshev’s rule - formula for specific k’s
k=2, % of obs falling within 2 s.d. of mean is AT LEAST = (1-1/2^2) x100% = 75%
k=3, % of obs falling within 2 s.d. of mean is AT LEAST = (1-1/3^2) x100% = 75%
Result applies to samples and populations
describe chebyshev’s rule - math grades ex
couldnt rely on empirical rule
since distribution skewered
gave an interval that didn’t work
still gives same interval for chebyshev’s rule bur now = AT LEAST 75% instead of 95% from empirical rule
describe chebyshev’s rule - body temp ex
can apply chebyshevs and say that at least 75% of temps are in interval (xbar -2s, xbar+2s)
but in this case empirical rule does apply so
Roughly 95% of obs is a better approx than at least 75% in this case
empirical rule is more precise
describe chebyshev’s rule - conclusions
if data is mound shaped and approx symmetrical = better to use empirical rule = close approximation to the actual percentage
otherwise = use chebyshevs
can approximate bin content in any part of range just using sample mean and variance