16 - Confidence Intervals Flashcards
What is a confidence interval?
- range of feasible values for an unknown population parameter
- µ (pop mean), p (pop proportion)
- statement conveying the confidence that the range of feasible values really does include the unknown population value
Because proportions are averages, the CLT implies…
a normal model for the sampling distribution of p^ if the sample size n is large enough:
p^ ~ N[p, p(1-p)/n]
SE (p^)
sqr[p(1-p)/n]
If we use the percentile of the normal distribution, z0.025, then
z0.025 = 1.9, and
P(-1.96 SE(p^) <= p^ - p <= 1.96 SE (p^)) = 0.95
p^ lies within 1.96 standard errors of p in 95% of samples
se(p^) =
sqr[p^(1-p^)/n]
CI for p
The 100(1-a)% z-interval for p is the interval from…
p^-za/2sqr[p^(1-p^)/2] to p^+za/2sqr[p^(1-p^)/2]
Xbar =
xbar =
mean of a randomly chosen sample
mean of the observed sample
SE(Xbar) =
se(Xbar) =
σ/sqr(n)
s/sqr(n)
Student’s t-distribution
very similar to the normal distribution, but the t has fatter tails
incorporates excess variability
as sample size n gets larger, the t-distribution convergs to the standard normal distribution
defining the t-distribution
any normal random variable, its Z-score:
(Xbar - µ)/(σ/sqr(n)) = Z ~ N(0, 1)
replace σ with s (sample SD), its Z-score:
(Xbar - µ)/(s/sqr(n)) = T ~ Tn-1
- Tn-1 → a random variable with n-1 degrees of freedom
Student’s t-distribution compensates for…
Exact sampling distribution of random variable Tn-1…
substituting s for σ in the standard error.
The exact sampling distribution of the random variable Tn-1 = (Xbar - µ)/(S/sqr(n))
degrees of freedom
n-1; larger n = better estimate of a standard normal distribution
Degrees of freedom is necessary because…
mimics sample size,
there will be more variability in s for small sample sizes than for large sample sizes
Confidence interval for µ
The 100(1-a)% confidence t-interval for µ is
xbar - ta/2,n-1 s/sqr(n) to xbar + ta/2,n-1 s/sqr(n)
Interpreting CI’s
95% of intervals created according to this procedure are expected to contain μ.
From the CLT, we know:
E(Xbar) =
Var(Xbar) =
SD(Xbar) = SE(Xbar) =
sample mean:
μ
σ2/n
σ/sqr(n)
Manipulating CI’s
you can transform the ends of the CI to obtain a new CI for the transformed parameter
Ex. MPG → L/100km
Ex. probabilities → odds ratios
_________ provides a 95% confidence interval for µ
Xbar +/- 1.96 σ/sqr(n)
or
Xbar +/- 2 σ/sqr(n)
sample mean of 0/1 variables
sample proportion of 1’s
this means we can still apply CLT to sample proportion
p =
p^ =
population proportion
sample proportion
Sampling distribution of p^
p^ ~ N(p, p(1-p)/n)
key assumptions of sampling distribution of p^
- we have an independent sample from the pop
- sample size is large enough for CLT to be applied
- sample size rule: np^ and n(1-p^) > 10
Rule for the sample size
both np^ and n(1-p^) > 10
*#of successes and failures each have to be greater than 10*
_______ provides a 95% confidence interval for the population proportion
p^ +/- 2se(p^) = p^ +/- 2sqr[p(1-p)/n]
but since we don’t know p, we replace it with p^:
p^ +/- 2sqr[p^(1-p^)/n]
What is MoE?
Margin of Error is the distance from the center to the edge of the interval
MoE = 2 s/sqr(n) ⇒ critical value * standard error
Ex. if you want a 95% confidence level, MoE = 1.96 σ/sqr(n)
Sample size formula for a population mean (µ) with 95% z-interval
n= (critical value * σ/MoE)2
sample size for estimating a population proportion
n = (1/MoE)2
use only if:
- you want a 95% CI
- p (pop proportion) lies between 0.25 & 0.75
Ex. a company has estimated the proportion of doctors who say they’ll prescribe a new drug as 30%. They used a sample size of 100, but didn’t provide a CI.
What was the MoE for a 95% CI?
n= (1/MoE)2 ⇔ MoE = 1/sqr(n)
MoE = 1/10 = 0.1
CI is approx (30% +/- 10% ) = (20%, 40%)