Interval Estimation Flashcards
What are point estimates?
Are single numbers obtained by estimation of population parameters from sample statistics
What is a confidence interval?
Given that we know that there is some variability or uncertainty around the point estimate we want to know an indication of how close our estimate is likely to be to the true value.
What is the confidence interval formula for the mean μ when the population is normally/approximately normally distributed (when the population variance is known - unlikely case)? write it down
This comes from the formula:
z= (X ¯ - μ)/ σ/ √n
P(X ¯ − (Zα σ) / √n ≤ μ ≤ X ¯ +( Zα σ ) √n) = p
x CI (X ¯ − (Zα σ) / √n , X ¯ +( Zα σ ) √n)
Where p is known as the confidence value e.g. 95%
What does a 95% confidence interval for μ mean?
That there is a 95% probability that these confidence intervals will contain μ.
Not that the probability that μ lies in the confidence interval is 95%.
what is a critical value for a confidence interval?
he value of the test statistic which defines the upper and lower bounds of a confidence interval.
(when population variance is known - unlikely case)
e.g. normal distribution, 95% CI for μ, find α, α= 0.025, for lower bound Zα corresponds to - 1.96 for upper bound 1- α = 0.975 and upper bound corresponds to 1.956, these are the critical values for the 95% CI of μ, due to the symmetry of the normal distribution are -1.96 and 1.96.
What does the width of the interval depend on?
(when population variance is known - unlikely case)
look at the formula:
X ¯ − (Zα σ) / √n ≤ μ ≤ X ¯ +( Zα σ ) √n
It depends on the population standard deviation σ, the confidence value p, and the sample size n.
- If σ or p increases, the interval gets wider
- If n increases it gets narrower
When the population is normally distributed/approximately normally distributed and the population variance σ^2 and the population mean μ are unknown what formula is used instead of the Z one?
The formula for the random variable T has a student’s t-distribution with n-1 DF (degree of freedom)
T = (X ¯ - μ)/ s/ √n
Where the only thing that changed is the population standard deviation σ is substituted with the population one s.
This makes sense since X ¯ is normally distributed and s is X^2(n-1) (Chi-squared ) distributed
What are the characteristics of a student’s t-distribution?
- It’s bell-shaped
- is symmetric about 0 and has 0 means
- has a larger variance (it’s more spread) than a normal distribution
- as n increases the variance decreases and the distribution goes toward a normal distribution, IF n>40 NORMAL DISTRIBUTION IS AN ACCEPTABLE APPROXIMATION.
What is the confidence interval formula for the mean μ when the population is normally/approximately normally distributed (when the population variance is unknown - likely case)? write it down
c= tα/2, n-1
n-1=
α /2= (1-p ) /2
P((X ¯ − (c s) / √n ≤ μ ≤ X ¯ +( c s) √n) = p
(X ¯ − (c s) / √n, X ¯ +( c s) √n)
When can the t-distribution be used if the population is NOT normally/approximately normally distributed? ( 2 cases)
- When n>=30 (large sample case, Central LimitTheorem rule)
- When n<30 but the population has a bell-shaped distribution like a Binomial distribution with the probability of success very close to 0.5.
E.g.
X ¯ =8.17, s^2 = 1.42 , s = 1.191, Population
is roughly bell-shaped and n=56 → t-distribution is good approximation
CI 95% of μ?
n-1 = 55
α /2= (1-0.95 ) /2 =0,05/2 = 0.025
c= t0.025,55, 55 is not on the t-distribution table so either find the value in between numbers that are there or ince n>40 normal distribution is an acceptable approximation → use z table.
(X ¯ − (c s) / √n ≤ μ ≤ X ¯ +( c s) √n)
What is the formula for w, the width of the interval? write it down
If s is approximately known in advance what can it be used for?
w = 2 (tα /2, n-1) (s/√n)
n = (4 (tα /2, n-1)^2 s^2) / w^2
This formula can be used to work out how large the sample is to get an estimate of the mean within a certain width of interval
How do you compare two samples mean?
Two samples can be compared by establishing a confidence interval for the difference of their mean.
- X1..Xm is a random i.i.d. sample from a population mean μ1 and σ^2 1
- Y1..Yn is a random i.i.d. sample from a population mean μ2 and σ^2 2
- X and Y are independent variables
For μ1- μ2 we can use the unbiased estimator X ¯ - Y¯ which has a variance σ^2. Var (X ¯ - Y¯ )= σ^2 1/m + σ^2 2 /n (the variance is like this because of the independence Var( X ¯ - Y¯ = Var (X ¯) - Var (Y¯ )
How do you compare two samples’ mean when the population variance is unknown (likely case)? 2 cases
1) The sample size is m>30 and n>30 (large samples)
From this formula:
z= (X ¯ - μ)/ σ/ √n
z =(X ¯ - Y¯) - (μ1- μ2) / √(s^21 /m + s^22 /n)
IN THIS CASE (of comparison of two samples) thanks to the Central Limit Theorem the statistics have a normal distribution (even if it is not >40)
The CI would be:
(X ¯ - Y¯ - zp/2 √(s^21 /m + s^22 /n), X ¯ - Y¯ + zp/2 √(s^21 /m + s^22 /n)
2) Both population distributions are normally/approximately normally distributed and σ^2 1 = σ^2 2= σ^2, or “reasonably close”. (might check histogram data)
Given the variance: Var (X ¯ - Y¯ )= σ^2/m + σ^2/n = σ^2 (1/m +1/n)
An unbiased estimator for σ^2 of the X ¯ - Y¯ distribution is the pooled estimator sp^2
sp^2= ((m-1)s^21 + (n-1)s^2)/(m+n-2)
Then from:
T = (X ¯ - μ)/ s/ √n
t =(X ¯ - Y¯) - (μ1- μ2) / sp√ (1/m + 1/n)
With m+n-2 DF
c= t df,α
((X ¯ - Y¯) - c /sp√ (1/m + 1/n), (X ¯ - Y¯) + c/sp√ (1/m + 1/n)) (not ure about this)
Note: for small samples when σ^2 1 and σ^2 are very different there is no easy procedure to find the difference in sample mean even if the populations are normally distributed.
How do you compare two samples mean when the population variance is known (unlikely case)?
From this formula:
z= (X ¯ - μ)/ σ/ √n
z =(X ¯ - Y¯) - (μ1- μ2) / √(σ^21 /m + σ^22 /n)
Which then follows a standard normal distribution N(0,1)
TheCI would be:
(X ¯ - Y¯ - zp/2 √(σ^21 /m + σ^22 /n), X ¯ - Y¯ + zp/2 √(σ^21 /m + σ^22 /n)
What does it mean when 0 is included in a confidence interval for a parameter (such as the difference between two means)?
It suggests that there is no statistically significant effect or difference at the given confidence level.