1.5: Point Estimates, Confidence intervals, and resampling Flashcards
The two branches of statistical inference
hypothesis testing and estimation
hypothesis testing
seeks to find if the value of a parameter equals some specific value
Estimation
seeks to find the value of the parameter
Estimators
the formulas used to calculate the sample statistics
Estimates
are the particular values derived from these estimators
An unbiased estimator
one whose expected value equals the parameter it is estimating
An efficient unbiased estimator
has the smallest sampling distribution variance for a given sample size
ex:
–> Estimator A is efficient because its estimates are tightly grouped around the true value of μ (smaller standard error).
–> Estimator B is inefficient because its estimates are more spread out from the true value of μ (larger standard error)
A consistent estimator
gets closer to the population parameter’s value as the sample size increases
As the sample size approaches infinity, the standard error will approach zero, and the distribution will fully concentrate over the true population value
ex:
–> Estimator A is consistent because its standard error significantly narrows down when sample size increases.
–> Estimator B is inconsistent. Increasing sample size barely improves the accuracy of the estimate
A point estimate is unlikely to exactly equal the population parameter due to sampling error
what should we use then?
An interval estimate
A 100(1−α)% confidence interval
is a range that has a 1−α probability of containing the parameter, where α is the significance level
ex: using a 5% significance level creates a 95% confidence interval around the sample mean. We can be 95% confident that the population mean falls somewhere in this interval
A 100(1−α)% confidence interval is calculated by:
Point Estimate ± Reliability Factor × Standard Error
The 100(1−α)% confidence interval for a population mean from a normally distributed population with known variance is:
what does this do?
X¯ ± z(of)(α/2) * (σ/√n)
This produces a confidence interval with upper and lower bounds with a total of α
probability that the population mean is outside the confidence interval
z(of)(α/2) is used because α/2 represents what percent would be in each tail.
When the population variance is unknown, as is often the case, it is appropriate to use the sample standard deviation as a substitute for the population standard deviation.
what is the formula?
X¯ ± z(of)(α/2) * (s/√n)
the t-distribution
used for confidence intervals when the population variance is unknown
This is valid even when the sample size is small
Since it is more conservative (i.e., the reliability factor is bigger), the confidence interval will be wider
The confidence interval for the population mean can use the t-distribution when the variance is unknown provided the sample is large, or the population is approximately normally distributed.
what is the formula to do so?
X¯ ± t(of)(α/2) * (s/√n)
degrees of freedom: n - 1
we have to use the t table and see where the level of confidence intersects with the degrees of freedom on the table to
which do we use between z and t distributions for:
large sample size
Unknown population variance
t is better
z is acceptable
which do we use between z and t distributions for:
large sample size
known population variance
z
which do we use between z and t distributions for:
small sample size
not a normal distribution
not available
which do we use between z and t distributions for:
small sample size
normal distribution
known population variance
z
which do we use between z and t distributions for:
small sample size
normal distribution
unknown population variance
t