Confidence Intervals Flashcards
Point Estimate
- uses a single value to estimate a population parameter
- doesnt express the uncertainty
Interval estimate
- uses a range of values to estimate a population parameter.
- Confidence Interval
What’s the equation for Confidence Interval?
= sample statistic (mean) ± margin of error
Main Components of Confidence Interval
- sample statistic
- margin of error
- confidence level
Sample Statistic
sample mean or sample proportion
Confidence level
- The confidence level describes the likelihood that a particular sampling method will produce a confidence interval that includes the population parameter.
- common confidence levels
- 90%
- 95% - most popular
- 99%
margin of error
ME = z-score * SE (large smaple n>=30)
ME = t-score * SE (small sample n<30)
represents the maximum expected difference between a population parameter and a sample estimate.
- This range of values expresses the uncertainty in your estimate due to random sampling
Steps to Construct Confidence Interval
- Identify a sample statistic.
- Choose a confidence level.
- Find the margin of error.
- Calculate the interval.
Why does data professional use confidence interval?
to help describe the uncertainty surrounding an estimate.
Interpretation of the confidence interval
Technically, 95% confidence means that if you take repeated random samples from a population, and construct a confidence interval for each sample using the same method, you can expect that 95% of these intervals will capture the population mean. You can also expect that 5% of the total will not capture the population mean.
Incorrect interpretation of the confidence interval
- 95% refers to the probability that the population mean falls within the constructed interval. It’s not correct to say there is a 95% chance that your confidence interval captures the population mean because this implies that the population mean is variable. Intervals change from sample to sample, but the value of the population mean is constant
- 95% refers to the percentage of data values that fall within the interval
- 95% refers to the percentage of sample means that fall within the interval
Z-scores
For large sample sizes, you use z-scores to calculate the margin of error.
This is because of the central limit theorem: for large sample sizes, the sample mean is approximately normally distributed.
For a standard normal distribution, also called amz-distribution, you usez-scores to make calculations about your data.
T-scores
For small sample sizes (n < 30), you need to use the t-distribution.
Statistically speaking, this is because there is more uncertainty involved in estimating the standard error for small sample sizes
But, the t-distribution has bigger tails than the standard normal distribution does. The bigger tails indicate the more outliers that come with a small dataset.
As the sample size increases, the t-distribution approaches the normal distribution.
When the sample size = 30, the distributions are practically the same, and you can use z-score for your calculations.
CI Step 1: Identify a sample statistic
IF your sample represents the average emissions rate for 15 engines. You’re working with a sample mean.
CI Step 2: Choose a confidence level
common confidence levels
- 90%
- 95% ( most popular)
- 99%