Exam 2 Study Guide Flashcards
What is Statistical Inference?
Draw conclusions about population parameters using sample statistics.
What is a Parameter?
A measurement on a population.
What is a Statistic?
A measurement on a sample.
What two questions should we ask to identify the parameter in inference?
- How many variables are being measured (one or two)?
- What type of data do we have (categorical or quantitative)?
What parameter is used for one quantitative variable?
μ, the population mean.
What parameter is used for one categorical variable?
p, the population proportion.
What is the parameter for two categorical variables?
The difference between population proportions, p1 – p2.
What is the parameter for two categorical groups measured with quantitative data?
The difference between population means, μ1 - μ2.
Are parameters fixed values?
Yes, parameters are fixed values but often unknown.
What do statistics do when different samples are taken?
Statistics vary from sample to sample.
What is a Sampling Distribution?
Distribution of all statistics calculated from all samples of the same size, n, collected from a population.
What is required for constructing sampling distributions?
All samples must be collected randomly.
What shape does the sampling distribution of the sample proportion take with a large enough sample?
Bell-shaped and symmetric.
What is the center of the sampling distribution of the sample proportion?
The population proportion, p.
What happens to the standard error as sample size n increases?
The standard error decreases.
What shape does the sampling distribution of the sample mean take if the population is normal?
Normal.
What is the center of the sampling distribution of the sample mean?
The population mean, μ.
What is a Confidence Interval?
An interval computed from sample data that captures the parameter for a specified proportion of all samples.
What does a confidence interval provide?
- A range of plausible values for a population parameter.
- A confidence level.
What does a 95% confidence interval interpretation mean?
We are 95% confident that the interval captures the true population parameter.
What is the formula for a Confidence Interval (CI)?
1CI: statistic + margin of error.
What does the margin of error include?
The standard error.
What is Bootstrapping?
A method to estimate the standard error of a statistic without knowledge of the population.
What does the bootstrapping process involve?
Sampling with replacement from the original sample.
How is a 95% confidence interval constructed using bootstrapping?
95% CI: statistic + 2(SE).
What happens to the width of the CI as the sample size increases?
The width of the CI decreases.
What is the purpose of a Hypothesis Test?
To judge or test between two different claims about the parameter.
What is stated in the null hypothesis?
The null hypothesis (H0H0) always states that there is no effect, no difference, or no relationship
What are the steps of a Hypothesis Test?
- State the null and alternative hypotheses.
- Find the observed sample statistic.
- Obtain the p-value.
- Make a decision about H0.
- Make a conclusion in context.
What does the Null Hypothesis claim?
The effect we are looking for does not exist.
What does the Alternative Hypothesis claim?
The claim we are looking for does exist.
What is the p-value?
The proportion of samples that would give a statistic as extreme as the observed sample result, when the null hypothesis is true.
What does a smaller p-value indicate?
Stronger statistical evidence against the null hypothesis.
What is the Decision Rule in hypothesis testing?
- If p-value < α, then reject H0.
- If p-value > α, then do not reject H0.
What is a Type I Error?
Occurs when we reject a true null hypothesis (false positive).
What is a Type II Error?
Occurs when we do not reject a false null hypothesis (false negative).
How is a two-tailed hypothesis test connected to confidence intervals?
Reject H0 if the parameter stated in H0 is not captured by a (1 – α) confidence interval.
Symbol for
One Quantitative - Mean
Parameter/Statistic
What is a Parameter and what is a Statistic?
μ /x bar
Parameter is the measurement of a Population (p for p)
Statistic is measurement on a SAMPLE (s for s)
Symbol for
One Quantitative - SD
Parameter/Statistic
σ/s
Symbol
One Categorical - Proportion
Parameter/Statistic
ρ/p^
Symbol
Two categorical
Parameter/Statistic
p1 - p2/p^1 - p^2
Symbol
One Categorical/One Quantitative
μ1 - μ2/ xbar1 - xbar 2
a.) 0.56
b.) 0.63
c.) 0.70
d.) 0.91
b.) 0.63
We’re looking at the mean of an evenly distributed bell-shaped graph. The mean is clearly 0.63 based on the graph
If the samples of size n=65 had been used instead of n = 35, what would be true?
- The variability in the sample statistics would be about the same
- The same statistics would have more variability
- The sample statistics would have less variability
The same statistics would have less variability
Higher sample size, more accurate
Reasonably likely
Unusual but might occur occasionally
Extremely unlikely to occur
Extremely unlikely to occur
x-bar represents the sample mean, or average value
- The sample means would be centered at the same value
- The sample means would be centered at a smaller value
- The same means would be eentered at a larger value
- The sample means would be centered at the same value
US adults (age 18 and over)
1012 randomly selected US adults
p-hat = proportion of the sample of 1012 randomly selected US adults who are dissastified = 0.53
Incorrect; A 95% confidence interval means that if we took many random samples, about 95% of the intervals would contain the true population parameter. It reflects how confident we are in the estimate based on the sample.
Incorrect
Answer: Not a proper interpretation.
Once the interval is calculated, the population parameter either is or isn’t in it — there’s no probability anymore. The 95% refers to the long-run success rate of the method, not the specific interval.
Answer: 8.7% to 10.1%
Explanation: Use the formula for a 95% confidence interval:
point estimate±1.96×standard error
point estimate±1.96×standard error
9.4%±1.96×0.35%=9.4%±0.686%⇒[8.7%,10.1%]
9.4%±1.96×0.35%=9.4%±0.686%⇒[8.7%,10.1%]
Answer: Possible
Explanation: Bootstrap samples are created by randomly sampling with replacement from the original sample. Since 20 is in the original, it’s possible (though unlikely) to draw 20 repeatedly.
Answer: μ=μ= mean number of hours per week all customers use the Internet
Explanation: The parameter of interest refers to the population mean, not the sample mean. μμ represents the average for all customers, which is what the study is trying to estimate.
Answer: xˉ=7.2xˉ=7.2 hours
Explanation: xˉxˉ represents the sample mean, which is the best estimate of the population mean μμ. Since the sample average is 7.2 hours, we write it as xˉ=7.2xˉ=7.2.
answer and explain
ChatGPT said:
Answer: 6.322 hours to 8.082 hours
Explanation:
A 95% confidence interval from a bootstrap distribution uses the 2.5th and 97.5th percentiles.
From the table:
- 2.5% = 6.322
- 97.5% = 8.082
So the 95% CI is 6.322 to 8.082 hours.
Boostrap distribution created by…
The bootstrap distribution is created by:
Resampling the original sample with replacement many times (e.g. 1,000+ samples). Calculating the sample mean for each resample. Plotting those means to form a distribution — this is the bootstrap distribution.
It estimates the sampling distribution when the population data is unknown.
Answer: 6.438 hours to 7.947 hours
Explanation:
A 90% confidence interval from a bootstrap distribution uses the 5th and 95th percentiles.
From the table:
5% = 6.438 95% = 7.947 So the 90% CI is 6.438 to 7.947 hours.
ChatGPT said:
Great question! Here’s why:
A 90% confidence interval means you’re capturing the middle 90% of the distribution. That leaves 5% in each tail (totaling the 10% you’re excluding).
So:
Lower bound = 5th percentile Upper bound = 95th percentile
That’s why the 90% bootstrap CI is from the 5th to the 95th percentile.
How to calculate a boostrap
Bootstrap samples are calculated through this process:
Start with your original sample of data. Randomly sample from it with replacement (same size as the original sample). Calculate a statistic (like the mean) from this resample. Repeat steps 2–3 many times (often 1,000+). The distribution of all those statistics is your bootstrap distribution.
You then use percentiles from this distribution to build confidence intervals.
Absolutely! Here are the steps:
Take a random sample from a population (this is your original sample). Randomly resample with replacement from that original sample (same size). Calculate the statistic of interest (e.g., mean) from the resample. Repeat steps 2–3 many times (e.g., 1,000 times). Use the distribution of those statistics (bootstrap distribution) to estimate uncertainty, like confidence intervals.
the probability, when the null hypothesis is true, ofobtaining a sample as extreme as (or more extreme than) the observed sample
Sample D
Answer: H0:μ1=μ2H0:μ1=μ2 vs HA:μ1>μ2HA:μ1>μ2
Explanation:
The question asks if sales are higher in group 1 (approached by salespeople), so the alternative hypothesis is μ1>μ2μ1>μ2.
The null always assumes no difference: μ1=μ2μ1=μ2.
Answer: H0:μ=50H0:μ=50 vs HA:μ>50HA:μ>50
Explanation:
The research question asks if the average age is greater than 50, so the alternative hypothesis is μ>50μ>50.
The null hypothesis assumes no difference: μ=50μ=50.
Answer: 0.8
Explanation:
In a randomization test, the distribution is centered at the null hypothesis value, which is p=0.8p=0.8 in this case. This allows comparison of the observed result to what’s expected if H0H0 is true.
Answer: 98 because that is the size of the original sample
Explanation:
Randomization samples must match the original sample size to properly simulate what could happen under the null hypothesis. Here, that’s 98 students.
Answer: p^
Explanation:
Since the hypotheses involve the population proportion pp, the sample statistic used for simulation is the sample proportion p^p^. This is what you’d calculate for each simulated sample to build the randomization distribution
Answer: xˉ1=18.3,xˉ2=15.0xˉ1=18.3,xˉ2=15.0
Explanation:
The test statistic is D=xˉ1−xˉ2D=xˉ1−xˉ2.
First option: 17.4−18.6=−1.217.4−18.6=−1.2 Second option: 18.3−15.0=3.318.3−15.0=3.3
Since 3.3 is farther from 0 (the center of the null distribution) than -1.2, it provides stronger evidence against H0H0.
Rejecting the null hypothesis when the null hypothesis is true
What does it mean when you reject a null hypothesis? What does it mean when you accept it?
Rejecting the null hypothesis means there is enough evidence to support the alternative hypothesis — the effect or difference is likely real.
Accepting (or failing to reject) the null means there is not enough evidence to support a difference — but it doesn’t prove the null is true.
Decreasing the significance level of a hypothesis test (say, from 5-1%) will cause the p-value of an observed test statistic to… in/decrease/stay the same
Stay the same;
Explanation:
The p-value is calculated from the data and does not depend on the chosen significance level (αα).
Changing αα affects the decision (reject or not), not the p-value itself.
What is a significance level?
The significance level (denoted αα) is the threshold for deciding whether to reject the null hypothesis.
Common values:
α=0.05α=0.05 (5%) α=0.01α=0.01 (1%)
It represents the maximum probability of making a Type I error — rejecting a true null hypothesis.
Answer: Reject H0H
Explanation:
The p-value (0.0005) is much smaller than the significance level (0.05), so the result is statistically significant. This means there’s strong evidence against the null hypothesis.
Answer: Parameter: μ=μ= average gas price in Illinois. Hypotheses: H0:μ=2.18H0:μ=2.18 vs HA:μ>2.18HA:μ>2.18
Explanation:
The question is testing whether Illinois gas prices are higher than the national average, so the alternative hypothesis is greater than. The null always assumes no difference: μ=2.18μ=2.18.
Answer: Parameters: μ1=μ1= mean number of hours of television per week for male students and μ2=μ2= mean number of hours for female students. Hypotheses: H0:μ1=μ2H0:μ1=μ2 vs HA:μ1>μ2HA:μ1>μ2
Explanation:
The research question asks if male students watch more TV than female students, so the alternative hypothesis should be μ1>μ2μ1>μ2. The null assumes no difference: μ1=μ2μ1=μ2.
Answer: This sample provides no evidence that male students watch more television each week than female students at this university.
Explanation:
The observed difference is xˉ1−xˉ2=6−4=2xˉ1−xˉ2=6−4=2, but on the randomization plot, a difference of 2 or more is not unusual — it’s within the spread of the null distribution. So, the p-value is not small, and we fail to reject H0H0.