CHAPTER 6 Samples, Uncertainty, and Statistical Inference Flashcards
What are the three terms that all quantitative estimates consist of?
The true quantity of interest, bias, and noise.
What does statistical hypothesis testing allow analysts to assess?
Whether an estimate was likely to have arisen from noise.
True or False: Statistical significance and substantive significance are the same.
False.
What is the purpose of using relationships between variables in a sample?
To make inferences about relationships in the larger population.
What is the term for the true quantity of interest in a population?
Estimand.
What is an estimate?
The number we get as a result of our analysis.
What symbol is commonly used to denote an estimate?
A letter with a hat over it (e.g., q-hat).
What are the two reasons an estimate can differ from the estimand?
- Bias
- Noise
Fill in the blank: Bias refers to errors that occur for ______ reasons.
systematic
Fill in the blank: Noise refers to errors that occur because of ______.
chance
What analogy is used to explain the difference between bias and noise?
Curling.
What is bias in the context of estimators?
A systematic error that causes estimates to differ from the estimand.
What does it mean for an estimator to be unbiased?
The average value of the estimates it generates equals the estimand.
What is sampling variation?
Natural variability that results from sampling.
What is the desired quality of a good estimator?
It should be both unbiased and precise.
What happens to estimates if the estimator is unbiased but imprecise?
Estimates will typically differ from the estimand because of noise.
What occurs if an estimator is biased but precise?
Estimates will differ from the estimand because they are estimating the wrong quantity.
What is the relationship between bias and precision in estimators?
There can be trade-offs between bias and precision.
What do gray dots represent in the illustration of estimators?
Various estimates from repeated applications of a given estimator.
What do black diamonds represent in the illustration of estimators?
The estimand— the true value in the world we are interested in.
Fill in the blank: An estimator that yields similar estimates with each iteration is considered ______.
precise
What can lead to bias in political polling?
- Voters systematically lie to pollsters
- Different turnout rates between parties
- Differences in who is contacted by pollsters
What does it mean if an estimator is precise?
It means there is very little noise, yielding similar estimates with each iteration.
What is the relationship between bias and precision in estimators?
There are trade-offs between bias and precision; sometimes a certain amount of bias is acceptable for a gain in precision.
What is a common example of trade-offs between bias and precision?
Polling, where a larger convenience sample may be more precise but biased, while a smaller professional sample may be less biased but less precise.
What is the standard error?
The standard deviation of the sampling distribution, quantifying the precision of an estimator.
What does a large standard error indicate?
Estimates are spread out and the estimator is relatively imprecise.
What does a small standard error indicate?
Estimates are close together and the estimator is relatively precise.
What factors affect the standard error in polling?
- Sample size (N)
- True proportion (q) of the population
How does sample size affect standard error?
As sample size increases, standard error decreases.
What is the relationship between true proportion (q) and standard error?
Standard error is minimized when q is very large or very small, as this reduces sampling error.
What is the implication of diminishing returns in sample size?
Increasing the sample size by a factor of 10 improves precision by approximately threefold.
What can lead to misleading conclusions regarding standard error?
Small sample sizes or extreme values of q can produce misleading approximations.
Why are small towns often overrepresented in lists of extreme cancer rates?
Small sample sizes lead to less precision and more variability in estimates.
What is a confidence interval?
A range that estimates the true value with a specified level of confidence, often 95%.
What does the 95% confidence interval represent?
If repeated infinitely, the true estimand would lie within the interval 95% of the time.
How does the width of a confidence interval relate to the confidence level?
A higher confidence level (e.g., 99%) results in a wider confidence interval compared to a lower level (e.g., 95%).
What is the Law of Large Numbers?
As sample size increases, the noise in estimates decreases.
What is the Central Limit Theorem?
For unbiased polls, approximately 95% of estimates will be within 2 standard errors of the true population value.
What is the margin of error in polling?
Twice the standard error.
What is the relationship between the 95% and 99% confidence intervals?
The 99% confidence interval is wider than the 95% confidence interval.
What is the primary question of statistical inference?
How do we make inferences about populations using estimates from samples?
What does hypothesis testing assess?
Whether a particular hypothesis is reasonable based on sample data.
In hypothesis testing, what is the null hypothesis?
The assumption that there is no relationship or difference between groups.
What is a p-value?
The probability of obtaining a result at least as extreme as the observed result, assuming the null hypothesis is true.
What does it mean if a p-value is below a pre-specified threshold (e.g., .05)?
We reject the null hypothesis and conclude that there is statistically significant evidence for the alternative hypothesis.
True or False: A low p-value indicates that the null hypothesis is true.
False
What is the standard error?
A measure of the variability of an estimator, indicating how far the estimate is likely to be from the true parameter.
What is the purpose of constructing a confidence interval?
To estimate the range within which the true parameter is likely to fall.
What does it signify if a regression coefficient is statistically significant?
There is evidence that a relationship exists between the explanatory and outcome variables in the population.
What is the difference between estimand and estimate?
Estimand is the true parameter we want to know; estimate is the calculation derived from sample data.
Fill in the blank: Statistical hypothesis testing helps determine if an observed phenomenon is likely due to _______.
chance
What is substantive significance?
The practical importance of a result, beyond just its statistical significance.
What is the Central Limit Theorem’s implication for unbiased polls?
95% of estimates will fall within 2 standard errors of the truth.
What happens if we conduct hypothesis testing on a population with complete data?
We can still assess the likelihood of observing a result under the null hypothesis.
What is the significance of testing the null hypothesis that the true relationship between income and education is zero?
It helps determine if there is a statistically significant correlation between the two variables.
What can be a common error in interpreting p-values?
Assuming the p-value indicates the probability that the null hypothesis is true.
What do we need to consider when making inferences from sample data?
Both bias and noise.
True or False: Statistical significance guarantees a large effect size.
False
What is a common threshold for statistical significance?
.05
What does the standard error allow us to do with regression estimates?
Construct confidence intervals and conduct hypothesis tests.
Fill in the blank: Statistical inference can help identify if observed relationships are genuine or simply due to _______.
noise
What can statistical inference tell us even with complete population data?
Whether observed differences are likely due to chance.
What is the question of substantive significance?
It asks how much effect marketing has on sales, rather than just whether there is an effect.
What was the main finding of the 2012 study published in Nature regarding Facebook and voting?
People were more likely to vote if their Facebook pages displayed a banner indicating which of their friends voted.
What was the estimated effect of Facebook banners on voter turnout?
Less than 0.4 percentage points.
What do the researchers conclude about strong ties in social networks?
They are instrumental for spreading both online and real-world behavior.
What does a large sample size allow researchers to do?
Detect genuine relationships more reliably.
What did Berlinski and Dewan find regarding the Second Reform Act of 1867?
There was little effect on election outcomes despite the doubling of the eligible electorate.
What does statistical insignificance imply about the Reform Act’s effects?
It does not mean the effects were non-existent; estimates suggested a large effect despite imprecision.
What two reasons can cause estimates to differ from estimands?
- Bias
- Noise
What is bias in the context of estimates?
Differences between the estimand and estimate that arise for systematic reasons.
What is noise?
Differences between the estimand and estimate that arise due to idiosyncratic facts about the sample.
What is unbiasedness in estimation?
An estimate is unbiased if the average of repeated estimates equals the estimand.
What is the expected value?
The average value of an infinite number of draws of a variable.
Define precision in the context of estimation.
An estimate is precise if repeated estimations yield similar results.
What is a sampling distribution?
The distribution of estimates from repeated applications of an estimator on new samples.
What is the standard error?
The standard deviation of the sampling distribution.
What is the margin of error in polling?
Typically, the standard error multiplied by 2.
What does a 95% confidence interval indicate?
The estimand would be contained in the interval 95% of the time if the estimation procedure is repeated.
What is hypothesis testing?
Statistical techniques for assessing confidence that a data feature reflects a real feature rather than noise.
What is the null hypothesis?
The hypothesis that a feature of the data is entirely the result of noise.
What is statistical significance?
Evidence for a hypothesis when the null hypothesis can be rejected at a pre-specified level of confidence.
What does a p-value represent?
The probability of finding a relationship as strong as or stronger than the observed relationship if the null hypothesis is true.
True or False: A p-value indicates the probability that the null hypothesis is true.
False.
What is the role of noise in statistical studies?
It creates uncertainty and can lead to statistically significant results that are not indicative of real relationships.
What can happen if only statistically significant findings are reported?
It may lead to systematically incorrect conclusions.
What phenomenon can noise create that leads to misinterpretations?
Reversion to the mean.
What is the population in statistical terms?
The units in the world we are trying to learn about.
What is a sample?
A subset of the population for which we have data.
What is an estimand?
The unobserved quantity we are trying to learn about with our data analysis.
What is an estimator?
The procedure applied to data to generate a numerical result.
What is an estimate?
The numerical result from applying an estimator to a specific set of data.