Module 5 - Manley Ch 6 - Generalizations Flashcards
What is a statistical inference?
Using specific observations as evidence for general claims about a larger group, or vice versa.
What is the difference between a statistical generalization and a statistical instantiation?
Generalization: Moves from a sample to conclusions about the whole population.
Instantiation: Moves from known facts about the population to conclusions about a sample.
Why is forming appropriate generalizations difficult?
It requires careful sampling, avoiding biases, and understanding probability, which is why the field of statistics exists.
When does a sample provide strong evidence for a hypothesis?
When the likelihood of observing the sample is much greater if the hypothesis is true than if it is false.
What factors weaken the reliability of a sample?
Small sample size
Sampling bias (e.g., non-random or convenience sampling)
What is sampling bias?
A selection effect where the method of choosing the sample skews the results, making them unrepresentative of the population.
How can experiments be designed to reduce sampling bias?
By ensuring the sample is random and representative of the population.
Why is a larger sample size important?
It increases the likelihood that the sample’s proportions closely resemble those of the whole population.
What is the law of large numbers?
The principle that larger samples tend to reflect the true distribution of the population more accurately.
Why do smaller samples often have more extreme proportions?
Because random variation has a larger effect on smaller groups, leading to greater deviations from the population average.
Why might small counties show extreme rates of a disease compared to larger ones?
Small samples are more prone to random fluctuations, leading to unusually high or low rates.
Why are the hospitals or schools with the “best” or “worst” rates often small ones?
Small sample sizes magnify the effects of random variation, making extreme outcomes more likely.
What should you consider when interpreting extreme results in small samples?
Whether the results could be explained by random variation rather than true differences.
How do you test the strength of evidence?
By comparing the likelihood of observing the evidence if the hypothesis is true versus if it is false.
Why is it important to design experiments where evidence strongly distinguishes between hypotheses?
To ensure that the observations are far more likely under one hypothesis than the other, making the evidence meaningful.
What is stratified sampling?
A method of sampling where the sample is divided into subgroups (strata) that match the population’s proportions for specific characteristics.
Why does sample size matter in statistical generalizations?
Larger samples are more likely to reflect the true characteristics of the population and reduce the margin of error.
What does a 95% confidence interval mean?
If the true population value lies outside the interval, the observed result would occur only 5% of the time.
Why is random sampling important?
It minimizes selection effects and ensures that the sample is representative of the population.
What are some common issues with non-random sampling methods?
Oversampling certain groups (e.g., cars on low-traffic roads).
Failing to account for relevant subgroups.
Bias from convenience sampling.
How does stratified sampling reduce bias?
By ensuring the sample mirrors the population in terms of characteristics like age, gender, or car type.
Why does stratification not replace randomization?
There may be unconsidered subgroups or hidden biases, so random sampling within strata is still essential.
What is participation bias in surveys?
When certain groups are more likely to participate than others, skewing the sample.
How can offering cash incentives for survey participation still lead to bias?
The fixed amount may motivate some groups more than others, leaving the sample unrepresentative.
What is response bias?
When participants give answers that are socially acceptable, avoid embarrassment, or reflect ignorance.
How can the wording of a survey question introduce bias?
Different terms or phrasings can evoke different emotional responses, leading to skewed results (e.g., “death tax” vs. “estate tax”).
What should you do if you cannot eliminate all selection effects in sampling?
Use randomization and stratification to minimize bias as much as possible.
Why are voluntary surveys particularly prone to bias?
They attract participants with strong opinions, skewing results toward those who care most about the topic.
What are the two key questions we must answer when summarizing statistical data?
What features of the data are most important to us?
What’s the clearest way to present those features?
Why can statistical summaries be misleading?
They omit some facts and can be selectively presented to make true but misleading claims.
What are the three main measures of central tendency?
Mean, median, and mode.
When is the mean most useful?
When calculating the total of a quantitative feature or when outliers do not significantly skew the data.
Why might the median be preferred over the mean?
The median resists being skewed by outliers and better reflects the “typical” value in some cases.
What is the mode?
The value that appears most frequently in a dataset.
What is an outlier?
A data point that is significantly different from other values in the dataset.
How does a truncated mean handle outliers?
It calculates the mean after excluding extreme outliers.
Why is visualizing the shape of data important?
It helps identify patterns, distributions, and inequalities that measures like the mean or median cannot capture.
What does the standard deviation measure?
The average distance of data points from the mean, giving a sense of variability in the dataset.
What is cherry-picking data, and why is it misleading?
Selecting specific data points to create a false impression of a trend while ignoring the full dataset.
What are loose generalizations?
Vague or unclear generalizations, often expressed as “Most Fs are Gs” or “Many Fs are Gs,” without clear evidence or definition.
What is a stereotype?
A widely held loose generalization about a social group, often influenced by in-group bias.
Why are loose generalizations problematic?
They can smuggle false or misleading ideas under the guise of truth and are often vague enough to evade scrutiny.
How can even true generalizations be misleading?
They might imply a causal or explanatory relationship where none exists, confusing the true cause of a phenomenon.
Give an example of a true but misleading generalization.
“Older women are dangerous drivers.” While true on average, it misrepresents the real cause: visual and cognitive decline that affects some seniors, regardless of gender.
What is the representativeness heuristic?
A cognitive shortcut where people estimate probabilities based on how strongly two features are associated in their minds, rather than actual statistical relationships.
What is an example of the representativeness heuristic in action?
Judging “Linda is a bank teller and active in the feminist movement” as more probable than “Linda is a bank teller,” despite it being statistically impossible.
What is base rate neglect?
Ignoring the general prevalence of events or conditions when assessing probabilities.
How does base rate neglect affect decision-making?
It leads to errors by focusing on specific details or similarities while ignoring how common each possibility is overall.
How can we make better generalizations?
Ensure clarity by defining terms and specifying proportions.
Avoid causal implications unless they are supported by evidence.
Consider base rates and broader context before drawing conclusions.
Why is clarity important in generalizations?
It prevents vague or misleading interpretations and forces accountability for claims.