CHAPTER 7 Over-Comparing, Under-Reporting Flashcards
What is the consequence of analysts making numerous comparisons but only reporting statistically significant ones?
There will be lots of false positive results and over-estimates.
What is p-hacking?
A form of nefarious researcher behavior leading to false positives.
What is p-screening?
A situation where entirely honest researchers also contribute to false positives.
What tools do analysts and consumers have to reduce misleading results?
There are some tools at their disposal, though no easy solution exists.
What creature was known for predicting the outcomes of soccer matches?
Paul the Octopus.
How did Paul the Octopus predict match outcomes?
By choosing between two boxes of food marked with the flags of competing countries.
How many predictions did Paul make, and how many did he get right?
Paul made 14 predictions and was correct in 12 of them.
What is the null hypothesis in the context of Paul’s predictions?
That Paul was picking in a completely random fashion.
How can we calculate the probability of Paul guessing correctly?
By calculating the likelihood of getting exactly 12, 13, or 14 correct predictions.
What is the probability of Paul getting at least 12 correct predictions if he was guessing randomly?
Approximately 1 in 155.
What does a p-value represent in hypothesis testing?
The probability of observing an outcome at least as extreme as the one observed if the null hypothesis is true.
What was the adjusted probability of Paul getting 11 or more predictions right if he was predisposed to pick Germany?
About 0.03 or 1 in 33.
Why should we be skeptical of Paul’s predictive abilities?
Because he was primarily predicting games involving Germany, which he favored.
What is the significance of having multiple octopuses making predictions?
It raises the likelihood that at least one would achieve a record similar to Paul’s by chance.
What is the probability that at least one of ten octopuses generates a p-value as good as Paul’s?
About 1 in 4.
Name some other animals that forecasted soccer match winners around the same time as Paul.
- Leon the Porcupine
- Petty the Hippopotamus
- Anton the Tamarin
- Mani the Parakeet
What is the implication of many animals making predictions?
Many could be celebrated for their predictions, even if their success was due to chance.
Fill in the blank: The term for the product of n and every positive whole number less than n is called _______.
factorial.
What is the issue with only reporting statistically significant results?
It leads to publication bias, where the true effects are systematically overestimated
This occurs because only the results that reject the null hypothesis are published.
What does the term ‘publication bias’ refer to?
The phenomenon where only statistically significant results are reported, leading to a distorted understanding of research findings
This bias can occur even when all studies are well-designed.
What is p-hacking?
The practice of manipulating data or statistical tests until a desired p-value is achieved
This can involve tweaking experiments or trying different statistical models.
What is p-screening?
The practice of not publishing studies with p-values above a certain threshold, leading to under-reporting of null results
This can occur even when researchers act honestly.
How does over-comparing contribute to publication bias?
It increases the likelihood of finding statistically significant results purely by chance, which are then reported while null results are ignored
This can happen when numerous hypotheses are tested without proper correction.
True or False: All scientific results are published regardless of their significance.
False
Statistically insignificant results are often not published, leading to biased literature.
What is the file drawer problem?
The tendency for studies with null results to remain unpublished, leading to a lack of awareness about such findings
This contributes to publication bias in scientific literature.
What happens to the true estimand when only statistically significant results are reported?
The reported estimates systematically overestimate the true estimand
This occurs even if the original estimates are unbiased.
What is the impact of noise on scientific estimates?
Noise can cause estimates to differ from the true quantity of interest, affecting the validity of conclusions drawn from them
This is particularly relevant in studies with small sample sizes.
Fill in the blank: The act of testing many different outcomes in a study can lead to _______.
over-comparing
This increases the chances of finding a statistically significant result by chance.
What should researchers do to avoid p-hacking?
Adhere to pre-registered study designs and avoid manipulating data post-hoc
Transparency in research practices is crucial.
What is the consequence of publication bias on scientific knowledge?
It undermines the reliability of scientific consensus and leads to the belief that many accepted facts may be false
This has caused concern among scientists about the integrity of their fields.
What did Daryl Bem’s 2010 study claim?
It claimed that human beings possess extrasensory perception (ESP)
The study was controversial and sparked debates about the validity of such claims.
Why is it difficult to accumulate knowledge in a field affected by publication bias?
Because over-comparing and under-reporting distort the average of published estimates, making it hard to get close to the true estimand
This can lead to a misrepresentation of the consensus in scientific literature.
What is the relationship between noise and the true estimand?
Noise can cause estimates to deviate from the true estimand, complicating the interpretation of results
This can lead to erroneous conclusions if not accounted for.
True or False: Publication bias affects only the results of individual studies.
False
It affects the overall distribution of estimates in scientific literature.
What does the term ‘estimand’ refer to?
The true quantity of interest that a study aims to estimate
Understanding the estimand is crucial for interpreting research findings.
What is the significance of Daryl Bem’s 2010 study?
It claimed that human beings have extrasensory perception (ESP) and reported statistically significant evidence that subjects could predict the location of hidden objects better than chance.
What is publication bias?
It occurs when studies with statistically significant results are more likely to be published than those without significant results.
Define p-hacking.
The practice of manipulating data or statistical analyses to obtain a statistically significant result.
What did Bem’s study find specifically about the type of objects involved in ESP?
Evidence of ESP was only found when the objects were erotic in nature.
What was the initial response of the psychological community to Bem’s findings?
The community remained skeptical and several follow-up studies failed to replicate the findings.
True or False: The Journal of Personality and Social Psychology initially published replication studies of Bem’s claim.
False.
What was the average estimated effect of get-out-the-vote interventions according to published studies?
About a 3.5 percentage point increase in voter turnout.
What was the actual average effect of get-out-the-vote interventions found by Green, McGrath, and Aronow?
Half a percentage point.
What does the distribution of p-values help to assess?
It helps to diagnose whether p-hacking has occurred in a body of literature.
In which case would we expect a uniform distribution of p-values?
When there is no real relationship in the world and no p-hacking.
What does it suggest if a literature shows more low p-values than high p-values?
It suggests that the literature is detecting a real relationship in the world.
What are some signs of p-hacking identified by Simonsohn, Nelson, and Simmons?
- Excluded
- Transformed
What is one proposed solution to reduce publication bias?
Reduce the significance threshold for p-values from .05 to a lower value.
What is a potential downside to lowering the significance threshold?
It could increase incentives for p-hacking by making statistically significant results rarer and more valuable.
What does a significant threshold of .005 mean for false positives?
It means fewer false positives but at the cost of more false negatives.
What is the consequence of lowering the significance threshold?
It might increase incentives for p-hacking and make statistically significant results rarer and more valuable
Lowering the threshold could lead to complacency in critical analysis.
What is the trade-off when using a significance threshold of .005?
It results in fewer false positives at the cost of more false negatives
False positives are rejecting the null hypothesis when it is true, while false negatives are failing to reject the null hypothesis when it is false.
How can p-values be adjusted?
By correcting for the number of tests run
This helps in better assessing the state of the evidence.
What is a limitation of simple p-value corrections?
They only work if the tests are truly independent
Related tests may require more complex adjustments.
What does the threshold of .05 represent in statistical testing?
An arbitrary number for determining statistical significance
It may not reflect the substantive importance of effects.
What is pre-registration in research?
A commitment to test specific hypotheses before seeing the data
It helps prevent over-comparing and under-reporting.
What was the NHLBI’s requirement for clinical trials?
Developers must pre-register the goals of the drug or supplement
Success is only declared if there is a statistically significant effect on the pre-registered outcome.
What was the success rate of clinical trials after pre-registration according to Kaplan and Irvin’s study?
It dropped from 57 percent to 8 percent
This indicates many prior successes were likely due to over-comparing.
What is replication in research?
Reassessing an estimated effect using new, independently generated data
It helps verify the genuineness of findings.
How does the probability of finding a false positive change with replication?
It decreases with each independent replication
Multiple replications reduce the likelihood of spurious conclusions.
What is the significance of testing additional hypotheses related to a finding?
It helps assess the underlying mechanisms and validity of the original claim
This method can provide insights even when direct replication isn’t possible.
What should raise concerns about a study’s findings?
If the study would not have been published had the opposite result been found
This indicates potential issues with over-comparing and under-reporting.
What is the power pose hypothesis?
Adopting a power pose influences attitudes and behaviors
The study’s underlying science is disputed and replication attempts have failed.
What broader issue does the story of Paul the Octopus illustrate?
The challenges of over-comparing and under-reporting extend beyond science
This issue can affect everyday decision-making and consumer behavior.
What is p-hacking?
Searching over lots of different ways to run an experiment, make a comparison, or specify a statistical model until you find one that yields a statistically significant result and then only reporting that one.
What is publication bias?
The phenomenon whereby published results are systematically over-estimates because there is a bias toward publishing statistically significant results.
What is p-screening?
A social process whereby a community of researchers, through its publication standards, screens out studies with p-values above some threshold, giving rise to publication bias.
How can over-comparing and under-reporting affect scientific findings?
They create deep challenges for the scientific community, leading to potentially misleading interpretations of data.
What does the efficient-market hypothesis suggest?
No fund or investment strategy should be able to systematically beat the market average over the long run.
What is the probability of an investor beating the market 15 years in a row by chance?
About 1 in 30,000.
True or False: Bill Miller’s success as a fund manager can be solely attributed to his investment strategies.
False.
What should you consider when assessing the validity of a claim about an investment strategy’s success?
Whether the comparison made is the natural one or if it was chosen to make the strategy look better.
Fill in the blank: One of the main reasons to be cautious about superstars in finance is that their success may be due to _______.
good luck.
What happened to Bill Miller’s fund after his streak of success?
His fund lost 55 percent of its value during the 2008 financial crisis and continued to trail the market for several more years.
What is one practice that can help mitigate the problem of over-comparing and under-reporting?
Thinking clearly about the naturalness of comparisons made.
What is the significance of pre-registration in a study?
It helps in assessing the confidence in the findings by outlining expected outcomes beforehand.
What might indicate that a study’s results are unreliable?
If the primary outcome of interest was revised during the study.
What are some potential measures of prior exposure to Trump in the 2016 U.S. presidential election?
Watching The Apprentice, watching Home Alone 2, or both.
Why should investors be skeptical of claims made by successful fund managers?
Due to the sheer number of traders and funds, exceptional track records may arise by chance.
What is a common outcome of publication bias in scientific literature?
Misleadingly high estimates of effect sizes due to the preference for statistically significant findings.
What does the term ‘superstars’ refer to in the context of finance?
Individuals who achieve remarkable success, often leading to misleading inferences about their abilities.
Fill in the blank: The tendency of scientific estimates to shrink over time is explained by _______.
reversion to the mean.
What is the primary focus of the analysis mentioned in the text?
Finding statistically significant relationships suggesting that prior exposure to Trump corresponded to political behaviors in the 2016 presidential election.
What types of prior Trump exposure can be tested?
- Having seen The Apprentice
- Having seen Home Alone 2
- Both shows
What political behaviors can be measured as outcomes?
- Support for Trump
- Support for Hillary Clinton
- Voter turnout in 2016
What demographic subgroups can be analyzed for voter behavior?
- Women
- Blacks
- Southerners
- Rich
- Young
What was revealed about the variables related to the respondents’ exposure to Trump?
The variables were made up and generated completely at random.
What should be considered when interpreting the relationships found between variables and political behavior?
Consider if the relationships would hold with new data from another set of respondents.
What is a concern regarding academic studies as mentioned in the text?
Problems of over-comparing and under-reporting.
What are some suggested actions for authors to address concerns of over-comparing?
- Disclose additional information
- Conduct additional analyses
What is p-hacking?
Undisclosed flexibility in data collection and analysis that allows presenting anything as significant.
Who coined the term p-hacking?
Joseph Simmons, Leif Nelson, and Uri Simonsohn.
What is the significance of the 2016 Cooperative Congressional Election Study in this context?
It provided real survey data for analysis.
What do the authors suggest regarding statistical significance?
Lowering the threshold for statistical significance.
What is the purpose of pre-registration in research?
To mitigate biases in data analysis and reporting.
What was the main finding of the power posing studies?
Initial claims of effects on hormones and risk tolerance were not replicated.
What is a potential outcome of observing irrelevant events on voter behavior?
Voters’ evaluations of government performance may be affected.
True or False: The authors initially provided accurate data about respondents’ media exposure.
False
Fill in the blank: The probability that one investor gets 15 years in a row right is _______.
[specific probability value not provided in text]
What is the significance of the study by Kaplan and Irvin published in 2015?
It discussed the likelihood of null effects in large NHLBI clinical trials over time.
What is a common issue in observational research mentioned in the text?
False-positive results.