CHAPTER 7 Over-Comparing, Under-Reporting Flashcards

Question 1

Q

What is the consequence of analysts making numerous comparisons but only reporting statistically significant ones?

Answer

A

There will be lots of false positive results and over-estimates.

Question 2

Q

What is p-hacking?

Answer

A

A form of nefarious researcher behavior leading to false positives.

Question 3

Q

What is p-screening?

Answer

A

A situation where entirely honest researchers also contribute to false positives.

Question 4

Q

What tools do analysts and consumers have to reduce misleading results?

Answer

A

There are some tools at their disposal, though no easy solution exists.

Question 5

Q

What creature was known for predicting the outcomes of soccer matches?

Answer

A

Paul the Octopus.

Question 6

Q

How did Paul the Octopus predict match outcomes?

Answer

A

By choosing between two boxes of food marked with the flags of competing countries.

Question 7

Q

How many predictions did Paul make, and how many did he get right?

Answer

A

Paul made 14 predictions and was correct in 12 of them.

Question 8

Q

What is the null hypothesis in the context of Paul’s predictions?

Answer

A

That Paul was picking in a completely random fashion.

Question 9

Q

How can we calculate the probability of Paul guessing correctly?

Answer

A

By calculating the likelihood of getting exactly 12, 13, or 14 correct predictions.

Question 10

Q

What is the probability of Paul getting at least 12 correct predictions if he was guessing randomly?

Answer

A

Approximately 1 in 155.

Question 11

Q

What does a p-value represent in hypothesis testing?

Answer

A

The probability of observing an outcome at least as extreme as the one observed if the null hypothesis is true.

Question 12

Q

What was the adjusted probability of Paul getting 11 or more predictions right if he was predisposed to pick Germany?

Answer

A

About 0.03 or 1 in 33.

Question 13

Q

Why should we be skeptical of Paul’s predictive abilities?

Answer

A

Because he was primarily predicting games involving Germany, which he favored.

Question 14

Q

What is the significance of having multiple octopuses making predictions?

Answer

A

It raises the likelihood that at least one would achieve a record similar to Paul’s by chance.

Question 15

Q

What is the probability that at least one of ten octopuses generates a p-value as good as Paul’s?

Answer

A

About 1 in 4.

Question 16

Q

Name some other animals that forecasted soccer match winners around the same time as Paul.

Answer

A

Leon the Porcupine
Petty the Hippopotamus
Anton the Tamarin
Mani the Parakeet

Question 17

Q

What is the implication of many animals making predictions?

Answer

A

Many could be celebrated for their predictions, even if their success was due to chance.

Question 18

Q

Fill in the blank: The term for the product of n and every positive whole number less than n is called _______.

Answer

A

factorial.

Question 19

Q

What is the issue with only reporting statistically significant results?

Answer

A

It leads to publication bias, where the true effects are systematically overestimated

This occurs because only the results that reject the null hypothesis are published.

Question 20

Q

What does the term ‘publication bias’ refer to?

Answer

A

The phenomenon where only statistically significant results are reported, leading to a distorted understanding of research findings

This bias can occur even when all studies are well-designed.

Question 21

Q

What is p-hacking?

Answer

A

The practice of manipulating data or statistical tests until a desired p-value is achieved

This can involve tweaking experiments or trying different statistical models.

Question 22

Q

What is p-screening?

Answer

A

The practice of not publishing studies with p-values above a certain threshold, leading to under-reporting of null results

This can occur even when researchers act honestly.

Question 23

Q

How does over-comparing contribute to publication bias?

Answer

A

It increases the likelihood of finding statistically significant results purely by chance, which are then reported while null results are ignored

This can happen when numerous hypotheses are tested without proper correction.

Question 24

Q

True or False: All scientific results are published regardless of their significance.

Answer

A

False

Statistically insignificant results are often not published, leading to biased literature.

Question 25

Q

What is the file drawer problem?

Answer

A

The tendency for studies with null results to remain unpublished, leading to a lack of awareness about such findings

This contributes to publication bias in scientific literature.

Question 26

Q

What happens to the true estimand when only statistically significant results are reported?

Answer

A

The reported estimates systematically overestimate the true estimand

This occurs even if the original estimates are unbiased.

Question 27

Q

What is the impact of noise on scientific estimates?

Answer

A

Noise can cause estimates to differ from the true quantity of interest, affecting the validity of conclusions drawn from them

This is particularly relevant in studies with small sample sizes.

Question 28

Q

Fill in the blank: The act of testing many different outcomes in a study can lead to _______.

Answer

A

over-comparing

This increases the chances of finding a statistically significant result by chance.

Question 29

Q

What should researchers do to avoid p-hacking?

Answer

A

Adhere to pre-registered study designs and avoid manipulating data post-hoc

Transparency in research practices is crucial.

Question 30

Q

What is the consequence of publication bias on scientific knowledge?

Answer

A

It undermines the reliability of scientific consensus and leads to the belief that many accepted facts may be false

This has caused concern among scientists about the integrity of their fields.

Question 31

Q

What did Daryl Bem’s 2010 study claim?

Answer

A

It claimed that human beings possess extrasensory perception (ESP)

The study was controversial and sparked debates about the validity of such claims.

Question 32

Q

Why is it difficult to accumulate knowledge in a field affected by publication bias?

Answer

A

Because over-comparing and under-reporting distort the average of published estimates, making it hard to get close to the true estimand

This can lead to a misrepresentation of the consensus in scientific literature.

Question 33

Q

What is the relationship between noise and the true estimand?

Answer

A

Noise can cause estimates to deviate from the true estimand, complicating the interpretation of results

This can lead to erroneous conclusions if not accounted for.

Question 34

Q

True or False: Publication bias affects only the results of individual studies.

Answer

A

False

It affects the overall distribution of estimates in scientific literature.

Question 35

Q

What does the term ‘estimand’ refer to?

Answer

A

The true quantity of interest that a study aims to estimate

Understanding the estimand is crucial for interpreting research findings.

Question 36

Q

What is the significance of Daryl Bem’s 2010 study?

Answer

A

It claimed that human beings have extrasensory perception (ESP) and reported statistically significant evidence that subjects could predict the location of hidden objects better than chance.

Question 37

Q

What is publication bias?

Answer

A

It occurs when studies with statistically significant results are more likely to be published than those without significant results.

Question 38

Q

Define p-hacking.

Answer

A

The practice of manipulating data or statistical analyses to obtain a statistically significant result.

Question 39

Q

What did Bem’s study find specifically about the type of objects involved in ESP?

Answer

A

Evidence of ESP was only found when the objects were erotic in nature.

Question 40

Q

What was the initial response of the psychological community to Bem’s findings?

Answer

A

The community remained skeptical and several follow-up studies failed to replicate the findings.

Question 41

Q

True or False: The Journal of Personality and Social Psychology initially published replication studies of Bem’s claim.

Question 42

Q

What was the average estimated effect of get-out-the-vote interventions according to published studies?

Answer

A

About a 3.5 percentage point increase in voter turnout.

Question 43

Q

What was the actual average effect of get-out-the-vote interventions found by Green, McGrath, and Aronow?

Answer

A

Half a percentage point.

Question 44

Q

What does the distribution of p-values help to assess?

Answer

A

It helps to diagnose whether p-hacking has occurred in a body of literature.

Question 45

Q

In which case would we expect a uniform distribution of p-values?

Answer

A

When there is no real relationship in the world and no p-hacking.

Question 46

Q

What does it suggest if a literature shows more low p-values than high p-values?

Answer

A

It suggests that the literature is detecting a real relationship in the world.

Question 47

Q

What are some signs of p-hacking identified by Simonsohn, Nelson, and Simmons?

Answer

A

Excluded
Transformed

Question 48

Q

What is one proposed solution to reduce publication bias?

Answer

A

Reduce the significance threshold for p-values from .05 to a lower value.

Question 49

Q

What is a potential downside to lowering the significance threshold?

Answer

A

It could increase incentives for p-hacking by making statistically significant results rarer and more valuable.

Question 50

Q

What does a significant threshold of .005 mean for false positives?

Answer

A

It means fewer false positives but at the cost of more false negatives.

Question 51

Q

What is the consequence of lowering the significance threshold?

Answer

A

It might increase incentives for p-hacking and make statistically significant results rarer and more valuable

Lowering the threshold could lead to complacency in critical analysis.

Question 52

Q

What is the trade-off when using a significance threshold of .005?

Answer

A

It results in fewer false positives at the cost of more false negatives

False positives are rejecting the null hypothesis when it is true, while false negatives are failing to reject the null hypothesis when it is false.

Question 53

Q

How can p-values be adjusted?

Answer

A

By correcting for the number of tests run

This helps in better assessing the state of the evidence.

Question 54

Q

What is a limitation of simple p-value corrections?

Answer

A

They only work if the tests are truly independent

Related tests may require more complex adjustments.

Question 55

Q

What does the threshold of .05 represent in statistical testing?

Answer

A

An arbitrary number for determining statistical significance

It may not reflect the substantive importance of effects.

Question 56

Q

What is pre-registration in research?

Answer

A

A commitment to test specific hypotheses before seeing the data

It helps prevent over-comparing and under-reporting.

Question 57

Q

What was the NHLBI’s requirement for clinical trials?

Answer

A

Developers must pre-register the goals of the drug or supplement

Success is only declared if there is a statistically significant effect on the pre-registered outcome.

Question 58

Q

What was the success rate of clinical trials after pre-registration according to Kaplan and Irvin’s study?

Answer

A

It dropped from 57 percent to 8 percent

This indicates many prior successes were likely due to over-comparing.

Question 59

Q

What is replication in research?

Answer

A

Reassessing an estimated effect using new, independently generated data

It helps verify the genuineness of findings.

Question 60

Q

How does the probability of finding a false positive change with replication?

Answer

A

It decreases with each independent replication

Multiple replications reduce the likelihood of spurious conclusions.

Question 61

Q

What is the significance of testing additional hypotheses related to a finding?

Answer

A

It helps assess the underlying mechanisms and validity of the original claim

This method can provide insights even when direct replication isn’t possible.

Question 62

Q

What should raise concerns about a study’s findings?

Answer

A

If the study would not have been published had the opposite result been found

This indicates potential issues with over-comparing and under-reporting.

Question 63

Q

What is the power pose hypothesis?

Answer

A

Adopting a power pose influences attitudes and behaviors

The study’s underlying science is disputed and replication attempts have failed.

Question 64

Q

What broader issue does the story of Paul the Octopus illustrate?

Answer

A

The challenges of over-comparing and under-reporting extend beyond science

This issue can affect everyday decision-making and consumer behavior.

Answer 64

A

Searching over lots of different ways to run an experiment, make a comparison, or specify a statistical model until you find one that yields a statistically significant result and then only reporting that one.

Answer 65

A

The phenomenon whereby published results are systematically over-estimates because there is a bias toward publishing statistically significant results.

Answer 66

A

A social process whereby a community of researchers, through its publication standards, screens out studies with p-values above some threshold, giving rise to publication bias.

Answer 67

A

They create deep challenges for the scientific community, leading to potentially misleading interpretations of data.

Answer 68

A

No fund or investment strategy should be able to systematically beat the market average over the long run.

Answer 69

A

About 1 in 30,000.

Answer 70

A

Whether the comparison made is the natural one or if it was chosen to make the strategy look better.

Answer 71

A

good luck.

Answer 72

A

His fund lost 55 percent of its value during the 2008 financial crisis and continued to trail the market for several more years.

Answer 73

A

Thinking clearly about the naturalness of comparisons made.

Answer 74

A

It helps in assessing the confidence in the findings by outlining expected outcomes beforehand.

Answer 75

A

If the primary outcome of interest was revised during the study.

Answer 76

A

Watching The Apprentice, watching Home Alone 2, or both.

Answer 77

A

Due to the sheer number of traders and funds, exceptional track records may arise by chance.

Answer 78

A

Misleadingly high estimates of effect sizes due to the preference for statistically significant findings.

Answer 79

A

Individuals who achieve remarkable success, often leading to misleading inferences about their abilities.

Answer 80

A

reversion to the mean.

Answer 81

A

Finding statistically significant relationships suggesting that prior exposure to Trump corresponded to political behaviors in the 2016 presidential election.

Answer 82

A

Having seen The Apprentice
Having seen Home Alone 2
Both shows

Answer 83

A

Support for Trump
Support for Hillary Clinton
Voter turnout in 2016

Answer 84

A

Women
Blacks
Southerners
Rich
Young

Answer 85

A

The variables were made up and generated completely at random.

Answer 86

A

Consider if the relationships would hold with new data from another set of respondents.

Answer 87

A

Problems of over-comparing and under-reporting.

Answer 88

A

Disclose additional information
Conduct additional analyses

Answer 89

A

Undisclosed flexibility in data collection and analysis that allows presenting anything as significant.

Answer 90

A

Joseph Simmons, Leif Nelson, and Uri Simonsohn.

Answer 91

A

It provided real survey data for analysis.

Answer 92

A

Lowering the threshold for statistical significance.

Answer 93

A

To mitigate biases in data analysis and reporting.

Answer 94

A

Initial claims of effects on hormones and risk tolerance were not replicated.

Answer 95

A

Voters’ evaluations of government performance may be affected.

Answer 96

A

[specific probability value not provided in text]

Answer 97

A

It discussed the likelihood of null effects in large NHLBI clinical trials over time.

Answer 98

A

False-positive results.