Reproducibility Crisis Flashcards
Brian Wansink
- experiments on eating behaviours
- abused statistical procedures to look like research was successful
- p-hacking and HARKing
What is ‘Crisis of Reproducibility’?
published research can’t be replicated/reproduced
- misuse of statistics
- not just about statistics
Why researchers use statistics?
- to find relationships between variables if they think they are linked
- ignore noise and try find relationships that hold true ‘in general’
What hypothesis do researchers test for?
null hypothesis / statistical tests that estimate how well the data supports the null hypothesis
- rarely test if a relationship exists but test to see if no relationship exists
what is the null hypothesis?
hypothesis that no relationship (statistical significance) exists
What is the p-value?
- probability that null hypothesis is true (probability that results are due to chance)
- lower p-value –> less likely null hypothesis is true; more reasonable to reject null hypothesis
- higher p-value –> more likely null hypothesis is true (no relationship exists); accept null hypothesis
What would be the null hypothesis for study testing to see if there’s an extra $1000 annual income per year of schooling?
- there is no relationship between years of schooling and income
What does a p-value of 0.55 mean?
- 55% chance that null hypothesis is true / 55% chance that there’s no relationship between years of school and income
- expect a statistical test this extreme 55% of the time
What does a p-value of 0.01 mean?
- 1% chance that null hypothesis is true
- very unlikely that there’s no relationship between 2 values
- very unlikely to get p-value this extreme
Most common cut off for statistical significance
p < 0.05
- researchers only incorrectly reject null hypothesis 5% of the time (1 in 20)
What is the issue with p-value cut off being <0.05?
- p<0.05 so 1 in 20 (5%) chance that null hypothesis is true
- so for every 20 tests run, you will incorrectly reject null hypothesis 1 time (null hypothesis is true in in 20 times)
What is p-hacking?
- Repeating a statistical test to get false positives / false statistically significant results
What is HARKing?
Hypothesis after results are known
- can’t collect data then frame hypothesis around data
What is the risk of a false positive with the accepted p value for a statistically significant result
- 1 in 20 chance of false positive (rejecting null hypothesis when it is true and there’s actually no relationship)
Examples of why research can be wrong
- small sample size
- publishing studies with small effects
- relying on a small number of studies
- generating new hypothesis to fit data
- flexibility in research design
- intellectual bias
- conflict of interest
- competition to produce positive results