Lecture 7 Flashcards
What are the 3 misconceptions about null hypothesis significance testing?
1) Significant=important
2) “Non-significant”= null hypothesis is true
3) “Significant” = null hypothesis is false
What is one of the biggest problems in NHST?
Encourages all-or-nothing thinking. Causes researchers to engage in sketchy practices to find a significant difference. P value is arbitrary
What was the reproducibility project?
Looked at 100 psych studies, Follow methods of study exactly to see if it was the same. 97% of studies reported significant results
What percentage of studies were actually significant in the reproducibility project?
36%. 61% of studies claimed a significant difference that replication didn’t find.
Why do we care about reproducibility?
1) False positives-blur the picture of truth
2) Can lead to horrible public policy decisions (ex: turning right on a red light)
3) Waste of resources in pursuing line of faulty thinking
4) False ideas persist in literature due to the fact that only significant results get published.
5) Loss of credibility as a field
What did a meta-analysis show on the actual effects of turning right on a red light?
Original studies showed no difference (due to being underpowered). Meta-analysis showed a 61% increase in hitting pedestrians, and a 100% increase in cyclists.
What are incentive structures?
The idea that science is objective, and the search for the truth. Scientists must compete for resources. If you are published more, you get to do more stuff. Journals only publish significant results.
What is the publication bias?
90% of studies in psychology are statistically significant.
What is the publication bias driven by?
Journal editors/reviewers selecting (7/10 papers are rejected)- researchers also only bother to send in significant results
What is researcher degrees of freedom?
Researchers contributing to the problem through the choices they make. Abuse of freedom.
What is selective reporting?
When the researcher reports only significant measures, even though they measured more than that, without correcting for multiple tests. Type 1 error gets inflated due to this.
How do you calculate an increase in type 1 error due to multiple groups?
Do 1- (probability of none) exponent is the number of groups you’re measuring
How do researchers deal with extremes?
They choose which scores are extremme and which aren’t. There are also different methods of dealing with outliers such as data transformation and deletion.
What percentage of researchers admit to stopping data collection once they achieve significance?
70%
What can stopping data collection lead to?
Truth inflation/Magnitude error