Lecture 15 - Open Science and other Current Issues Flashcards
Describe the method of the Bem (2011) study
- N = 100
- Task with 20 minute duration
- 36 trials
- Different types of image: positive, negative, romantic, erotic
- Participants asked to click on the curtain which they feel has the picture behind it
- If guessed = 50/50
Describe the results of the Bem (2011) study
- Hit rate significantly above chance for erotic images:
- 53.1%, t(99) = 2.51, p = .01 (i.e. significantly above 50/50)
- Hit not significantly different from change for other image types
- Conclusion: people can tell the future, but only for erotic images
- Bem (2011)
What are the implications of the Bem (2011) study?
- The findings of this study are impossible
- However:
- It used the conventional statistics (e.g. t and p)
- It was published in a reputable journal
- What went wrong?
How do we infer from samples?
- Our results in the sample should match the population
- True negative = good because we know something doesn’t work (want what is true in the sample to match what is true in the real world)
- Statistical power = probability of seeing a true positive
- Alpha = the highest acceptable risk of a false positive (typically 5%) – still risk of a false positive but it is acceptably low
What is publication bias and the file drawer problem?
- Researchers biased toward results which support their theories
- Significant results are more likely to be published (could be true or false positives)
- Many journals value novelty and surprising results
- Non-significant results are often not published
- Non-significant replications are hard to publish (paves the way for silly papers like Bem’s, which stay there for a long time because its hard to publish the studies which prove them wrong)
- Researchers are under pressure to find significant results (‘publish or perish’)
- Non-significant studies/not published stuck away in file drawer
What is the importance of null results?
- A study, if well-designed, does not fail; it tells the truth
- Important null results:
- Phrenology = bumps on head predict criminal behaviour (found to be nonsignificant)
- Repressed memories = don’t explain all mental illness (no evidence – repressed memories have no relationship with psychological help)
- Physics = believed heavier things would fall faster than light things (Galileo found this is not true – mass of object has no relationship with how fast it falls)
What are some questionable research practices?
- Distorting the data, to support the researchers’ hypotheses e.g. running multiple analyses, finding a significant one and pretending that was the only planned test
- We typically say a result is significant is p<.05
- It is almost always possible to get some result where p<.05
- HARKING: hypothesising after results known
- “If you torture the data long enough, it will confess to anything” (Ronald Coase, Economist)
What are researcher degrees of freedom?
- Researcher gets to make decisions about how the data is analysed
- There are many valid ways to analyse a given dataset:
- Different statistical tests
- Different variables
- Different rules for excluding outliers (e.g. by different number of SDs)
What is P-hacking?
- P-hacking is a way to cheat/lie with statistics
- For any test, we accept a 5% probability of a false positive
- P-hacking:
- Performing the analysis in different ways to get p<.05
- Only reporting the significant result (harking = hypothesis side)
- This result in false positives: we cannot trust the results
What is multiverse analysis?
- Run many possible analysis
- See how many get a significant result
- Munoz and Young (2018):
- Analysed the data with N = 1, 152 regressions
- Less than 5% had a significant effect
What are the two problems which lead to many false positives in the literature?
- Significant results easier to publish – including false positives
- Many papers are underpowered (sample size not large enough) – true positives are not seen
- Leads to many false positives in the literature (less true positives/negatives)
What is the reproducibility crisis?
- Baker (2016)
- Asked 1500 scientists ‘if you were an experienced researcher and read a published paper, could you replicate the results?’
- 52% said yes, there is a significant crisis
What are some factors which contribute to irreproducible research?
Selective reporting, pressure to publish, publication bias and low power
What are some factors which could boost reproducibility?
Better understanding of statistics and design, incentivize people to be open and honest with their data (open science)
How do we solve the crisis?
How to solve the crisis:
- Transparency:
- (1) Open materials
- (2) Open data
- (3) Preregistered
- Get badges so other scientists know they’ve been open with data/materials and preregistered
Open materials
- Share the materials
- Exact instructions, program, stimuli
- Makes it easier for others to replicate
Open data
- Share the raw data
- So other researchers can perform the analysis and see how other variables/analyses affect the results
Preregistration
- Plan the study in advance, including materials and planned analyses
- E.g. Open Science Framework and AsPredicted
- Prevents p-hacking and HARCKING
- Researchers can compare your preregistration to the final study (if don’t match, sign of p-hacking)
What did the Nosek et al. (2022) study find about preregistration?
The amount of preregistrations is increasing