1- Reproducibility, Replicability and Robustness Flashcards
Why are these three notions important?
In assessing credibility of study findings
What is replicability?
We replicate a study and expect to get the same findings
What is Nosek et al’s definition of replicability?
Testing reliability of previous findings with different data
What do the credibility of scientific findings depend partly on?
Replicability of supporting evidence
Why is replication not always straightforward?
Difficult to determine what counts as same study/same outcome
What is Nosek et al’s definition of reproducibility?
Testing reliability of previous findings using same data and same analysis strategy
Why should the same results theoretically occur when findings are reproduced?
Someone is applying same analysis to same data
What 2 reasons why reproducibility tests may fail?
- Original analysis may not be repeated
- Data/necessary software tools may not be available
What is the most common problem of reproducibility?
Data availibility
What is Nosek et al’s definition of robustness?
Testing reliability of previous findings using same data and different analysis strategy
Why is fragility a risk factor for robustness?
For replicability and generalisability
What are the three types of replication?
Direct replication, systematic replication, conceptual replication
What is direct replication?
New study using same procedure, measures, study population as original features
What is systematic replication?
Secondary features have been changed (eg. different stimuli order)
What is conceptual replication?
Intentionally different from direct replication, examines validity and generalisability of original findings, similar but not the same
What happened in the 1960s-70s?
Social psychologists started to doubt the validity of research
When is an effect declared statistically significant?
Null hypothesis test of p<0.05
What is the problem with the statistic significance criterion?
Encourages publication and analytic bias favouring significant results
What was Strack et al’s (1988) study?
Study on facial feedback hypothesis, 92 US students, hold pen with non-dominant hand/teeth/lips, rate funniness of cartoons
What did Strack et al find?
Cartoons funnier when holding pen between teeth (smile) than when holding with lips (pout)- confirms facial feedback hypothesis
What was Wagenmakers et al’s (2016) study?
17 replications of same study as Strack et al- study pre-registered and N= 1,894
What did Wagenmakers et al find?
Strack et al’s findings not replicated
What 3 reasons are there why Strack et al’s findings may not have been replicated?
- Social norms
- Task was not very valid
- Instruction may be differently interpreted
What 3 ways does Giner-Sorolla propose to address the replication crisis?
- Publish both null and significant findings
- Rigorous and transparent methodology
- Pre-register hypotheses and studies
What is the purpose of statistical power?
Tells you whether a sample is large enough to find an effect
What does using a statistically incorrect sample size lead to?
Inadequate results
Why do we need an effective sample size?
Enables efficient, high significance studies
What is sensitivity?
How likely a study is to distinguish between actual effect and chance effect
Why is statistical power generally set at 80%?
There is an 80% chance that the effect exists
Why should we pay attention to the sample size?
Effect is unreliable if sample size is too low