critical perspectives 1 - replication Flashcards
% of studies replicated across different major psych journals
only 36% have findings replicated (open science collaboration, 2015)
23% in social psychology journal (lowest)
replication crisis
highly cited studies in emotions - biases in citation of studies
top 65 studies - 40 observational and 25 experimental
found highly cited found bigger effects than less cited studies
preference for big effects over most representative or realistic results in other meta-analyses
what is replication
if you do the same again, do you get the same result/effect
more evidence showing same result = more likely to believe it
if not - why not
why is replication important - advantages of it? (5)
- protects against false positives (e.g. sampling error)
- controls for artifacts (leading questions, worded weirdly)
- addresses researcher fraud (wanting to be published)
- whether findings generalise to different populations
- same hypothesis using a different procedure
direct replication
recreate critical elements of an original study
e.g. samples, procedures, measures are kept the same
direct not exact (exact is basically impossible)
getting the same (or similar) results are indication that the findings are accurate or reproducible
conceptual replication
test the same hypothesis using a different procedure
same/similar results = findings are robust to alternative research designs, operational definitions, and samples
way in which direct replications can be done
registered replication reports = call for people to do a study again to a high quality
labs can sign up and follow procedures and protocols then share findings and reach conclusion between many researchers
results of replication attempts are published regardless of the outcome
4 reasons for non-replication
- faking
- sloppy science
- outcome switching/ p-hacking
- small samples/lack of statistical power
non-replication: faking example
Diederik Stapel - made up his research about human nature and published it
publishers/media preferred clear answers but his research detailed complexities of life, so he simplified it to get published
got data to fit narrative
non-replication: “sloppy science”
nine circles of scientific hell
issues with scientific research
from 1 (best) to 9 (worst):
- limbo
- overselling
- post-hoc storytelling - hypothesising after results are known
- P-value fishing
- creative outliers
- plagiarism
- non-publication - but it is difficult to publish all work
- partial publication
- inventing data - faking
non-replication - outcome switching/ p-value fishing
4th circle of scientific hell
changing outcomes of interest in study depending on observed results
desire to obtain a p value under .05 - significant result
e.g. do a study on effect of music on sadness, but find effect on happiness, so ditch original idea of sadness and write about happiness instead –> switch outcome of interest –> should write up all effects, not ignore sadness insignificance
p-hacking
making decisions to maximise likelihood of a statistically significant effect, rather than objective or scientific grounds
need to report everything in a study, not just ignore insignificant grounds
non-replication: small samples
small sample = less statistical power
therefore may not be replicable with larger samples
publication bias
of the 9 circles of scientific hell –> 7. non-publication, and 8. partial publication
findings that are statistically significant are more likely to be published than those that are not
by not sharing this with others, it is bad science
there are good reasons for some not being published, e.g. ambiguity over results
file drawer problem with publication
could published studies represent the 5% of findings that occur by chance alone - as so much is unpublished, and the unpublished insignificant results could represent the truth
form of publication bias
cratering of knowledge
distribution of knowledge is like a bell cure but with a massive dip in the middle around 0 - craters between the p = .05 levels
shows published research shows stuff that is just about significant
shows that research could not reflect reality
how common is sloppy science - survey of researchers
survey of 2000 psychologists research and questionable practices
practices:
- failing to report all measures/conditions
- deciding to collect more data after looking to see if results are significant - choose when to stop when results look good (shouldn’t peak at data - should decide sample in advance)
- selectively reporting studies that worked
conclusion:
- surprisingly high % of respondents engaged in questionable practices
issue of flexibility in resarch - sloppy science example (when i’m 64)
flexibility in data collection, analysis, and reporting
this increases actual false-positive rates
ANCOVA found that people actually became younger after listening “when i’m 64” when controlling for fathers age –> this is obviously impossible
how this happened:
wrote the results wrong because: they actually asked participants lots of questions but only reported the fathers age one as this gave a significant effect
idea that p value is 1/20, so doing 20 things you will get a significant effect
therefore have to report all things you do, not pretend you didn’t do it
moderators as a reason why findings may not replicate (ego-depletion example)
variables that influence the nature of an effect - e.g. country, culture
- e.g. ego-depletion = self-control is depleted by one thing so you can’t control yourself on the next
- reverse ego-depletion also found = self-control on one task makes self-control better on the next task
- found reasoning for reversal due to culture - e.g. in India, they praise self-control more
- this means it is not a failure of replication, but is the effect of a moderator
identifying moderators is good because it improves understanding of research
moderator = second generation research (first gen = is there an effect? second gen = when is there an effect?)
scientist error OR poor replication
idea that just because replication got different result, doesn’t mean original was wrong –> the replication could have got it wrong
example:
one study found priming participants with words about the elderly (“wrinkly”, “grey”) made people walk more slowly away from the lab after the study
replication didn’t find it - but original researcher refuted this and said replication got it wrong and studied it differently