W5 Peer Review, Replication, Empathy Flashcards
What are the steps of peer review?
Step 1 = read by an editor, does it suit the journal, does it impact an area of study, is it exciting/influential?
Step 2 = send to 2-3 expert anonymous reviewers, most journals allow reviewers to see who the writers are, they make one of the following decisions
REJECT it for the journal
INVITE A REVISION, changes requested by the reviewers, sometimes need to change major parts of the experiment
Step 3 = authors might revise the paper in line with the reviewers requests, or argue why the changes are not necessary
Step 4 = if changes are made, another peer review is done, can reject, revise or accept the paper, sometimes happens
What are the challenges of going through peer review?
- Often multiple iterations of peer review
- Each round of peer review can take several months, eg. 2-3 months
- Often have new deadlines to revise the manuscript to be accepted as a revised manuscript
Why is peer review important? (3)
what is the counter-argument to this?
- Papers of poor quality are excluded
- Made up of multiple experts and editors to be as objective as possible
- Helpful and constructive to improve the manuscript
Other alternatives to peer review
Some argue EVERYTHING in raw form should be published, no gatekeeping or quality control, it will ‘self-correct’
What are the cons to only being able to access things as student/researchers vs. open. access?
(Scientific journals give you the choice between paying for your research or doing open access)
- general public gets excluded to scientific literature, excludes people from learning about psych, medical knowledge, pressure to only publish in mainstream journals and not start your own
OR - authors have to pay $$$ which may limit publications from less financed institutions and teams, causes biases in scientific literature / rich / western
What’s a solution for researcher vs. open access?
- Hybrid approach for limited papers to be paid / accessible OR
- Better science communication / transferred into a blog
What were the findings of the Reproducibility project?
Called replication crisis in psychology
- studied how well does effects replicate DIRECTLY
Of the 96% of original studies were stat. sig, ONLY 36% of replications were statistically significant
MEAN EFFECT SIZE of originals WAS HALF IN REPLICATIONS
What is direct vs. conceptual replication?
- Direct replication = method is repeated as closely as possible
- Conceptual replication = different methods are used to test the same hypothesis
What are the 4 reasons why a replicant study might fail to obtain the same results as the original?
- Unidentified contextual effects
- Unidentified individual differences = eg. participants in first study might have all had higher levels of something, and replicant study did not, but individual differences weren’t controlled for initially
- Original could be TYPE 1 ERROR, “false positive”, 5% of the time, 1 / 20 tests
- Poor research practices, eg. no Power calculations
Type 1 vs Type 2 error, which is more serious?
- Type 1 Error = Null hypothesis was true, we reject the null hypothesis “false positive”, we say there’s an effect even there’s not a real effect
More serious, false alarm/positive - Type 2 Error = Null hypothesis is not true, BUT retain null hypothesis “false negative”, there’s an effect but we say there’s no effect miss
How to avoid the ‘fishing expedition’ in research?
Decide a-priori about what you want to look for, otherwise you go on “fishing expedition” for statistically significant results
Eg. without uncorrected multiple comparisons, you can attribute noise in machines to neural activity in salmon
If we changed alpha level from .05 to .01 what errors would increase/decrease?
Type 1 error would decrease from 5% chance to 1% chance
Type 2 error would increase (saying there is no effect when there is an effect)
What would be the consequences of using an alpha level 0.1?
If alpha level decreased, things that might be valid would be rejected and not published, and more participants would be needed to produce the same results - need to increase SENSITIVITY to really determine whether an effect is there or not
What is power?
What are the 2 things Power is affected by?
- the likelihood of correctly rejecting the null hypothesis, of correctly detecting a significant effect statistically in a sample, if the effect is there in the population
Power is affected by number of PEOPLE and number of TRIALS
How does low Power increase Type 1 and 2 errors?
Increase in chance of random error variance in a person in the sample which affects the mean
Small sample sizes lead to both type 1 and type 2 statistical errors - commonly driven by outliers
5-10 years recently it has become the norm to conduct power analyses, and many older studies becomes harder to replicate because they were ‘underpowered’, eg. low sample size
What do you need to consider in high sample sizes?
- Larger sample sizes minimises the impact of outliers and increases the generalisability BUT you NEED to consider effect size, since a minute difference in groups might produce a significant result from HIGH Power, but the magnitude of the effect might be very minute
Is it justified to use the same amount of participants as previous studies?
No - better to do a Power analysis to figure out what is the minimum sample size to get decent power
This is limited by logistics and replication studies
After Power analysis, what do we do?
Have a firm ‘stopping rule’ for recruiting participants regardless of statistical significance
If you check for significance every participant or small subset, and stop once you’ve reached it, this can inflate type 1 error rate “false positives”
check if the task/performance was appropriately difficult, eg. avoiding ceiling and floor effects
How do you determine a formal stopping rule? and can we always test this?
- run a formal power analysis (can inform how many people you should study)
- No, its unfeasible to do power analysis in some complex designs, otherwise need to justify why you didn’t do one
Why are multiple experiments in a study good?
Try to replicate differences that you have already observed, because there is a 5% chance that the differences are JUST DUE TO CHANCE
Multiple experiments within a manuscript, eg. 2 experiments that find the same result = 5% * 5% chance of false positive = 0.25% of false positive
What 3 things are needed for transparency and what happens if we don’t include it?
- Describe all the decisions made
- Include both insignificant and significant variables
- Share de-identified raw research data ‘open science framework’ for others to replicate or check your data
Distorted sense of scientific literature
Published papers more likely to have type 1 errors, and there might be more findings showing no effect and only one showing an effect
Valuable knowledge lost