L2 - Critical thinking about psychological research Flashcards
Replicability
- same research but with different sample
- procedures are done as equal as possible to original study, but with different participants
= if different results, then study lacks replicability
Reproducibility
- extent to which others can reproduce the findings using same data and same data analysis protocol
- no different sample and no different RQ
> e.g. tried to reproduce economics studies, and more than half were not reproducible
what are some possible reasons for lack of reproducibility?
- data was not provided
- analysis protocols were not provided
- authors did not respond
- …
> we fail 50% of times sometimes as well for no apparent reason
! could explain also replicability
Robustness
- same data to different researchers, different analysis
- they get different results based on analysis chosen
Researcher degrees of freedom
- all the individual decisions that the researchers have to make to analyse data (choices made are not same for everyone)
> e.g. what measurement procedure? what analysis? how many participants? what is a relevant effect? … - if researcher is biased, it will largely impact results because of his degree of freedom
what are the steps of the publication process?
- researcher writes manuscript
- sends it to the editors
- reviewers review it
- suggestions for revision and publication
- author fixes it
- study gets published
what are the studies reviewed on before publication?
- clarity
- accuracy
- appropriate methodology
- theoretical base
What is a case that explains biases in reviewing?
- same exact paper but different direction of the results (in some cases expected results occurred, in others they didn’t)
- different peer reviewers in different conditions led to very different reviews
= “expected results condition”: basically no criticism, very positive feedback
= “not-expected results condition”: harsh feedback, much criticism
!! high influence of expected results on reviewer’s judgement
what is the issue with not publishing studies with insignificant results?
(e.g. no effect found)
- the researchers start pursuing only significant results, which lead to biases
> might want to move their carreer further and not lose their jobs, which might happen if they don’t publish
→ goal of science and of scientist does not allign anymore
> study showed that 96% of results in studies were the expected ones
File drawer effect
- not publishing studies with results that were not expected
- it is estimated that 50% of studies in psychology remain unpublished
- leads to publication bias
what is usually overemphasized in studies?
- counterintuitive findings
- small and noisy samples
! too much emphasis on new, surprising findings is problematic
(see picture 2)
what are counterintuitive findings in terms of conditional probability?
- prior probability: unconditional probability of hypothesis being true regardless of the research results
→ in counterintuitive findings, the prior probability is very low (= low base rate)
= low prior probability of alternative hypothesis being true
(to be rechecked)
what is the problem when using small and noisy samples?
- with smaller and noisier samples, power is smaller as well
→ increased type II error
→ more likely to not reject the null hypothesis when it is false
= lower probability of null hypothesis being false given that it was rejected
(to be rechecked)
Publication and reporting bias
(see picture 1)
- from original studies, (eg) 50% found no effect
1. publication bias
2. outcome reporting bias
3. spin
4. citation bias
= at the end, almost all studies point at intended effect
Publication bias
- some studies with not expected results are not published
> all studies with positive results and only some with negative results are published
Outcome reporting bias
- some variables and conditions are left out of published studies with initial negative results
- some questionable practices are applied
Spin
- results are interpreted in vague and not accurate way, making results sound significant
→ doesn’t stand out as much that results weren’t as predicted
> e.g. “(insignificant) effect is significantly more marked than placebo”
Citation bias
- positive results are cited more than studies with negative results
p-hacking
- being on the lookout for significant findings
- data analysis carried out to so that findings are significant
> one of the questionable research practices
Type I & Type II error
- Type I error: rejecting H0 when true (usually 5% of cases)
- Type II error: not rejecting H0 when false (usually 20% of cases; power = 80%)
what is the ideal scenario when deciding analyses?
what is common practice instead?
- cost & benefit analysis of using certain power and alpha level
- usually, people just use power of 80% and alpha of 5% (default value)
What is the effect of questionable research practices on the interpretation of the results?
- they inflate the likelihood of type I error
- instead of using one outcome, they use more of them, leading to an increased change of getting expected outcome
how does the probability of getting Type I error change based on how many tests we run?
- The probability of making type I error in individual test is 0.05, but probability of type I error for whole collection of tests is higher
- p-hacking
in what occasions would the probability of getting type I error be higher?
- when measuring multiple dependent variables
- when comparing multiple groups
- when making testing a difference with or without including a covariate
how can we control for type I error? why?
By setting up and communicating clear sampling plan:
> adding observations and testing after each new addition increases the probability of Type I error
> so continuing data collection until a significant difference is found, guarantees T.1 error
What are the estimation of actual power in the studies?
- it is originally set as 80%
- estimates range grom .50 to .35
>.31 in specific domains (e.g. neuroscience)
!! so power is quite low, but 96% of published studies have expected results → this indicates mismatch in percentages
Why is “checking for significance” an issue?
Because of:
- lack of sensitivity
> underpowered studies make for inconclusive replication attempts
> 49% of replications were inconclusive (but are often reported as conclusive failures to replicate)
- lack of differentiation
> is the found effect in the replication meaningfully different from the original?
what are some solutions for all the questionable research practices?
- quality of research should be determining factor in how research is evaluated (and not statistical significance)
- replication should be more central (both direct and conceptual)
- open science and pre-registration (online availability of data, materials, procedure and pre-publication)