L2 - Critical thinking about psychological research Flashcards
Replicability
- same research but with different sample
- procedures are done as equal as possible to original study, but with different participants
= if different results, then study lacks replicability
Reproducibility
- extent to which others can reproduce the findings using same data and same data analysis protocol
- no different sample and no different RQ
> e.g. tried to reproduce economics studies, and more than half were not reproducible
what are some possible reasons for lack of reproducibility?
- data was not provided
- analysis protocols were not provided
- authors did not respond
- …
> we fail 50% of times sometimes as well for no apparent reason
! could explain also replicability
Robustness
- same data to different researchers, different analysis
- they get different results based on analysis chosen
Researcher degrees of freedom
- all the individual decisions that the researchers have to make to analyse data (choices made are not same for everyone)
> e.g. what measurement procedure? what analysis? how many participants? what is a relevant effect? … - if researcher is biased, it will largely impact results because of his degree of freedom
what are the steps of the publication process?
- researcher writes manuscript
- sends it to the editors
- reviewers review it
- suggestions for revision and publication
- author fixes it
- study gets published
what are the studies reviewed on before publication?
- clarity
- accuracy
- appropriate methodology
- theoretical base
What is a case that explains biases in reviewing?
- same exact paper but different direction of the results (in some cases expected results occurred, in others they didn’t)
- different peer reviewers in different conditions led to very different reviews
= “expected results condition”: basically no criticism, very positive feedback
= “not-expected results condition”: harsh feedback, much criticism
!! high influence of expected results on reviewer’s judgement
what is the issue with not publishing studies with insignificant results?
(e.g. no effect found)
- the researchers start pursuing only significant results, which lead to biases
> might want to move their carreer further and not lose their jobs, which might happen if they don’t publish
→ goal of science and of scientist does not allign anymore
> study showed that 96% of results in studies were the expected ones
File drawer effect
- not publishing studies with results that were not expected
- it is estimated that 50% of studies in psychology remain unpublished
- leads to publication bias
what is usually overemphasized in studies?
- counterintuitive findings
- small and noisy samples
! too much emphasis on new, surprising findings is problematic
(see picture 2)
what are counterintuitive findings in terms of conditional probability?
- prior probability: unconditional probability of hypothesis being true regardless of the research results
→ in counterintuitive findings, the prior probability is very low (= low base rate)
= low prior probability of alternative hypothesis being true
(to be rechecked)
what is the problem when using small and noisy samples?
- with smaller and noisier samples, power is smaller as well
→ increased type II error
→ more likely to not reject the null hypothesis when it is false
= lower probability of null hypothesis being false given that it was rejected
(to be rechecked)
Publication and reporting bias
(see picture 1)
- from original studies, (eg) 50% found no effect
1. publication bias
2. outcome reporting bias
3. spin
4. citation bias
= at the end, almost all studies point at intended effect
Publication bias
- some studies with not expected results are not published
> all studies with positive results and only some with negative results are published
Outcome reporting bias
- some variables and conditions are left out of published studies with initial negative results
- some questionable practices are applied
Spin
- results are interpreted in vague and not accurate way, making results sound significant
→ doesn’t stand out as much that results weren’t as predicted
> e.g. “(insignificant) effect is significantly more marked than placebo”
Citation bias
- positive results are cited more than studies with negative results
p-hacking
- being on the lookout for significant findings
- data analysis carried out to so that findings are significant
> one of the questionable research practices
Type I & Type II error
- Type I error: rejecting H0 when true (usually 5% of cases)
- Type II error: not rejecting H0 when false (usually 20% of cases; power = 80%)
what is the ideal scenario when deciding analyses?
what is common practice instead?
- cost & benefit analysis of using certain power and alpha level
- usually, people just use power of 80% and alpha of 5% (default value)
What is the effect of questionable research practices on the interpretation of the results?
- they inflate the likelihood of type I error
- instead of using one outcome, they use more of them, leading to an increased change of getting expected outcome
how does the probability of getting Type I error change based on how many tests we run?
- The probability of making type I error in individual test is 0.05, but probability of type I error for whole collection of tests is higher
- p-hacking
in what occasions would the probability of getting type I error be higher?
- when measuring multiple dependent variables
- when comparing multiple groups
- when making testing a difference with or without including a covariate
how can we control for type I error? why?
By setting up and communicating clear sampling plan:
> adding observations and testing after each new addition increases the probability of Type I error
> so continuing data collection until a significant difference is found, guarantees T.1 error
What are the estimation of actual power in the studies?
- it is originally set as 80%
- estimates range grom .50 to .35
>.31 in specific domains (e.g. neuroscience)
!! so power is quite low, but 96% of published studies have expected results → this indicates mismatch in percentages
Why is “checking for significance” an issue?
Because of:
- lack of sensitivity
> underpowered studies make for inconclusive replication attempts
> 49% of replications were inconclusive (but are often reported as conclusive failures to replicate)
- lack of differentiation
> is the found effect in the replication meaningfully different from the original?
what are some solutions for all the questionable research practices?
- quality of research should be determining factor in how research is evaluated (and not statistical significance)
- replication should be more central (both direct and conceptual)
- open science and pre-registration (online availability of data, materials, procedure and pre-publication)
what are direct and indirect replications?
- Direct: same research, different sample
- Conceptual: different research and sample, same concept
Registered report
- publisher guarantees to publish study no matter what results are
- around 45% of predicted results are met (instead of 96%)
Why are questionable practices so common in research?
- most times they are not intended
- there are various different outcomes that a study could have, based on the researcher’s degrees of freedom
- sometimes, the outcome is the consequence of a series of decisions
what other decisions could have been made by the authors of the study on fertility and religion?
- do not memorize, it is just to understand previous flashcard
- different cycle days (still reasonable)
- relationship status (ambiguous questions)
→ this leads to 180 different possible outcomes (if other decisions were taken)
(see pictures 3 & on)
! hard to assess how robust the finding is, if researchers are not open about the decisions made
Asymmetric attention
- bias in critical thinking
- rigorously checking unexpected results, but giving expected results a free pass
- motivated skepticism (skepticism biased towards things we don’t want to accept / agree with)
Black box argument
- (hypothesis myopia)
- something happens, and we observe the response
- we then infer the interpretation of the process
(see picture 6)
hypothesis myopia
- bias in critical thinking
- collecting evidence to support a hypothesis, while not looking for evidence against it, and ignoring other explanations
- prove hypothesis without looking for alternative interpretation
Texas sharpshooter
- bias in critical thinking
- seizing on random patterns in the data and mistaking them for interesting findings
Just-so storytelling
- bias in critical thinking
- finding stories after the fact to rationalize whatever the results turn out to be
- researchers are not necessarily aware of how their decisions influenced results, so they think that results are accurate and representative of state of the world
Actively Open-Minded Thinking (AOT)
- we should take into consideration all possibilities, evidence and goals
- pay attention to sufficiency, fairness and confidence
> e.g. where all possibilities considered? was the evidence considered in light of all hypothesis? were the relevant criteria applied? …
Article 2
What are the two main problems in research today?
- scientific field is more competitive than ever > emphasis on piling up publications with statistically significant results
- not good-enough tools when considering multiple variables
Article 2
What is Hypothesis Myopia?
- collect evidence to support just one hypothesis
- not look for evidence against it
- fail to consider other explanations
Article 2
what is an example of hypothesis myopia?
- Sally Clark convicted of murdering her two sons, because Sudden Infant Death Syndrome appeared very unlikely to happen twice in one family
- failed to consider base rate of double murder happening in a family
- likelihood ratio of 9:1 (SIDS:against murder)
= they failed to account for other hypothesis and explanations for an event, only collected explanation for their initial hypothesis
Article 2
What is the Texas Sharpshooter?
- “drawing the target around the pattern of bullets already shot”
- pick the one option that explains the most agreeable results
Article 2
what is p-hacking?
- exploiting researcher degrees of freedom until p<0.05
- misuse of data analysis to find patterns in data that can be represented as statistically significant, thus increasing the risk of false positives
- e.g. perform many statistical tests and only report those that came back with significant results
Article 2
What is HARKing?
- report unexpected findings as having been predicted from the start
- hypothesizing after the results are known
Article 2
What is asymmetric attention?
- giving expected results a free pass, but rigorously check non-intuitive results
- e.g. 88% of cases in which results did not allign with hypothesis, the inconstistencies were blamed on how experiments were conducted, not on the theory of the researchers
Article 2
what is Just-so Storytelling?
- justifying results that come up after obtaining them
Article 2
what is JARKing?
- justifying after results are known
- rationalize why results should have come up a certain way but did not
Article 2
what are some solutions to the bias in researching?
- strong inference:
> explicitly considering competing hypotheses + develop experiments to test for them
> tackles hypothsis myopia - explicitly listing alternative explanations
> reduce just-so storytelling
Article 2
Transparency
- share methods, data, computer code and results
- register reports (presenting plan for peer review before they do experiment)
Article 2
Team of rivals
- adversarial collaboration (proponent-sceptic)
- team up with “rivals” in the field to get to the truth
- hard to carry out, because it’s hard for researchers to team up with people that will try to dismantle their research
Article 2
Blind data analysis
- researchers who do not know how close they are to desired results will be less likely to find what they are unconsciously looking for
Article 2
how can blind data analysis be carried out?
- write a program that creates alternative data sets (eg add random noise or move participants to different conditions)
> they carry out analysis of fake results and only at the end they get real results, and analysis cannot be changed fiddled with
Article 3
What are the problems with false positives?
(Rejecting the null hypothesis when true)
- particularly persistent mistake in research
- research have little incentive to find null results (they will not be published)
- false positives waste resources
- risk of losing credibility of scientific field of researches with published false positives
Article 3
how do false positives come to be?
- because of the reserachers’ degrees of freedom
- with all the decisions that researchers can make in the study design and analysis plan, the likelihood of false positives is higher than 0.05 (alpha)
Article 3
where does this exploratory behavior stem from?
- ambiguity in what decision is best
- researcher’s desire to find statistically significant result
Article 3
what are some possible degrees of freedom of researchers (rDf)?
- choosing among dependent variables
- using covariates
- reporting subsets of experimental conditions
- choosing sample size
! experiment was conducted over these rDf, and it showed that they would increase the false positive rate up to 50%
Article 3
What are the requirements for authors?
- solution to problem of false-positive publications
- atuhors must decide the rule for terminating data collection before it begins, and report rule in the article
- athors must collect at least 20 observations per cell or provide compelling cost-of-data-collection justification
- authors must report all experimental conditions, including failed manipulations
- if observations are eliminated, authors must also report what the statistical results are if those observations are included
- if analysis includes a covariate, authors must report the statistical results of the analysis without the covariate
Article 3
Sample-size rDf
- many researchers stop data collection on basis of interim data analysis (estimated 70%)
- e.g. stop collecting data when statistical significance is obtained or when n of observation =50
- based on wrong idea that effect significant with small sample size is also significant with larger sample size
Article 3
what are the guidelines for reviewers?
- solution to problem of false-positive publications
- reviewers should ensure that authors follow the requirements
- reviewers should be more tolerant of imperfections in results
- reviewers should require authors to demonstrate that their results do not hinge on arbitrary analytic decisions
> reviewer should ask for alternatives of the arbitrary decisions made by author
> ensure that arbitrary decisions are consistent across studies - if justifications of data collection or analysis are not compelling, reviewers should require the authors to conduct exact replication
Article 3
what is wrong with the study about listening to music and age felt?
see image 7
Article 3
file-drawer problem
only reporting the experiments that work
Article 3
what is a solution for the file-drawer problem?
- asking researchers to submit studies independently from result
> how to enforce submission?
> how to ensure disclosure of degrees of freedom? - publishers should give incentives and reinforce disclosure practices, until it becomes common
Article 3
what are other possible solutions for rDf? what are their criticisms?
- Correcting alpha levels
> might be interpreted as ulterior rDf
> no clear effect of specific rDf on findings, therefoe no clear direction of correction of alpha - Using Bayesian statistics
> increases rDf (additional judgements of prior d.)
> new set of analyses that could be subjected to data - Using Conceptual replications
> might choose different conditions and report different measures - Porting materials and data
> too big cost on reader and reviewer
> allows for reduction of a condition from raw data as well (= no transparency)
Article 1
what are two widespread problems of science?
- science is ignored
> judgements that could be better made by computers are made by humans (e.g. selection procedures in university) - “replication crisis”
> most studies cannot be replicated
> journals will publish studies with most surprising results (= with higher chances of being wrong)
Article 1
Notes - bias in science
- journals are becoming more concerned with financial conflict of interest (studies showing results that sponsors want)
- cognitive dissonance is present when researchers defend their positions despite evidence of them being wrong
Article 1
Actively open-minded thinking (AOT)
- thinkers should be open to challenges and seek out challenges proactively
> e.g. thinking of alternative possible conclusions, asking questions about interpretations of conclusion, …
Article 1
statistical control
- persistance in methodology used
> often interaction effects are interpreted badly and interactions disappear when we change measurement tools
> we must not interpret interactions unless the additional variable reverses the direction of the effect (cross-over interaction)
Article 1
what are the bases of AOT?
- search
> possibilities (possible answers to questions we are asking)
> evidence (facts and beliefs that influence the evaluation of possibilities)
> goals (what we apply to evaluate possibilities in light of evidence) - inference
Article 1
conjectures and refutations
- don’t wait around for contrasting results, look proactively to more data as clearly as possible
Article 1
My-side bias
- tendency to search for reasons supporting a favoured conclusion
- ignoring alternative explanations, …