Lecture 2: Effect sizes and a-priori power analyses Flashcards
What is the main limitation of statistical significance?
When there are many participants, the results are easily significant which does not directly address how large or clinically significant an effect is as the criterium for significance is arbitrary.
What are effect sizes?
standardized measures of how large an effect is. These are the types for continuous outcome measures:
* Pearson’s r (.1 = small, .3 = medium, .5 = large)
* Cohen’s d (0.2 = small, 0.5 = medium, 0.8 = large)
* Hedges’ g (0.2 = small, 0.5 = medium, 0.8 = large)
What is Cohen’s d (check slide 20)?
(y1-y2)/ sdpooled
What is Hedges’ g (check slide 20)?
Cohen’s d * J
What is the difference between standard deviation?
Sd= √var
Se= √var/N
Which d’s are common in intervention research?
d= 0.8 for intervention vs waiting list control
d= 0.5 for intervention vs other intervention
-> difficult to grasp how clinically meaningful an effect is
How to overcome issue for clinical meaning?
By standardizing scores to a population norm and establish a cut-off point of being recovered. Also by using effect sizes for discrete outcome measures:
* Risk ratio
* Odds ratio (1.5 = small, 3.5 = medium, 9 = large)
* Number Needed to Treat (NNT)
Odds ratio
Divide recovered by not recovered for each condition. This means that you are x times more likely to recover than not to recover after one condition than the other condition
What are the limitations for discrete measures?
- effect sizes are warped
- comparability across studies is limited
- standardized scores can become more abstract
What is the role of effect sizes?
- Provide a standardized measure of the strength of an effect
- Allow to draw conclusions across multiple studies (meta-analysis)
- Help to calculate how many subjects you need when planning
a new study (power analysis)
What is an a-priori power analysis used for?
To determine how many subjects are needed to get with a reasonable chance (80%) a significant result
What is needed to determine the required N?
- significance level/alpha (usually .05)
- effect size that you expect (e.g. d = 0.8)
- desired power (usually 0.8)
Why is the selection criteria of clients a shortcoming?
There is low comorbidity and less complex forms of psychopathology. There could be low within group variance (could result in an higher F value when not correct). Results thus mat not generalize-> generalization crisis in psychology. Solutions: meta-analysis and N=1 study
How is RCTs focusing on specific symptom outcome measures an issue?
Does not capture the key problems or cause of clients-> broader issue of measuring latent constructs in psychology. Solutions: measures like quality of life and other clinical significance measures
How is RCTS reporting effects on various outcome measures an issue?
Multiple tests can increase the probability of significant results and so there are different effects in a study (some significant, some not)-> analytic flexibility issue in psychology. Solutions: register RCT with a priori hypotheses and meta-analysis