Lecture 2: Effect sizes and a-priori power analyses Flashcards
What is the main limitation of statistical significance?
When there are many participants, the results are easily significant which does not directly address how large or clinically significant an effect is as the criterium for significance is arbitrary.
What are effect sizes?
standardized measures of how large an effect is. These are the types for continuous outcome measures:
* Pearson’s r (.1 = small, .3 = medium, .5 = large)
* Cohen’s d (0.2 = small, 0.5 = medium, 0.8 = large)
* Hedges’ g (0.2 = small, 0.5 = medium, 0.8 = large)
What is Cohen’s d (check slide 20)?
(y1-y2)/ sdpooled
What is Hedges’ g (check slide 20)?
Cohen’s d * J
What is the difference between standard deviation?
Sd= √var
Se= √var/N
Which d’s are common in intervention research?
d= 0.8 for intervention vs waiting list control
d= 0.5 for intervention vs other intervention
-> difficult to grasp how clinically meaningful an effect is
How to overcome issue for clinical meaning?
By standardizing scores to a population norm and establish a cut-off point of being recovered. Also by using effect sizes for discrete outcome measures:
* Risk ratio
* Odds ratio (1.5 = small, 3.5 = medium, 9 = large)
* Number Needed to Treat (NNT)
Odds ratio
Divide recovered by not recovered for each condition. This means that you are x times more likely to recover than not to recover after one condition than the other condition
What are the limitations for discrete measures?
- effect sizes are warped
- comparability across studies is limited
- standardized scores can become more abstract
What is the role of effect sizes?
- Provide a standardized measure of the strength of an effect
- Allow to draw conclusions across multiple studies (meta-analysis)
- Help to calculate how many subjects you need when planning
a new study (power analysis)
What is an a-priori power analysis used for?
To determine how many subjects are needed to get with a reasonable chance (80%) a significant result
What is needed to determine the required N?
- significance level/alpha (usually .05)
- effect size that you expect (e.g. d = 0.8)
- desired power (usually 0.8)
Why is the selection criteria of clients a shortcoming?
There is low comorbidity and less complex forms of psychopathology. There could be low within group variance (could result in an higher F value when not correct). Results thus mat not generalize-> generalization crisis in psychology. Solutions: meta-analysis and N=1 study
How is RCTs focusing on specific symptom outcome measures an issue?
Does not capture the key problems or cause of clients-> broader issue of measuring latent constructs in psychology. Solutions: measures like quality of life and other clinical significance measures
How is RCTS reporting effects on various outcome measures an issue?
Multiple tests can increase the probability of significant results and so there are different effects in a study (some significant, some not)-> analytic flexibility issue in psychology. Solutions: register RCT with a priori hypotheses and meta-analysis
Why is it an issue that RCTs focus on pre-post difference in symptoms?
It does not tell you the mechanism why the intervention works and does not account for non-linear change processes. Solutions: study mediators, N=1 study
What are shortcomings of clinical practice?
- Overconfidence in clinical impression and tailored interventions as reliability is low and unclear if/how treatment works. Big issue in complex decision-making-> look at moderators
- No systematic evaluation as difficult to tell when to stop a treatment, and knowledge stops when therapist retires-> evaluate intervention systematically, N=1 study
Why are both RCT’s and clinical info needed?
RCTS provide empirical basis on effectiveness of therapy and clinical experience is needed to monitor treatment in real world
What is the relationship between statistical significance and sample size?
Statistical significance is strongly determined by the number of subjects. As the sample sizes increases, the result is more likely to become statistically significant.
Because of this, even a small effect can be statistically significant with a large sample size.
How would using the pooled standard deviation affect the effect size?
The standard deviation of the control group is smaller (SD = √3) than the pooled standard deviation of both groups (SD = 2). By using the smaller SD of the control
group, Cohen’s d will become larger.
How to interpret AUC?
This means that when a subject from the intervention group is compared to a subject from
the control group, in x% of the cases, the subject from the intervention group is classified
as having a better outcome than the subject from the control group.
Why is it needed to perform an a-priori power analysis?
A limitation of too few
subjects: high probability that the result is not significant, while there may be a true effect (Type II error). Limitation of too many subjects: many patients unnecessarily receive a treatment that may not be effective or may have side effects (= unethical); wasted research
recourses (i.e., research hours, money); trivial differences might become significant.
How does a smaller alpha value affect sample size?
The critical F-value also becomes larger, and to exceed this threshold a larger sample size is needed to get a higher F-value.