lecture 15 - power Flashcards
type I error
falsely rejecting the null hypothesis
main focus of statistics
false +ve
type II error
falsely accepting the null hypothesis
focus of power analysis
statistical power
the intuition - power is the potential to detect a particular effect eg a difference, if it really exists
say an effect definitely exists and it has a large effect size
then that will be relatively easy to ‘detect’ eg get a statistically significant difference in an experiemnt with relatively few ptps. the experiment still has high power even with small n.
in contrast say an effect definitely but it has a small effect size.
an experimnet with only a few ptps would be unlikely to detect with difference eg find a statistically significant difference. the experimnent has low power.
OK, but if I know an effect definitely exists, then why do I care about power? For that matter, why would I even run the experiment?! It’s not about knowing the effect definitely exists, but rather assuming it does to see what you’d needed to do to detect, but only if it actually exists.
power the intuition continued
High power: high probably of detecting an effect when there actually is an
effect! ☺
* low power: low probability of detecting an effect even though it actually exists!
* If a result is not statistically significant….
power can help to clarify whether the difference:
doesn’t exist
or
the experiment just missed it
High power suggests the difference doesn’t exist (or you would have likely
detected it).
But low power does not clarify which is likely the case.
NOTE. The effect size in the power calculation should not be from the current data
Remember you need an effect size that you want to detect to do a power calculation. Importantly this effect size should not usually be the effect size for the data from the current experiment. You can calculate power after the fact based on the effect size you’ve got in your data, but this “post hoc” power isn’t very helpful for clarifying the current (nonsignificant) result as it’s directly related to the nonsignificant p-value you’ve got for that data. Rather you want to do power calculations before you’ve collected data to determine how much data to collect based on an effect size from somewhere else, e.g. from a different data set, from some aspect of relevant theory, etc.
the sampled populations
null hypothesis - according to null hypothesis populations are the same
experimental hypothesis - populations are different
look at graphs in notes NOWWWWWWWW
the effect of increasing N on type II error
power (power = 1 - type II error)
influences on power - assuming the effect size
t = D-bar / S / √N
t increases with the mean difference, D-bar
t increase with decreasing standard deviation, S
t increases with increasing N
look at critical value table
power - how many ptps do you need?
- Power depends on:
– The difference between means
– The variability of the scores
– The number of participants - So, if we want to calculate the number of subjects
needed, then we need to estimate the likely differences
between means and the variability. - We also need to set what power level we want – 0.8 is
typical (i.e. an 80% of the time a significant result will
come from the experiment – if there really is an effect).
t = Y-bar1 - Y-bar2/ √S^2p/ N1 + S^2p/ N2
power - effect sizes
The differences between means, and expected variability,
can be estimated from pilot work or the previous literature.
* Combining these, we can calculate an “effect size” e.g.
Cohen’s
d.
* That is,
d is how many standard deviations will there be
between the conditions.
* Cohen “rule of thumb” is that a
d of 0.5 is a medium effect
size. Large = 0.8, small = 0.2.
* A research derived effect size is usually better than
assuming “a medium effect size”….
cohens d = Y-bar1 - Y-bar2 / √Sp
power - the components of calculating n
The formula we need, for
a between-subjects t-test, is:
* So, in addition to the effect size,
d, we’d need one other
parameter, δ.
– This is a “noncentrality parameter” (essentially a way of saying
“how wrong the null hypothesis is”).
* δ varies as a function of α level and power in the table below.
– For a power of 0.8 and an α of 0.05 (i.e. the typical values we would
want to use) then δ = 2.80
𝑛 =
2𝛿^2/
d^2
power - the calculation. sample size?
𝑁 = 2𝛿^2/ d^2
Assume we are looking for a “medium” effect (
d = 0.5),
and that we want a power of 0.8 and an α of 0.05 (this is a
pretty typical sort of experiment). Then:
2 ∗ 2. 82
0. 52
= 62.72
- That is, we would need 63 subjects per condition to get a
reasonably high level (0.8) of power
power - how much power will my experiment have?
We have a formula relating effect size and the number of subjects:
n=
2𝛿2/
d2
* If we re-arrange that formula it becomes
𝛿 =
d√n/2
* We can then use this and the table of δ values to work out the
power of a particular experiment
assume we are looking for a ‘medium’ effect (d = 0.5) and that we have 25 subjects in each of two groups
𝛿 = d √n/2 = 0.5 √25/2 = 1.77
given 𝛿 = 1.77 power is about 0.44. that is a typical experiment under thesis conditions would be significance less that half the time in terms of detecting a real effect of this size.
power what you need to know
- What power is conceptually.
- How power relates to effect size.
- That you could (hypothetically) calculate power and use it to:
- determined the number of participants you need in an experiment
- Clarify if a result isn’t significant because the effect doesn’t exist or because
the experiment lacked the power to detect it. - Software can help you calculate power (SPSS, Gpower, r, etc.)
power what you don’t need to know
The formula relating power to n, α and δ
* The table related power to n, α, and δ
* How to calculate n based on power….
* How to calculate power based on n….