lecture 15 - power Flashcards

1
Q

type I error

A

falsely rejecting the null hypothesis
main focus of statistics
false +ve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

type II error

A

falsely accepting the null hypothesis
focus of power analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

statistical power

A

the intuition - power is the potential to detect a particular effect eg a difference, if it really exists
say an effect definitely exists and it has a large effect size
then that will be relatively easy to ‘detect’ eg get a statistically significant difference in an experiemnt with relatively few ptps. the experiment still has high power even with small n.
in contrast say an effect definitely but it has a small effect size.
an experimnet with only a few ptps would be unlikely to detect with difference eg find a statistically significant difference. the experimnent has low power.

OK, but if I know an effect definitely exists, then why do I care about power? For that matter, why would I even run the experiment?! It’s not about knowing the effect definitely exists, but rather assuming it does to see what you’d needed to do to detect, but only if it actually exists.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

power the intuition continued

A

High power: high probably of detecting an effect when there actually is an
effect! ☺
* low power: low probability of detecting an effect even though it actually exists! 
* If a result is not statistically significant….
power can help to clarify whether the difference:
doesn’t exist
or
the experiment just missed it
High power suggests the difference doesn’t exist (or you would have likely
detected it).
But low power does not clarify which is likely the case.
NOTE. The effect size in the power calculation should not be from the current data

Remember you need an effect size that you want to detect to do a power calculation. Importantly this effect size should not usually be the effect size for the data from the current experiment. You can calculate power after the fact based on the effect size you’ve got in your data, but this “post hoc” power isn’t very helpful for clarifying the current (nonsignificant) result as it’s directly related to the nonsignificant p-value you’ve got for that data. Rather you want to do power calculations before you’ve collected data to determine how much data to collect based on an effect size from somewhere else, e.g. from a different data set, from some aspect of relevant theory, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

the sampled populations

A

null hypothesis - according to null hypothesis populations are the same

experimental hypothesis - populations are different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

look at graphs in notes NOWWWWWWWW

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

the effect of increasing N on type II error

A

power (power = 1 - type II error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

influences on power - assuming the effect size

A

t = D-bar / S / √N

t increases with the mean difference, D-bar

t increase with decreasing standard deviation, S

t increases with increasing N

look at critical value table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

power - how many ptps do you need?

A
  • Power depends on:
    – The difference between means
    – The variability of the scores
    – The number of participants
  • So, if we want to calculate the number of subjects
    needed, then we need to estimate the likely differences
    between means and the variability.
  • We also need to set what power level we want – 0.8 is
    typical (i.e. an 80% of the time a significant result will
    come from the experiment – if there really is an effect).

t = Y-bar1 - Y-bar2/ √S^2p/ N1 + S^2p/ N2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

power - effect sizes

A

The differences between means, and expected variability,
can be estimated from pilot work or the previous literature.
* Combining these, we can calculate an “effect size” e.g.
Cohen’s ෠
d.
* That is, ෠
d is how many standard deviations will there be
between the conditions.
* Cohen “rule of thumb” is that a ෠
d of 0.5 is a medium effect
size. Large = 0.8, small = 0.2.
* A research derived effect size is usually better than
assuming “a medium effect size”….

cohens d = Y-bar1 - Y-bar2 / √Sp

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

power - the components of calculating n

A

The formula we need, for
a between-subjects t-test, is:
* So, in addition to the effect size, ෠
d, we’d need one other
parameter, δ.
– This is a “noncentrality parameter” (essentially a way of saying
“how wrong the null hypothesis is”).
* δ varies as a function of α level and power in the table below.
– For a power of 0.8 and an α of 0.05 (i.e. the typical values we would
want to use) then δ = 2.80

𝑛 =
2𝛿^2/
d^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

power - the calculation. sample size?

A

𝑁 = 2𝛿^2/ d^2

Assume we are looking for a “medium” effect (෠
d = 0.5),
and that we want a power of 0.8 and an α of 0.05 (this is a
pretty typical sort of experiment). Then:

2 ∗ 2. 82
0. 52
= 62.72

  • That is, we would need 63 subjects per condition to get a
    reasonably high level (0.8) of power
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

power - how much power will my experiment have?

A

We have a formula relating effect size and the number of subjects:
n=
2𝛿2/
d2
* If we re-arrange that formula it becomes
𝛿 =
d√n/2
* We can then use this and the table of δ values to work out the
power of a particular experiment

assume we are looking for a ‘medium’ effect (d = 0.5) and that we have 25 subjects in each of two groups
𝛿 = d √n/2 = 0.5 √25/2 = 1.77

given 𝛿 = 1.77 power is about 0.44. that is a typical experiment under thesis conditions would be significance less that half the time in terms of detecting a real effect of this size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

power what you need to know

A
  • What power is conceptually.
  • How power relates to effect size.
  • That you could (hypothetically) calculate power and use it to:
  • determined the number of participants you need in an experiment
  • Clarify if a result isn’t significant because the effect doesn’t exist or because
    the experiment lacked the power to detect it.
  • Software can help you calculate power (SPSS, Gpower, r, etc.)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

power what you don’t need to know

A

The formula relating power to n, α and δ
* The table related power to n, α, and δ
* How to calculate n based on power….
* How to calculate power based on n….

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

statistical power

A

it is important to control the Type I error rate so that we don’t too often mistakenly think that an effect is genuine when it is not. The opposite problem relates to the Type II error, which is how often we will miss an effect in the population that genuinely exists. If we set the Type II error rate high then we will be likely to miss a lot of genuine effects, but if we set it low we will be less likely to miss effects. The ability of a test to find an effect is known as its statistical power
The power of a test is the probability that a given test will find an effect assuming that one exists in the population.
This is the opposite of the probability that a given test will not find an effect assuming that one exists in the population, which, as we have seen, is the β-level (i.e., Type II error rate). Therefore, the power of a test can be expressed as 1 − β. Given that Cohen (1988, 1992) recommends a 0.2 probability of failing to detect a genuine effect (see above), the corresponding level of power would be 1 − 0.2, or 0.8. Therefore, we typically aim to achieve a power of 0.8, or put another way, an 80% chance of detecting an effect if one genuinely exists. The power of a statistical test depends on

t will also depend on whether the test is a one- or two-tailed test (see Section 2.9.5), but, as we have seen, you’d normally do a two-tailed test
How big the effect is, because bigger effects will be easier to spot. This is known as the effect size and we’ll discuss it in Section 3.5).
How strict we are about deciding that an effect is significant. The stricter we are, the harder it will be to ‘find’ an effect. This strictness is reflected in the α-level. This brings us back to our point in the previous section about correcting for multiple tests. If we use a more conservative Type I error rate for each test (such as a Bonferroni correction) then the probability of rejecting an effect that does exist is increased (we’re more likely to make a Type II error). In other words, when we apply a Bonferroni correction, the tests will have less power to detect effects.
The sample size: We saw earlier in this chapter that larger samples are better approximations of the population; therefore, they have less sampling error. Remember that test statistics are basically a signal-to-noise ratio, so given that large samples have less ‘noise’, they make it easier to find the ‘signal’

17
Q

Given that power (1 − β), the α-level, sample size, and the size of the effect are all linked, if we know three of these things, then we can find out the remaining one. There are two things that scientists do with this knowledge

A

Calculate the power of a test: Given that we’ve conducted our experiment, we will have already selected a value of α, we can estimate the effect size based on our sample data, and we will know how many participants we used. Therefore, we can use these values to calculate 1 − β, the power of our test. If this value turns out to be 0.8 or more, then we can be confident that we have achieved sufficient power to detect any effects that might have existed, but if the resulting value is less, then we might want to replicate the experiment using more participants to increase the power.
Calculate the sample size necessary to achieve a given level of power: We can set the value of α and 1 − β to be whatever we want (normally, 0.05 and 0.8, respectively). We can also estimate the likely effect size in the population by using data from past research. Even if no one had previously done the exact experiment that we intend to do, we can still estimate the likely effect size based on similar experiments. Given this information, we can calculate how many participants we would need to detect that effect (based on the values of α and 1 − β that we’ve chosen)