Power Analysis Flashcards

1
Q

Learning Objectives:

A

Overview:
- Hypothesis Testing: Type 1 and 2 Errors
- Significance Criterion, Effect Sizes, Statistical Power
- Power Analysis (Prospective and Retrospective)
- Using G*Power3
- Reporting a Power Analysis
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Learning Objectives:
1. Understand and be able to explain… type 1 and type 2 errors.
2. Understand and be able to explain… the significance criterion, effect sizes and statistical power.
3. Understand and be able to explain… the factors that determine statistical power.
4. Be able to… conduct, report and interpret prospective and retrospective power analyses for the statistical tests covered in PSY104 and PSY279.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Quick note

A

Power analysis moves away from reporting significance of results to reporting the effect size… more useful.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

T-test/ T-statistic

A

trying to find evidence of a significant difference between population means or mean + hypothesized mean

T-value = measures size of difference relative to variation in sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Hypothesis testing

A

When we run an experiment we set up a hypothesis, saying “there will be a difference between these two groups” collect data and calculate means to compare

Ho - null hypothesis, no difference between groups

p value - what is the chance that we will obtain the data we did, given there is no difference (null hypothesis is true), whats the chance we got these results by chance

If the chance the results are by chance is larger than .05, the results are non-significant

When the null hypothesis is true, most of the time no difference is found, but sometimes an error is made (think there is an effect when there isnt)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Type I and II errors

A

Type I error (α) - falsely rejecting null hypothesis, think there is an effect when there isn’t one

Type II error (β) - falsely accepting the null hypothesis, think there isn’t an effect when there is one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Significance criterion

A

α, the risk of making a type I error, is usually set at .05 (significance level), or 5%.

The limitations of statistical significance testing…
- Statistical probability cannot be used as a measure of the magnitude of the result as this may reflect either the effect size or the sample size…

…two studies may conducted in the same way may produce very different results, in terms of statistical significance, simply because they have different sample sizes.

Significance simply tells you the chance you got the results by fluke, not how big the effect was of one variable on another, doesn’t reflect the magnitude of the result

e.g.
Small effect size, small sample - non significant result

small effect size, large sample - significant result

The larger the sample, the less likely it is that the results obtains were gotten by chance, despite the small effect size.

Too few participants can result in a type II error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Effect size

A

A measure of the magnitude of the results, independent of sample size.

As well as looking at statistical significance we must also look at the effect size when analyzing findings.

Example: Comparing the difference between two means

An independent t test - t value - gives significance

Effect size/ d = mean of pop1 - mean of pop2
_______________________
pool standard deviation

i.e. How many standard deviations are there between the populations?

Standard deviation is the expression of how much data varies about the mean (the square root of variance). Variance is the average of the squared differences from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Interpreting effect sizes

A

Cohen (1992) - Effect sizes for different statistical tests

d = cohen’s d (used for t-test and anova data)

r = r2 (effect size for pearsons correlation)

f2 = cohen’s f2 (regression)

\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
Effect Size	    Small	   Medium	   Large
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
d		                .20		   .50		   .80
r		                .10		   .30		   .50	
f2 = R2/1-R2           .02		   .15		   .35
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Statistical power

A

The power of a test is defined as the probability of avoiding making a Type II error (i.e., 1 - β). Recommended level of power is .80.

If power is set at .80 the probability of making a Type II error is .20, or 20%.

If the significance criterion is set at .05, the probability of making a Type I error is .05.

Therefore, 1:4 ratio of Type I : Type II. It’s better to say there’s no effect when there is than to say there is an effect when there isn’t. Better to err on the side of caution.

Statistical power depends on:

  1. effect size
  2. sample size
  3. precision of measurements
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Statistical power depends on 1. effect size

How effect size affects power in the t-test…

A

Larger effect sizes are much easier to see, less likely to say there is no effect when there is, if the effect size is large

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Statistical power depends on 2. sample size

How sample size affects power in the t-test…

A

Error bars are calculated as SD/ square root of the sample

Error bars represent the variability of the mean, the standard error of the mean.

The larger the sample size, the more likely to get significant results, even if these results are meaningless –> less likely to make type II error - more accurate estimation of the means.

Larger samples give smaller error bars, therefore the larger the sample, the greater the power (due to smaller ‘standard errors’) (i.e. effects are easier to detected because there’s less overlap of error bars)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Statistical power depends on 3. precision of measurements

How precision of measurements affects power in the t-test…

A

More reliable measures give more precise estimates of the latent variable (underlying psychological construct), so there’s less error variance/ noise. This will also result in smaller standard errors.

Less likely to make a type II error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Prospective power analysis

A

Prospective power analysis helps to determine the sample size we want to recruit

  • we don’t want to recruit too few and get a non-sig result
  • don’t want to recruit too many and get sig. results regardless of effect size

So…

  • set sig criterion (.05)
  • set statistical power (.80)
  • estimate effect size
  • calculate required sample size needed to find sig. results

Estimated effect size is determined by several factors (we don’t know it in advance so we estimate)…

  • on the basis of previous research
  • conducted pilot study
  • decide on the small/medium/large effect size
  • conduct a power analysis…
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Cohens graph of estimated necessary sample sizes for different tests for small, medium and large effect sizes

A

Sample size calculations (with alpha =.05 and power = .80)

_________________________________________________
Effect Size Small Medium Large
_________________________________________________
d a 393 64 26
r 783 85 28
f2 = R2/1-R2 3 IVs 547 76 34
4 IVs 599 84 38 _________________________________________________
Note. a Sample size per group. All other Ns are for total sample sizes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Retrospective power analysis

(a) determining min effect size could’ve detected
(b) calculating statistical power of test
(c) estimating the required sample size for the effect size

A

(a) Calculating the minumum effect size that could have been detected given the sig. criterion, statistical power and sample size

  • set significance criterion (.05)
  • set statistical power (.80)
  • sample size is known
  • estimate minimum effect size that could be detected

Compare to effect size actually found, what’s the discrepancy? If you got a non-significant result, this can help to explain why

(b) Calculating the statistical power… the likelihood of avoiding making a type II error

If the results were non-sig we will find that the test was under power, more likely to make a type II error

So with the effect size observed, sig criterion of .05, and the sample size used, what was the likelihood a type II error was made?

  • calculate effect size
  • significance criterion is known
  • sample size is known
  • calculate statistical power of test

(c) Calculating the required sample size for the effect size… how many ppts would have been needed to get a sig. result?

With the effect size observed, sig criterion at .05, stat power at .8, how many ppts needed to get sig result?

  • calculate effect size
  • set significance criterion (.05)
  • set statistical power (.80)
  • estimate required sample size for the effect size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Using G*Power3 to conduct a power analysis

A

Computer program

Select test from top menu bar:
Tests -> test class -> design

Choose one of five types of power analysis:

  • a priori (computes required sample size)
  • post hoc (computes achieved power)
  • compromise
  • sensitivity (computes min effect size poss.)
  • criterion

Provide input parameters

Click ‘calculate’ to obtain output of, e.g., total N

17
Q

Order to do tests in…

A
  1. A priori (prospective) - sample size we want (estimate)
  2. Sensitivity (retro) - min. possible effect size with sample recruited
  3. Post hoc (retro) - achieved statistical power
  4. A priori (kind of retro) - the sample size would’ve needed (for sig. result) for effect size actually gotten
18
Q

Reporting power analysis

A

Must state all the inputs determining the result:

  • Effect size, e.g., d or r or other
  • Significance level (alpha)
  • Required power
  • Sample size
  1. Priori for sample size we want

An a priori power analysis indicated that it would be necessary to recruit __ participants per group in order to detect an effect size of d = __, with alpha set at .05, at 80% power.

  1. Sensitivity for min. possible effect size with sample recruited

A power analysis indicated that that with a sample of __ in the experimental group and __ in the control group, it would be to possible detect a minimum effect size of d = __, with alpha set at .05, at 80% power.

  1. Post-hoc for stat. power/ effect size of study

The effect size for the experiment was calculated as d = __. A power analysis indicated that with a sample of __ in the experimental group and __ in the control group, and with alpha set at .05, the analysis only achieved __% power.

  1. Priori for sample size would’ve needed for sig. results for achieved effect size

The effect size for the experiment was calculated as d = __. A power analysis indicated that it would be necessary to recruit at least __ participants per group to detect this effect size, with alpha set at .05, at 80% power.