Power Analysis Flashcards
Learning Objectives:
Overview:
- Hypothesis Testing: Type 1 and 2 Errors
- Significance Criterion, Effect Sizes, Statistical Power
- Power Analysis (Prospective and Retrospective)
- Using G*Power3
- Reporting a Power Analysis
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Learning Objectives:
1. Understand and be able to explain… type 1 and type 2 errors.
2. Understand and be able to explain… the significance criterion, effect sizes and statistical power.
3. Understand and be able to explain… the factors that determine statistical power.
4. Be able to… conduct, report and interpret prospective and retrospective power analyses for the statistical tests covered in PSY104 and PSY279.
Quick note
Power analysis moves away from reporting significance of results to reporting the effect size… more useful.
T-test/ T-statistic
trying to find evidence of a significant difference between population means or mean + hypothesized mean
T-value = measures size of difference relative to variation in sample
Hypothesis testing
When we run an experiment we set up a hypothesis, saying “there will be a difference between these two groups” collect data and calculate means to compare
Ho - null hypothesis, no difference between groups
p value - what is the chance that we will obtain the data we did, given there is no difference (null hypothesis is true), whats the chance we got these results by chance
If the chance the results are by chance is larger than .05, the results are non-significant
When the null hypothesis is true, most of the time no difference is found, but sometimes an error is made (think there is an effect when there isnt)
Type I and II errors
Type I error (α) - falsely rejecting null hypothesis, think there is an effect when there isn’t one
Type II error (β) - falsely accepting the null hypothesis, think there isn’t an effect when there is one
Significance criterion
α, the risk of making a type I error, is usually set at .05 (significance level), or 5%.
The limitations of statistical significance testing…
- Statistical probability cannot be used as a measure of the magnitude of the result as this may reflect either the effect size or the sample size…
…two studies may conducted in the same way may produce very different results, in terms of statistical significance, simply because they have different sample sizes.
Significance simply tells you the chance you got the results by fluke, not how big the effect was of one variable on another, doesn’t reflect the magnitude of the result
e.g.
Small effect size, small sample - non significant result
small effect size, large sample - significant result
The larger the sample, the less likely it is that the results obtains were gotten by chance, despite the small effect size.
Too few participants can result in a type II error
Effect size
A measure of the magnitude of the results, independent of sample size.
As well as looking at statistical significance we must also look at the effect size when analyzing findings.
Example: Comparing the difference between two means
An independent t test - t value - gives significance
Effect size/ d = mean of pop1 - mean of pop2
_______________________
pool standard deviation
i.e. How many standard deviations are there between the populations?
Standard deviation is the expression of how much data varies about the mean (the square root of variance). Variance is the average of the squared differences from the mean.
Interpreting effect sizes
Cohen (1992) - Effect sizes for different statistical tests
d = cohen’s d (used for t-test and anova data)
r = r2 (effect size for pearsons correlation)
f2 = cohen’s f2 (regression)
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ Effect Size Small Medium Large \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ d .20 .50 .80 r .10 .30 .50 f2 = R2/1-R2 .02 .15 .35 \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
Statistical power
The power of a test is defined as the probability of avoiding making a Type II error (i.e., 1 - β). Recommended level of power is .80.
If power is set at .80 the probability of making a Type II error is .20, or 20%.
If the significance criterion is set at .05, the probability of making a Type I error is .05.
Therefore, 1:4 ratio of Type I : Type II. It’s better to say there’s no effect when there is than to say there is an effect when there isn’t. Better to err on the side of caution.
Statistical power depends on:
- effect size
- sample size
- precision of measurements
Statistical power depends on 1. effect size
How effect size affects power in the t-test…
Larger effect sizes are much easier to see, less likely to say there is no effect when there is, if the effect size is large
Statistical power depends on 2. sample size
How sample size affects power in the t-test…
Error bars are calculated as SD/ square root of the sample
Error bars represent the variability of the mean, the standard error of the mean.
The larger the sample size, the more likely to get significant results, even if these results are meaningless –> less likely to make type II error - more accurate estimation of the means.
Larger samples give smaller error bars, therefore the larger the sample, the greater the power (due to smaller ‘standard errors’) (i.e. effects are easier to detected because there’s less overlap of error bars)
Statistical power depends on 3. precision of measurements
How precision of measurements affects power in the t-test…
More reliable measures give more precise estimates of the latent variable (underlying psychological construct), so there’s less error variance/ noise. This will also result in smaller standard errors.
Less likely to make a type II error.
Prospective power analysis
Prospective power analysis helps to determine the sample size we want to recruit
- we don’t want to recruit too few and get a non-sig result
- don’t want to recruit too many and get sig. results regardless of effect size
So…
- set sig criterion (.05)
- set statistical power (.80)
- estimate effect size
- calculate required sample size needed to find sig. results
Estimated effect size is determined by several factors (we don’t know it in advance so we estimate)…
- on the basis of previous research
- conducted pilot study
- decide on the small/medium/large effect size
- conduct a power analysis…
Cohens graph of estimated necessary sample sizes for different tests for small, medium and large effect sizes
Sample size calculations (with alpha =.05 and power = .80)
_________________________________________________
Effect Size Small Medium Large
_________________________________________________
d a 393 64 26
r 783 85 28
f2 = R2/1-R2 3 IVs 547 76 34
4 IVs 599 84 38 _________________________________________________
Note. a Sample size per group. All other Ns are for total sample sizes.
Retrospective power analysis
(a) determining min effect size could’ve detected
(b) calculating statistical power of test
(c) estimating the required sample size for the effect size
(a) Calculating the minumum effect size that could have been detected given the sig. criterion, statistical power and sample size
- set significance criterion (.05)
- set statistical power (.80)
- sample size is known
- estimate minimum effect size that could be detected
Compare to effect size actually found, what’s the discrepancy? If you got a non-significant result, this can help to explain why
(b) Calculating the statistical power… the likelihood of avoiding making a type II error
If the results were non-sig we will find that the test was under power, more likely to make a type II error
So with the effect size observed, sig criterion of .05, and the sample size used, what was the likelihood a type II error was made?
- calculate effect size
- significance criterion is known
- sample size is known
- calculate statistical power of test
(c) Calculating the required sample size for the effect size… how many ppts would have been needed to get a sig. result?
With the effect size observed, sig criterion at .05, stat power at .8, how many ppts needed to get sig result?
- calculate effect size
- set significance criterion (.05)
- set statistical power (.80)
- estimate required sample size for the effect size