Sample Size and Power Flashcards
Critical value
The estimated effect size that exactly corresponds to the significance level.
If we are testing whether the effect size is bigger than 0 and is significant at 95% level, then the critical value is…
the level of the estimate where exactly 5% of area under the curve lies to the right
Usually in program evaluation, instead of presumption
of innocence,” the rule is:
“presumption of zero”
The ‘burden of proof’ is showing that there was an impact
Null hypothesis
The intervention had no impact
When do you reject the null hypothesis?
If it is very unlikely (less than a 5% probability) that the
difference is solely due to chance. At this point, we could say “our program has a statistically significant impact”
Type 1 Errors
False positive.
5% of the time we will say that a program has impact when, in fact, it did not
Statistical power
The probability that, if the true effect is of a given size, our proposed experiment will be able to distinguish the estimated effect from zero (you find that its significant); in other words the probability of avoiding type II errors
Graphically, its the proportion of this curve that is to the right of the critical value for the null hypothesis
Type 2 Errors
False negative.
Traditionally, we aim for 80% power (some aim for 90%)
Low power means we may not find a significant effect even though an effect exists
What are the four possible results from hypothesis testing?
No error, when there is no effect = True negative
No error, when there is an effect = True positive
Error, when there is no effect = False positive (Type 1)
Error, when there is an effect = False negative (Type 2)
When curves overlap, does this suggest higher or lower power?
Less power when curves overlap
How does effect size relate to the size & shape of the distribution curves? How does effect size relate to power?
If we expect a small effect size, the curves will be closer together - so power will be lower
If we expect a large effect size, the curves will be farther apart - so power will be higher (larger SE)
Bigger effect size = more power
What if we have a large effect size, but a low rate of take-up? How will this affect power & the distribution curves?
A low take-up will dilute the average effect size. Even if the treatment group experiences a massive effect…if only a few people take it up, then its going to have a negative effect on power.
The average effect size will drop down, curves will be closer or overlapping & we will back to having low power
What happens to the distribution curves as you increase the sample size? Does this increase accuracy or precision?
By increasing sample size, you are increasing precision. The curves will become more narrow - as they are more likely to converge closer to the true average
How does variance correlate to power?
Lower variation will make estimates more tightly clustered, suggesting higher power.
As variance goes down, curves will become more narrow - but they will overlap less & the critical value will be closer to zero = higher power
As variance goes up, estimates will be more disbursed. Curves will get wider and overlap more - meaning the critical value will be much more to the right and power will be lower
If there’s not a perfect 50/50 split between treatment and control, what happens to the shape of the curves?
Uneven distribution; less power
More efficient to just increase the sample size & make the curves more narrow
Allocation ratio
the fraction of the total sample allocated to the treatment group is the allocation ratio
Usually, for a given sample size, power is maximized when half sample allocated to treatment, half to control
Diminishing marginal benefit to precision from adding sample, so best to add equally
What is the key difference in achieving high power in clustered level RCTs vs. individual level RCTs?
You need a bigger sample size to achieve the same power
What is the key difference in achieving high power in clustered level RCTs vs. individual level RCTs?
You need a bigger sample size to achieve the same power
Intra-cluster correlations (ICC, rho) - high vs. low ICC?
Intra-cluster correlations refer to how similar people are WITHIN clusters. If there are many different types of people within clusters, then the ICC is low. When there are very similar types of people within clusters, you’ll have a very high ICC.
Intra-cluster correlations: Definition & equation
The proportion of total variation explained by between-cluster level variance
ICC equation = between cluster var / (between + within cluster variance)
Total variance definition (two segments)
Can be divided into within cluster variance and between cluster variance
When within- cluster variance is high, then within-cluster correlation is low & between-cluster correlation is high
How does ICC impact power?
For a given N we have less power when we randomize by cluster (unless ICC is zero)
There are diminishing returns to surveying more people per cluster. Usually the number of clusters is the key determinant of power, not the number of people per cluster
If ICC is high, what is the more efficient way of increasing power?
Include more clusters & people within each cluster - both will increase power.
Power with clustering equation
The equation is the same for non-clustering power equation; except we have to add this ICC & the number of people within clusters/average cluster size. These two added concepts have a huge impact on power