Study Power and Sample Size Flashcards
Why is sample size important?
Before starting any study, it is essential to decide how many people you need in the study in oder to answer the question of interest. If the study is too small, you may fail to detect an important effect in the outcome of interest, or estimates of effect may be too imprecise (i.e. wide confidence interval). On the other hand, a study that is larger than necessary will be a waste of resources.
What program can we use to check the power of a study and what does this tell us?
We can use the ‘Power Calculator’ program to check the power of a study. This will tell us the power of a study and also tell us the number of individuals we would have to have in each group to have a certain percentage power for the study (e.g. 80% power).
What are the two approaches to sample size calculation?
There are two approaches to sample size calculation that depend on the objectives of the study. These are based on the two different ways in which we can judge results using statistical methods:
- We can judge results by evaluating the precision of our estimated effect using a confidence interval. If this is the objective, then we compute how many subjects are needed to estimate the effect of interest (e.g. prevalence or difference in prevalence) with a specified degree of precision.
- We can judge results by carrying out a hypothesis test to evaluate the strength of evidence (P-value) of a true effect. In this case, we compute the number of subjects needed to test the hypothesis of interest with certain power. The power of a study is the probability of getting a significant result when the true size of effect is as specified.
If we are going to judge the results of a study by evaluating the precision of our estimated effect using a confidence interval then how should we approach sample size calculation?
If this is the objective, then we compute how many subjects are needed to estimate the effect of interest (e.g. prevalence or difference in prevalence) with a specified degree of precision.
If we are going to judge the results of a study by carrying out a hypothesis test to evaluate the strength of evidence of a true effect then how should we approach sample size calculation?
If this is the objective, then we compute the number of subjects needed to test the hypothesis of interest with certain power. The power of a study is the probability of getting a significant result when the true size of effect is as specified.
Discuss the estimation of a single proportion.
We shall look at the situation where we want to estimate a single proportion. This assumes the sample is a simple random or otherwise representative sample and the outcome of interest is binary. E.g. we may design a random cross-sectional study to determine the prevalence of a certain condition or disease.
For a single proportion (sample proportion) the standard error is given by:
SEp = sqrt [ pi(1 - pi) / n ]
where pi is the population proportion and n is the sample size. Therefore, for a 95% confidence interval
w = 1.96 x sqrt [ pi(1 - pi) / n ]
clearly the larger the sample size n is, the smaller w and hence the narrower the confidence interval. Rearranging we get the formula for computing the sample size:
n = [3.84pi (1 - pi)] / w^2
From this formula, we can see that n is large when w is small, i.e. when we want a very precise estimate with a narrow confidence interval, and when disease prevalence is close to 50%.
To decide on the appropriate size of n:
- Choose a maximum width w for our confidence interval. There is no set precision you should select for a study and your choice may depend on the purpose of the study.
- Put a value of the population proportion into the equation. Of course we don’t know at this stage of the study, so we must estimate or guess the value. We may do this by looking at previous studies or using our common sense. If we really don’t know we can use 0.5 which is what we call the ‘safe value’ as it is the most conservative sample size estimate (i.e. the biggest).
For example:
Suppose we want to carry out a study to estimate the prevalence of lung cancer in people aged 65 and over in the UK. We think the prevalence will be somewhere between 5% and 8% and we want to get an estimate to somewhere within 2% (w) with 95% certainty. Using pi=0.08 (safest bet), and w=0.02 we get:
n= (3.84x0.08x0.92) / (0.02^2) = 706
In other words, we would need to study over 700 people to be confident (95% confident) that our sample prevalence is estimated to within 2% of the true prevalence.
How might you go about estimating an effect with a certain degree of precision for estimates other than sample proportion?
The same ideas of estimating an effect with a certain degree of precision can be applied to a single mean, an incidence rate, or the difference between 2 means, rates or proportions. The sample size formula for estimating a single mean:
n = 3.84 σ^2/ w^2
where σ is the standard deviation of the outcome variable in the whole population. Here n is big when w is small and when σ (i.e the variation in our outcome) is large. As before we must decide on a value for w and make a rough guess of the population parameter, σ.
What factors affect sample size calculations based on precision?
The factors that affect sample size calculations based on precision are:
- Desired precision w. If we were happy with less precision, say estimating the prevalence to within say 10%, then we would need fewer people, whereas if we require a very precise estimate, say to within 1%, we would have to study more people.
- Desired confidence. The above calculations are based on a 95% confidence interval. We would need more people in the study to obtain the same precision but with greater confidence. E.g. based on a 99% CI instead of using 3.84 we would use 2.583 = 6.66 in the formula.
- The outcome of interest. Where the outcome is a binary variable, the closer the prevalence is to 0.5, the more people are needed. Where it is a continuous variable, the bigger the population standard deviation, the greater the sample size.
Testing a hypothesis with certain power.
Often the study aim is to compare two groups with respect to some variable, and a hypothesis test is used to assess the evidence against the null hypothesis of no difference between the groups. When we carry out a hypothesis test, two types of error can occur which will give us the wrong result:
- Type I error - we reject the null hypothesis when it is actually true. We have met this error before - the probability of it occurring is simply the significance level of the test. We shall denote the probability by a.
- Type II error - we fail to reject the null hypothesis when it is false. The probability of this occurring is denoted by B.
Clearly we want both a and B to be small. If we want B to be small, then it follows that 1-B needs to be large. This probability 1-B is known as the power of the study and can be thought of as the probability of detecting an effect as significant if it really exists. The bigger the sample size, the greater the power. If a study is under-powered and it yields a non-significant difference between groups, we do not know whether this is because there really is no true difference, or because the study failed to detect a true difference that exists. It is therefore essential to choose an adequate sample size so that the study has a small probability of type I error (usually 5%) and sufficient power (usually 80-90%).
The basic steps to compute the sample size based on power are:
- Make a guess of the prevalence/mean in the baseline group.
- Decide what is the minimum effect / difference you want to be able to detect. Think about what is clinically important.
- Specify the significance level you want the study to have, e.g. 5%.
- Specify the power you want the study to have, e.g. 90%.
- Plug these values into the appropriate formula.
Describe the basic steps to compute the sample size based on power.
The basic steps to compute the sample size based on power are:
- Make a guess of the prevalence/mean in the baseline group.
- Decide what is the minimum effect / difference you want to be able to detect. Think about what is clinically important.
- Specify the significance level you want the study to have, e.g. 5%.
- Specify the power you want the study to have, e.g. 90%.
- Plug these values into the appropriate formula.
Comparing two proportions.
We shall illustrate this by looking at how to compute the sample size for a study which aims to determine whether the proportion of a particular outcome differs between two groups. This may be the proportion with disease in the exposed compared with unexposed group in a cross-sectional or cohort study, or the proportion of people exhibiting the outcome of interest in the treatment group compared with placebo group in a clinical trial.
let: pi1 be the true proportion in group 1
pi2 be the true proportion in group 2
pi1-pi2 is the difference between the two groups.
The formula for the sample size for comparing two proportions is:
n = [ pi1 (1-pi1) + pi2 (1-pi2) / D^2 ] x F
where n is the number in each group and where F depends on the power and significance level, and can be obtained from the significance level table.
We do not know pi1 and pi2, so we have to make a guess as to the prevalence in the baseline group (say pi1) and decide what difference we want to be able to detect (this is D). The value to put in for pi2 can then be deduced (pi1-D).
For example:
We want to determine whether the prevalence of ling cancer in the UK elderly population differs between men and women. We guess the prevalence in women will be about 4% and we would consider a 3% higher prevalence in men (i.e. 7%) to be a clinically important difference. We want to be 90% sure of getting a significant result if such difference really existed (i.e. 90% power), and we define ‘significant’ by the 5% significance level (i.e. 5% probability of a type 1 error is acceptable). Then F=10.5, and the number of subjects in each group will be:
n = [ 0.04 (1-0.04) + 0.07 (1-0.07) / 0.03^2 ] x 10.5 = 1208 in each group
Note: this formula is only suitable for studies with the same number of people in each group.
Applying the principles behind comparing two proportions to other measures of effect.
The same ideas can be applied to other study designs. When the measure of effect is the difference in two means, the following formula (based on 95% confidence or = 5%) is used:
n = [ 2σ^2 (1.96 + EB)^2 ] / D^2
where:
σ = SD of the outcome variable in the whole population EB = 1.282 for power of 90% or 0.842 for power of 80% D = difference in means between 2 groups
For study designs where your measure of effect is a risk ratio (cross-sectional or cohort study) or odds ratio (cross-sectional, cohort or case-control), sample size calculations can be carried out in packages such as EpiInfo.
List the factors affecting sample size calculations based on power.
The factors which affect the sample size calculations in this section are:
- The power: the greater the required power (i.e. smaller probability of failing to reject null hypothesis when it is actually false / type 2 error).
- The significance level: the smaller the significance level (i.e. smaller probability of type 1 error / rejecting the null hypothesis when it is actually true), the more people are needed.
- The outcome of interest. For binary outcomes, the closer the populations are to 0.5, the more people needed. For continuous outcomes, the bigger the variation in the outcome, the bigger n.
- The size of the difference between the groups, D. The smaller the difference you want to detect, the more people are needed.
General points about sample size calculations.
Sample size calculations are only rough estimates of how many people you need as they are based on some degree of guessing the size of parameters.
- Try working out sample size for a number of different scenarios to give an idea of the possible scope of the study. It will help you decide whether the study is both feasible and worth doing.
- Remember to increase the sample size obtained from the calculation to allow for non-response, drop outs etc.
- The examples given have all assumed the comparison of two equal sized groups. However in some situations we may have different sized groups e.g. 1 case to 2 controls. In a case-control study, inclusion of more controls will give you more power, but little is to be gained by going beyond 4 controls per case.
What might be the problem with having too many people in a sample?
Waste of resources.