Sample size estimation and sampling Flashcards
Sound knowledge
Sample size - general considerations (4)
- Precision of estimate (width of confidence interval) [typically +/- 0.05 or +/- 0.1] - ESTIMATING POPULATION PARAMETERS ONLY
- Expected variation in data (p*q or sigma squared)
- Level of confidence (descriptive study - how sure we want to be that the confidence interval includes the true population value; analytic study - how sure we want to be that any observed difference is not due to chance) [typically 95% i.e. Type 1 error rate of 5%]
- Power (how sure we want to be that we reject the null hypothesis when we should; more power = larger sample size) [typically 80% i.e. Type II error rate of 20%] - COMPARING PROPORTIONS/MEANS ONLY
Sample size - estimating a proportion
Zalpha2pq /
L2
Defined by user:
- p = a priori estimate of proportion, Note: maximum sample size at p=0.5
- q = 1-p
Common/fixed values
- L = precision [usually set to 0.05, or 0.1 if less precise], Note: more precise = larger required sample size
- Z = value of Z required for desired confidence level, i.e. 95% or 1 - alpha [1.96]
Can make adjustments for:
- non-infinate populations (if sample size is >10% of total population)
- imperfect tests (true prevalence vs apparent prevalence)
Sample size - estimating a mean
Za2σ2 /
L2
Defined by user:
- a priori estimate of population variance (sigma squared) [typically estimated as upper limit of 95% CI minus lower limit/4 (= sigma, sd) and raised to the power of 2 to get sigma squared] Note: larger variance = larger required sample size
Common/fixed values:
- Precision [usually set to 0.05, or 0.1 if less precise], Note: more precise = larger required sample size
- Value of Z required for desired confidence level, i.e. 95% or 1 - alpha [1.96]
Sample size - comparing 2 independent proportions
Define by user:
- p1 = a priori estimate of proportion in group 1
- p2 = a priori estimate of proportion in group 2 Note: the smaller the difference between p1 and p2 the larger the required sample size (for given power and confidence level)
- p=(p1+p2)/2
- q = 1-p
Common/fixed values:
- Value of Z required for desired confidence level, i.e. 95% or 1 - alpha [1.96]
- Value of Z required for desired power, i.e. 1 - beta [-0.84]
Sample size - comparing 2 means
Defined by user:
- a priori estimate of population mean in group 1 (mu1)
- a priori estimate of population mean in group 2 (mu2) Note: the smaller the difference between mu1 and mu2 the larger the required sample size (for given power and confidence level)
- a priori estimate of population variance (sigma squared) Note: larger variance = larger required sample size
Common/fixed values
- Value of Z required for desired confidence level, i.e. 95% or 1 - alpha [1.96]
- Value of Z required for desired power, i.e. 1 - beta [-0.84]
Sample size - detecting disease
(1-alpha1/D)[N-1/2(D-1)]
Defined by user
- Population size, N. Note: larger population size = larger required sample size
- D, number of diseased animals (population size * minimum expected prevalence), Note: smaller minimum expected prevalence = larger required sample size
Common/fixed values
- Alpha, 1 - confidence level (usually set to 0.05)
Can make adjustments for:
- imperfect tests (e.g. if sensitivity 80%, use value corresponding to 0.8 of expected value)
Probability (random) sampling - advantages, types
Every unit in the source population has a chance (non-zero) of being selected in the sample, and this probability can be accurately determined. Suitable for descriptive studies (surveys).
Advantages:
- Provides best chance for selecting representative sample (avoids selection bias).
- Possible to calculate how reliable survey results are (i.e. formulas for inferring prevalence and calculating 95% CI assume random sampling).
Types
- Simple random sampling (SRS)
- Systematic sampling
- Stratified sampling
- Cluster sampling
- Multistage
- Targeted
Simple random sampling - definition, methods (2), disadvantage (1)
(Type of probability sampling)
Every subject in source population has equal probability of being selected.
Methods:
- Physical randomization e.g. drawing numbers from hat (all ear tag numbers in hat, draw number needed for sample)
- Random numbers e.g. make list of all animals/herds and number consecutively, use random number table or computer-generated random numbers to generate random numbers, fund herds/villages corresponding to random number
Disadvantages: impractical if animals are not already identified
Systematic random sampling - definition, method, advantages/disadvantages
(Type of probability sampling)
Used when animals are not individually identified but can be ordered in some way. First animal selected at random (ith animal) and then selection of every jth individual subsequently. Sampling interval calculated as the source population divided by the needed sample size. Appropriate when an estimate of the total number of animals in the source population is available and the animals (or their records are accessible). e.g. running animals through a shute and selecting every 10th animal.
Advantages: Practical esp if animals are not already identified
Disadvantages: Bias might arise if factor being studied is related to sampling interval
Stratified random sampling - method and advantages
Method:
- Source population is divided into mutually exclusive strata (e.g. climate zone, enterprise type, mob)
- Simple or systemic random sample is drawn from each strata
Simplest form is proportional sampling where the number selected for each strata is proportional to the number of individuals in each strata in the source population.
Advantages:
- Ensures all strata are represented, enabling stratum-specific estimation (note overall precision of estimates is maintained when using stratification, however precision in each strata is much lower. If need to achieve a certain precision in each strata, calculate sample size separately for each strata, then combine strata results to give overall estimate)
- Operationally convenient - can do survey in stages
- Can produce more precise results - in each strata the variability is much less so when the results are combined the overall variability is less and precision is greater
Cluster sampling - method, advantage
(Type of probability sample)
Used when it is difficult to get a sampling frame for individual animals.
Method:
- Source population occurs in natural groupings (clusters) e.g. herd/village
- Simple or systemic random sample is used to determine which cluster(s) is/are to be included in sample
- ALL animals within the selected cluster(s) is/are included in the sample
Advantages: practical
Note: It is not a cluster random sample if the unit of analysis is at the group level e.g. study to assess whether herds are infected with particular disease agent
Multistage sampling - method, advantages, disadvantages
Method:
- Primary sampling unit (PSU) selected first (e.g. herds) [same as for cluster sample]
- Secondary sampling unit selected (e.g. animals in that herd).
Two main ways to ensure each individual has same probability of being selected:
- Probability proportional to size (PPS) - villages/herds selected with a probability proportional to herd size, then fixed number of animals selected from each herd using SRS (possible only if complete sampling frame is available on all herds/villages AND reliable livestock population data exists)[advantage: simplifies field work since number of animals to be sampled is known prior to visiting village/herd]
- Simple random sampling (SRS) – every village/herd has same probability of being selected, fixed proportion of animals selected from each (suitable when a complete sampling frame is available on all herds/villages BUT no reliable livestock population data exists) [disadvantage: field work more difficult since number of animals to be sampled is unknown prior to visiting village/herd]
Also possible to select fixed number of flocks, fixed number of animals or proportional flocks, proportional number of animals and then adjust in analysis
Advantages: simplifies field work (fewer villages to visit vs simple random sample)
Disadvantages: more complex survey design and analysis
Targeted (risk-based) sampling - method, advantages, disadvantages
Method:
- Source population divided into categories based on probability of disease occurrence - assigned point values
- Sample drawn from high risk strata only (or weighting the sample heavily in favour of the high risk strata). i.e. some animals have a zero probability of being included.
Advantages: requires smaller sample sizes (good if outcome of interst is rare)
Disadvantages: Only possible to make population inferences if we have:
- Estimate how the characteristic used to divide population into risk-based strata relates to probability of disease (risk ratio)
- Estimate of frequency of characteristic in population e.g. if we know the likelihood that an animal with neurologic signs has BSE, and the prevalence of neurologic signs in the population, we can estimate the level of BSE in the population by sampling only animals with neurologic signs.
E.g. BSE: highest point value for clinically suspect [above 30 mo] > downers/emergency slaughter [30 mo] > fallen stock (DOA) [30 mo] > routine slaughter [above 36 mo.]
Non-probability sampling - definition, types (3)
Samples drawn without explicit method for determining an individuals’ probability of selection (i.e. sample drawn without a formal process for random selection). Not suitable for descriptive studies (surveys) since methods are unable to reliably select a representative sample and may therefore be biased. Often used for analytical studies.
Types:
- Convenience sample
- Purposive sample
- Judgement sample
Judgement sample
(Type of non-probability sample)
Investigator selects sample because in his/her opinion the sample is “representative” of the source population.
Disadvantages: criteria are implicit, not explicit