Sample size estimation and sampling Flashcards

Sound knowledge

1
Q

Sample size - general considerations (4)

A
  1. Precision of estimate (width of confidence interval) [typically +/- 0.05 or +/- 0.1] - ESTIMATING POPULATION PARAMETERS ONLY
  2. Expected variation in data (p*q or sigma squared)
  3. Level of confidence (descriptive study - how sure we want to be that the confidence interval includes the true population value; analytic study - how sure we want to be that any observed difference is not due to chance) [typically 95% i.e. Type 1 error rate of 5%]
  4. Power (how sure we want to be that we reject the null hypothesis when we should; more power = larger sample size) [typically 80% i.e. Type II error rate of 20%] - COMPARING PROPORTIONS/MEANS ONLY
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Sample size - estimating a proportion

A

Zalpha2pq /

L2

Defined by user:

  • p = a priori estimate of proportion, Note: maximum sample size at p=0.5
  • q = 1-p

Common/fixed values

  • L = precision [usually set to 0.05, or 0.1 if less precise], Note: more precise = larger required sample size
  • Z = value of Z required for desired confidence level, i.e. 95% or 1 - alpha [1.96]

Can make adjustments for:

  • non-infinate populations (if sample size is >10% of total population)
  • imperfect tests (true prevalence vs apparent prevalence)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sample size - estimating a mean

A

Za2σ2 /

L2

Defined by user:

  • a priori estimate of population variance (sigma squared) [typically estimated as upper limit of 95% CI minus lower limit/4 (= sigma, sd) and raised to the power of 2 to get sigma squared] Note: larger variance = larger required sample size

Common/fixed values:

  • Precision [usually set to 0.05, or 0.1 if less precise], Note: more precise = larger required sample size
  • Value of Z required for desired confidence level, i.e. 95% or 1 - alpha [1.96]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sample size - comparing 2 independent proportions

A

Define by user:

  • p1 = a priori estimate of proportion in group 1
  • p2 = a priori estimate of proportion in group 2 Note: the smaller the difference between p1 and p2 the larger the required sample size (for given power and confidence level)
  • p=(p1+p2)/2
  • q = 1-p

Common/fixed values:

  • Value of Z required for desired confidence level, i.e. 95% or 1 - alpha [1.96]
  • Value of Z required for desired power, i.e. 1 - beta [-0.84]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sample size - comparing 2 means

A

Defined by user:

  • a priori estimate of population mean in group 1 (mu1)
  • a priori estimate of population mean in group 2 (mu2) Note: the smaller the difference between mu1 and mu2 the larger the required sample size (for given power and confidence level)
  • a priori estimate of population variance (sigma squared) Note: larger variance = larger required sample size

Common/fixed values

  • Value of Z required for desired confidence level, i.e. 95% or 1 - alpha [1.96]
  • Value of Z required for desired power, i.e. 1 - beta [-0.84]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sample size - detecting disease

A

(1-alpha1/D)[N-1/2(D-1)]

Defined by user

  • Population size, N. Note: larger population size = larger required sample size
  • D, number of diseased animals (population size * minimum expected prevalence), Note: smaller minimum expected prevalence = larger required sample size

Common/fixed values

  • Alpha, 1 - confidence level (usually set to 0.05)

Can make adjustments for:

  • imperfect tests (e.g. if sensitivity 80%, use value corresponding to 0.8 of expected value)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Probability (random) sampling - advantages, types

A

Every unit in the source population has a chance (non-zero) of being selected in the sample, and this probability can be accurately determined. Suitable for descriptive studies (surveys).

Advantages:

  1. Provides best chance for selecting representative sample (avoids selection bias).
  2. Possible to calculate how reliable survey results are (i.e. formulas for inferring prevalence and calculating 95% CI assume random sampling).

Types

  1. Simple random sampling (SRS)
  2. Systematic sampling
  3. Stratified sampling
  4. Cluster sampling
  5. Multistage
  6. Targeted
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Simple random sampling - definition, methods (2), disadvantage (1)

A

(Type of probability sampling)

Every subject in source population has equal probability of being selected.

Methods:

  1. Physical randomization e.g. drawing numbers from hat (all ear tag numbers in hat, draw number needed for sample)
  2. Random numbers e.g. make list of all animals/herds and number consecutively, use random number table or computer-generated random numbers to generate random numbers, fund herds/villages corresponding to random number

Disadvantages: impractical if animals are not already identified

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Systematic random sampling - definition, method, advantages/disadvantages

A

(Type of probability sampling)

Used when animals are not individually identified but can be ordered in some way. First animal selected at random (ith animal) and then selection of every jth individual subsequently. Sampling interval calculated as the source population divided by the needed sample size. Appropriate when an estimate of the total number of animals in the source population is available and the animals (or their records are accessible). e.g. running animals through a shute and selecting every 10th animal.

Advantages: Practical esp if animals are not already identified

Disadvantages: Bias might arise if factor being studied is related to sampling interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Stratified random sampling - method and advantages

A

Method:

  1. Source population is divided into mutually exclusive strata (e.g. climate zone, enterprise type, mob)
  2. Simple or systemic random sample is drawn from each strata

Simplest form is proportional sampling where the number selected for each strata is proportional to the number of individuals in each strata in the source population.

Advantages:

  1. Ensures all strata are represented, enabling stratum-specific estimation (note overall precision of estimates is maintained when using stratification, however precision in each strata is much lower. If need to achieve a certain precision in each strata, calculate sample size separately for each strata, then combine strata results to give overall estimate)
  2. Operationally convenient - can do survey in stages
  3. Can produce more precise results - in each strata the variability is much less so when the results are combined the overall variability is less and precision is greater
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Cluster sampling - method, advantage

A

(Type of probability sample)

Used when it is difficult to get a sampling frame for individual animals.

Method:

  1. Source population occurs in natural groupings (clusters) e.g. herd/village
  2. Simple or systemic random sample is used to determine which cluster(s) is/are to be included in sample
  3. ALL animals within the selected cluster(s) is/are included in the sample

Advantages: practical

Note: It is not a cluster random sample if the unit of analysis is at the group level e.g. study to assess whether herds are infected with particular disease agent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Multistage sampling - method, advantages, disadvantages

A

Method:

  1. Primary sampling unit (PSU) selected first (e.g. herds) [same as for cluster sample]
  2. Secondary sampling unit selected (e.g. animals in that herd).

Two main ways to ensure each individual has same probability of being selected:

  1. Probability proportional to size (PPS) - villages/herds selected with a probability proportional to herd size, then fixed number of animals selected from each herd using SRS (possible only if complete sampling frame is available on all herds/villages AND reliable livestock population data exists)[advantage: simplifies field work since number of animals to be sampled is known prior to visiting village/herd]
  2. Simple random sampling (SRS) – every village/herd has same probability of being selected, fixed proportion of animals selected from each (suitable when a complete sampling frame is available on all herds/villages BUT no reliable livestock population data exists) [disadvantage: field work more difficult since number of animals to be sampled is unknown prior to visiting village/herd]

Also possible to select fixed number of flocks, fixed number of animals or proportional flocks, proportional number of animals and then adjust in analysis

Advantages: simplifies field work (fewer villages to visit vs simple random sample)

Disadvantages: more complex survey design and analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Targeted (risk-based) sampling - method, advantages, disadvantages

A

Method:

  1. Source population divided into categories based on probability of disease occurrence - assigned point values
  2. Sample drawn from high risk strata only (or weighting the sample heavily in favour of the high risk strata). i.e. some animals have a zero probability of being included.

Advantages: requires smaller sample sizes (good if outcome of interst is rare)

Disadvantages: Only possible to make population inferences if we have:

  1. Estimate how the characteristic used to divide population into risk-based strata relates to probability of disease (risk ratio)
  2. Estimate of frequency of characteristic in population e.g. if we know the likelihood that an animal with neurologic signs has BSE, and the prevalence of neurologic signs in the population, we can estimate the level of BSE in the population by sampling only animals with neurologic signs.

E.g. BSE: highest point value for clinically suspect [above 30 mo] > downers/emergency slaughter [30 mo] > fallen stock (DOA) [30 mo] > routine slaughter [above 36 mo.]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Non-probability sampling - definition, types (3)

A

Samples drawn without explicit method for determining an individuals’ probability of selection (i.e. sample drawn without a formal process for random selection). Not suitable for descriptive studies (surveys) since methods are unable to reliably select a representative sample and may therefore be biased. Often used for analytical studies.

Types:

  1. Convenience sample
  2. Purposive sample
  3. Judgement sample
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Judgement sample

A

(Type of non-probability sample)

Investigator selects sample because in his/her opinion the sample is “representative” of the source population.

Disadvantages: criteria are implicit, not explicit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Convenience sample

A

(Type of non-probability sample)

Chosen because it is easy, quick or inexpensive to obtain e.g. proximity to research facility, first 10 cows to arrive at milking shed. Suitable when the need for the study sample to be representative of the source population can be relaxed.

Advantage: convenient

Disadvantage: potential for bias

31
Q

Purposive sampling

A

(Type of non-probability sample)

Sample is based on the study subjects possessing one ore more attributes (e.g. known exposure, disease status - as in cohort/case-control studies). Becomes a probability sample if a random sample who meet this criterion are selected from the source population.