Sampling Flashcards

1
Q

Epidemiological approaches require us to gather information about ___________.

A

populations

(health of a group; can’t test everyone —> sample)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the main 3 approaches to epidemiological studies?

A
  1. establish level of occurrence of a disease - prevalence
  2. test hypotheses - changes in levels of occurrence in different populations, identification of risk factors
  3. detection of disease - declare freedom from disease for trade
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between a census and a sample?

A

CENSUS = every individual in the population is sampled giving an exact measurement of what is going on without the need for statistics ($$$)

SAMPLE = group from the population is used to make inferences about the population based on information from the sample (quick, $, practical)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a target population? Source population? Sample?

A

TARGET POP = population you want to make inferences about with respect to your objectives (feral cats in NA)

SOURCE POP = subset of the target population from which you will draw your sample (feral cats on LI)

SAMPLE = individuals actually measured or enrolled in the study (feral cats spayed or neutered by LIU’s team in 2021)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between internal validity and external validity?

A

INTERNAL = whether or not the study results obtained from the study sample are valid for the source population

EXTERNAL = how well the study results can be generalized to the target population; subjective assessment of whether or not the source population and sample are broadly representative of the target population (are the results from the US relevant to a reader in Australia?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a sampling frame? Sampling unit?

A

SAMPLING FRAME = complete list of all sampling units required for most probabilistic sampling methods

SAMPLING UNIT = unit of interest (animals, litters, pens, farms)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 2 approaches to sampling?

A
  1. probability sampling
  2. non-probability sampling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is probability sampling?

A

formal process of random selection used to ensure representativeness of the sample where every element in the population has a known probability of being included in the sample

  • statistical inference can be made about the population from which you drew your sample
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the 5 types of probability sampling?

A
  1. simple random
  2. systemic random
  3. stratified
  4. cluster
  5. multistage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the main characteristic of probability sampling?

A

every element in the population has a known non-zero probability of being included in the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is simple random sampling (SRS)? What does it require?

A

basis of most probability sampling methods where every sample has an equal probability of being included in the sample

list of all subjects (sampling frame)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the 3 steps to the procedure of simple random sampling (SRS)?

A
  1. generate a list of animals
  2. assign a unique number (ID) to each animal
  3. generate a list of random numbers assigned to the IDs using Excel or random number generators
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the main 2 limitations of simple random sampling (SRS)?

A
  1. difficult to obtain a complete sampling frame with large groups of animals and actually collect them
  2. costs may be higher than other methods
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does systemic random sampling compare to simple random sampling?

A
  • does not require a sampling frame, but need all animals present sequentially (processing, appointment book, file)
  • do not need to know the number of sampling units, but need a rough guess
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the 3 steps to the procedure of systemic random sampling?

A
  1. select the sampling interval, j
  2. randomly select a number within the sampling interval (7)
  3. sample every jth animal after that (7, 27, 47, …)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the main limitation to systemic random sampling?

A

could be subject to bias in the interval

  • if there are 5 pens and the producer takes one cow at a time from each pen in a sequence for them to run through a chute, you shouldn’t sample every 5th because only cows from one pen will be tested
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is stratified random sampling?

A

a sampling frame is divided into groups, or strata, of defined common characteristics (breed, sex, age) and randomized within each stratum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What ensures each strata in stratified random sampling are present and properly represented?

A

having a proportional total within the stratum to the total number of the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

In what situation should stratified random sampling occur? Why would it be better than simple random sampling?

A

if it is suspected a certain breed (Siamese) have a higher prevalence of disease over other breeds (DLH/DSH), so a fair representation of breeds in the study is needed to prevent bias in an estimate of prevalence

  • population of 1000 cats, ~70 Siamese
  • sample size of 100 cats, ~7 Siamese in study
  • SRS = 10% chance of having 4 or less Siamese and 10% chance of having 10 or more Siamese
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are 2 limitations to stratified random sampling?

A
  1. need to know about stratification variable in advance (risk factor)
  2. less efficient than SRS if the strata do not explain variance within the population (strata are not risk factors)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the 2 most common sampling methods used in veterinary medicine?

A
  1. cluster
  2. multistage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is cluster sampling?

A

sampling frame is divided into logical aggregations (clusters) and a random selection of clusters is performed

  • clusters are common in vet med: any group that shares common characteristics, like farms, pens, and litters

(pick a shelter at random and sample all the animals there)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How do clusters compare to others in the population?

A

animals within a cluster are more similar than animals between clusters

24
Q

What are the 3 advantages to cluster sampling?

A
  1. only require a sampling frame for groups, but not for the members within the group
  2. fewer sampling units (clusters) are required to achieve sample size of study units
  3. more economical than SRS
25
Q

What are the 2 limitations to cluster sampling?

A
  1. higher variability that SRS, so larger overall sample size (units of concern) is required to achieve the same precision - animals within the same shelter tend to be similar; low hygiene = low overall health of all animals that’s not indicative of the target population
  2. when between-cluster variation is large relative to within-cluster variation = need to sample many more clusters
26
Q

What is multistage sampling? Why do we do this?

A

same as cluster sampling, but take a sub-sample of subjects within clusters

animals within a cluster are more alike than animals between clusters, so it’s more efficient to sample fewer animals

27
Q

How are multistage sampling clusters picked?

A

cluster is randomly selected and individuals within each cluster are randomly selected (primary sampling units)

farm —> sows —> litters —> piglets

28
Q

What are the 3 advantages to multistage clusters?

A
  1. only require a list of clusters (farms, not animals; sites, not pens or fish)
  2. most practical for large studies
  3. can adjust the number of clusters and primary sampling units (PSUs) based on the cost of acquiring samples and running those samples
29
Q

What happens when within cluster variation is high and between cluster variation is high in multistage sampling?

A

high WITHIN CLUSTER variation = smaller number of clusters, with more PSUs = weak farm effect - need fewer farms, but more animals

high BETWEEN CLUSTER variation = larger number of clusters, with fewer PSUs = strong farm effect - need more farms, but fewer animals

30
Q

What are the 2 main limitations of multistage sampling?

A
  1. larger sample sizes (clusters) may be required to achieve the same precision as SRS due to dependence (similarity) between clusters
  2. greater complexity of statistical methods to analyze and estimate sample size
31
Q

What are the 4 types of non-probability sampling?

A
  1. judgement (haphazard)
  2. convenient
  3. purposive
  4. targeted (risk-based)
32
Q

What is judgement sampling?

A

investigator chooses subjects that “represent” the population in the judgement of the investigator

33
Q

What is convenient sampling? When is it most appropriate?

A

investigator uses samples that are easy to obtain

if it is not needed for a group to be representative of the source population

34
Q

What is purposive sampling?

A

investigator chooses samples based on attributes (known exposure, cases) and if random samples are taken from the group, then it is a probability sample from the source population

35
Q

What is targeted (risk-based) sampling?

A

specifically used in animal disease surveillance programs using a biased sample to help find rare diseases or determine freedom from disease (DOESN’T represent target population

36
Q

What are the 2 advantages to non-probability sampling?

A
  1. convenient and inexpensive
  2. appropriate under certain circumstances, like exploratory/preliminary investigation, homogenous populations, and sampling to determine if disease is present in a population
37
Q

What are the 2 disadvantages to non-probability sampling?

A
  1. not possible to know the probability of each unit being selected (not appropriate for descriptive studies at the population level)
  2. can lead to biased population estimates with no way to quantify the extent of the bias
38
Q

What are the 6 steps to the sampling decision process?

A
  1. define question
  2. define source population
  3. develop sampling frame
  4. specify sampling method and determine sample size
  5. select samples
  6. analyze results
39
Q

What are the 4 types of data types? What are some examples of each?

A
  1. continuous - any real numbers (including decimals); temperature, weight
  2. dichotomous - yes/no, 1/0; pregnancy, gender
  3. categorical (nominal) - ordinal with more than2 possibilities; breed, cancer class
  4. count - integers, all whole numbers
40
Q

What are the 2 types of graphs used in veterinary epidemiology?

A
  1. histograms
  2. boxplots
41
Q

What is the difference between binomial and normal distribution?

A

binomial distribution is discrete, whereas the normal distribution is continuous, meaning that the binomial distribution has a finite amount of events, whereas the normal distribution has an infinite number of events

42
Q

What is skewed distribution?

A

a distribution that is neither symmetric nor normal because the data values trail off more sharply on one side than on the other

43
Q

What are the mean, median, and mode?

A

MEAN = average of a data set

MEDIAN = middle value when a data set is ordered from least to greatest

MODE = number that occurs most often in a data set

44
Q

What is the standard error of the mean?

A

indicates how different the population mean is likely to be from a sample mean, telling how much the sample mean would vary if you were to repeat a study using new samples from within a single population

45
Q

What are confidence intervals?

A

the probability that a population parameter will fall between a set of values for a certain proportion of times

46
Q

How much of the data should fall within 2 standard deviations in the area under the normal curve?

A

95.5%

47
Q

What are the 2 types of statistical errors?

A

TYPE I (α) - effect present in the study, effect absent in nature

TYPE II (β) - effect absent in study, effect present in nature (lack of power)

48
Q

What is statistical power?

A

probability of a study to find a true difference (1-β)

49
Q

What are the 4 possibilities if a study truly finds no effect?

A

(type II (β) error)

  1. truly no effect
  2. study design issues (wrong dose, route of infection)
  3. sample size too small (power < 0.80)
  4. bad luck
50
Q

What is a p-value? When are we reasonably sure that the effect detected is not due to chance?

A

(type I (α) error)

probability that a difference (as large as the one observed) could be due to chance

P < 0.05 (we are 95% certain effect detected is a true effect)

51
Q

What is the standard error? What is it related to?

A

measure of the precision of the point of estimate

variance, standard deviation, sample size

52
Q

What is a significance (hypothesis) test?

A

determines if the point estimate is significantly different from some value specified by the null hypothesis test (p-value)

53
Q

What is a confidence interval? When does it suggest a non-significant effect?

A

range of likely value for the point of estimate (95% CI) - range where the estimate falls 95% of the time

if 95% CI includes the null value
(picture = significant effect)

54
Q

EXAMPLE: visiting a herd of 100 cows, producer wonders if there is more mastitis on their farm than normal
(the normal prevalence is 10%)

A
55
Q

EXAMPLE: visiting a herd of 100 cows, produce says they are producing an average of 30 kg/day
(normal milk production is estimated 28 kg/day)

A
56
Q

What 3 things are being done with sample size calculation?

A
  1. establish level of occurrence of a disease by estimating a proportion or mean with desired precision (level of FeLV in feral cats in LI)
  2. test hypotheses by comparing 2 proportions or means (cats of placebo vs. those on treatment)
  3. detection of disease/declare freedom from disease
57
Q

It’s impossible to test all the animals in a population to be certain there is no disease. So, what is done?

A

test to a theoretical expected prevalence (level) of disease - “design prevalence”

  • if a disease was present in the population, you’d expect 1% of the animals to be infected, so you need to sample enough to be reasonably confident that you would detect the disease if the prevalence was 1% or higher