Sampling Epidemiology Flashcards

1
Q

Epidemiological approaches require?

A

that we gather information about populations
- understand info at pop level, particualrly health status
- we can not sample everyone in apopulation - that is impossble, so we must sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When you want to conduct a study, what exactly do you need to start?

A

You need to have a question. Some examples of the types of questions are:
1. Establish level of occurrence of a disease e.g. prevalence
2. Test hypotheses
* has the level of disease changed (Changes in level of occurrence)
* Identification of risk factors associated with disease. Sample in a way we can find differences in disease among various groups.
3. Detection of disease
* Particularly important to trade and Declaring freedom from disease.
- a lot of rules and standards for a country to declare that they are free of a disease.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How are you going to measure disease within a population?

A
  • What’s going on in the population?
    • What’s the best way to get this information?
      • Sample every animal? (aka a census) If you have everyone in the population you do not need statistics. Very powerful, expensive, and difficult to do.
      • Sample a portion of the animals?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define a census

A
  • Census
  • Every individual in the population
    is sampled
  • Exact measurement of what is
    going on
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Define a sample.

A
  • Sample
  • Take a sample from the population
  • Make inferences about the population based on information in the sample
  • Quicker, less expensive, and practical!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How would you define target population.

A
  • Population you want to make inferences about with respect to your objectives. E.g. Feral cats in US and Canada

Population you are trying to understand.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Source population

A
  • Subset of the target population from which you will draw your sample. E.g. Feral cats on Long Island

Taking information from that to develop a study population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Study population

A
  • Individuals actually measured or enrolled in your study. E.g. Feral cats spayed or neutered by LIU’s Team in 2021

Samples you can actually get your hands on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define internal validity.

A
  • Whether or not the study results (obtained from the study sample) are valid for the source population.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define external validity.

A
  • How well the study results can be generalized to the target population
  • Subjective assessment of whether or not the source population and sample are broadly representative of the target population

Tends to refer to the reader and they are wondering is this study relevant to me. Do I care about the results of this study? Does it impact me?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define sampling frame.

A
  • Complete list of all sampling units.
  • Required for most probabilistic sampling methods.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define sampling unit.

A
  • Unit of interest…
  • Animals (usually the unit in vet med)
  • Litters
  • Pens
  • Farms
  • etc. …
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the two approaches to sampling?

A
  • Two approaches to sampling
    1. Probability sampling
    2. Non-probability sampling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define probability sampling.

A
  1. Probability sampling
    * Every element in the population has a know probability of being included in the sample. Where stats and assumptions hold true.
    * Formal process of random selection used to ensure representativeness of the sample.
    * Statistical inference can be made about the population from which you drew your sample.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Name the types of probability sampling.

A
  • Five types of probability sample:
    1. Simple random = the easiest one. (also called SRS in stats).
    2. Systematic random
    3. Stratified
    4. Cluster
    5. Multistage

Every element in the population has a known non-zero probability of being included in the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define simple random probability sampling.

A
  • AKA Simple Random Sample (SRS)
  • Forms the basis of most probability sampling methods
  • Every subject has equal probability of being included in the sample
  • Requires a list of all subjects (sampling frame)
  • Not always easy or possible
  • Procedure
  • Generate list of all animals
  • Assign a unique number (ID) to each animal
  • Generate a list of random numbers assigned to IDs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the limitations of simple random probability sampling? Provide an example.

A
  • Limitations
  • Difficult to obtain a complete sampling frame
  • Costs may be higher than other methods
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Define systematic random probability sampling.

A
  • Systematic random
  • Does not require sampling frame, but need all animals presented sequentially
  • E.g. fish processing, or appointment book (or old folder filing system)
  • Do not necessarily need to know the number of sampling units (but need a rough guess)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Describe the procedure used for systematic random probability sampling. Provide an example. (see slideshow)

A
  • Procedure
  • Select your sampling interval, j
  • Randomly select a number within your sampling interval
  • Sample every j th animal after that
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the limitations of systemic random probability sampling.

A
  • Limitations
  • Could be subject to bias in the interval
  • E.g. (not a great example, but) if you have 5 pens and the producer takes one cow at a
    time from each pen in a sequence for them to run through a chute… you shouldn’t
    sample every 5th animal going through the chute!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Define stratified random probability sampling.

A
  • Stratified random
  • Sampling frame is divided into groups (strata) of defined common characteristic
  • Breed, sex, age (parity), etc.
  • Randomize within each stratum
  • Simple random or systematic random
  • Proportional to the total number, within the stratum
  • Ensures all strata are present and properly represented in the study
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is an example of stratified random probability sampling. (see slideshow)

A
  • E.g. You want to estimate the prevalence of a disease in a population of cats
  • Suspect Siamese cats have a much higher prevalence of disease than DLH/DSH cats
  • Want a fair representation of breeds in your study to prevent bias in estimate of prevalence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the limitations of stratified random probability sampling.

A
  • Need to know about stratification variable in advance
  • Less efficient than SRS if the strata do not explain variance within the population (i.e. strata
    are not risk factors)
24
Q

Define cluster probability sampling.

A
  • Sampling frame is divided into logical aggregations (clusters) and a random selection of
    clusters is performed
    • Clusters are common in veterinary medicine
      • Any group that shares common characteristics
        • Farm
        • Pen
        • Litter
      • Animals within a cluster are more similar than animals between clusters
  • All individual sampling units within the selected clusters are included in the sample
  • Random selection of clusters (groups)
  • Unit of concern (Primary Sampling Unit) = all individuals within the cluster
25
Q

What are the advantages of cluster probability sampling?

A
  • Only require a sampling frame for groups, but not for the members within the group
  • Fewer sampling units (clusters) are required to achieve a certain sample size of study units
  • More economical than SRS
26
Q

What are the limitations of cluster probability sampling?

A
  • Higher variability than SRS, so larger overall sample size (units of concern) is required to achieve the same precision
  • When between-cluster variation is large relative to within-cluster variation –> need to sample many more clusters! If the shelters are all very similar, you care more about the individual differences of the animals. How similar are the animals to one another in a shelter? How similar are shelters to one another?
27
Q

Define multistage probability sampling.

A
  • Same as Cluster Sampling, but take a sub-sample of subjects within clusters.
  • Why do this?
  • Animals within a cluster are more alike than animals between clusters
  • Don’t gain much information from sampling over and over within a cluster!
  • Gain more info by sampling more herds and taking a small sub-sample within each herd
  • Very common in veterinary epidemiology
  • Cluster is randomly selected
  • Individuals within each cluster are randomly selected (primary sampling units = PSUs)
  • Can have multiple stages:
  • Farm -> Sows -> Litters -> Piglets
28
Q

What are the advantages of multistage probability sampling?

A
  • Advantages
  • Only requires a list of clusters
    • Farms, not animals
    • Sites, not pens, not fish
  • Most practical for large studies (national)
  • Adjust number of clusters and primary sampling units (PSUs) based on cost of acquiring
    samples vs running samples
    • Also depends on within- and between-cluster variation

If within cluster variation is high –> smaller number of clusters, with more PSUs
i.e. weak ‘farm’ effect –> need fewer farms, but more animals
If between cluster variation is high –> larger number of clusters, with fewer PSUs
i.e. strong ‘farm’ effect –> need more farms, but fewer animals

29
Q

What are the limitations of multistage probability sampling?

A
  • Limitations
  • Larger sample sizes (number of clusters) may be required to achieve the same precision as SRS
    * Due to dependence (similarity)
  • Greater complexity of statistical methods to analyze and estimate sample size
30
Q

Define non-probability sampling.

A
  1. Judgement (haphazard)
    * Investigator chooses subjects that ‘represent’ the population in the judgement of the
    investigator
  2. Convenient
    * Samples that are easy to obtain
    * OK if you do not need a group that is representative of the source population
  3. Purposive
    * Chose on attributes (know exposure, cases)
    * If random samples are taken from this group, then it is a probability sample from the
    source population
  4. Targeted (risk-based)
    * Specifically used in animal disease surveillance programs
    * Biased sample… to help:
    * Find rare disease
    * Determine freedom from disease
31
Q

What is an example of targeted (risk-based) sampling?

A

Example:
Bovine spongiform encephalopathy (BSE) surveillance:
* Cattle over 30 months
* Dead, Down, Diseased, Dying (4 Ds)
ISA surveillance (fish in Canada):
* Mortalities and moribund fish routinely submitted

32
Q

What are the advantages of non-probability sampling?

A
  • Advantages
  • Convenient and inexpensive
  • Appropriate under certain circumstances
    • Exploratory or preliminary investigation
    • Homogenous populations
    • Sampling to determine if disease is present in a population
33
Q

What are the disadvantages of non-probability sampling?

A
  • Not possible to know the probability of each unit being selected
    • Cannot estimate sample size or standard errors
    • Not appropriate for descriptive studies (at the population-level)
  • Can lead to biased population estimates, with no way to quantify the extent of the bias
    (selection bias)!
34
Q

Sample Size requires ‘Basic Statistics’
* Extra statistical resources
* CDC: Principles of Epidemiology in Public Health Practice
https://www.cdc.gov/csels/dsepd/ss1978/ss1978.pdf
* Lesson 2: Summarizing Data (page 109 to 179)

Breeze through this

A
35
Q

What are the different types of data?

A
  1. Continuous: Real numbers such as temperature, weight, decimal values, etc.
  2. Dichotomous: Yes/No… 1/0, male vs. female
  3. Categorical (Nominal): Ordinal (Type 1, 2, 3) anything that is more than two possibilities
  4. Count: Integers
36
Q

Histogram

A

freq on y axis over some type of x axis variable

37
Q

boxplots

A

see graph in powerpoint

38
Q

What are the different types of distributions?

A
  • Distributions
  • Binomial vs. Normal
  • Shapes: skewed (Right vs Left)
  • Mean, median, mode
  • Percentiles, Quartiles, and
    Interquartile ranges
39
Q

What is the difference between a binomial and a normal distribution?

A

The main difference between the binomial distribution and the normal distribution is that binomial distribution is discrete, whereas the normal distribution is continuous. It means that the binomial distribution has a finite amount of events, whereas the normal distribution has an infinite number of events.

40
Q

What are the different measures of spread?

A
  • Variance vs. Standard Deviation
41
Q

What is the difference between standard deviation and variance?

A

Standard deviation measures how far apart numbers are in a data set.
Variance gives an actual value to how much the numbers in a data set vary from the mean.
Standard deviation is the square root of the variance and is expressed in the same units as the data set. Variance can be expressed in squared units or as a percentage.

42
Q

Define standard error of the mean.

A

Confidence Intervals (e.g. 95%CI)

43
Q

If you increase sample size, what would happen to your precision? standard deviation? sample error?

A

Standard deviation = decreases
Precision = increases
Sample error = decreases (b/c increasing sample size means you are getting closer to the actual population size).

44
Q

Normal distribution graph

A
45
Q

Statistical power is defined as?

A

The probability to find a TRUE difference.

46
Q

In what situations would your study not find a true effect?

A
  1. There truly is no effect.
  2. Study defect.
  3. Sample size too small.
  4. Bad luck
47
Q

What is a Type I (alpha) error?

A

A Type I error means that the result you obtained does not exist in the real world, but your study found something.,

AKA Effect present in study, but absent in real world.

48
Q

What is a Type II (beta) error?

A

A Type II error means that there is something that exists in the real world but your study did not found it. This means that your study lacks power.

AKA Effect absent in study, but present in real world.

49
Q

If you run your study, and find that the effect was absent, do you accept or reject the null hypothesis?

A

You accept the null hypothesis.

50
Q

If you run your study, and find that the effect was present, do you accept or reject the null hypothesis?

A

You reject the null hypothesis.

51
Q

What is the definition of the p-value? What value do we want this to equal and why?

A

P-value is the probability of you having found a Type 1 error by chance.

This is bad b/c then you are reporting an effect that does not exist. Therefore we try to keep this error as low as possible, around 5%. We want p-value to be less than 0.05.

52
Q

Define the term standard error.

A

Standard error measures the precision of your study. It is related to variance, standard deviation, and sample size.

53
Q

What is a hypothesis test?

A

Test to determine if your point estimate, is significantly different from the value you specified in your null hypothesis.

E.g.
Milk is 28 kg
Sample if the cows in my farm producing more than or less than 28 kg or are they producing exactly that? P-value determines if your values are on this null value.

54
Q

Define the term Confidence interval

A

If you were to sample this population, what is the 95% confidence interval for this population.
More you sample, more confident you will be around this value.

Always described as range where 95% of your values would fall in all of the time.

55
Q

A higher confidence interval indicates?

A

That you are more confident in your results

56
Q

What is the null hypothesis?

A

The null hypothesis is the commonly accepted fact. You are working to disprove, nullify, or reject this fact.

57
Q

As you increase your sample size, what happens to your standard error value? Confidence interval?

A

Standard error value decreases b/c you are more confident in your results.
Confidence interval increases (widens)