Lecture 6 & 7: Sampling Flashcards

1
Q

What is the difference between census and sample?

A

Census (most reliable way)
-Every individual in a population is evaluated
-EX Stats CA population Census

Sample
-Only a subset of individuals in a population are evaluated
-EX 5% of Canadians selected for a study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why do we take a sample?

A

-Descriptive study: To describe characteristics of a population, who, what when etc EX more than 1billion adults are overweight and 300 million of them are obese

-Analytical study: To assess specific associations between risk factors (exposures) and disease (or other outcomes) ie to compare 2 groups EX children with pets vs without to see the difference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why don’t we just take a census if its more accurate?

A

Keep in mind this involves all individuals of the country therefore resources are a huge issue. This is why stats CA only does them every 5 years

-Time
-Expense $$
-Logistics (need a list of everyone, need to get a hold of everyone)
-Need everyone to volunteer/participate
-Date quantity vs data quality
-but poor sampling can result in bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the main stages to sampling?

A
  1. Determine WHO/WHAT to sample
  2. Determine HOW you’re going to choose these subjects
  3. Determine HOW MANY you’ll need to be confident in your findings
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is it important to determine WHO you are sampling?

A

-How you choose your subjects will have an impact on the validity of your results (how accurate the results are for the population)
-If your subjects are not truly representative of the population of interest then your conclusions may be biased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What should you consider when choosing your subjects?

A

Want to make sure you are getting a good representation of the population of interest

-Establish criteria for participating BEFORE you start sampling
-INCLUSION CRITERIA= characteristics needed to be eligible for the study
-EXCLUSION CRITERIA= characteristics that would exclude or prevent someone from participating

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 3 areas/layers of a population when defining your populations?

A

outermost: Target population ex all Canadians
-Population to which it might be possible to extrapolate results
-May not always be clearly defined in write-ups

Middle: Source population: LIST (subset of target pop)
-Population from which the study subjects are drawn
-We should be able to list all members (sampling units) of this population (=sampling frame)

Innermost: Study Population:
-The individuals included in your study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the 2 types of validity?

A

External validity: How well can the study results be extrapolated to the target population? (from study population to target population)

Internal Validity: How well does the study related to the source population? (from study population to source population)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What should you consider when determining HOW to sample?

A

-Your sample strategy will determine the nature of any extrapolations you might make from the sample to the population
-From which groups should you choose subjects?
-How should you sample them?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the 3 different sample strategies?

A
  1. Non-probability sampling
    -Convenience sampling
    -Judgment sampling
    -Purposive sampling
  2. Probability sampling
    -Simple random sampling
    -Systematic random sampling
    -Stratified random sampling
  3. Others (either non- or probability)
    -Cluster sampling
    -Multi-stage sampling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the 3 types of non-probability sampling?

A

Probability is unknown for non-probability

Convenience sampling: sampling units are chosen bc they are easy to get (ex animals in traps, farms close to UoG)

Judgment sampling: the investigator chooses what they deem to be units that are representative of the population (PhD student made her survey include ppl that were from a farm and knew concepts she was talking about)

Purposive sampling: Sampling units are chosen on purpose bc of their exposure or disease status (in an analytic study)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the problem with convenience sampling?

A

-Ex in class where the method is determining how many dogs ppl have that are sitting in the first row

-Problem: not truly representative of the distribution of dog ownership bc service dogs and owners sit closer to the front etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When do you use non-probability sampling?

A

Often used in analytical studies

Pros:
-Relatively cheap and easy
-Good for a homogeneous population

Cons:
-Can produce biased results if the subjects you select are not representative of the target population
-Can limit how far you can extrapolate your results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is probability sampling?

A

-Uses some form of random selection process
-All individuals in a population have some non-zero probability of being selected for the study AND that probability can be calculated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is simple random sampling? (the first type of random sampling we talked about)

A
  1. Simple random sampling
    -A fixed % of the source pop is chosen using a formal random process (flip a coin, random # draw)
    -All individuals have an equal chance of being chosen
    -If done properly, the sample chosen should be representative of the population under investigation
    -You need to known the sampling frame (and therefore total # individuals in your population) to use this method ie that list
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is systemic random sampling? (the 2nd type of random sampling)

A
  1. Systemic random sampling
    -Good when you don’t have a complete list of individuals in the population to be sampled IF you know how many individuals there are in total
    -Often used with cattle/sheep run through a chute bc need the individuals to be sequentially available
    -Sampling interval (j) is calculated (j= source population size/required sample size)
    -Starting point in the first interval is selected on a formal, random basis (first sheep through is random then every 5 that go through shoot take sample)
    -ie randomly select your starting point from among the first j individuals, and then sample every 5th individual after that
17
Q

What is stratified random sampling? (the 3rd type of random sampling)

A
  1. Stratified Random sampling
    -Before choosing participants, the sampling frame is broken down into strata based on some factor likely to influence the level of the characteristic being measured
    -Then, simple or systemic random sampling is conducted within each strata
    -The % sampled within each strata does not have to be the same in all groups
18
Q

What is cluster sampling? (the other sampling strategy)

A

-The sampling unit is a group of individuals with things in common (ex herd, household, geographic region)
-But the unit of concern is still the individual (ex cow, or person)
-All individuals in the sampling units are selected (ex all cows in the herd, all ppl in the home)
-Overall, method of choice may be either non-probability or probability

19
Q

Why is cluster sampling often used for animal research?

A

-Its easier to get a list of all clusters (farms) in the area than it is to get a list of all animals
-its cheaper/easier to go around testing all of the animals in, ex 20 herds until we reach our required goal of 1000 animals, than to drive around ON testing ex 5 animals at all 200 farms for a sample size of 1000

20
Q

What is multi-stage sampling? (other sampling strategy)

A

-Similar to cluster sampling, except that sampling takes place at BOTH the cluster level AND the individual level
-Convenient when there are too many individuals in a cluster to obtain measurements on (ex a feedlot of 2000 cattle)
-Also convenient when the individuals in a cluster are so alike that measuring just a few will provide sufficient info (ex a couple of purebred puppies out of a 10-pup litter)
-ie a proportion of individuals are (randomly) selected and measured not all measured like cluster

21
Q

What is important to consider no matter what sampling strategy you are choosing?

A

-What kind of subjects are likely to participate
-choosing subjects that are not representative of your target population can limit the influences you can make

22
Q

What is the 3rd main stage to sampling?

A
  1. Determine how many you’ll need to be confident in your findings.

-For descriptive epi (estimating means, proportions)
-For analytical epi (comparing means, proportions)

23
Q

Now that we know who/what and how we are sampling, now what do we do?

A

-Essential that adequate sample sizes be estimated prior to start of study
-Too small a sample: you might not find what your looking for
-Too large a sample: waste resources, ethical concerns, statistical significant vs tactical significant

24
Q

What are some sampling considerations?

A

Non-stat (more focus on resources)
-Time
-Money
-Sampling frame
-Research objective

Stat
-Power
-Confidence
-Type 1 and 2 errors
-Hypotheses

25
Q

What is confidence?

A

Confidence= degree of certainty about your estimation process (commonly 95%) (that the true mean lies b/w these values)

Confidence intervals= represent a range of values around he sample estimate that include the true mean/ proportion of the source population

26
Q

What is precision?

A

Precision= how tight the confidence interval is around your estimate (how accurate your confidence was)

Precision depends on:
-Sample size
-Variability within population
-Your sampling strategy

*Smaller range is more precise

27
Q

What is the letter for precision in sample sizes?

A

-L=precision of an estimate (ie allowable error)

Smaller=better bc less error
ex want weight of a pig to be within 2kg of the true weight not 5kg

28
Q

What are the steps to estimate sample sizes for descriptive characteristics?

A
  1. Determine he level of confidence that you wish to use (always 95% in this class)
  2. Specify the desired precision of your estimate (ex within 6% of the true value) is your L
  3. Estimate the expected variance for the characteristic in the population (if measuring a proportion can use priori proportion if measuring a mean can use expected variance)
  4. Use the appropriate formula to calculate sample size
29
Q

What is a hypotheses and what is a null and alternative hypotheses?

A

-Hypotheses reflect a research question of interest

Null-hypotheses (Ho): state that there are no differences b/w the groups being compared

Alternative hypotheses: are claims contrary to the null hypothesis. If we reject the null hypothesis, then we can assume there is enough evidence to support the alternative hypothesis

30
Q

What is the difference between one sided and two sided hypothesis?

A

One-sided hypothesis: EX The risk of developing lung cancer in ON men with longterm exposure to radon is higher than the risk in ON men without long-term radon exposure

two-sided:The risk of developing lung cancer in ON men with long-term exposure to radon is either higher or lower than the risk in ON men without long-term exposure to radon

IN EPI ALMOST ALL HYPOTHESES ARE 2-sided

31
Q

What is a type 1 and 2 error?

A

Type 1 error: outcomes in the groups being compared are declared different, when they are not

Type 2 error: outcomes in the groups being compared are declared as not being different the they are

32
Q

What are the steps to estimate sample sizes to test for differences between groups?

A
  1. States your null hypothesis and determine whether you will have a 1-sided or 2 sided alternative hypotheses (always 2 for our purposes)
  2. Determine whether you are comparing a mean or proportion
  3. Determine how much of a difference between the groups you want to detect (and the expected var, if necessary)
  4. Set a (related to confidence always 5%) and b (20% relate to power which is usually 80)
  5. Use the appropriate formula to calculate sample size
33
Q

What are the trends of the required sample size increasing as…

A

Required sample size generally increases as:
-The size of the difference between 2 means or proportions decreases (ie smaller difference needs to be found) the smaller the difference, higher the sample size
-The level f power to detect a difference b/w 2 groups increases ie higher the power wanted, higher the sample size needed
-The number of confounders you’re controlling for increases
-The number of hypotheses tested increases