Lecture 6 & 7: Sampling Flashcards
What is the difference between census and sample?
Census (most reliable way)
-Every individual in a population is evaluated
-EX Stats CA population Census
Sample
-Only a subset of individuals in a population are evaluated
-EX 5% of Canadians selected for a study
Why do we take a sample?
-Descriptive study: To describe characteristics of a population, who, what when etc EX more than 1billion adults are overweight and 300 million of them are obese
-Analytical study: To assess specific associations between risk factors (exposures) and disease (or other outcomes) ie to compare 2 groups EX children with pets vs without to see the difference
Why don’t we just take a census if its more accurate?
Keep in mind this involves all individuals of the country therefore resources are a huge issue. This is why stats CA only does them every 5 years
-Time
-Expense $$
-Logistics (need a list of everyone, need to get a hold of everyone)
-Need everyone to volunteer/participate
-Date quantity vs data quality
-but poor sampling can result in bias
What are the main stages to sampling?
- Determine WHO/WHAT to sample
- Determine HOW you’re going to choose these subjects
- Determine HOW MANY you’ll need to be confident in your findings
Why is it important to determine WHO you are sampling?
-How you choose your subjects will have an impact on the validity of your results (how accurate the results are for the population)
-If your subjects are not truly representative of the population of interest then your conclusions may be biased
What should you consider when choosing your subjects?
Want to make sure you are getting a good representation of the population of interest
-Establish criteria for participating BEFORE you start sampling
-INCLUSION CRITERIA= characteristics needed to be eligible for the study
-EXCLUSION CRITERIA= characteristics that would exclude or prevent someone from participating
What are the 3 areas/layers of a population when defining your populations?
outermost: Target population ex all Canadians
-Population to which it might be possible to extrapolate results
-May not always be clearly defined in write-ups
Middle: Source population: LIST (subset of target pop)
-Population from which the study subjects are drawn
-We should be able to list all members (sampling units) of this population (=sampling frame)
Innermost: Study Population:
-The individuals included in your study
What are the 2 types of validity?
External validity: How well can the study results be extrapolated to the target population? (from study population to target population)
Internal Validity: How well does the study related to the source population? (from study population to source population)
What should you consider when determining HOW to sample?
-Your sample strategy will determine the nature of any extrapolations you might make from the sample to the population
-From which groups should you choose subjects?
-How should you sample them?
What are the 3 different sample strategies?
- Non-probability sampling
-Convenience sampling
-Judgment sampling
-Purposive sampling - Probability sampling
-Simple random sampling
-Systematic random sampling
-Stratified random sampling - Others (either non- or probability)
-Cluster sampling
-Multi-stage sampling
What are the 3 types of non-probability sampling?
Probability is unknown for non-probability
Convenience sampling: sampling units are chosen bc they are easy to get (ex animals in traps, farms close to UoG)
Judgment sampling: the investigator chooses what they deem to be units that are representative of the population (PhD student made her survey include ppl that were from a farm and knew concepts she was talking about)
Purposive sampling: Sampling units are chosen on purpose bc of their exposure or disease status (in an analytic study)
What is the problem with convenience sampling?
-Ex in class where the method is determining how many dogs ppl have that are sitting in the first row
-Problem: not truly representative of the distribution of dog ownership bc service dogs and owners sit closer to the front etc
When do you use non-probability sampling?
Often used in analytical studies
Pros:
-Relatively cheap and easy
-Good for a homogeneous population
Cons:
-Can produce biased results if the subjects you select are not representative of the target population
-Can limit how far you can extrapolate your results
What is probability sampling?
-Uses some form of random selection process
-All individuals in a population have some non-zero probability of being selected for the study AND that probability can be calculated
What is simple random sampling? (the first type of random sampling we talked about)
- Simple random sampling
-A fixed % of the source pop is chosen using a formal random process (flip a coin, random # draw)
-All individuals have an equal chance of being chosen
-If done properly, the sample chosen should be representative of the population under investigation
-You need to known the sampling frame (and therefore total # individuals in your population) to use this method ie that list
What is systemic random sampling? (the 2nd type of random sampling)
- Systemic random sampling
-Good when you don’t have a complete list of individuals in the population to be sampled IF you know how many individuals there are in total
-Often used with cattle/sheep run through a chute bc need the individuals to be sequentially available
-Sampling interval (j) is calculated (j= source population size/required sample size)
-Starting point in the first interval is selected on a formal, random basis (first sheep through is random then every 5 that go through shoot take sample)
-ie randomly select your starting point from among the first j individuals, and then sample every 5th individual after that
What is stratified random sampling? (the 3rd type of random sampling)
- Stratified Random sampling
-Before choosing participants, the sampling frame is broken down into strata based on some factor likely to influence the level of the characteristic being measured
-Then, simple or systemic random sampling is conducted within each strata
-The % sampled within each strata does not have to be the same in all groups
What is cluster sampling? (the other sampling strategy)
-The sampling unit is a group of individuals with things in common (ex herd, household, geographic region)
-But the unit of concern is still the individual (ex cow, or person)
-All individuals in the sampling units are selected (ex all cows in the herd, all ppl in the home)
-Overall, method of choice may be either non-probability or probability
Why is cluster sampling often used for animal research?
-Its easier to get a list of all clusters (farms) in the area than it is to get a list of all animals
-its cheaper/easier to go around testing all of the animals in, ex 20 herds until we reach our required goal of 1000 animals, than to drive around ON testing ex 5 animals at all 200 farms for a sample size of 1000
What is multi-stage sampling? (other sampling strategy)
-Similar to cluster sampling, except that sampling takes place at BOTH the cluster level AND the individual level
-Convenient when there are too many individuals in a cluster to obtain measurements on (ex a feedlot of 2000 cattle)
-Also convenient when the individuals in a cluster are so alike that measuring just a few will provide sufficient info (ex a couple of purebred puppies out of a 10-pup litter)
-ie a proportion of individuals are (randomly) selected and measured not all measured like cluster
What is important to consider no matter what sampling strategy you are choosing?
-What kind of subjects are likely to participate
-choosing subjects that are not representative of your target population can limit the influences you can make
What is the 3rd main stage to sampling?
- Determine how many you’ll need to be confident in your findings.
-For descriptive epi (estimating means, proportions)
-For analytical epi (comparing means, proportions)
Now that we know who/what and how we are sampling, now what do we do?
-Essential that adequate sample sizes be estimated prior to start of study
-Too small a sample: you might not find what your looking for
-Too large a sample: waste resources, ethical concerns, statistical significant vs tactical significant
What are some sampling considerations?
Non-stat (more focus on resources)
-Time
-Money
-Sampling frame
-Research objective
Stat
-Power
-Confidence
-Type 1 and 2 errors
-Hypotheses
What is confidence?
Confidence= degree of certainty about your estimation process (commonly 95%) (that the true mean lies b/w these values)
Confidence intervals= represent a range of values around he sample estimate that include the true mean/ proportion of the source population
What is precision?
Precision= how tight the confidence interval is around your estimate (how accurate your confidence was)
Precision depends on:
-Sample size
-Variability within population
-Your sampling strategy
*Smaller range is more precise
What is the letter for precision in sample sizes?
-L=precision of an estimate (ie allowable error)
Smaller=better bc less error
ex want weight of a pig to be within 2kg of the true weight not 5kg
What are the steps to estimate sample sizes for descriptive characteristics?
- Determine he level of confidence that you wish to use (always 95% in this class)
- Specify the desired precision of your estimate (ex within 6% of the true value) is your L
- Estimate the expected variance for the characteristic in the population (if measuring a proportion can use priori proportion if measuring a mean can use expected variance)
- Use the appropriate formula to calculate sample size
What is a hypotheses and what is a null and alternative hypotheses?
-Hypotheses reflect a research question of interest
Null-hypotheses (Ho): state that there are no differences b/w the groups being compared
Alternative hypotheses: are claims contrary to the null hypothesis. If we reject the null hypothesis, then we can assume there is enough evidence to support the alternative hypothesis
What is the difference between one sided and two sided hypothesis?
One-sided hypothesis: EX The risk of developing lung cancer in ON men with longterm exposure to radon is higher than the risk in ON men without long-term radon exposure
two-sided:The risk of developing lung cancer in ON men with long-term exposure to radon is either higher or lower than the risk in ON men without long-term exposure to radon
IN EPI ALMOST ALL HYPOTHESES ARE 2-sided
What is a type 1 and 2 error?
Type 1 error: outcomes in the groups being compared are declared different, when they are not
Type 2 error: outcomes in the groups being compared are declared as not being different the they are
What are the steps to estimate sample sizes to test for differences between groups?
- States your null hypothesis and determine whether you will have a 1-sided or 2 sided alternative hypotheses (always 2 for our purposes)
- Determine whether you are comparing a mean or proportion
- Determine how much of a difference between the groups you want to detect (and the expected var, if necessary)
- Set a (related to confidence always 5%) and b (20% relate to power which is usually 80)
- Use the appropriate formula to calculate sample size
What are the trends of the required sample size increasing as…
Required sample size generally increases as:
-The size of the difference between 2 means or proportions decreases (ie smaller difference needs to be found) the smaller the difference, higher the sample size
-The level f power to detect a difference b/w 2 groups increases ie higher the power wanted, higher the sample size needed
-The number of confounders you’re controlling for increases
-The number of hypotheses tested increases