Study Design and Sampling Flashcards
How does a sampling design fit into the framework of statistical inference?
- A sampling design ensures that the process of selecting units for observation is unbiased and representative of the statistical population. This allows for valid inferential statistics, where conclusions drawn from the sample can be generalized to the larger population.
What are the four goals of an ideal sampling design?
All sampling units are selectable: Every unit must have a non-zero chance of being selected.
Example: In a penguin study, every penguin (both mature and immature) must be selectable.
Selection is unbiased: No attribute of the sampling unit should influence its probability of being selected.
Example: Not choosing only juvenile penguins because they are easier to catch.
Selection is independent: Selecting one unit should not affect the probability of selecting another.
Example: If you sample a person in a group, you should not sample others in that group.
All samples are possible: All combinations of sampling units must be possible in the sample.
Example: When sampling people from different city sides, you should ensure that both east and west sides are represented.
What is bias in sampling, and how can it affect a study?
- Bias is an over- or under-estimate of a value from a sample compared to the population due to flawed sampling. Bias occurs when some units have a higher chance of being selected, leading to unrepresentative results.
- If only dominant birds at a feeder are sampled, the sample will overestimate the average bird mass.
What is sampling independence, and why is it important?
Sampling independence means the selection of one unit does not influence the selection of others. It’s essential to avoid biased sampling.
- When interviewing people at a park, you should not select multiple people from the same group.
What errors can occur in sampling design?
Errors include bias, lack of independence in selection, and failure to include all possible samples. These errors result in inaccurate data that do not represent the statistical population.
What are explanatory variables, response variables, and confounding variables?
Explanatory Variable: The factor that is investigated for its effect.
- Example: Smoking habits in a lung cancer study.
Response Variable: The outcome measured in response to the explanatory variable.
- Example: Risk of lung cancer.
Confounding Variable: A variable that affects the response variable but is not accounted for, potentially leading to spurious relationships.
- Example: Age affecting both smoking habits and cancer risk.
What are the goals of an observational study?
Observational studies aim to collect real-world data from a statistical population to investigate relationships among variables without manipulating them. These studies show associations but cannot prove causation.
- Studying the relationship between smoking and lung cancer.
What is the difference between retrospective and prospective designs?
Retrospective: The outcome is already known, increasing the risk of spurious relationships.
Prospective: The outcome is unknown, and subjects are followed over time to observe outcomes.
What are common observational study designs?
Simple Random Survey: Randomly selecting units from the statistical population.
- Example: Randomly selecting GPS points to find bird nests.
Stratified Survey: Dividing the population into subgroups (strata) and sampling within each group.
- Example: Sampling people from different age groups to study candy preferences.
Cluster Survey: Dividing the population into clusters and sampling within those.
- Example: Sampling households to study wage variation.
Case-Control Survey: Comparing two groups—one with a particular outcome and one without.
- Example: Studying lifestyle factors of heart attack survivors (case) versus non-survivors (control).
Cohort Survey: Following a random group over time to observe outcomes.
- Example: Following a group of young people to study the development of heart disease.
What is the goal of an experimental study?
The goal is to study the effect of one or more manipulated variables (factors) on one or more response variables. By manipulating variables, experimental studies establish cause-and-effect relationships.
- Example: Testing whether adding nitrogen to soil affects soybean production.
How do experimental studies differ from observational studies?
In experimental studies, the researcher manipulates the explanatory variable and randomly assigns sampling units to treatments. In observational studies, variables are not controlled, and the researcher observes existing relationships.
What are treatment factors and levels?
A factor is a manipulated variable in an experiment, and a level is a specific value of that factor. Each experiment has at least two levels for comparison.
- Example: In a soybean study, nitrogen addition is the factor, and the levels are “no nitrogen” and “nitrogen added.”
What is replication, and why is it important?
Replication is repeating a treatment multiple times to see if the results are consistent. Replication ensures that the observed effect is not due to chance. Each replicate is an independent sampling unit.
What is pseudoreplication?
Pseudoreplication occurs when data from observation units (not sampling units) are treated as independent in the analysis, inflating the number of replicates.
- For example, In a soybean study, treating each plant as a replicate, instead of each plot, would be pseudoreplication.
What is blinding in an experimental study?
Blinding is a method where participants do not know which treatment they are receiving. A single-blind study blinds participants, while a double-blind study blinds both participants and researchers to the treatments.