C1 Intro to Probability and Data with R M1-3 Data Flashcards
Which type of variable is hdi (Human Development Index, combining factors of life expectancy, educational attainment, and income) with levels very high, high, medium, and low human development)?
Ordinal Categorical Variable
There is an inherent ordering to the levels of this categorical variable (from very high to low), and hence this is an ordinal categorical variable.
What are the two main types of numerical variables?
Continuous and Discrete
Continuous variables can take any value within a range, while discrete variables can only take specific values.
Define continuous variables.
Can take any value within a range (e.g., height)
Continuous variables allow for an infinite number of possible values.
Define discrete variables.
Can only take specific values (e.g., number of cars owned)
Discrete variables are countable and often represented as whole numbers.
What are the two categories of categorical variables?
Ordinal and Nominal
Categorical variables represent characteristics or qualities.
Define ordinal variables.
Have a meaningful order (e.g., satisfaction levels)
The order matters in ordinal variables, unlike in nominal variables.
Define nominal variables.
No inherent order (e.g., morning person vs. afternoon person)
Nominal variables categorize data without a ranking system.
What do researchers do in observational studies?
Collect data without interfering with how it arises.
What can researchers establish in observational studies?
An association (correlation) between variables.
In general, observational studies can provide evidence of a naturally
occurring association between variables, but they cannot by themselves show a causal connection.
What are the two types of observational studies?
- Retrospective studies (using past data)
- Prospective studies (collecting data throughout the study)
What is the main feature of experiments in research?
Researchers randomly assign subjects to treatments.
What do experiments allow researchers to establish?
Causal connections.
Why is random assignment important in experiments?
It helps control for confounding variables.
What are confounding variables?
Extraneous factors that may influence both the explanatory and response variables.
What is Convenience Sample Bias?
When only easily accessible individuals are included.
This type of bias can lead to non-representative samples because it does not account for the broader population.
What causes Non-response Bias?
Occurs when a non-random fraction of the sampled individuals respond, leading to unrepresentative results.
It can skew the results if the non-respondents differ significantly from respondents.
What is Voluntary Response Bias?
Arises when only those with strong opinions choose to respond.
This bias often leads to overrepresentation of extreme views in survey results.
What is Simple Random Sampling?
Each case has an equal chance of selection.
This method ensures that every individual in the population has the same probability of being chosen.
Define Stratified Sampling.
Population is divided into strata, and samples are taken from each.
This technique is useful for ensuring representation from different segments of the population.
What characterizes Cluster Sampling?
Population is divided into clusters, and entire clusters are sampled.
This method is often used when populations are large and geographically dispersed.
Explain Multistage Sampling.
Combines cluster sampling with additional sampling within selected clusters.
This approach allows for a more refined sampling process, potentially increasing efficiency.
What is a strategy to minimize sampling bias in studies?
Use Random Sampling
Ensures that every individual in the population has an equal chance of being selected.
What is Stratified Sampling?
Dividing the population into homogeneous subgroups and randomly sampling from each stratum
Ensures representation across key characteristics like age or gender.
How does increasing sample size help in studies?
It reduces the impact of bias and increases the reliability of results
A larger sample size generally leads to more accurate and generalizable findings.
What is a method to avoid sampling bias related to participant selection?
Avoid Convenience Sampling
Relying solely on easily accessible individuals can lead to non-representative samples.
What should be done to address Non-response Bias?
Follow up with individuals who do not respond to surveys
Encouraging participation ensures a more representative sample.
What is a benefit of using multiple sampling methods?
Enhances representativeness and reduces bias
Combining methods like multistage sampling can improve the quality of the sample.
What is the principle of Control in experimental design?
Comparing the treatment group to a control group.
The control group serves as a baseline to evaluate the effect of the treatment.
Define Randomization in the context of experimental design.
Randomly assigning subjects to different treatment groups.
This helps to eliminate bias and ensures that the groups are comparable.
What does Replication refer to in experimental studies?
Ensuring a sufficiently large sample size or repeating the entire study.
Replication increases the reliability of the results.
What is the purpose of Blocking in experimental design?
Grouping subjects based on known or suspected variables that may affect the response variable before random assignment.
Blocking helps to control for variables that could confound the results.
What is a placebo?
A fake treatment used as a control.
Placebos are often used in clinical trials to test the efficacy of a new drug.
What is the placebo effect?
Improvement due to belief in receiving treatment.
This phenomenon can occur even when patients receive no active therapeutic intervention.
What does blinding refer to in research?
Participants unaware of their group assignment.
This helps reduce bias in the results.
What is a double-blind study?
Both participants and researchers are unaware of group assignments.
This design minimizes both participant and researcher bias.
What is random sampling?
Selection of subjects randomly from a population, ensuring equal chance of being chosen
This results in a representative sample that allows generalization of study results.
What is the purpose of random sampling in study design?
To create a sample that is likely representative of the population
This enables the results of the study to be generalized.
What is random assignment?
Assignment of subjects to different treatment groups in experimental settings
This ensures that differences in characteristics are equally represented in treatment and control groups.
How does random assignment contribute to research?
It allows researchers to attribute observed differences in outcomes directly to the treatment being tested
This strengthens the validity of causal conclusions.
What are the implications of using both random sampling and random assignment in studies?
Allows for causal conclusions that can be generalized to the population
Studies lacking one or both methods have limitations in their conclusions.
True or False: Random sampling ensures that study results can be generalized to the population.
True
A representative sample is essential for generalization.
A study is designed to test the effect of light level and noise level on exam performance of students. The researcher also believes that light and noise levels might have different effects on males and females, so wants to make sure both genders are represented equally under different conditions.
What would be the explanatory, response and blocking variables in this scenario?
The researchers are interested in the effect of light and noise on exam performance. Since they believe these two variables might be affecting the outcome, these are the explanatory variables and exam performance is the response variable. Gender of the student is a nuisance variable they want to control for, hence they block for it. Unlike light and noise, gender is not a treatment that is being imposed on the subjects.
A retail store considering updates to their credit card policies randomly samples 1000 of their credit card holders to survey on the phone. The phone calls are made during business hours, therefore there is a lower rate of responses from members who work during these hours. What type of bias is this indicative of?
non-response bias
There is an initial random sample, but not everyone in this random sample is reached. Therefore the issue is non-response of the sampled individuals.
A city council has requested a household survey be conducted in a suburban area of their city. The area is broken into many distinct and unique neighborhoods, some including large homes, some with only apartments. Which approach would likely be the least effective?
The least effective approach would be cluster sampling, where each cluster is a neighborhood
What is the most important difference between observational studies and experiments?
Random assignment
Random assignment helps to eliminate bias and establish causality in experiments.
What are associated variables?
When two variables show some connection with one another, they are called associated variables. Associated variables can also be called dependent variables and vice-versa.
The multi-unit and home ownership rates are said to be associated because the plot shows a discernible pattern.
True or False:
A pair of variables is
ASSOCIATED OR INDEPENDENT, NOT BOTH
A pair of variables are either related in some way (associated) or not (independent). No pair of
variables is both associated and independent.
When is said that a pair of variables are independent?
If two variables are not associated, then they are said to be independent. That is, two variables are independent if there is no evident relationship between the two.
What is the term for a variable that might causally affect another?
Explanatory variable
This variable is hypothesized to influence the response variable.
What do we call the variable that is affected by the explanatory variable?
Response variable
This variable responds to changes in the explanatory variable.
Fill in the blank: The _______ variable is the one that might affect another variable.
explanatory
This variable is often used in causal analysis.
Fill in the blank: The _______ variable is the one that is affected by the explanatory variable.
response
This variable measures the effect of the explanatory variable.
What is stratified sampling?
A divide-and-conquer sampling strategy where the population is divided into groups called strata
How are strata chosen in stratified sampling?
Strata are chosen so that similar cases are grouped together
What is the second sampling method employed within each stratum in stratified sampling?
Usually simple random sampling
Why is stratified sampling useful?
It is especially useful when the cases in each stratum are very similar with respect to the outcome of interest
Fill in the blank: Stratified sampling is a _______ sampling strategy.
[divide-and-conquer]
What are the four principles of experimental design?
Controlling, Randomization, Replication, Blocking
What does controlling refer to in experimental design?
Researchers assign treatments to cases and control other differences in the groups
What is the purpose of randomization in experiments?
To account for uncontrolled variables and prevent accidental bias
Why is replication important in experimental design?
It allows researchers to estimate the effect of the explanatory variable more accurately
What is blocking in the context of experimental design?
Grouping individuals based on a variable before randomizing them into treatment groups
When is blocking particularly useful in an experiment?
When researchers suspect that other variables influence the response
What is an example of using blocking in a drug study for heart attacks?
Split patients into low-risk and high-risk blocks before random assignment
What is the significance of incorporating the first three principles of experimental design?
Controlling, Randomization & Replication
They are essential for any study to ensure valid results
What does randomization help to prevent in a study?
Accidental bias
Fill in the blanks
An extraneous variable that is related to the explanatory and response variables and that prevents us from deducing causal relationships based on observational studies is called a _____
confounding variable
In an experiment, what purpose does blocking serve?
Control for variables that may influence the response.
If there are variables that are known or suspected to affect the response variable, we first group subjects into blocks based on these variables, and then randomize cases within each block to treatment groups. This allows us to control for possible effects of these confounding variables by making sure they are equally represented in all treatment groups.
What is the purpose of random sampling in data collection?
To make results generalizable to the target population
Random sampling ensures that every individual in the population has an equal chance of being selected, enhancing the representativeness of the sample.
What does random assignment in study design suggest?
Causality
Random assignment helps ensure that any differences observed in the study outcomes can be attributed to the treatment rather than pre-existing differences between groups.
How does random sampling relate to the population?
It allows generalization of results to the population at large
This is important for making valid inferences based on the sample studied.
What type of sampling is described as stratified sampling?
Random sampling
Stratified sampling involves dividing the population into subgroups and randomly sampling from each subgroup to ensure representation.
Further classify a variable once it is identified as categorical.
If variable is categorical, determine if it is ordinal based on whether or not the levels have a natural ordering.
Further classify a variable once it is identified as numerical.
If variable is numerical, further classify as continuous or discrete based on whether or not the variable can take on an infinite number of values or only non-negative whole numbers, respectively.
True or False
Labeling variables as explanatory and response does not guarantee that the relationship between the two is actually causal, even if there is an association identified between the two variables.
True