C1 Intro to Probability and Data with R M1-3 Data Flashcards

Question 1

Q

Which type of variable is hdi (Human Development Index, combining factors of life expectancy, educational attainment, and income) with levels very high, high, medium, and low human development)?

Answer

A

Ordinal Categorical Variable

There is an inherent ordering to the levels of this categorical variable (from very high to low), and hence this is an ordinal categorical variable.

Question 2

Q

What are the two main types of numerical variables?

Answer

A

Continuous and Discrete

Continuous variables can take any value within a range, while discrete variables can only take specific values.

Question 3

Q

Define continuous variables.

Answer

A

Can take any value within a range (e.g., height)

Continuous variables allow for an infinite number of possible values.

Question 4

Q

Define discrete variables.

Answer

A

Can only take specific values (e.g., number of cars owned)

Discrete variables are countable and often represented as whole numbers.

Question 5

Q

What are the two categories of categorical variables?

Answer

A

Ordinal and Nominal

Categorical variables represent characteristics or qualities.

Question 6

Q

Define ordinal variables.

Answer

A

Have a meaningful order (e.g., satisfaction levels)

The order matters in ordinal variables, unlike in nominal variables.

Question 7

Q

Define nominal variables.

Answer

A

No inherent order (e.g., morning person vs. afternoon person)

Nominal variables categorize data without a ranking system.

Question 8

Q

What do researchers do in observational studies?

Answer

A

Collect data without interfering with how it arises.

Question 9

Q

What can researchers establish in observational studies?

Answer

A

An association (correlation) between variables.

In general, observational studies can provide evidence of a naturally
occurring association between variables, but they cannot by themselves show a causal connection.

Question 10

Q

What are the two types of observational studies?

Answer

A

Retrospective studies (using past data)
Prospective studies (collecting data throughout the study)

Question 11

Q

What is the main feature of experiments in research?

Answer

A

Researchers randomly assign subjects to treatments.

Question 12

Q

What do experiments allow researchers to establish?

Answer

A

Causal connections.

Question 13

Q

Why is random assignment important in experiments?

Answer

A

It helps control for confounding variables.

Question 14

Q

What are confounding variables?

Answer

A

Extraneous factors that may influence both the explanatory and response variables.

Question 15

Q

What is Convenience Sample Bias?

Answer

A

When only easily accessible individuals are included.

This type of bias can lead to non-representative samples because it does not account for the broader population.

Question 16

Q

What causes Non-response Bias?

Answer

A

Occurs when a non-random fraction of the sampled individuals respond, leading to unrepresentative results.

It can skew the results if the non-respondents differ significantly from respondents.

Question 17

Q

What is Voluntary Response Bias?

Answer

A

Arises when only those with strong opinions choose to respond.

This bias often leads to overrepresentation of extreme views in survey results.

Question 18

Q

What is Simple Random Sampling?

Answer

A

Each case has an equal chance of selection.

This method ensures that every individual in the population has the same probability of being chosen.

Question 19

Q

Define Stratified Sampling.

Answer

A

Population is divided into strata, and samples are taken from each.

This technique is useful for ensuring representation from different segments of the population.

Question 20

Q

What characterizes Cluster Sampling?

Answer

A

Population is divided into clusters, and entire clusters are sampled.

This method is often used when populations are large and geographically dispersed.

Question 21

Q

Explain Multistage Sampling.

Answer

A

Combines cluster sampling with additional sampling within selected clusters.

This approach allows for a more refined sampling process, potentially increasing efficiency.

Question 22

Q

What is a strategy to minimize sampling bias in studies?

Answer

A

Use Random Sampling

Ensures that every individual in the population has an equal chance of being selected.

Question 23

Q

What is Stratified Sampling?

Answer

A

Dividing the population into homogeneous subgroups and randomly sampling from each stratum

Ensures representation across key characteristics like age or gender.

Question 24

Q

How does increasing sample size help in studies?

Answer

A

It reduces the impact of bias and increases the reliability of results

A larger sample size generally leads to more accurate and generalizable findings.

Question 25

Q

What is a method to avoid sampling bias related to participant selection?

Answer

A

Avoid Convenience Sampling

Relying solely on easily accessible individuals can lead to non-representative samples.

Question 26

Q

What should be done to address Non-response Bias?

Answer

A

Follow up with individuals who do not respond to surveys

Encouraging participation ensures a more representative sample.

Question 27

Q

What is a benefit of using multiple sampling methods?

Answer

A

Enhances representativeness and reduces bias

Combining methods like multistage sampling can improve the quality of the sample.

Question 28

Q

What is the principle of Control in experimental design?

Answer

A

Comparing the treatment group to a control group.

The control group serves as a baseline to evaluate the effect of the treatment.

Question 29

Q

Define Randomization in the context of experimental design.

Answer

A

Randomly assigning subjects to different treatment groups.

This helps to eliminate bias and ensures that the groups are comparable.

Question 30

Q

What does Replication refer to in experimental studies?

Answer

A

Ensuring a sufficiently large sample size or repeating the entire study.

Replication increases the reliability of the results.

Question 31

Q

What is the purpose of Blocking in experimental design?

Answer

A

Grouping subjects based on known or suspected variables that may affect the response variable before random assignment.

Blocking helps to control for variables that could confound the results.

Question 32

Q

What is a placebo?

Answer

A

A fake treatment used as a control.

Placebos are often used in clinical trials to test the efficacy of a new drug.

Question 33

Q

What is the placebo effect?

Answer

A

Improvement due to belief in receiving treatment.

This phenomenon can occur even when patients receive no active therapeutic intervention.

Question 34

Q

What does blinding refer to in research?

Answer

A

Participants unaware of their group assignment.

This helps reduce bias in the results.

Question 35

Q

What is a double-blind study?

Answer

A

Both participants and researchers are unaware of group assignments.

This design minimizes both participant and researcher bias.

Question 36

Q

What is random sampling?

Answer

A

Selection of subjects randomly from a population, ensuring equal chance of being chosen

This results in a representative sample that allows generalization of study results.

Question 37

Q

What is the purpose of random sampling in study design?

Answer

A

To create a sample that is likely representative of the population

This enables the results of the study to be generalized.

Question 38

Q

What is random assignment?

Answer

A

Assignment of subjects to different treatment groups in experimental settings

This ensures that differences in characteristics are equally represented in treatment and control groups.

Question 39

Q

How does random assignment contribute to research?

Answer

A

It allows researchers to attribute observed differences in outcomes directly to the treatment being tested

This strengthens the validity of causal conclusions.

Question 40

Q

What are the implications of using both random sampling and random assignment in studies?

Answer

A

Allows for causal conclusions that can be generalized to the population

Studies lacking one or both methods have limitations in their conclusions.

Question 41

Q

True or False: Random sampling ensures that study results can be generalized to the population.

Answer

A

True

A representative sample is essential for generalization.

Question 42

Q

A study is designed to test the effect of light level and noise level on exam performance of students. The researcher also believes that light and noise levels might have different effects on males and females, so wants to make sure both genders are represented equally under different conditions.
What would be the explanatory, response and blocking variables in this scenario?

Answer

A

The researchers are interested in the effect of light and noise on exam performance. Since they believe these two variables might be affecting the outcome, these are the explanatory variables and exam performance is the response variable. Gender of the student is a nuisance variable they want to control for, hence they block for it. Unlike light and noise, gender is not a treatment that is being imposed on the subjects.

Question 43

Q

A retail store considering updates to their credit card policies randomly samples 1000 of their credit card holders to survey on the phone. The phone calls are made during business hours, therefore there is a lower rate of responses from members who work during these hours. What type of bias is this indicative of?

Answer

A

non-response bias

There is an initial random sample, but not everyone in this random sample is reached. Therefore the issue is non-response of the sampled individuals.

Question 44

Q

A city council has requested a household survey be conducted in a suburban area of their city. The area is broken into many distinct and unique neighborhoods, some including large homes, some with only apartments. Which approach would likely be the least effective?

Answer

A

The least effective approach would be cluster sampling, where each cluster is a neighborhood

Question 45

Q

What is the most important difference between observational studies and experiments?

Answer

A

Random assignment

Random assignment helps to eliminate bias and establish causality in experiments.

Question 46

Q

What are associated variables?

Answer

A

When two variables show some connection with one another, they are called associated variables. Associated variables can also be called dependent variables and vice-versa.

The multi-unit and home ownership rates are said to be associated because the plot shows a discernible pattern.

Question 47

Q

True or False:
A pair of variables is
ASSOCIATED OR INDEPENDENT, NOT BOTH

Answer

A

A pair of variables are either related in some way (associated) or not (independent). No pair of
variables is both associated and independent.

Question 48

Q

When is said that a pair of variables are independent?

Answer

A

If two variables are not associated, then they are said to be independent. That is, two variables are independent if there is no evident relationship between the two.

Question 49

Q

What is the term for a variable that might causally affect another?

Answer

A

Explanatory variable

This variable is hypothesized to influence the response variable.

Question 50

Q

What do we call the variable that is affected by the explanatory variable?

Answer

A

Response variable

This variable responds to changes in the explanatory variable.

Question 51

Q

Fill in the blank: The _______ variable is the one that might affect another variable.

Answer

A

explanatory

This variable is often used in causal analysis.

Question 52

Q

Fill in the blank: The _______ variable is the one that is affected by the explanatory variable.

Answer

A

response

This variable measures the effect of the explanatory variable.

Question 53

Q

What is stratified sampling?

Answer

A

A divide-and-conquer sampling strategy where the population is divided into groups called strata

Question 54

Q

How are strata chosen in stratified sampling?

Answer

A

Strata are chosen so that similar cases are grouped together

Question 55

Q

What is the second sampling method employed within each stratum in stratified sampling?

Answer

A

Usually simple random sampling

Question 56

Q

Why is stratified sampling useful?

Answer

A

It is especially useful when the cases in each stratum are very similar with respect to the outcome of interest

Question 57

Q

Fill in the blank: Stratified sampling is a _______ sampling strategy.

Answer

A

[divide-and-conquer]

Question 58

Q

What are the four principles of experimental design?

Answer

A

Controlling, Randomization, Replication, Blocking

Question 59

Q

What does controlling refer to in experimental design?

Answer

A

Researchers assign treatments to cases and control other differences in the groups

Question 60

Q

What is the purpose of randomization in experiments?

Answer

A

To account for uncontrolled variables and prevent accidental bias

Question 61

Q

Why is replication important in experimental design?

Answer

A

It allows researchers to estimate the effect of the explanatory variable more accurately

Question 62

Q

What is blocking in the context of experimental design?

Answer

A

Grouping individuals based on a variable before randomizing them into treatment groups

Question 63

Q

When is blocking particularly useful in an experiment?

Answer

A

When researchers suspect that other variables influence the response

Question 64

Q

What is an example of using blocking in a drug study for heart attacks?

Answer

A

Split patients into low-risk and high-risk blocks before random assignment

Answer 65

A

They are essential for any study to ensure valid results

Answer 66

A

Accidental bias

Answer 67

A

confounding variable

Answer 68

A

Control for variables that may influence the response.

If there are variables that are known or suspected to affect the response variable, we first group subjects into blocks based on these variables, and then randomize cases within each block to treatment groups. This allows us to control for possible effects of these confounding variables by making sure they are equally represented in all treatment groups.

Answer 69

A

To make results generalizable to the target population

Random sampling ensures that every individual in the population has an equal chance of being selected, enhancing the representativeness of the sample.

Answer 70

A

Causality

Random assignment helps ensure that any differences observed in the study outcomes can be attributed to the treatment rather than pre-existing differences between groups.

Answer 71

A

It allows generalization of results to the population at large

This is important for making valid inferences based on the sample studied.

Answer 72

A

Random sampling

Stratified sampling involves dividing the population into subgroups and randomly sampling from each subgroup to ensure representation.

Answer 73

A

If variable is categorical, determine if it is ordinal based on whether or not the levels have a natural ordering.

Answer 74

A

If variable is numerical, further classify as continuous or discrete based on whether or not the variable can take on an infinite number of values or only non-negative whole numbers, respectively.