C1 Intro to Probability and Data with R M1-3 Data Flashcards

1
Q

Which type of variable is hdi (Human Development Index, combining factors of life expectancy, educational attainment, and income) with levels very high, high, medium, and low human development)?

A

Ordinal Categorical Variable

There is an inherent ordering to the levels of this categorical variable (from very high to low), and hence this is an ordinal categorical variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two main types of numerical variables?

A

Continuous and Discrete

Continuous variables can take any value within a range, while discrete variables can only take specific values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define continuous variables.

A

Can take any value within a range (e.g., height)

Continuous variables allow for an infinite number of possible values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define discrete variables.

A

Can only take specific values (e.g., number of cars owned)

Discrete variables are countable and often represented as whole numbers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the two categories of categorical variables?

A

Ordinal and Nominal

Categorical variables represent characteristics or qualities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define ordinal variables.

A

Have a meaningful order (e.g., satisfaction levels)

The order matters in ordinal variables, unlike in nominal variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define nominal variables.

A

No inherent order (e.g., morning person vs. afternoon person)

Nominal variables categorize data without a ranking system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What do researchers do in observational studies?

A

Collect data without interfering with how it arises.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What can researchers establish in observational studies?

A

An association (correlation) between variables.

In general, observational studies can provide evidence of a naturally
occurring association between variables, but they cannot by themselves show a causal connection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the two types of observational studies?

A
  • Retrospective studies (using past data)
  • Prospective studies (collecting data throughout the study)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the main feature of experiments in research?

A

Researchers randomly assign subjects to treatments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do experiments allow researchers to establish?

A

Causal connections.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is random assignment important in experiments?

A

It helps control for confounding variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are confounding variables?

A

Extraneous factors that may influence both the explanatory and response variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Convenience Sample Bias?

A

When only easily accessible individuals are included.

This type of bias can lead to non-representative samples because it does not account for the broader population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What causes Non-response Bias?

A

Occurs when a non-random fraction of the sampled individuals respond, leading to unrepresentative results.

It can skew the results if the non-respondents differ significantly from respondents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is Voluntary Response Bias?

A

Arises when only those with strong opinions choose to respond.

This bias often leads to overrepresentation of extreme views in survey results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is Simple Random Sampling?

A

Each case has an equal chance of selection.

This method ensures that every individual in the population has the same probability of being chosen.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Define Stratified Sampling.

A

Population is divided into strata, and samples are taken from each.

This technique is useful for ensuring representation from different segments of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What characterizes Cluster Sampling?

A

Population is divided into clusters, and entire clusters are sampled.

This method is often used when populations are large and geographically dispersed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Explain Multistage Sampling.

A

Combines cluster sampling with additional sampling within selected clusters.

This approach allows for a more refined sampling process, potentially increasing efficiency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a strategy to minimize sampling bias in studies?

A

Use Random Sampling

Ensures that every individual in the population has an equal chance of being selected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is Stratified Sampling?

A

Dividing the population into homogeneous subgroups and randomly sampling from each stratum

Ensures representation across key characteristics like age or gender.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How does increasing sample size help in studies?

A

It reduces the impact of bias and increases the reliability of results

A larger sample size generally leads to more accurate and generalizable findings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is a method to avoid sampling bias related to participant selection?

A

Avoid Convenience Sampling

Relying solely on easily accessible individuals can lead to non-representative samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What should be done to address Non-response Bias?

A

Follow up with individuals who do not respond to surveys

Encouraging participation ensures a more representative sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is a benefit of using multiple sampling methods?

A

Enhances representativeness and reduces bias

Combining methods like multistage sampling can improve the quality of the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the principle of Control in experimental design?

A

Comparing the treatment group to a control group.

The control group serves as a baseline to evaluate the effect of the treatment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Define Randomization in the context of experimental design.

A

Randomly assigning subjects to different treatment groups.

This helps to eliminate bias and ensures that the groups are comparable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What does Replication refer to in experimental studies?

A

Ensuring a sufficiently large sample size or repeating the entire study.

Replication increases the reliability of the results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is the purpose of Blocking in experimental design?

A

Grouping subjects based on known or suspected variables that may affect the response variable before random assignment.

Blocking helps to control for variables that could confound the results.

32
Q

What is a placebo?

A

A fake treatment used as a control.

Placebos are often used in clinical trials to test the efficacy of a new drug.

33
Q

What is the placebo effect?

A

Improvement due to belief in receiving treatment.

This phenomenon can occur even when patients receive no active therapeutic intervention.

34
Q

What does blinding refer to in research?

A

Participants unaware of their group assignment.

This helps reduce bias in the results.

35
Q

What is a double-blind study?

A

Both participants and researchers are unaware of group assignments.

This design minimizes both participant and researcher bias.

36
Q

What is random sampling?

A

Selection of subjects randomly from a population, ensuring equal chance of being chosen

This results in a representative sample that allows generalization of study results.

37
Q

What is the purpose of random sampling in study design?

A

To create a sample that is likely representative of the population

This enables the results of the study to be generalized.

38
Q

What is random assignment?

A

Assignment of subjects to different treatment groups in experimental settings

This ensures that differences in characteristics are equally represented in treatment and control groups.

39
Q

How does random assignment contribute to research?

A

It allows researchers to attribute observed differences in outcomes directly to the treatment being tested

This strengthens the validity of causal conclusions.

40
Q

What are the implications of using both random sampling and random assignment in studies?

A

Allows for causal conclusions that can be generalized to the population

Studies lacking one or both methods have limitations in their conclusions.

41
Q

True or False: Random sampling ensures that study results can be generalized to the population.

A

True

A representative sample is essential for generalization.

42
Q

A study is designed to test the effect of light level and noise level on exam performance of students. The researcher also believes that light and noise levels might have different effects on males and females, so wants to make sure both genders are represented equally under different conditions.
What would be the explanatory, response and blocking variables in this scenario?

A

The researchers are interested in the effect of light and noise on exam performance. Since they believe these two variables might be affecting the outcome, these are the explanatory variables and exam performance is the response variable. Gender of the student is a nuisance variable they want to control for, hence they block for it. Unlike light and noise, gender is not a treatment that is being imposed on the subjects.

43
Q

A retail store considering updates to their credit card policies randomly samples 1000 of their credit card holders to survey on the phone. The phone calls are made during business hours, therefore there is a lower rate of responses from members who work during these hours. What type of bias is this indicative of?

A

non-response bias

There is an initial random sample, but not everyone in this random sample is reached. Therefore the issue is non-response of the sampled individuals.

44
Q

A city council has requested a household survey be conducted in a suburban area of their city. The area is broken into many distinct and unique neighborhoods, some including large homes, some with only apartments. Which approach would likely be the least effective?

A

The least effective approach would be cluster sampling, where each cluster is a neighborhood

45
Q

What is the most important difference between observational studies and experiments?

A

Random assignment

Random assignment helps to eliminate bias and establish causality in experiments.

46
Q

What are associated variables?

A

When two variables show some connection with one another, they are called associated variables. Associated variables can also be called dependent variables and vice-versa.

The scatterplot suggests a relationship: counties with a higher rate of multi-units tend to have lower homeownership rates.

The multi-unit and home ownership rates are said to be associated because the plot shows a discernible pattern.

47
Q

True or False:
A pair of variables is
ASSOCIATED OR INDEPENDENT, NOT BOTH

A

A pair of variables are either related in some way (associated) or not (independent). No pair of
variables is both associated and independent.

48
Q

When is said that a pair of variables are independent?

A

If two variables are not associated, then they are said to be independent. That is, two variables are independent if there is no evident relationship between the two.

49
Q

What is the term for a variable that might causally affect another?

A

Explanatory variable

This variable is hypothesized to influence the response variable.

50
Q

What do we call the variable that is affected by the explanatory variable?

A

Response variable

This variable responds to changes in the explanatory variable.

51
Q

Fill in the blank: The _______ variable is the one that might affect another variable.

A

explanatory

This variable is often used in causal analysis.

52
Q

Fill in the blank: The _______ variable is the one that is affected by the explanatory variable.

A

response

This variable measures the effect of the explanatory variable.

53
Q

What is stratified sampling?

A

A divide-and-conquer sampling strategy where the population is divided into groups called strata

54
Q

How are strata chosen in stratified sampling?

A

Strata are chosen so that similar cases are grouped together

55
Q

What is the second sampling method employed within each stratum in stratified sampling?

A

Usually simple random sampling

56
Q

Why is stratified sampling useful?

A

It is especially useful when the cases in each stratum are very similar with respect to the outcome of interest

57
Q

Fill in the blank: Stratified sampling is a _______ sampling strategy.

A

[divide-and-conquer]

58
Q

What are the four principles of experimental design?

A

Controlling, Randomization, Replication, Blocking

59
Q

What does controlling refer to in experimental design?

A

Researchers assign treatments to cases and control other differences in the groups

60
Q

What is the purpose of randomization in experiments?

A

To account for uncontrolled variables and prevent accidental bias

61
Q

Why is replication important in experimental design?

A

It allows researchers to estimate the effect of the explanatory variable more accurately

62
Q

What is blocking in the context of experimental design?

A

Grouping individuals based on a variable before randomizing them into treatment groups

63
Q

When is blocking particularly useful in an experiment?

A

When researchers suspect that other variables influence the response

64
Q

What is an example of using blocking in a drug study for heart attacks?

A

Split patients into low-risk and high-risk blocks before random assignment

65
Q

What is the significance of incorporating the first three principles of experimental design?

Controlling, Randomization & Replication

A

They are essential for any study to ensure valid results

66
Q

What does randomization help to prevent in a study?

A

Accidental bias

67
Q

Fill in the blanks

An extraneous variable that is related to the explanatory and response variables and that prevents us from deducing causal relationships based on observational studies is called a _____

A

confounding variable

68
Q

In an experiment, what purpose does blocking serve?

A

Control for variables that may influence the response.

If there are variables that are known or suspected to affect the response variable, we first group subjects into blocks based on these variables, and then randomize cases within each block to treatment groups. This allows us to control for possible effects of these confounding variables by making sure they are equally represented in all treatment groups.

69
Q

What is the purpose of random sampling in data collection?

A

To make results generalizable to the target population

Random sampling ensures that every individual in the population has an equal chance of being selected, enhancing the representativeness of the sample.

70
Q

What does random assignment in study design suggest?

A

Causality

Random assignment helps ensure that any differences observed in the study outcomes can be attributed to the treatment rather than pre-existing differences between groups.

71
Q

How does random sampling relate to the population?

A

It allows generalization of results to the population at large

This is important for making valid inferences based on the sample studied.

72
Q

What type of sampling is described as stratified sampling?

A

Random sampling

Stratified sampling involves dividing the population into subgroups and randomly sampling from each subgroup to ensure representation.

73
Q

Further classify a variable once it is identified as categorical.

A

If variable is categorical, determine if it is ordinal based on whether or not the levels have a natural ordering.

74
Q

Further classify a variable once it is identified as numerical.

A

If variable is numerical, further classify as continuous or discrete based on whether or not the variable can take on an infinite number of values or only non-negative whole numbers, respectively.

75
Q

True or False
Labeling variables as explanatory and response does not guarantee that the relationship between the two is actually causal, even if there is an association identified between the two variables.