CHAPTER 10 Controlling for Confounders Flashcards

1
Q

What is the main purpose of controlling for confounders?

A

To mitigate bias arising from confounders.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the most common method to control for a confounder?

A

Including it in a regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

True or False: Controlling for confounders eliminates all bias in a study.

A

False.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What should we typically control for, confounders or mechanisms?

A

Confounders.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does controlling involve in statistical analysis?

A

Finding the correlation between two variables while holding other variables constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In the context of U.S. Congress, which party is more likely to vote conservatively?

A

Republicans.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does a higher ACU score indicate?

A

A more conservative voting record.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What was the average ACU score for Republicans in 1997 according to the data?

A

83.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What was the average ACU score for Democrats in 1997 according to the data?

A

19.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How much more conservatively do Republicans vote compared to Democrats on average?

A

64 ACU points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a potential confounder that affects both party membership and voting records?

A

Personal ideology.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What survey was administered to congressional candidates to measure personal ideology?

A

National Political Awareness Test (NPAT).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does controlling for personal ideology mean in this context?

A

Comparing voting records of Democrats and Republicans with similar NPAT scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does Table 10.2 illustrate about the difference in voting records after controlling for ideology?

A

The difference diminishes significantly compared to the unadjusted difference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the unit of analysis in the regression model discussed?

A

An individual representative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does the coefficient β1 in the regression represent?

A

The correlation between ACU score and being a Republican, controlling for personal ideology.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What was the estimated value of β1 from the regression on the data?

A

24.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Why might the estimate of the causal effect of party discipline still be questionable?

A

Due to the presence of other confounders beyond personal ideology.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a heterogeneous treatment effect?

A

When the effect of a treatment varies across different units of observation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Why is it important to consider heterogeneous treatment effects when controlling for confounders?

A

It can change the subset of units for which we estimate the average effect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are ATE and LATE in the context of treatment effects?

A

ATE is average treatment effect; LATE is local average treatment effect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does ATE stand for?

A

Average Treatment Effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does LATE stand for?

A

Local Average Treatment Effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

True or False: LATE and ATE are always the same.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are the key ingredients in any regression for causal inference?

A
  • Dependent variable
  • Treatment variable
  • Control variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is a dependent variable?

A

The outcome you are trying to understand

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is a treatment variable?

A

The feature of the world whose effect on the dependent variable you are trying to estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What are control variables?

A

Potential confounders included in the regression to reduce bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

In the regression equation, what do the parameters α, β, and γ represent?

A
  • α: intercept
  • β: effect of the treatment
  • γ: effect of the control variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What does the error term ε represent in a regression?

A

Idiosyncratic factors reflecting differences from predicted outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What does BLACEF stand for?

A

Best Linear Approximation to the Conditional Expectation Function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

True or False: OLS regression provides the best linear approximation without knowing the data-generating process.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What happens if there are no baseline differences across values of T after controlling for X?

A

BLACEF corresponds to the average effect of T on Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What does the omitted variable bias formula quantify?

A

The bias associated with failing to include a confounder in regression

35
Q

What is the formula for omitted variable bias?

A

β S − β = π · γ

36
Q

What does π represent in the omitted variable bias formula?

A

The correlation between T and X

37
Q

What does γ represent in the omitted variable bias formula?

A

The effect of the control variable on the outcome

38
Q

If an unobserved confounder is positively related to both T and Y, what is the sign of the bias?

A

Positive bias

39
Q

What can cause an under-estimate of the effect of T?

A

If the confounder is positively related to T but negatively related to Y

40
Q

How does controlling for a variable (X) affect the relationship between T and Y?

A

It changes the estimated relationship if X is correlated with T and has an independent relationship with Y

41
Q

What is a potential confounder when regressing income on height?

42
Q

What does running separate regressions for men and women allow us to see?

A

The correlation between income and height separately for each gender

43
Q

What happens to the slope when separate regressions for men and women are run?

A

The slope is greater for men than for women

44
Q

What is the purpose of running a regression of income on both height and gender?

A

To obtain a summary estimate of the correlation between income and height, controlling for gender

45
Q

What is the relationship between the slopes of two lines when controlling for gender?

A

The slopes are identical, representing a weighted average

46
Q

What does the intercept of the regression line for women represent?

A

Predicted income for women who are 5 feet tall

47
Q

What is the predicted income for women who are 5 feet tall?

A

The predicted income for women who are 5 feet tall is represented by the intercept of the regression line for women.

This intercept is a key parameter in the regression model.

48
Q

What is the predicted difference in income between men and women of the same height?

A

The predicted difference in income between men and women of the same height is represented by the slope of the two regression lines.

This slope indicates how income varies with height when controlling for gender.

49
Q

What is the average relationship between height and income, controlling for gender?

A

The average relationship between height and income, controlling for gender, is approximately 8.1.

This value is derived after controlling for the confounding effect of gender.

50
Q

What was the previous estimate for the slope of the relationship between height and income before controlling for gender?

A

The previous estimate for the slope was 14.8.

This estimate was corrected to 8.1 after accounting for the confounding influence of gender.

51
Q

What does controlling for a confounder do to the precision of estimates?

A

Controlling for a confounder can either improve or harm the precision of estimates.

The effect on precision depends on the correlation of the control variable with the outcome and treatment.

52
Q

What is p-hacking?

A

P-hacking is the practice of trying control variables until achieving a statistically significant estimate.

This practice is discouraged as it can lead to misleading results.

53
Q

What is the NPAT score in the context of the congressional politics example?

A

The NPAT score is a continuous measure of personal political ideology used as a confounder in the analysis.

This score helps to control for ideology in the regression of ACU score on party affiliation.

54
Q

What does the regression of ACU Rating on NPAT Conservativeness score aim to achieve?

A

The regression aims to control for personal ideology and estimate the effect of party affiliation on ACU Rating.

It results in a continuous measure of the relationship between ideology and party.

55
Q

What does the gap between the two regression lines represent in the context of ACU ratings?

A

The gap between the two regression lines represents the difference in predicted ACU Rating between Republicans and Democrats for a given NPAT score.

This allows for a nuanced understanding of party differences across ideological lines.

56
Q

What are the conditions for controlling to yield an unbiased estimate of a causal effect?

A

To yield an unbiased estimate, all confounders must be controlled for, and there must be no reverse causality.

This highlights the challenges in achieving true causal inference in observational studies.

57
Q

What is reverse causation?

A

Reverse causation occurs when the outcome affects the treatment, complicating causal interpretations.

It emphasizes the difficulty in establishing clear cause-and-effect relationships.

58
Q

What was the main finding of the study on social media usage and subjective well-being?

A

The experimental estimate of Facebook usage’s effect on subjective well-being was about one-third the size of the estimates from the simple correlation.

This suggests that controlling for confounders still leads to an over-estimate of the true effect.

59
Q

What is the significance of a regression table?

A

A regression table summarizes the results of regression analyses, including coefficients, standard errors, and statistical significance.

Understanding regression tables is crucial for interpreting the results of statistical analyses.

60
Q

What does the first column of a regression table typically contain?

A

The first column of a regression table typically contains labels for the variables involved in the regression.

This helps in identifying which variables are included in the analysis.

61
Q

How is statistical significance indicated in a regression table?

A

Statistical significance is indicated by stars next to the coefficient estimates in the regression table.

This helps to quickly identify which results are statistically reliable.

62
Q

What is the initial ACU rating for Republicans before controlling for NPAT score?

A

64.32

This is the unadjusted rating before considering NPAT categories or scores.

63
Q

What happens to the ACU rating for Republicans when controlling for NPAT categories?

A

Drops to 23.74

This indicates a significant reduction in the estimated effect.

64
Q

What is the ACU rating for Republicans when controlling for the continuous NPAT Conservativeness score?

A

24.28

This value reflects the influence of a more nuanced measure of NPAT.

65
Q

What does the r-squared statistic represent in regression analysis?

A

The proportion of variation in one variable predicted by other variables

A higher r-squared indicates a better fit of the model to the data.

66
Q

How many observations were included in the regression analysis?

A

349

This number reflects the congresspeople who completed the NPAT survey in 1997.

67
Q

What is the coefficient estimate for the NPAT category 81-100 in relation to ACU rating?

A

59.77 **

This coefficient indicates a significant positive relationship.

68
Q

True or False: A high r-squared statistic alone guarantees understanding of causal relationships.

A

False

A high r-squared does not imply that all confounders have been controlled for or that the model is correctly specified.

69
Q

What is a confounder in the context of regression analysis?

A

A variable that affects both the treatment and the outcome

It can introduce bias if not controlled for.

70
Q

What is the challenge when a variable is both a confounder and a mechanism?

A

It complicates the decision on whether to control for it

This is due to its dual role affecting both the treatment and the outcome.

71
Q

What is the local average treatment effect (LATE)?

A

The average treatment effect for a specific subset of the population

This concept helps in understanding causal effects in targeted groups.

72
Q

Fill in the blank: Controlling is a way to account for _______ and obtain better estimates.

A

[confounders]

Controlling helps reduce bias in estimating causal relationships.

73
Q

What is the purpose of matching in statistical analysis?

A

To control for confounding variables by pairing treated and untreated units with similar characteristics

This technique allows for comparison while accounting for observable differences.

74
Q

What is omitted variables bias?

A

The bias resulting from failing to control for some confounder

This can lead to incorrect estimates of causal effects.

75
Q

What is a dummy variable?

A

A variable indicating whether a unit has a particular characteristic (1 for yes, 0 for no)

These are often used in regression models to represent categorical data.

76
Q

What is the treatment variable in regression analysis?

A

The variable representing the feature whose effect on the dependent variable is being estimated

It is crucial for understanding causal relationships.

77
Q

What is a dependent or outcome variable?

A

The variable in the data that corresponds to the feature being explained or predicted

It is the primary focus of the analysis.

78
Q

What should researchers be cautious about when controlling for variables?

A

Unobservable confounders, reverse causation, and confounders that are also mechanisms

These factors can still lead to biased estimates despite controlling for other variables.

79
Q

What is the main goal of controlling variables in regression?

A

To generate more credible estimates by comparing treated and untreated units with similar characteristics

This helps in approximating causal effects more accurately.

80
Q

What is the main focus of the study by Allcott et al. (2020)?

A

The Welfare Effects of Social Media

Published in the American Economic Review, the study investigates how social media impacts subjective well-being.

81
Q

What significant trend was observed in response rates during elections?

A

Response rates declined considerably in subsequent elections

This decline is why data from the late 1990s is being presented.

82
Q

What is the NPAT category mentioned in the text?

A

There are five NPAT categories

One of these categories must be omitted in analysis, specifically the 1st–20th percentile.

83
Q

Why can’t both Democrat and Republican variables be included in the regression?

A

Every member is either one or the other

This prevents separate identification of their effects.

84
Q

What does including a Republican variable in the regression imply?

A

It interprets the coefficient as the effect of being a Republican versus being a Democrat

This approach simplifies the analysis by focusing on one variable at a time.