16; Dummy Variables and Collider Bias Flashcards

1
Q

What do dummy variables allow you to incorporate into regression analysis?

A

Nominal and ordinal variables

Examples include sex, county of birth, and church attendance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How are dummy variables coded?

A

As binary variables with values of 0 or 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the coding for male and female in a dummy variable for sex?

A

Male = 1, Female = 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a key characteristic of dummy variables when there are more than two categories?

A

Create multiple dummy variables to cover each category.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In Cinnirella’s paper, how are birth counties represented?

A

By creating a variable for each county, e.g., Essex = 1 if born in Essex, otherwise 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What effect do dummy variables have on the regression line?

A

They shift the regression line up or down without affecting the slope.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the reference category in a regression with dummy variables?

A

The category against which other categories are compared.

If we regress multiple dummiess witth multtiple categories, we compare against the reference group

e.g. all other regional coeffficient arre in relation to reference group of London

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In the regression of height on sex, which category is the reference category?

A

Female. (m=1, f=0)

Thus coefficient on sex is the amounnt that men are taller than women

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What happens to coefficients and statistical significance when changing the reference group?

A

They will change.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the dependent variable in the dummy variable practice exercise?

A

Infant Mortality Rate (infant deaths per 1000 births).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do interactions in regression analysis allow you to test?

A

Whether there is a differential effect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you figure out the reference group

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain what is going on in these regression tables

A

First column has ‘Other District’ as the reference point

Second Column has ‘Textile’ as the reference point

Constant is always the intercept point relative to the reference i.e. Predicted IMR for Textile at log(pop. density)=0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are interactions

A

While Dummy variables asssume identical effects of other variabnles (doesn’t change slsope)

Interactions allow you to test for a differential effect. Adds on a ‘B3PopDensi * Mining’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain how the Mining and Others Equation is being calculated here using interactions

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a common rule of thumb for T-statistics in terms of significance?

A

T-statistics above 2 or less than -2 are significant at the 5% level.

17
Q

How can statistical significance be measured using standard error?

A

If the coefficient is greater than 2 times the standard error.

18
Q

What do Ziliak and McCloskey (2008) argue about statistical significance?

A

Statistical significance does not imply historical or economic significance.

With very large samples, many variables stat. sig.

19
Q

What does statistical significance indicate? What do we look for as well?

A

The probability that the mean difference or coefficient is different from zero.

How much X influes Y, this comes from the relative magnitude of the mean difference or coefficient

20
Q

What do regression techniques allow for, according to the conclusions?

A

They are more flexible than what first appears.

21
Q

Fill in the blank: Dummy variables allow us to incorporate _______ and ordinal variables into regressions.

A

categorical.

22
Q

True or False: Not everything that is statistically significant is meaningful.

23
Q

What is Collider Bias

A

Collider bias occurs when you control for a variable (the “collider”) that is influenced by two other variables — and this opens up a spurious association between them, even if none existed before.

24
Q

Why is this an example of collider bias

A

It’s collider bias because Movie Star status is a variable influenced by both beauty and talent, and conditioning on it (i.e., analyzing only movie stars) creates a false negative correlation between beauty and talent — even though they are uncorrelated in the general population.

25
Why is these also collider bias from selection on unobservables
In both examples, nutrition and adult height are causally connected, but by using a selected sample (soldiers or prisoners), you're conditioning on a collider (like wage or enlistment/crime status), which creates a spurious backdoor path between nutrition and height through an unobservable
26
Why is this collider and nonresponse bias
Collider bias arises because both social status and marriage influence migration, and conditioning on migration (through only observing non-migrants) opens a spurious path between them. Nonresponse bias occurs because individuals who migrate (and thus drop out of the dataset) do so in a way that is systematically related to both their social status and marital outcomes.
27
Why is this collider and ascertainment bias
Collider bias arises because both wealth and dying in the plague influence whether someone was probated, and conditioning on being probated opens a spurious association between wealth and plague mortality. Ascertainment bias occurs because only individuals whose deaths were recorded through probate are included in the data, and the likelihood of being probated is systematically related to both wealth and cause of death.
28
Why is this Collider and M Bias
M-bias is when conditioning on a collider (a common effect of two variables) creates a misleading association between those variables, and biases downstream relationships — even when no causal link exists. Migration is a collider because it is influenced by both Parental SES and Education. If you condition on Migration (e.g., by adjusting for it in a regression or selecting only those who migrated), you open a non-causal path between Parental SES and Education. This path can then bias the estimated relationship between Hookworm Exposure (which is influenced by SES) and Income at Age 50 (which is influenced by Education).
29
what is pre-treatment collider bias
A specific subtype of collider bias, where the collider you condition on is a pre-treatment variable — i.e., it occurs before the treatment/exposure, but is still influenced by it (or by factors associated with it). Why It's Tricky: In observational studies, researchers often want to adjust for pre-treatment covariates to “control for confounding.” But some pre-treatment variables are not confounders — they are colliders. Conditioning on them still opens spurious paths, despite being pre-treatment. Example (from the Beach & Hanlon diagram): Income is determined partly by residence choice and other socioeconomic factors. If you adjust for income when estimating the effect of coal pollution on infant mortality, you open a backdoor path (through the collider), causing bias.
30
intuitive pre collider bias example
Where’s the Pre-Treatment Collider Bias? Household income is a collider: it is influenced by parental education and neighborhood choice. If you condition on income, you open a spurious path between parental education and neighborhood. That in turn biases your estimate of the effect of lunch programs on performance — because school quality (which varies by neighborhood) is now falsely correlated with the program via income. 🎯 The Intuition: You think you're "controlling for income" to remove confounding, but in reality, you're: Opening a backdoor path that lets in bias, Connecting two causes (education and neighborhood) that were otherwise unconnected, And letting school quality sneak into the estimate as if it were part of the treatment effect.
31