Key Things to Memorize Flashcards

1
Q

Actual definition of selection bias

A

the difference between the estimate of B1 from the (short) regression and the true causal effect of the variable on the outcome

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

actual definition of OVB

A

OVB is the mathematical difference between the regression coefficients from the short regression and the long regression.

That long regression estimates the association of subways with pollution holding fixed land area, regulation, and population. If regulation and population capture the effects of confounders, then the estimate of this long regression will be closer to the true causal effect of
interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the OVB explain intuitively

A

Intuitively in context, the OVB formula tells us the short regression coefficient bundles 3 things (OR 2 IF ONLY 2 VARIABLES):
(1) the true causal effect of interest,
(2) the effect of Regulation on 𝑦 and the variation in regulation that is correlated with subways, and
(3) the effect of Population on 𝑦, and the variation in population that is correlated with subways

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

OVB formula

A

coef short = coef long + (coef regressor omitted in short x coef auxiliary of reg of interest)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the residual and when does it coincide with the error

A

The residual is the estimate of the error. They coincide if we have full population data in which case there is no sampling variability in the estimate of the error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does having full population data influence OLS

A

OLS is only approximating the conditional
expectation function, whether you have population data or a sample. If you have population data, you perfectly learn the coefficients that best approximate the conditional expectation function. If you have a sample, you have estimates of those.

Even if you have population-level data, the 𝛽0, 𝛽1, and 𝑒𝑖 might not represent the parameters and values of interest (not true causal effect)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the mean on residuals in OLS

A

0 (always 0, this is a property of OLS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is πΆπ‘œπ‘£(π‘₯𝑖, 𝑒̂𝑖) in OLS?

A

0 (it is a property of OLS that residuals and regressors are uncorrelated)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does using an instrument remove the bias in OLS when you have simultaneity (reverse causality causing bias) - explain the intuition

A

when you use an instrument, we use the variation in the regressor that is created by the instrument. The intuition is that the instrument will create variation in the regressor that is unrelated to the reverse causality (e.g. demand for food options)

Isolate to one channel -> removes reverse causality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the two assumptions for IV to be valid (in simple terms)

A

relevance i.e. cov(instrument, regressor) not equal to 0

exogeneity i.e. cov(instrument, ui) = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what are the two parts of exogeneity

A

exclusion = instrument only affects the outcome through the regressor

as good as randomly assigned = the instrument is uncorrelated with all unobserved determinants of the outcome (residuals)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you estimate the coefficient on the regressor of interest using 2SLS

A

form first stage, second stage and reduced form equations using your instrument

run regression on the first stage and reduced form equations to get the coefficients

coefficient of interest = reduced form coefficient / first stage coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are the first stage, second stage and reduced form equations i.e. how do you form them

A

second stage = the classic regression that would be biased using OLS
for example audit rate = a + b(cheating) + e

first stage = regressor of interest = a + b(instrument for the regressor of interest) + e
for example cheating = a + b(average education) + e

reduced form = plug first stage into second stage and group terms
general form -> outcome of interest = a + b(instrument) = E
for example audit rate = a + b(average education) + e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does in mean for a parameter to be overidentified in 2SLS

A

if there are more instruments for the biased regressor (i.e. 2 instruments for 1 biased regressor), it is said to be overidentified

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does it mean for a parameter to be just identified in 2SLS

A

same amount of instruments as biased regressors (i.e. 1 instrument for the 1 biased regressor) it is said to be just identified

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does it mean for a parameter to be not identified in 2SLS

A

There are no instruments for the biased regressor, so you cannot estimate the coefficient of interest

17
Q

what is the benefit of 2SLS when you have overidentified parameters

A

Gives an estimate of the coefficient of interest by incorporating the variation from both of the instruments as a weighted average

18
Q

How does a first difference estimator overcome bias

A

If the regressor is uncorrelated with ai (error term unchanged over time eg culture) then the differences and differences method removes the bias from it, because ai is constant over time and this method uses variation over time of the regressor to estimate the effect of interest.

19
Q

How does the fixed effects estimator help control the concern of ai

A

For each 𝑖 we define 𝛿𝑖 to be a dummy variable taking the value 1 if the observation is from city 𝑖. We include 𝛿𝑖 in our model for all 𝑖 (except for one to avoid the dummy variable trap and a violation of β€œno perfect collinearity”)

This is the same as doing regression with multiple regressors to capture effect of confounders

20
Q

What does b0 show when you use fixed effects estimator

A

the expected outcome in the omitted variable when regressor = 0

21
Q

(Fixed effects) Assume ai = a(nyc). Regressor = PPM, outcome = absences. Omitted city = Chicago. What is the interpretation of a(nyc)?

A

The average difference in the amount of absences in New York City compared to Chicago, holding fixed the value of PPM

22
Q

How does fixed effects work with time periods

A

same as other fixed effects method, but now binary variable that takes value 1 if observation is of day t. Include binary variable for all days except one in order to avoid dummy variable trap and violation of no perfect collinearity.

23
Q

What is interpretation of b0 when you have two fixed effects, city and day (time)

A

average outcome in that city on that day if regressor = 0

24
Q

What is the interpretation of Ct in fixed effects (time) e.g. C(jan31)=100, c being the same as β€˜a’ i.e. error term unchanged over time.

A

outcome was higher by 100 on all the variables on Jan 31 (essentially just the difference between that binary variable and the rest)

25
Q

What happens to a binary variable when you use first difference estimator

A

β€˜difference away’ - either always 0 or always 1 over that time period so is removed from the equation

26
Q

What is the downside of using first difference estimators

A

We no longer have the bias due to the confounder of π‘Žπ‘–, but we are also not able to estimate the effect of any binary variable we used on the outcome

27
Q

What is the difference-in-differences estimator using i and t

A

assuming i=2 means treated

[(i=2, t=2) - (1=2, t=1)] - [(i=1,t=2) - (i=1, t=1)]

average outcome in variable (e.g. region) at time t

28
Q

what is the critical assumption of the difference-in-differences estimator

A

parallel trends

β€˜the change in outcome for untreated variable between t=1 and t=2 is what would have happened to treated variable if there had not been treatment”

essentially counterfactual

29
Q

What is the assumption in regression discontinuity

A

We need to assume that the individuals above the threshold who receive treatment are comparable to the ones below the threshold who do not receive treatment.
We do this by only considering data from just below and just above the cutoff, and making sure the individuals cannot choose where they are relative to the cutoff.
This would probably imply that it is as good as randomly assigned

30
Q

What are the pros and cons of increasing window size in regression discontinuity

A

Increasing the window allows us to use more observations, which will
decrease standard errors of estimated coefficients. However, by increasing the size of the window we might use observations that are not as comparable, and the β€œas good as randomly assigned” assumption is more likely to fail

31
Q

What is standard deviation

A

The standard deviation measures the spread of 𝑦𝑖. It is a measure of the average distance between 𝑦𝑖 and the average value, 𝑦̅̅̅𝑁̅.

32
Q

What is standard error

A

The standard error measures the spread of 𝑦̅̅̅𝑁̅. The idea is that, if we had a different draw of 𝑁 observations, we would have a different value of 𝑦̅̅̅𝑁̅. The standard error is a measure of on average how far 𝑦̅̅̅𝑁̅ is from the true mean when taking repeated samples

33
Q

what is heteroskedasticity

A

Heteroskedasticity is when the variance of the error changes with the regressor.