Key Things to Memorize Flashcards by Fil Abbax

Actual definition of selection bias

the difference between the estimate of B1 from the (short) regression and the true causal effect of the variable on the outcome

How well did you know this?

Not at all

Perfectly

actual definition of OVB

OVB is the mathematical difference between the regression coefficients from the short regression and the long regression.

That long regression estimates the association of subways with pollution holding fixed land area, regulation, and population. If regulation and population capture the effects of confounders, then the estimate of this long regression will be closer to the true causal effect of
interest

How well did you know this?

Not at all

Perfectly

What does the OVB explain intuitively

Intuitively in context, the OVB formula tells us the short regression coefficient bundles 3 things (OR 2 IF ONLY 2 VARIABLES):
(1) the true causal effect of interest,
(2) the effect of Regulation on 𝑦 and the variation in regulation that is correlated with subways, and
(3) the effect of Population on 𝑦, and the variation in population that is correlated with subways

How well did you know this?

Not at all

Perfectly

OVB formula

coef short = coef long + (coef regressor omitted in short x coef auxiliary of reg of interest)

How well did you know this?

Not at all

Perfectly

What is the residual and when does it coincide with the error

The residual is the estimate of the error. They coincide if we have full population data in which case there is no sampling variability in the estimate of the error

How well did you know this?

Not at all

Perfectly

How does having full population data influence OLS

OLS is only approximating the conditional
expectation function, whether you have population data or a sample. If you have population data, you perfectly learn the coefficients that best approximate the conditional expectation function. If you have a sample, you have estimates of those.

Even if you have population-level data, the 𝛽0, 𝛽1, and 𝑢𝑖 might not represent the parameters and values of interest (not true causal effect)

How well did you know this?

Not at all

Perfectly

What is the mean on residuals in OLS

0 (always 0, this is a property of OLS)

How well did you know this?

Not at all

Perfectly

What is 𝐶𝑜𝑣(𝑥𝑖, 𝑢̂𝑖) in OLS?

0 (it is a property of OLS that residuals and regressors are uncorrelated)

How well did you know this?

Not at all

Perfectly

How does using an instrument remove the bias in OLS when you have simultaneity (reverse causality causing bias) - explain the intuition

when you use an instrument, we use the variation in the regressor that is created by the instrument. The intuition is that the instrument will create variation in the regressor that is unrelated to the reverse causality (e.g. demand for food options)

Isolate to one channel -> removes reverse causality

How well did you know this?

Not at all

Perfectly

What are the two assumptions for IV to be valid (in simple terms)

relevance i.e. cov(instrument, regressor) not equal to 0

exogeneity i.e. cov(instrument, ui) = 0

How well did you know this?

Not at all

Perfectly

what are the two parts of exogeneity

exclusion = instrument only affects the outcome through the regressor

as good as randomly assigned = the instrument is uncorrelated with all unobserved determinants of the outcome (residuals)

How well did you know this?

Not at all

Perfectly

How do you estimate the coefficient on the regressor of interest using 2SLS

form first stage, second stage and reduced form equations using your instrument

run regression on the first stage and reduced form equations to get the coefficients

coefficient of interest = reduced form coefficient / first stage coefficient

How well did you know this?

Not at all

Perfectly

what are the first stage, second stage and reduced form equations i.e. how do you form them

second stage = the classic regression that would be biased using OLS
for example audit rate = a + b(cheating) + e

first stage = regressor of interest = a + b(instrument for the regressor of interest) + e
for example cheating = a + b(average education) + e

reduced form = plug first stage into second stage and group terms
general form -> outcome of interest = a + b(instrument) = E
for example audit rate = a + b(average education) + e

How well did you know this?

Not at all

Perfectly

What does in mean for a parameter to be overidentified in 2SLS

if there are more instruments for the biased regressor (i.e. 2 instruments for 1 biased regressor), it is said to be overidentified

How well did you know this?

Not at all

Perfectly

What does it mean for a parameter to be just identified in 2SLS

same amount of instruments as biased regressors (i.e. 1 instrument for the 1 biased regressor) it is said to be just identified

How well did you know this?

Not at all

Perfectly

What does it mean for a parameter to be not identified in 2SLS

Study These Flashcards

There are no instruments for the biased regressor, so you cannot estimate the coefficient of interest

what is the benefit of 2SLS when you have overidentified parameters

Study These Flashcards

Gives an estimate of the coefficient of interest by incorporating the variation from both of the instruments as a weighted average

How does a first difference estimator overcome bias

Study These Flashcards

If the regressor is uncorrelated with ai (error term unchanged over time eg culture) then the differences and differences method removes the bias from it, because ai is constant over time and this method uses variation over time of the regressor to estimate the effect of interest.

How does the fixed effects estimator help control the concern of ai

Study These Flashcards

For each 𝑖 we define 𝛿𝑖 to be a dummy variable taking the value 1 if the observation is from city 𝑖. We include 𝛿𝑖 in our model for all 𝑖 (except for one to avoid the dummy variable trap and a violation of “no perfect collinearity”)

This is the same as doing regression with multiple regressors to capture effect of confounders

What does b0 show when you use fixed effects estimator

Study These Flashcards

the expected outcome in the omitted variable when regressor = 0

(Fixed effects) Assume ai = a(nyc). Regressor = PPM, outcome = absences. Omitted city = Chicago. What is the interpretation of a(nyc)?

Study These Flashcards

The average difference in the amount of absences in New York City compared to Chicago, holding fixed the value of PPM

How does fixed effects work with time periods

Study These Flashcards

same as other fixed effects method, but now binary variable that takes value 1 if observation is of day t. Include binary variable for all days except one in order to avoid dummy variable trap and violation of no perfect collinearity.

What is interpretation of b0 when you have two fixed effects, city and day (time)

Study These Flashcards

average outcome in that city on that day if regressor = 0

What is the interpretation of Ct in fixed effects (time) e.g. C(jan31)=100, c being the same as ‘a’ i.e. error term unchanged over time.

Study These Flashcards

outcome was higher by 100 on all the variables on Jan 31 (essentially just the difference between that binary variable and the rest)

What happens to a binary variable when you use first difference estimator

'difference away' - either always 0 or always 1 over that time period so is removed from the equation

What is the downside of using first difference estimators

We no longer have the bias due to the confounder of 𝑎𝑖, but we are also not able to estimate the effect of any binary variable we used on the outcome

What is the difference-in-differences estimator using i and t

assuming i=2 means treated [(i=2, t=2) - (1=2, t=1)] - [(i=1,t=2) - (i=1, t=1)] average outcome in variable (e.g. region) at time t

what is the critical assumption of the difference-in-differences estimator

parallel trends 'the change in outcome for untreated variable between t=1 and t=2 is what would have happened to treated variable if there had not been treatment" essentially counterfactual

What is the assumption in regression discontinuity

We need to assume that the individuals above the threshold who receive treatment are comparable to the ones below the threshold who do not receive treatment. We do this by only considering data from just below and just above the cutoff, and making sure the individuals cannot choose where they are relative to the cutoff. This would probably imply that it is as good as randomly assigned

What are the pros and cons of increasing window size in regression discontinuity

Increasing the window allows us to use more observations, which will decrease standard errors of estimated coefficients. However, by increasing the size of the window we might use observations that are not as comparable, and the “as good as randomly assigned” assumption is more likely to fail

What is standard deviation

The standard deviation measures the spread of 𝑦𝑖. It is a measure of the average distance between 𝑦𝑖 and the average value, 𝑦̅̅̅𝑁̅.

What is standard error

The standard error measures the spread of 𝑦̅̅̅𝑁̅. The idea is that, if we had a different draw of 𝑁 observations, we would have a different value of 𝑦̅̅̅𝑁̅. The standard error is a measure of on average how far 𝑦̅̅̅𝑁̅ is from the true mean when taking repeated samples

what is heteroskedasticity

Heteroskedasticity is when the variance of the error changes with the regressor.

Key Things to Memorize Flashcards

(33 cards)