Key Things to Memorize Flashcards
Actual definition of selection bias
the difference between the estimate of B1 from the (short) regression and the true causal effect of the variable on the outcome
actual definition of OVB
OVB is the mathematical difference between the regression coefficients from the short regression and the long regression.
That long regression estimates the association of subways with pollution holding fixed land area, regulation, and population. If regulation and population capture the effects of confounders, then the estimate of this long regression will be closer to the true causal effect of
interest
What does the OVB explain intuitively
Intuitively in context, the OVB formula tells us the short regression coefficient bundles 3 things (OR 2 IF ONLY 2 VARIABLES):
(1) the true causal effect of interest,
(2) the effect of Regulation on π¦ and the variation in regulation that is correlated with subways, and
(3) the effect of Population on π¦, and the variation in population that is correlated with subways
OVB formula
coef short = coef long + (coef regressor omitted in short x coef auxiliary of reg of interest)
What is the residual and when does it coincide with the error
The residual is the estimate of the error. They coincide if we have full population data in which case there is no sampling variability in the estimate of the error
How does having full population data influence OLS
OLS is only approximating the conditional
expectation function, whether you have population data or a sample. If you have population data, you perfectly learn the coefficients that best approximate the conditional expectation function. If you have a sample, you have estimates of those.
Even if you have population-level data, the π½0, π½1, and π’π might not represent the parameters and values of interest (not true causal effect)
What is the mean on residuals in OLS
0 (always 0, this is a property of OLS)
What is πΆππ£(π₯π, π’Μπ) in OLS?
0 (it is a property of OLS that residuals and regressors are uncorrelated)
How does using an instrument remove the bias in OLS when you have simultaneity (reverse causality causing bias) - explain the intuition
when you use an instrument, we use the variation in the regressor that is created by the instrument. The intuition is that the instrument will create variation in the regressor that is unrelated to the reverse causality (e.g. demand for food options)
Isolate to one channel -> removes reverse causality
What are the two assumptions for IV to be valid (in simple terms)
relevance i.e. cov(instrument, regressor) not equal to 0
exogeneity i.e. cov(instrument, ui) = 0
what are the two parts of exogeneity
exclusion = instrument only affects the outcome through the regressor
as good as randomly assigned = the instrument is uncorrelated with all unobserved determinants of the outcome (residuals)
How do you estimate the coefficient on the regressor of interest using 2SLS
form first stage, second stage and reduced form equations using your instrument
run regression on the first stage and reduced form equations to get the coefficients
coefficient of interest = reduced form coefficient / first stage coefficient
what are the first stage, second stage and reduced form equations i.e. how do you form them
second stage = the classic regression that would be biased using OLS
for example audit rate = a + b(cheating) + e
first stage = regressor of interest = a + b(instrument for the regressor of interest) + e
for example cheating = a + b(average education) + e
reduced form = plug first stage into second stage and group terms
general form -> outcome of interest = a + b(instrument) = E
for example audit rate = a + b(average education) + e
What does in mean for a parameter to be overidentified in 2SLS
if there are more instruments for the biased regressor (i.e. 2 instruments for 1 biased regressor), it is said to be overidentified
What does it mean for a parameter to be just identified in 2SLS
same amount of instruments as biased regressors (i.e. 1 instrument for the 1 biased regressor) it is said to be just identified
What does it mean for a parameter to be not identified in 2SLS
There are no instruments for the biased regressor, so you cannot estimate the coefficient of interest
what is the benefit of 2SLS when you have overidentified parameters
Gives an estimate of the coefficient of interest by incorporating the variation from both of the instruments as a weighted average
How does a first difference estimator overcome bias
If the regressor is uncorrelated with ai (error term unchanged over time eg culture) then the differences and differences method removes the bias from it, because ai is constant over time and this method uses variation over time of the regressor to estimate the effect of interest.
How does the fixed effects estimator help control the concern of ai
For each π we define πΏπ to be a dummy variable taking the value 1 if the observation is from city π. We include πΏπ in our model for all π (except for one to avoid the dummy variable trap and a violation of βno perfect collinearityβ)
This is the same as doing regression with multiple regressors to capture effect of confounders
What does b0 show when you use fixed effects estimator
the expected outcome in the omitted variable when regressor = 0
(Fixed effects) Assume ai = a(nyc). Regressor = PPM, outcome = absences. Omitted city = Chicago. What is the interpretation of a(nyc)?
The average difference in the amount of absences in New York City compared to Chicago, holding fixed the value of PPM
How does fixed effects work with time periods
same as other fixed effects method, but now binary variable that takes value 1 if observation is of day t. Include binary variable for all days except one in order to avoid dummy variable trap and violation of no perfect collinearity.
What is interpretation of b0 when you have two fixed effects, city and day (time)
average outcome in that city on that day if regressor = 0
What is the interpretation of Ct in fixed effects (time) e.g. C(jan31)=100, c being the same as βaβ i.e. error term unchanged over time.
outcome was higher by 100 on all the variables on Jan 31 (essentially just the difference between that binary variable and the rest)
What happens to a binary variable when you use first difference estimator
βdifference awayβ - either always 0 or always 1 over that time period so is removed from the equation
What is the downside of using first difference estimators
We no longer have the bias due to the confounder of ππ, but we are also not able to estimate the effect of any binary variable we used on the outcome
What is the difference-in-differences estimator using i and t
assuming i=2 means treated
[(i=2, t=2) - (1=2, t=1)] - [(i=1,t=2) - (i=1, t=1)]
average outcome in variable (e.g. region) at time t
what is the critical assumption of the difference-in-differences estimator
parallel trends
βthe change in outcome for untreated variable between t=1 and t=2 is what would have happened to treated variable if there had not been treatmentβ
essentially counterfactual
What is the assumption in regression discontinuity
We need to assume that the individuals above the threshold who receive treatment are comparable to the ones below the threshold who do not receive treatment.
We do this by only considering data from just below and just above the cutoff, and making sure the individuals cannot choose where they are relative to the cutoff.
This would probably imply that it is as good as randomly assigned
What are the pros and cons of increasing window size in regression discontinuity
Increasing the window allows us to use more observations, which will
decrease standard errors of estimated coefficients. However, by increasing the size of the window we might use observations that are not as comparable, and the βas good as randomly assignedβ assumption is more likely to fail
What is standard deviation
The standard deviation measures the spread of π¦π. It is a measure of the average distance between π¦π and the average value, π¦Μ Μ Μ πΜ .
What is standard error
The standard error measures the spread of π¦Μ Μ Μ πΜ . The idea is that, if we had a different draw of π observations, we would have a different value of π¦Μ Μ Μ πΜ . The standard error is a measure of on average how far π¦Μ Μ Μ πΜ is from the true mean when taking repeated samples
what is heteroskedasticity
Heteroskedasticity is when the variance of the error changes with the regressor.