IV Flashcards
In an IV, we impose the mechanics directly by the exclusion restriction. But as we have learned, this is not likely to hold. How can we use IV to try and learn about the ceteris paribus (partial derivative) effects if we have violations of the exclusion restriction?
When estimating a DD, we are actually estimating a reduced form effects. We then need to discuss the chanel (mecanism) behind our reduced form effect. That is one approach we can do with an instrument, we just estimate the reduced form effect as in an DD and speculate. Another approach is to estimate the reduced form effect and then run a structural model in order to learn about the mechanism.
What is the thing causing the problem with weak instruments?
Weak instruments cause problems since \beta_{IV} (the estimator) is an ratio! The weak instrument will contaminate the denominator with bad variation. This is what causes al the problem. If we have an strong instrument, we will not have so much concern even though we might have some endogeneity.
Research questions that cannot be answered by an experiment is called…..? Give a example.
FUQed questions: It stands for “fundamentally unidentified question.” In their book Mostly Harmless Econometrics, economists Joshua Angrist and Jörn-Steffen Pischke define FUQs as“research questions that cannot be answered by an experiment.”
Gender, ability etc can not be randomly assigned and are thouse FUQed. We could however, randomized gender CV’s and see how employers respond.
What is a IV-regression with fixed effects?
If we run a IV-regression with fixed effects we have fuzzy DiD. It thus assumes we have parallel trends. It is thus a very strange way of looking at the data. It is hard to understand where the identification comes from.
If we have a fuzzy DiD, we most have parallel-trends.
What format should the outcome variable be in in a panel data study?
In a panel data, it is levels that should be included otherwises the beta coefficient will not have the interpretation that they think. We should thus not use first difference.
What will happen if we have panel data and do not account for serial correlation? Particularly in a fuzzy DID setting.
Using panel data, we should account for time-series variation. Not doing this will drive up the F-value
What do we think about robustness checks not targeted towards the identification?
Having a lot of robustness-checks not targeted towards the identification is a way of trying to fool the reader. This is really bad.
What can be a problem when conducting a study based on contemporaneous historians statements about the past?
We will have measurement errors.
What do we need to do when we are estimating a interaction model?
Every time we are making a interaction model we most de-mean the model. That is, taking out the mean value of the interaction variable. This do not change the coefficients on the interactions, but it changes the coefficients for the main effects.
We thus do this in order to interoperate the main effects.
How should we think about units that can not be treated?
Someone that can not be treated, is not a good control. For example, some people do not have genes for being depressed.
Should we state that we are assuming normality?
Someone that can not be treated, is not a good control. For example, some people do not have genes for being depressed.
What do Per think about Random- effects?
Random effects do not solve anything, why we haven’t talked about it so much in this course.
What is a “poor-man’s” instrument?
Lag-depentent treatment and aggregating.
How much endogeneity do we have when the instrument is as randomly assigned?
Non
If we are to standardize some measures, what assumptions do we make?
To be able to standardize something, it requires that all the variables accualy can be on the same scale, that they have the same functional form etc.
What do we need to do if we have a cluster IV?
If we have a cluster IV, it is essential to test whether the clusters creates a leverage-problem or not. This is done by excluding one cluster at the time, a leave-one-out estimate. If estimates change by much, we have a cluster problem.
Keane and Neal (2021) arge that some other method can perform better than a IV, which?
A controlled OLS can actually do better than an IV. Their simulations suggest first-stage F must be well above 10 to have high confidence 2SLS will outperform OLS, unless endogeneity is quite severe. So unless a higher standard can be met, OLS combined with a serious attempt to control for sources of endogeneity may be a superior research strategy to reliance on IV.
What is a good think with the tF test regarding published papers?
The tF adjustment can be easily applied to re-assess studies that have already been published, provided that the first-stage F-statistic has been reported, and does not require access to the original data.
Who is discussing the problem with one instrument causing many different behaviors (violation of exclusion restriction)?
Mellon (2021); Gallen and Raymond (2021).
Who is proposing the AR-test and tF-adjustment?
AR = Keane and Neal (2021) tF = Lee et al (2021)
Will covid-19 be a good instrument to use?
COVID-19 has affected everything, so it is not plausibly a natural experiment or a random assignment for anything but itself. COVID-19 may also be affected by the weather (Shen, Cai, and Li 2020), causally attaching it to the messy web of relationships.
If we have a violation of exclusion restriction, what can we do?
We can estimate the reduced form and just analyse that total derivative effect. E.g the effect of weather on something. But we can not use weather as a instrument.
What test do Gallen and Raymond (2021) suggest?
They propose a new test related to the Hausman test: running a “single paper” IV regression ignoring the other potentially endogenous covariates, and comparing the regression coefficient of interest to an IV regression that includes all those potentially endogenous variables as exogenous controls. Statistical equality between coefficients suggests that either their biases are both small, or they are coincidentally similar.
What are the identifying assumptions in IV?
Exclusion restriction (no direct effect of Z → Y), Relevance (there is a first stage) and Exogeneity (instrument is as good as random assigned).
What is the exclusion restriction and how can it be tested?
Exclusion restriction = The instrument (Z) have no direct effect on the outcome variable (Y)
Can not explicitly be tested but we can
- Argue with theory
- Run a refutability test
- If we find a sample where there is no first stage, we should have no reduced form effect. If we do, there is an violation of the exclusion restriction.
- Show that no other studies uses the same instrument
- If we have collective us of the same instrument, there will be violation of the exclusion restriction.
- The same instrument can’t be used by any one else.
- J-test (over-identification test)
What is instrument relevance and how can it be tested?
- F-test
- F-value should be at least 105
- F-Value can be at least 10 if we allow for the CI to be approximately around 95%.
- Anderson and Rubin → AR-test
- P-value on the reduced form.
- If significant. We are on the safe side
- tF-test
- AR-alternative that uses an correction factor for the SE.
- Do not need access to data. Can be done on published papers with reported f-stats.
What is instrument exogeneity and how can it be tested?
Exogeneity = cov(Z, \epsilon) = 0. Instrument is as good as randomly assigned.
- This assumption can not be explicitly tested but we can
- Argue with theory
- Show that Z is uncorrelated with other covariates in our study
- Add controls to show that the estimates do not change
What is the problem with weak instruments?
Weak instrument = weak first stage.
Asymptotic theory do not hold if we have weak instruments. We will make type-1 errors! We will find significant effects when there are non.
With weak instruments, our estimates will be biased towards OLS (with OVB) with more (false) precision. When F → 0. We will get the same bias as in an OLS.
What is the suggested level of F for two tailed test?
F >105 (104.7)
What is Angrists (2021) view on weak instruments?
Angrist and Kolesár (2021) argue that there is no big problem with weak instruments if we do not have much endogeneity since we work in finite samples. The authors propose that we should screen on the first stage. That is, we should have some sense regarding the sign and size off our first stage. Then conditioning on the sign/size of our first stage, we can omit this problem.
How should we think about weak instruments if we have a micro-data paper?
If we are doing a micro application there might not be any problem if our F > 10, but we should at least refer to the discussion about weak IV’s even though we can’t solve it. We also always look at the reduced form and see whether the p-value is smaller than 5.
What is a refutability test?
A kind of placebo test to test if the exclusion restriction holds. If we find a sample where there is no first stage, we should have no reduced form effect. If we do, there is an violation of the exclusion restriction.
How can we fix the problem with weak instruments
- See if our reduced form p-value is below 0.05
- or use this $tF$ to fix our SE.
- See if our reduced form p-value is below 0.05
- or use this tF to fix our SE.
Violation of the exclusion restriction is worse with weeker instruments and not so bad we a really strong instrument. The problem is a negative function of compliance.
What is the problems with clusters and high leverage in IV?
If we have non iid (clustered design) and high leverage (outliers), our normal approximation will be bad. This is discussed in @Young (2021). Most papers have high leverage and cluster problem. Adding a weak instrument to that design and we’re in trouble.
How to test if we have the problem with no iid and high leverage?
If we have clustered date ( > 50) we clustar our SE at the correct level. Then we run the regression while exclude one cluster at the time. If our point estimates change much when we are excluding some particular states, then we this type of problem.
This is called the “delete-one-sensitivity” - test
What are the only credible instruments according to Per?
Per claims that the only credible instruments are those that are derived from an RCT or an fuzzy RDD. If we have an RCT with partial compliance, we actually need to use the IV-approach with the assignment as the instrument. That is a credible design.
What about heterogeneity in IV-estimations?
If we have heterogeneous treatment effects we can’t do the over identification test (testing if different instruments yields the same effects) since in a heterogeneous world, different instruments accualy can yield different effects.
In a IV-setting. If we have two instruments that yield different estimates, we do not know what instrument is bad or if they are good and actually capturing different effects since the treatment yields heterogeneous effects in the population.
What do AR-test stand for?
Anderson and Rubin