Instrument Variables Flashcards
What is instrument variables?
X is correlated with the error term. Think of X as a two parts: one part that is correlated and one that is not correlated with the error term. We isolate the part that correlates with the error term by using instruments.
One study from USA was on how being in the military affected the future income. One can with good reason believe that people that volunteer for the military comes from poor neighborhoods which is correlated with less future income. Besides that, there is also good reason to believe that people from good neighborhoods have more money and power, hence not joining the military. To fix this bias, there there was used an instrument on people getting drafted to the military. Being drafted to the military is completely random. By using this as an instrument, one was able to split the X into two parts and eliminate the effect that people in the military might come from poor neighborhoods.
Endogenity and Exogenity
Engogen: Variables that are correlated with the error term
Exogen: Variables that are not correlated with the error term. (but things outside of the model?
What are the conditions for a valid instrument?
RELVEVANT and EXOGENOUS: The two conditions for a valid instrument:
- Instrument relevance: if an instrument is relevant, then variation in the instrument is relevant to the variation in X.
- Instrument Exogeneity: Z is correlated with Y solely through its correlation with X.
Relevant: It is relevant in a way that the IV actually affects X
Exogenous: Our IV only affect Y through X.
How do we use IV?
- Z is correlated with X, but not with error term. It has to satidy conditions for Relevance and exogeneity
Use Two Stage Least Square:
- Regress X on the Instrument Variable (X as dependent)
- Use the calculated Y variable of this regression in the original
Lets call the instrument for Z. if it satisfy the two conditions of relevance and exogeneity, we can estimate B1 by using an IV estimator called two least squares (TSLS). TSLS is calculated in two stages. First stage splits X into two parts; one part that is problematic and might be correlated to the error term, and one other part that is problem-free. The second stage uses the problem free part to estimate B1.
In the first stage you regress x on its instrument that gives X(hatt).
You then put X(hatt) in the regression.
From our example, the intuition is now that we only regress future salary on those who got draftet, thus eliminates the bias that veterans usually earn less.
How many instruments can we have? What do we call these models?
It is UNDERIDENTIFIED if it has less IV than endogeneous variables. Can not be computed
It is EXACTLY IDENTIFIED if it has the same number of IV’s and endogenous. It can now be computed but cannot be tested. Hence, one would need a good storytelling, economic knowledge to be certain that it is the right one to use.
It is OVER-IDENTIFIED if it has more instruments than endogenous variables. It can now be tested if the instruments are RELEVANT and EXOGENOUS.
How do we test the instrument variables?
1.First step is to test for relevance:
First we make a regression of X and all of its IV’s.
H0: Instruments do not have any effect on X. If H0 is rejected, the instruments are relevant. A general rule of thumb for this testing is to look at the F-statistics. If it is greater than 10, we reject H0, concluding that the instruments are relevant. If it is lower than 10, it implies that the instruments are weak.
- Second step is to test for Exogeneity:
We use a J-test one the error term to see if the instruments are exogenous. If they are exogenous, it means that all of our IV’s has a conditional mean of 0.
The null hypothesis of the J-test is that all of our instruments are exogen variables and that they don’t have any relation to the error term. This gives us a chi-squared distribution with (m – k) numbers of freedom where m is the number of instruments used an k is the number of endogen variables. We then compute an F-test and look at the p-value. A p-value over 0,05 tells us that all of our IV’s are exogenous with 95% certainty. So we want to keep H0.
When we do a J-test, our nullhypothesis is that all of our instrument is exogenous variables. And that they don’t have any relation to the error term. This gives us a chi-squared distribution with (m – k) numbers of freedom, where m is # of instruments and k is # of endogen variables. We then compute an F-test and look the p-value. WE WANT A P-VALUE THAT’S HIGHER THAN 0,05 OR 0,10, BECAUSE THIS TELLS US THAT OUR IV’S ARE EOGENOUS. SO WE WANT TO KEEP THE H0 IN THIS CASE, OR WE WANT TO FAIL TO REJECT THE H0.
What are the two assumptions of including an instrumental varable?
It should be relevant (there should be a (strong) correlation to the explanatory variable) and exogenous (there should be no correlation to the error term)
Instrumental variables: What test is used to test the second assumption of whether an IV-variable is exogenous?
The j-statistic. Can only be used when the regression is overidentified, and it tests whether the error is explained/correlated to the instrumental variable. The null-hypothesis is, that the terms are uncorrelated.
What are some of the drawbacks of using IVs?
- It is difficult to find good estimates that captures all of the variance of the endogenous variables, which is not correlated with the error. 2. The instrument is often not well correlated with the endogenous variable, which is a problem (weak instrument/low relevance). 3. The OLS standard errors from the second stage regression are not right (use the ones provided by the software).
Underidentified
has less IV than endogeneous variables. Can not be computed
EXACTLY IDENTIFIED
has the same number of IV’s and endogenous. It can now be computed but cannot be tested. Hence, one would need a good storytelling, economic knowledge to be certain that it is the right one to use.
OVER-IDENTIFIED
as more instruments than endogenous variables. It can now be tested if the instruments are RELEVANT and EXOGENOUS.
IV variable need to satisfy which assumptions?
- Correlate with X, but not with the error term.
- Relevance
- Exogeneity
How do we test IV variables for relevance?
- Make a regression of X and all of its IV’s (First step, X = b0 + b1*Z)
Now have the First Stage Regression
H0: All IV’s have no effect on X
- If rejected, they are relevant
- Look at F-stat, role of thumb is F-Stat over 10
If IV’s is not > X, cannot test
Consequences of weak instruments
If so, the TSLS estimator will be
- biased, and
- statistical inferences (standard errors, hypothesis tests, confidence intervals) can be misleading