4.IV Assignment Flashcards
Vad är attenuation bias?
Vilken effekt har det?
Hur löser vi det?
Classical measurement error.
Regression dilution, also known as regression attenuation, is the biasing of the regression slope towards zero (the underestimation of its absolute value), caused by errors in the independent variable.
Det är alltså mätfel i X variabeln. När vi har mätfel u Y får vi osäkerhet, men här får vi ett downward bias.
Har vi en positiv effekt (B > 0) kommer vi underestimera effekten med biaset. Är vår effekt negativ ( B < 0) så vi överskatta effekten nedåt.
Vi kan använda en instrumentvariabel för att få bort det. Även om det till och med också är ett brusigt instrument.
Angrist and Krueger (1991) use an instrumental variables (IV) strategy to overcome
the endogeneity problem in their study. Vilka är e key identifying assumptions underlying their empirical strategy?
The IV strategy relies on the assumption that a child’s quarter of birth (QOB) provides a valid instrument for the endogenous variable, namely years of schooling. For this to be the case, the QOB instrument needs to satisfy the following conditions:
- Relevance: The instrument affects the endogenous regressor (i.e. “there is a first-stage”). This means that the instrument cannot be completely uninformative about the endogenous regressor. In this setting, this requires that a child’s QOB needs to affect his/her years of schooling. Corr(X,Z) ≠ 0.
They show that the null hypothesis that the coefficients on the instruments are jointly zero in the first-stage regressions is rejected.
Exogeneity: The instrument is as good as randomly assigned. In practice, this means that the child’s QOB is not correlated with unobserved child characteristics. In Section III of the paper, the authors consider one important potential source of violation of the exogeneity assumption.
namely that children’s QOB may be correlated with their parents’ socioeconomic status. This is a concern since it is well-known in labor economics that parents’ socioeconomic background affects children’s educational attainment and labor market outcomes. However, referring to existing research, they argue that this seems unlikely to be the case.
Exclusion restriction: The instrument only affects outcomes through its effect on the endogenous variable. The exclusion restriction is tightly linked to the exogeneity assumption. In practice, this assumption requires that a child’s QOB cannot directly affect his/her later earnings or wages.
QOB could directly
affect a child’s years of schooling because it affects his/her relative age within the class. However, they argue that there is no clear evidence
of such a relative age effect on years of schooling. They also note that the coefficients on the QOB instruments are not jointly statistically significantly different from zero when including them in the structural equation.
Vilka problem har får man om instrumentet/instrumenten (m) man använder är svaga då man har en just-identified model (m = k) respektive over-identified model (m > k)?
Just-identified models:
In this case, the IV estimator is not biased unless the instruments are very weak (the first-stage coefficients are very close to zero). However, the standard errors of the IV estimator may be too small
Over-identified models : For these models, weak instruments are more problematic because the 2SLS estimator is biased, but consistent. With weak instruments, 1) the 2SLS estimate is biased
toward the OLS estimate, and 2) the 2SLS standard errors may be too small (as with the IV estimator).
Generellt så gör också svaga instrument att any small violations of the exclusion restriction to be magnified, leading to more biased and inconsistent estimates.
Vilka test ska man göra om man har en överidentifierad modell och eventuellt svaga instrument?
Vilket antagande gör man då?
Göra överidentifikationstest.
Over-identification test only really makes sense under the
assumption of homogenous treatment effects.
If we instead allow for heterogenous treatment effects, then a rejection of the null hypothesis in the over-identification test is hard to interpret: it can arise either because at least one instrument is invalid, or it may arise because they identify different LATEs.
But even with homogenous treatment effects, a rejection of the over-identification test only suggests the some of the instruments are invalid. It does not, however, tell you which instrument(s) is/are the problematic one(s).
Vilken effekt har svaga instrument på F-statistiken i fist stage?
With weak instruments, the first-stage F-statistic varies
inversely with the number of instruments.
Adding more weak instruments lowers
the F-statistic and thereby increases the bias of the 2SLS estimate. Men
Om man har svaga instrument, vilken ska man lite mest på, wald estimator eller 2SLS?
Given all these problems we have noted with our 2SLS estimates, it seems safe to say that we should trust the simple Wald estimate in Exercise 2 more than the 2SLS estimates.
In the general context of instrumental variables, explain the concept of compilers, never-takers and
always-takers
• Compliers: Individuals who only complete more years of schooling because they were born later in the year, i.e.
Di(1) = 1 and Di(0) = 0.
• Always-takers: Individuals who complete more years of schooling regardless of their QOB, i.e.
Di(1) = 1 and Di(0) = 1.
• Never-takers: Individuals who complete less years of schooling regardless of their QOB, i.e.
Di(1) = 0 and Di(0) = 0.
As we can see, compliers are the only ones who respond to the instrument.
Om vi tänker att utbilding påverkar individer olika. Vilken effekt är det då man får med IV i den här studien?
In this case, the IV/2SLS
estimator identifies a LATE. We can then reason as follows:
If each individual is a complier, then the LATE estimator coincides with the ATE, and our IV/2SLS estimates should be interpreted as ATEs.
If there are no always-takers, then by the “Bloom result”, we know that our 2SLS/IV LATE estimates coincide with ATT.
• If each individuals’ decision to pursue further education is not influenced by returns to education, then we know that the LATE estimator coincides with ATT.
• In other cases (i.e. when there are both never- and always-takers and schooling decisions are affected by the returns to education), our 2SLS/IV estimates should be interpreted as only
LATEs.
Säg att vi har en biner utfallsvariabel D som är = 1 om en person har gått åtminstone 12 år i skolan.
Z är ett instrument som säger om man är född sent på året eller inte.
Vi vet att Pr[Di = 1] = 0,77. Dvs 77% av personerna i vårt sample är har gått åtminstone 12 år i skolan.
Vi vet också att Pr[Zi = 1] = 0,75. Dvs 75% är födda sent på året (75% är i behandlingsgruppen).
Givet den informationen, hur räknar vi ut andelen, compliers, always takers och never-takers?
Givet antagandet monotonicity (instrumentet påverkar bara behandlingen åt ett håll, man blir bara behandlad och inte tvärtom) ges andelen compilers av regressionkoeffecienten av D mot Z (alltså den effekten Z har på D) säg, B = 0.012 = 1.2%
Givet monotonicity kommer de med D = 1 vara antingen compliers eller alwastakers.
Vi räknar först ut hur många compliers det är bland behandlade genom att ta [Z = 1]/[D=1] x B.
I vårt exempel blir det 0.0117, 1.17%
Sedan Pr[D=1] x (1 - 0.0117) och får att andelen always takers är 76.1 %.
Andelen never takers är då det som blir kvar. alltså 100% - 1.2% - 76% = ca 22.7%
I det här exemplet ser man alltså att andelen compliers är väldigt liten.
Vad är ATE och hur ska det tolkas?
ATE = Average treatment effect = den kausala effekten av behandlingen E[Y1] - E[Y0].
Har man inte en väldigt bra RCT med perfekt kontroll får man inte fram den här. Men det är denna man vill estimera.
Y1 = utfallet med behandling för en individ
Y0 = utfallet utan behandling.för en individ
Hur tolkar man E[X|Z = 1] - E[X|Z=0]?
Det är alltså effekten av Z på X vilket man tolkar som sannolikheten att någon är en complier, alltså share of compliers.
Hur tolkar manE[Y|Z = 1] - E[Y|Z=0] / E[X|Z = 1] - E[X|Z=0]
Detta är local average treatment effect och är alltså effekten viktad med andelen compliers. Om andelen compliers är 100% är LATE och ATE samma sak.
Enligt ANgrist osv “LATE is the effect of the treatment on the population of compliers”
Vad är ett sätt att visa att Cov(u, Z) = 0?
Om man kör 2SLS och lägger till kontroller ska estimatet i princip inte ändras, bara precision.
Vilka är antagandena bakom LATE?
independence (Z oberoende av alla potentialla utfall),
exclusion restriction ( Cov [zi,εi] = 0),
existence of a first stage (Cov(X, Z)≠0),
and monotonicity.
Om vi har ett så svagt instrument så att vårt F = 0, vad betyder det?
2SLS = OLS
Där av att 2SLS är biased towards OLS, ju större bias desto mer nära är de.