Misspecification and Model Selection Flashcards
3 types of misspecification
Omitted relevant variables (underfitting)
Inclusion of irrelevant variables (overfitting)
Incorrect function form
Omitting relevant variables:
In a misspecified model, what happens to our estimate of β₁?
β~₁ - which is β₁ plus something else
What is the final expectation of β~₁
E(β~₁) = β₁+β₂ x Cov(Xi,Zi)/Var(Xi) ≠ β₁
Zi is the relevant variable that has been omitted.
And obviously ≠ β₁ since omitted Zi, so misspecified and OLS is biased!
So OLS is biased in this underspecified model:
Unless.. (2)
Cov(Xi,Zi)=0 (Zi unrelated to Xi)
β₂=0 i.e Zi is not actually relevant!
How do we know the sign of bias, i.e know if we are overestimating or underestimating β₁?
If covariance and β₂ have same signs i.e both > 0 or both <0 , we get positive bias i.e overestimating β₁.
if they have opposite signs i.e Cov>0 but β₂<0, or Cov<0 and β₂>0 , we get negative bias, underestimate β₁
(and of course if either =0 no bias! as mentioned in previous FC)
Suppose we estimate wage on extra year of education.
True model is
ln(wagei) = β₀+ β₁ Educi + β₂ Abilityi + εi
But of course ability is hard to measure. So we omit it.
What is our β~₁ (use formula)
β~₁=β₁+β₂ x Cov(Educ,Ability)/Var(Educ)
using this, would our estimate of β₁ be positive (overestimated) or negative (underestimated) bias?
Since we expect Cov(Educ,Ability) to be postively correlated, and β₂ to be > 0…
β~₁=β₁+β₂ x Cov(Educ,Ability)/Var(Educ) > β₁
Positively biased! Overestimated
So β₁ likely upward biased: (return to education is bigger than true value)
How can we proceed from here (3)
Measure ability (HARD!)
Experiment: give random amount of education to people
Quasi experiments i.e replicate in natural settings
Detecting an omitted relevant variable
Consider true model again
Yi = β₀ + β₁ Xi + β₂ Zi + εi
But we estimate the misspecified model
Yi = β₀ + β₁ Xi + vi
How to test? Natural suggestion would be to specify
vi = γ₀ +y₁Zi + εi (Z is contained in error term v)
And test whether γ₁=0 (if not, Z is a relevant!)
Problems with this suggestion (2)
vi is not observed.
Eval: take residuals from misspecified to make an estimate of vi v^i)
We dont know what Z is (otherwise would’ve included it in our model i.e the true model)
So no good test for omitted relevant variables, so use economic theory and intuition and think about what bias they would bring to the parameters
2nd misspecification: Including irrelevant variables
Consider model:
Yi = β0 + β1 Xi + β2 Ii + εi
Where I is irrelevant. What does it mean for the coefficent β₂
The true population coefficient β₂ is 0
So I is irrelvant, and so we should get β^₂=0
What happens for our estimate of β₁ , and why?
Nothing - still unbiased.
True model is just a restricted version of estimated model (since β₂=0) so treat like doesn’t exist anyway!
So under classical assumptions: what does this mean for E(β^j)?
E(β^j) = βj for all values
e.g
E(βˆ₀) = β₀, E(βˆ₁) = β₁, E(βˆ₂) = β₂ = 0
So bias is not an issue for inclusion of irrelevent variables: but what is?
bias-variance tradeoff
Bias variance tradeoff captured: First part: Bias
True model
Yi = β0 + β1 Xi + β2 Zi + εi
where β₂ may = 0
What are 2 estimates of β₁?
β^₁ from: Y^i = βˆ₀ + β₁ˆXi + βˆ₂Zi
β~₁ from: Y~i = β~₀+β~₁Xi
Recall omitted relevant variable bias: if β₂≠0 and Cov≠0 , β~₁ is biased, while B^₁ isn’t.
2nd part of Bias-Variance tradeoff:
Variance of the 2 estimators
B) which has a lower variance (unless)…
Var(β~₁) = σ² /Σ(Xi - Xbar)²
Var(β^₁) = σ²/(1-Rzx)Σ(Xi-Xbar)²
β~₁ is better, unless R²zx=0 (Xi & Zi uncorrelated)
So using β^₁ better for unbiasness (unless β₂=0 and Cov=0 so β~₁ is also unbias ), B~₁ better for variance unless R²zx=0
Bias variance trade off summary
When β₂≠0
When β₂=0
B^₁ is unbiased
B~₁ is bias
But var (β~₁) < var (β^₁)
B)
Both unbiased
var(β~₁) < var (β^₁) (Again, so β~₁ clearly preferred, i.e not including the irrelevant variable is better since it shows the truer effect of Xi on Y)
So using this… when does the bias variance tradeoff exist
When B2≠0
What estimator is preferred in large samples
B^₁ , as unbiased unconditionally, and large samples variance decrease with sample size (N)
3rd misspecification: functional form misspecification
when we do not properly account for the shape of the relationship between dependent and independent variables.
example: return to experience in a wage equation
ln(wagei) = β₀+ β₁ agei + νi
But true model includes a squared term: ln(wagei)=β₀+β₁agei+β₂age²i +εi
To capture DMR! (so shape quadratic not linear)
so it is a type of omitted relevant variables issue (not adding age²)
thus possibly introducing bias (positive bias if Cov and β₂ share same signs, negative for opposite)
2 tests for functional form misspecification
Ramsey RESET (REgression specification error test)
Donaldson-Mackinnon test (test for non-nested alternatives)
Ramsey reset:
Assume general model
Yi =β₀ +β₁ X₁i +…+βk Xki +εi
We want to test whether to include any secondn order terms (squared variables, or multplicative terms e.g X x Z)
We could include them all in the model, but why don’t we?
A lot of variables = lots of parameters (k) to estimate: high k loses degrees of freedom
So how does Ramsey RESET work
(And include hypothesis test)
Takes fitted values (Y^i)
Y^i =β^₀ +β^₁ X₁i +…+β^k Xki
Then take polynomial terms of these as additional regressors
So for a second order (testing the squared variables)
Yi =β₀ +β₁ X₁i +…+βk Xki + δ₁Y^²i + ui
H₀:δ₁=0 (no functional form problem)
H₁:δ₁≠0 (functional form problem i.e need to include squared terms)
This process can be extender to higher-orders
Up to 4th order (variable⁴)
What would the hypothesis’ be for a 4th order misspecification, and what test statistic formula
H0 :δ1 =δ2 =δ3 =0
H1 : δj ≠ 0 for any j
Since a joint significance test, use F test
F = RSSr - RSSu / q
/
RSSu / n - (k+1)
q=3 for the amount of restrictions i.e equal signs)
k is the amount of parameters for the new model with the polynomial terms as the added regressors δ₁Y^₂ etc)
In Ramsey RESET, restricted model is nested within unrestricted open. (Where we just say some parameters jointly equal zero)
What if where non nested e.g test
Yi =β₀ +β₁ X₁i +…+βk Xki +ui
Against
Yi =β₀+β₁ ln(X₁i)+…+βk ln(Xki)+vi
How does the Donald MacKinnon test work, and test statistic
Obtain 2 fitted values for both equations
Yˆi =βˆ₀ +βˆ₁ X₁i +…+βˆk Xki
Yˇi =βˇ₀ + βˇ₁ ln(X₁i)+…+βˇk ln(Xki)
Then estimate the 2 models to get
Yi =β₀ +β₁ X₁i +…+βk Xki +θ₁ Yˇi + ui
Yi =β₀ +β₁ ln(X₁i)+…+βk ln(Xki)+θ₂ Yˆi +vi
(So the hat signs flip, add Y^i to log model, add Yˇi to linear model)
Then test using t-tests to see whether θ₁ and θ₂ are different from zero
If θ₁ ≠ 0
If θ₂ ≠ 0
What is a potential problem?
Evidence against model-in-levels (shouldn’t use model without logs)
Evidence against model-in-logarithms (we shouldn’t use logarithm based model)
C) a clear winner may not emerge: both or neither may get rejected!
What can we do if neither model is rejected?
Use additional tools sucuh as adjusted R²
What if both are rejected (neither model is good)
May be a third funcitonal form specification we have not considered