Week 4 Flashcards
How can we predict the value of y* with the model y* = X* beta + u*?
If we want to know E(y*), then we know by Gauss Markov that y(hat)* = X* beta(hat).
If we want to know y* itself, we should first note that we cannot estimate it because it’s a random variable. We can predict it.
Thus we get that for A’y =X* beta hat is the best predictor.
How is the predictor of y* derived?
We define the prediction error y(hat)* - y* = X*(beta(hat) - beta) - u* = X*(X’X)^-1 X’u - u*.
Which has mean zero and variance:
Var(y(hat)* - y*) = sigma^2 (X*(X’X)^-1 X*’ + I.m).
If we let A’y is another linear predictor of y*, it’s prediction error is:
A’y - y* = (A’X - X*)beta + A’u - u*.
If we define
D = A - X(X’X)-1X’*
Then A’y - y* = D’X beta + (D’ + X*(X’X)-1X’)u - u*.
A prediction is unbiased if mean zero for all beta, which is only when D’X = 0. Under this restriction the prediction error variance is:
Var(A’y - y*) = sigma2 (D’D + X*(X’X)-1X’* + Im)
Which is minimized when D’D = 0, thus when D = 0, thus A = X(X’X)-1X’*, thus:
A’y = X*(X’X)-1X’y = X* beta (hat) is the Best Linear Predictor
What is the restricted and unrestricted model?
We can consider two models, the unrestricted model:
y = X1beta1 + X2beta2 + u
And the restricted model:
y = X1beta1 + u1
How is estimator of the beta defined in the unrestricted and restricted model?
Restricted: beta(hat)1r = (X1‘X1)-1X1‘y
Unrestricted: beta(hat)u = [vector] (beta(hat)1u, beta(hat)2u) = (X’X)-1X’y
= [vector] (beta(hat)1r - (X1‘X1)-1X1‘X2 beta(hat)2u, (X2‘M1X2)-1X2‘M1y)
What is the expectation and variance of the beta of the restricted model?
E(beta(hat)1r) = beta1 + (X’1X1)-1X1‘X2beta2
Var(beta(hat)1r) = sigma2(X1‘X1)-1
Thus they are biased
What are the expectations and variances of the beta(s) in the unrestricted model?
E(beta(hat)1u) = beta1,
E(beta(hat)2u) = beta2
Var(beta(hat)1u) = sigma2((X1‘X1)-1 + delta),
Var(beta(hat)2u) = sigma2(X’2M1X2)-1
with delta = (X1‘X1)-1X1‘X2(X2‘M1X2)-1X2‘X1(X1‘X1)-1
Are the beta’s in the restricted and unrestricted models corrlated? Answer for every combination possible.
- beta(hat)1r and beta(hat)1u are always correlated
- beta(hat)1u and beta(hat)2u are only uncorrelated when X1‘X2 = 0
- beta(hat)1r and beta(hat)2u are always uncorrelated (bc. M1X1 = 0, thus Cov (X1‘y, X2‘M1y) = sigma2X1‘M1X2 = 0)
When should we include beta2?
If |beta2| is large, or sigma2/x2‘M1x2 is small (or both).
What is important about large models?
We should generally not create large models, as even when we have the “true” model, then we expect a very good estimaton, however because we then need to estimate a lot, we have large standard errors.
Thus removing a small unimportant parameter we might get a bit more biased but a much more precise estimation.
What is meant by a balanced addition?
You must make sure that you include variables that are required together, otherwise you might worsen your estimation.
What is the distribution of beta(hat)?
N(beta, sigma2(X’X)-1),
for linear combinations R beta(hat):
R beta(hat) = R(X’X)-1X’y ~ N(R beta, sigma2R(X’X)-1R’)
Show the expectation and variance of e (residuals).
E(y - X beta(hat)) = E(y) - E(X beta(hat)) = X beta - X beta = 0
Var(e) = Var (M u) = M Var(u) M’ = M sigma2 In M’ = sigma2 M
How is the distribution of s2 determined?
First we know that u ~ N(0, sigma2In), thus we have that v = u / sigma ~ N(0, In). Thus:
e’e/sigma2 = v’Mv ~ kai2(n-k).
Since r(M) = n - k, and s2 = e’e/(n-k) we have:
(n-k)s2/sigma2 ~ kai2(n-k).
Thus ((n-k)s2/sigma2) = n - k, Var((n-k)s2/sigma2) = 2(n - k), thus
E(s2) = sigma2 and Var(s2) = 2sigma4/(n-k)
What is M?
The M in e = M u, where e is residual and u actual error term
Show that beta(hat) and s2 are independent.
Let v = u/sigma.
We write: beta(hat) - beta = (X’X)-1X’u = sigma(X’X)-1X’v, and
e’e = u’Mu = sigma2 v’Mv. Setting L = (X’X)-1X’, we find LM = 0 because MX = 0.
Thus we conclude that Lv and v’Mv are independent, thus beta(hat) = beta + sigma Lv and e’e = sigma2 v’Mv are also independent.
From this follows taht beta(hat) and s2 are independent.