Week 2 Flashcards
What is the multivariate MSE?
Var(theta^hat) + (bias(theta hat))(bias(theta hat)’
What is the multivariate normal distribution?
f(x) = 2 pi ^(-n/2) |𝝨|^(1/2) exp(-(1/2) (x-mu)’ 𝝨^-1(x-mu))
With mu = vector of mu
And Sigma = diagonal matrix of variances
How can a (multivariate) normal distribution be rewritten to a chi squared distribution?
(x - μ)’Σ^-1(x-μ) or
If x is N(0, I.n) distributed and matrix A is symmetric idempotent of rank r then
x’Ax
How can a Normal and chi squared distribution be rewritten to a students t-distribution?
x/sqrt(w/n) is t(n) distributed, this converges to a normal distribution
How can two chi squared distributions be rewritten to a F-distribution?
(v/m)/(w/n) is F(m, n) distributed (given that v and w are independent.
How can a Student’s t-distribution be rewritten to a F-distribution?
t^2 = F(1, n) distribution
What can be said about the limit of a F-distribution?
If w/n -> inf then F(m, n) is (chi squared (m))/m distributed
What is the derivation of the Least Squares estimation?
- We need to choose a β s.t. the error is minimized, so we create function ɸ(β) = sum of ui2 = u’u = (y - Xβ)’(y-Xβ)
- We want to minimize this function, so first we rewrite it to:
ɸ(β) = y’y = y’Xβ - β’X’y + β’X’Xβ
= y’y - 2y’Xβ + β’X’Xβ (because y’Xβ = β’X’y)
- Then we differentiate and equate to 0:
derivative of ɸ(β) with respect to β = -2y’X + 2β’X’X = 0
gives
X’Xb = X’y, so b = (X’X)-1X’y
- So we get yhat = Xb = X(X’X)-1X’y
What is the residual maker matrix, and why is it named like that?
- M = In - X(X’X)-1X’, this is a symmetric idempotent matrix (AA=A).
- It’s derived from the residuals. i.e. e = y - Xb = y - X(X’X)-1X’y = My
What is the formula for b?
(X’X)-1X’y
What is the least absolute deviations estimator?
Instead of minimizing the sum on ui2, we now minimize the sum of the absolute value of ui.
What are the advantages and disadvantages of the least absolute deviations?
Advantages:
- Resistant to outliers
Disadvantages:
- Might have multiple solutions
- Not differentiable, so it’s a huge disadvantage in theoretical results
How is R2 derived?
- We have that y = Xb + e = yhat + e
- Since X’e = 0, we have yhat‘e = b’X’e = 0, thus y’y = yhat‘yhat + e’e. So we can create the ratio:
(yhat‘yhat)/(yhat‘yhat) = 1 - (e’e)/(yhat‘yhat)
- If we create the symmetric idempotent matrix A (with Ae = e), then
Ay = AXb + Ae = AXB + e, so then
y’Ay = b’X’AXb + e’e (a.k.a. SST (a.k.a. TSS) = SSE + SSR)
- Thus we can usually write R2 = 1 - SSR/SST (aka. TSS) = 1 - (e’e)/(y’Ay)
(AB)’ = … ?
B’A’
(AB)-1 = … ?
B-1A-1