Chapter 1 Linear regression Flashcards
A1. Linearity, implication
the marginal effect of the regressors do not depend on the level
A2. Strict exogeneity
A3. No mulitcolinearity
The rank of nxk data matrix X is K with probability 1
A4. Spherical error variance
Implications of A2
Justified by economic theory, not metrics.
Usually not satisfied by time series data
Implications of A4
How does iid affect our assumptions’ restrictiveness?
In random sampling we obtain uncorrelated xi, therefore E(epsilon_i|x_i)=0.
E(epsilon_i^2) remains constant across i -> unconditional homoskedasticity, but the value E(epsilon_i^2|x_i) may differ across i, therefore A4 remains restrictive.
How do we look for the parameters in OLS? does it make sense?
We minimize the SSR (loss function) to minimize the errors.
It makes sense if we want to predict, but not necessarily if we want to interpret causality.
SSR formula
to isolate b from the minimized SSR function we need the inverse of x’x to exist. Is this fulfilled?
Yes:
1. By A3 the determinant is different from 0
2. It is a square matrix by definition
3. n>k
Projection matrix
P=X(X’X)^{-1}X’
PY
=Xb
Anihilator matrix
M=I-P
MY
=Y-Xb=e
(residuals)
Property of M and P
They are both symmetric and idempotent (AA=A)
PX
=X
MX
=0
PM
=0
Finding the variance in OLS (sigma^2)
Since we don’t know epsilon^2, we need an estimator that gives us an approximation to the variance
Finding the R^2
Centered R^2
By removing the mean, the centered R^2 describes the explanatory power of the Xs, not the mu
Influential power of an observation
where the subindex i indicated the estimator without the ith observation.
Pi=x_i(x’x)^{-1}x_i’
trace P=k.
If all i have similar contribution, Pi is approx k/n. If i is an outlier, Pi is much larger
Statistical properties of b: 1. unbiasedness (E(b)=\beta).
Which assumptions do we need?
- Linearity to change y to its meaning
- Strict exogeneity to cancel out the second term in part 1
- No multicolinearity so that the inverse exists.
Note: if 2 doesn’t hold (like in time series), b is biased.
Definition of conditional variance for a vector
Statistical properties of b: 2. BLUE.
Develop the variance of b OLS under conditional homoskedasticity
This is the smallest variance we can obtain with a BLUE estimator (proof by Gauss-Markov theorem)